Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)

Exploiting and Filter Dynamics in Spatial Temporal Learning of Deep Spiking Neural Network

Haowen Fang , Amar Shrestha , Ziyi Zhao and Qinru Qiu Syracuse University {hfang02, amshrest, zzhao37}@syr.edu, [email protected]

Abstract spatial temporal patterns with an ability superior to conven- tional artificial neural network (ANN) [Wu et al., 2018b]. The recently discovered spatial-temporal informa- The potential of SNNs has not been fully explored. First of tion processing capability of bio-inspired Spiking all, due to the lack of unified and robust training algorithms, neural networks (SNN) has enabled some interest- the performance of SNNs is still not comparable with deep ing models and applications. However designing neural networks (DNN). Directly adapting large-scale and high-performance model is yet a is not feasible because their output is a sequence of Dirac challenge due to the lack of robust training algo- delta functions, hence is non-differentiable. Secondly, most rithms. A bio-plausible SNN model with spatial- SNN models and training algorithms use rate coding, repre- temporal property is a complex dynamic system. senting a numerical value in DNN by spike counts in a time and neurons behave as filters capable of window, and consider only the statistics of spike activities. preserving temporal information. As such neuron Temporal structure of spike train and spike timing also con- dynamics and filter effects are ignored in existing vey information [Mohemmed et al., 2012]. Spike trains with training algorithms, the SNN downgrades into a similar rates may have distinct temporal patterns representing memoryless system and loses the ability of tem- different information. To detect the temporal pattern in the poral signal processing. Furthermore, spike tim- spike train, novel synapse and neuron models with temporal ing plays an important role in information repre- dynamics are needed. However, synapse dynamics are often sentation, but conventional rate-based spike coding ignored in the computational models of SNNs. models only consider spike trains statistically, and To address the problem with non-differentiable neuron out- discard information carried by its temporal struc- put, one approach is to train an ANN such as a multi-layer tures. To address the above issues, and exploit the (MLP) and convert the model to an SNN. This temporal dynamics of SNNs, we formulate SNN as method is straightforward, but it requires additional fine- a network of infinite impulse response (IIR) filters tuning of weights and thresholds [Diehl et al., 2015]. There with neuron nonlinearity. We proposed a training are also works that directly apply backpropagation to SNN algorithm that is capable to learn spatial-temporal training by approximating the gradient of the spiking func- patterns by searching for the optimal synapse fil- tion [Lee et al., 2016; Esser et al., 2015; Shrestha et al., ter kernels and weights. The proposed model and 2019], or utilizing gradient surrogates [Wu et al., 2018b; training algorithm are applied to construct associa- Shrestha and Orchard, 2018]. Other approaches include us- tive memories and classifiers for synthetic and pub- ing derivatives of soft spike [Neftci et al., 2019] or membrane lic datasets including MNIST, NMNIST, DVS 128 potential [Zenke and Ganguli, 2018]. etc. Their accuracy outperforms state-of-the-art ap- proaches. The ability of capturing temporal patterns relies on neu- ron and synapse dynamics [Gutig¨ and Sompolinsky, 2006]. Synapse function can be modeled as filters, whose states pre- 1 Introduction serve rich temporal information. The challenge is how to cap- ture the dependencies between the current SNN states and Spiking neural networks have demonstrated their capability previous input spikes. This challenge has been addressed in signal processing and pattern detection by mimicking the by some existing works. [Gutig¨ and Sompolinsky, 2006] behavior of biological neural systems. In SNNs, informa- and [Gutig,¨ 2016] train individual neuron to classify different tion is represented by sparse and discrete spike events. The temporal spike patterns. [Mohemmed et al., 2012] is capable sparsity of spike activities can be exploited by event driven to train neurons to associate an input spatial temporal pattern implementation for energy efficiency. In a more bio-realistic with a specific output spike pattern. However, the aforemen- neuron and synapse model, each neuron is a dynamic system, tioned works cannot be extended to multiple layers and there- which is capable of spatial temporal information processing. fore are not scalable. Some recent works utilize backpropaga- The network made of such neurons can memorize and detect tion through time (BPTT) to address the temporal dependency

2799 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20) problems. [Wu et al., 2018b] proposed simplified iterative The ODE system is linear time invariant (LTI). It can also leaky integrate and fire (LIF) neuron model. [Gu et al., 2019] be interpreted as the convolution of an impulse response of a derived an iterative model from a current based LIF neu- filter with the input spike train, which leads to the spike re- ron. Based on the iterative model, network can be unrolled sponse model [Gerstner et al., 2014]. The relation between hence BPTT is possible. However, these works only consider the v(t), O(t) and the historical spike input can clearly be the temporal dynamics of , the synapse seen in the spike response model. We denote the input spike dynamics and the filter effect of SNN are ignored. There trains as a sequence of time-shifted Dirac delta functions, are also works that introduced the concept of IIR and FIR P j j Si(t) = δ(t − ti ), where ti denotes the jth spike ar- [ j into Multi Layer Perceptron (MLP) Back and Tsoi, 1991; rival time from the ith input synapse. Similarly, output spike Campolucci et al., 1999], which enabled MLP to model time train can be defined as O(t) = P δ(t − tf ), tf ∈ {tf : f series. v(t ) = Vth}. To simplify the discussion, we consider only In this work, our contributions are summarized as follows: one synapse. The impulse response kernel k(t) of a neuron 1. The dynamic behavior of LIF neuron is formulated by described by above ODE system is obtained by passing a sin- infinite impulse response (IIR) filters. We exploit the gle spike at time 0 at the input, such that the initial condi- synapse and neuron filter effect, derive a general repre- tions are x(0) = 1 and v(0) = 0. By solving equation 1a η −t −t sentation of SNN as a network of IIR filters with neuron and 1b, we have k(t) = η η−1 (e τm − e τs ). Given the gen- non-linearity. eral input S(t), PSP is the convolution of k(t) and S(t). For 2. A general algorithm is proposed to train such SNN to a neuron with M synapses, without reset, the sub-threshold learn both rate-based and spatial temporal patterns. The membrane potential is the summation of all PSPs, such that PM R ∞ algorithm does not only learn the synaptic weight, but v(t) = i wi 0 k(s)Si(t − s)ds. is also capable to optimize the impulse response kernel In hybrid model, the reset is modeled by simply setting v of synapse filters to improve convergence. The similar to vrest, and regarding the reset as the start of the next evalua- learning behavior has been discovered in biological sys- tion and discarding the neuron’s history information. A more tems [Hennig, 2013]. Our training algorithm can be ap- biological way is to treat reset as a negative current impulse plied to train simple LIF, and neurons with more com- applied to the neuron itself [Gerstner et al., 2014]. The re- plex synapses such as alpha synapse, dual-exponential −t set impulse response is h(t) = −Vthe τr , where τr controls synapse etc. the decay speed of reset impulse. Such that the membrane 3. Our algorithm is tested on various datasets includ- potential is the summation of all PSPs and reset voltage: ing MNIST, neuromorphic MNIST, DVS128 gesture, Z ∞ M Z ∞ TIDIGITS and Australian Sign Language dataset, and X v(t) =− h(t)O(t− s)ds+ wi k(s)Si(t− s)ds (2) outperform state of the art approaches. 0 i 0 Treating reset as a negative impulse enables adaptive 2 Neuron Model threshold, which is observed in biological neurons. Neu- Without loss of generality, we consider a LIF neuron with ron’s threshold depends on its prior spike activity. With dual exponential synapse for its biological plausibility. The adaptation, frequent spike activity increases the reset volt- neuron can be described as a hybrid system, i.e. the mem- age, which inhibits the neuron activity, preventing SNNs from brane potential and synapse status evolve continuously over over-activation. Such that additional tuning methods such as [ et al. ] time, depicted by ordinary differential equations (ODE), weight-thresholds balancing Diehl , 2015 is not neces- while a spike event triggers the update of the state variables sary. as the following [Brette et al., 2007]: Above equations reveal the filter nature of the biologically realistic neuron model. Each synapse act like a low pass fil- M ter. Synapse filter is causal, and the kernel is defined to decay dv(t) η X over time, hence the current state of the PSP is determined by τm = −(v(t) − vrest) + η η−1 wixi(t) (1a) dt all previous input spikes up to current time. The temporal de- i pendency calls for temporal error propagation in the training. dx (t) τ i = −x (t) (1b) s dt i 3 Neuron and Synapse as IIR Filters xi(t) ← xi(t) + 1, upon receiving spike (1c) In practice, for computational efficiency, spiking neural net- v(t) ← vrest, if v(t) = Vth (1d) work are usually simulated in discrete time domain and net- work states are evaluated for every unit time. The discrete Where xi is the state variable of the ith synapse, wi is the time version of equation 2 can be written as: associated weight, and M is the total number of synapses. τm M and τs are time constants, and η = τm/τs. v and vrest are the X X X neuron membrane potential and rest potential. For simplicity, v[t] = h[t]O[t − s] + wi k[s]Si[t − s] (3) we set vrest = 0. Every synapse has its own potential, which s i s is called postsynaptic potential (PSP). Neuron accumulates where t ∈ Z≥0. It is clear that v[t] is a combination of a PSP of all input synapses. The membrane potential resets reset filter and multiple synapse filters. However, the above when an output spike is generated. form is not practical for implementation because of infinite

2800 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)

Synapse 1 ...... S1[t] Neuorn F[t-1] F[t] F[t+1] × ··· × αp F1[t] Reset response n+1 Layer α0 × -R[t] O[t-1] O[t] O[t+1] βq × × β1 ··· w1 × θ U[t-1] U[t] U[t+1] ... -Vth V[t-1] V[t] V[t+1] Synapse j Sj[t] × × ··· × αp R[t-1] R[t] R[t+1] λ n Layer Fj[t] α0 × I[t-1] I[t] I[t+1] wj βq × × O[t] β1 V[t] U[t] ··· F[t-1] F[t] F[t+1] Figure 1: General neuron model as IIR filters O[t-1] O[t] O[t+1] ...... Layer n-1 Layer Time t-1 Time t Time t+1 convolution coefficients. We express the above system using Spatio path Temporal path Linear Constant-Coefficient Difference Equations (LCCD): Figure 2: Spatial temporal data flow M X v[t] = −Vthr[t] + wifi[t] (4a) l i is number of neurons in lth layer. Vi [t] is neuron mem- −1 Il[t] Rl[t] r[t] = e τr r[t − 1] + O[t − 1] (4b) brane potential. i is weighted input. i is reset voltage, F j[t] OL[t] U(x) f [t] = α f [t − 1] + α f [t − 2] + βx[t − 1] (4c) l is PSP. i is spike function, and is a Heavi- i 1 i 2 i side step function. P and Q denote the feedback and feed l l where fi[t] denotes the ith synapse filter, which is a second forward orders. λ, θ, αj,p and βj,q are coefficients of neu- −1 −1 ron filter, reset filter and synapse filter respectively. 5d is order IIR filter, r[t] is the reset filter, α1 = e τm + e τs , α2 = − τm+τs −1 −1 a general form of IIR filters, it allows PSP to be arbitrary −e τmτs , β = e τm − e τs . shapes. The above formulation is not specific to neuron mod- Introducing synapse dynamics could cause significantly els and it provides a flexible and universal representation, it large computation overhead because the number of synapses is capable of describing more complex spiking neuron mod- is quadratic to the number of neurons. Maintaining such large −1 els than LIF neuron. For example, by setting α1 = 2e τ , number of synaptic states is infeasible. In a biological sys- −2 1 τ 1 − τ tem, spikes are transmitted through axons, an axon connects α2 = −e , αp = 0, p ∈ {2, 3, ..., P }, β1 = τ e and to multiple destination neurons through synapses. Therefore, βq = 0, q ∈ {0, 2, 3, ..., Q}, it models neuron with alpha the synapses that connect to the same axon have identical synapse. By setting αp = 0, p ∈ {1, 2, ..., P }, β0 = 1, spike history hence same states. Based on this observation, βq = 0, q ∈ {1, 2, ...Q}, the synapse filter is removed, the tracking the states of synapses that have the same fan-in neu- model becomes simple LIF neuron as in [Diehl et al., 2015; ron is unnecessary as these synapses can share the same state Gu et al., 2019]. Based on 5a – 5f, a general model of spik- and computation. ing neuron can be represented as a network of IIR as shown Neuron itself can also be a filter and v[t] may also rely in Figure 1. Axonal delay is explicitly modeled in equation l l−1 on its previous states. We can extend equation 4a - 4b to a 5d by delayed input βj,qOj [t − q], hence it enables more more general form, such that the SNN can be interpreted as a complex and biologically plausible temporal behavior. Neu- network of IIR filters with non-linear neurons: rons can also have heterogeneous synapses, i.e. the synapses’ feed forward order and feedback order can vary across layers. V l[t] = λV l[t − 1] + Il[t] − V Rl[t] (5a) To avoid notation clutter, we assume that all neurons in this i i i th i paper have homogeneous synapse types. Nl−1 Equation 5a to 5f provide an explicitly iterative way to l X l l Ii [t] = wi,j Fj [t] (5b) model synapse and neuron dynamics, hence it is possible to j unfold the network over time and apply BPTT. The spatial Rl[t] = θRl[t − 1] + Ol[t − 1] (5c) and temporal data flow and unfolded network with second or- i i i der synapse filter are shown in Figure 2. Similar formulations P Q l X l l X l l−1 can be found in [Wu et al., 2018b; Gu et al., 2019]. However Fj [t] = αj,pFj [t − p] + βj,qOj [t − q] (5d) they are aimed at specific neuron models. p=1 q=0 l l 4 Spatial Temporal Error Propagation Oi[t] = U(Vi [t] − Vth) (5e) U(x) = 0, x < 0 otherwise 1 (5f) We discuss the spatial temporal backpropagation in the con- text of two learning tasks. In the first, the neuron that fires Where l and i denote the index of layer and neuron re- most represent the correct result. Since this is a classification spectively, and j denotes input index and t is the time, Nl task, we use cross-entropy loss and spike count of the output

2801 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20) neuron represents the probability. Loss is defined as: can propagate recursively. By applying chain rule, we can N obtain the gradient with respect to weight: XL Erate = − yilog(pi) (6) T t−1 t−1 i ∂E X X Y pi is given by: = δl[t]l[t](F l[t] + F l[i] κl[j]) (14) ∂wl PT L t=1 i=1 j=i exp ( t Oi [t]) pi = (7) PNL PT L In real biological system, synapses may respond to spike j=1 exp ( t Oj [t]) differently. The PSP kernels can be modulated by input as L where yi is the label, L is number of layers, Oi [t] denotes part of the synaptic plasticity [Hennig, 2013]. It is possible output of last layer. to employ to optimize the filter kernels in In the second learning task, the goal is to train SNN to equation 5d [Campolucci et al., 1999]. The gradients of L generate spikes at specified times such that the output spike l l with respect to αj,p and βj,p are: pattern O[t] is spatially and temporally similar as the target spike pattern S [t]. We refer to it as temporal learning. T Nl target ∂E X X ∂Il[t] The loss function of the learning is the distance between the = δl[t]l[t] i (15) ∂αl (∂βl ) i i ∂αl (∂βl ) actual output spike trains and the target spike trains. Inspired j,p j,q t=1 i j,p j,q by Van Rossum distance, we pass the actual and target spike l l train through a synapse filter k[t], to convert them to continu- ∂Ii [t] ∂Ii [t] where ∂αl and ∂βl are: ous traces. The loss is defined as: j,p j,q

NL T l P 1 X X ∂Ii [t] l l X l l E = (k[t] ∗ OL[t] − k[t] ∗ Si [t])2 (8) = w (F [t − p] + α F [t − p − r]) (16) dist 2T i target ∂αl i,j j j,r j i=1 t=1 j,p r=1 i where S [t] is the ith spike train of target spike patterns. target l P l ∂E l ∂Ii [t] l l−1 X l l−1 For both tasks, we define: δi[t] = l , i[t] = = w (O [t − q] + α O [t − q − r]) (17) ∂Oi[t] l i,j j j,r j l l ∂βj,q ∂U(Vi [t]−Vth) l ∂Vi [t+1] r=1 l , κi[t] = l . Please note that the spike ac- ∂Vi [t] ∂(Vi [t]) tivation function U(x) is not differentiable. Its approximation Above learning rule assumes the SNN to be an LTI system. will be discussed in section 4.1. By unfolding the model into The loss calculation, error propagation, filter coefficients and spatial path and temporal path as shown in Figure 2, BPTT synaptic weights update are performed at the end of each l can be applied to train the network. κi[t] can be computed as: training iteration. Therefore, within one iteration, the SNN l is still linear time-invariant. l ∂Vi [t + 1] l κi[t] = l = λ − Vthi[t] (9) ∂(Vi [t]) 4.1 Spike Function Gradient Approximation l δi[t] an be computed recursively as follows: The non-differentiable spike activation is a major road-block

Q Nl+1 l+1 for applying backpropagation. One solution is to use a X X ∂E ∂Oj [t + q] δl[t] = gradient surrogate [Neftci et al., 2019]. In the forward i ∂Ol [t + q] ∂Ol[t] q=0 j j i path, a spike is still generated by a hard threshold func- l tion, while in the backward path, the gradient of the hard ∂E ∂Oi[t + 1] threshold function is replaced by a smooth function. One of + l l (10) ∂Oi[t + 1] ∂Oi[t] such surrogates can be spike probability [Esser et al., 2015; Neftci et al., 2019]. Although the LIF neuron is deterministic, where stochasticity can be obtained from noise [Stevens and Zador, ∂Ol[t + 1] ∂Ol[t + 1] ∂V l[t + 1] ∂Rl[t + 1] 1996]. Under Gaussian noise of mean 0 and variance σ, in a i = i i i (11) l l l l short interval, LIF neuron can behave like a Poisson neuron ∂Oi[t] ∂Vi [t + 1] ∂Ri[t + 1] ∂Oi[t] l l such that the spike probability is a function of the membrane = −Vthδi[t + 1]i[t + 1] (12) potential v as follows:

1 Vth − v ∂Ol+1[t + q] ∂Ol+1[t + q] ∂V l+1[t + q] ∂Il+1[t + q] P (v) = erfc( √ ) (18) j j j j 2 2σ l = l+1 l+1 l ∂Oi[t] ∂V [t + q] ∂I [t + q] ∂Oi[t] j j where erfc(x) represents a complementary error function. l+1 l+1 l+1 l+1 = βj,q δj [t + q]j [t + q]wj,i (13) With this replacement, the gradient of U(x) can be approxi- l mated as: Where δi[t + q] = 0 for t + q > T . Unlike LSTM/RNN, or SNN such as [Wu et al., 2018b; Gu et al., 2019], there may 2 − (Vth−v) be dependency from layer l + 1 to layer l at multiple time ∂U(v) ∂P (v) e 2σ2 ≈ = √ (19) steps due to axonal delay. Based on above equations, error ∂v ∂v 2πσ

2802 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)

5 Experiments 10 different classes indicates that it is capable of utilizing fea- tures in the temporal distribution of the spikes in addition to Proposed model and algorithm are implemented in PyTorch1. the spike rates. We demonstrate the effectiveness using three experiments; the first experiment is a non-trivial generative task using as- 5.2 Vision Tasks sociative memory; the second is vision classification, and the third is to classify temporal patterns. In following experi- We evaluated our method on three vision datasets. Results ments, we use Adam optimizer, is set to 0.0001, and comparisons with state-of-the-art works in the SNN do- batch size is 64. We employ synapse model depicted by equa- main are shown in Table 1. For MNIST, we utilize rate- −1 based encoding to convert the input image into 784 spike tion 4c, in which τm = 4, τs = 1, λ = 0, θ = e τm , Vth = 1. trains where number of spikes in the spike train is propor- tional to the pixel value. With a convolutional SNN with the 5.1 Associative Memory structure 32C3-32C3-64C3-P2-64C3-P2-512-10, our model An associative memory network retrieves stored patterns that achieves state-of-the-art accuracy in the SNN domain. The most closely resembles the one presented to it. To demon- work next in terms of accuracy (99.42 %) [Jin et al., 2018] strate the capability of our approach to learn complex spa- employs of 64 spiking CNNs. Compared tial temporal spike patterns, we train a network of struc- to conversion-based approaches that require hyper-parameter ture 300x500x200x500x300. We generate 10 spatial tem- search and fine tuning [Diehl et al., 2015], our approach does poral spike train patterns, each contains 300 spike trains of not require post-training processing. It directly trains SNN length 300, samples of these patterns are shown in Figure using BPTT and obtains models with comparable quality as 3a. Each dot corresponds to a spike event, the x-axis rep- DNN. resents the time, and the y-axis represents the spike train in- Unlike MNIST, which consists of static images, Neuro- dex. The SNN is trained to reconstruct the pattern. First col- morphic MNIST (N-MNIST) is a dynamic dataset which con- umn of 3b shows two noisy sample inputs. Noisy samples sists of spike events captured by DVS camera and is a popular are formed by adding random noise, which includes obfusca- dataset for SNN evaluation. An N-MNIST sample is obtained tion and deletion of some part of the patterns, jitter in input by mounting the DVS camera on a moving platform to record spikes’ timing following a Gaussian distribution and random MNIST image on the screen. The pixel change triggers spike background spikes. After 50 epochs of training, the network event. Thus, this dataset contains more temporal information. is able to reconstruct the original patterns and remove back- With a convolutional network of size 32C3-32C3-64C3-P2- ground noise. Corresponding outputs at epoch 5 and 50 are 64C3-P2-256-10, our model outperforms the current state-of- shown in 3b. Such a task is difficult for rate-based training the-art. The results are shown in Table 1. [Lee et al., 2016] in- methods as they are not capable of capturing temporal de- troduced additional winner-take-all (WTA) circuit to improve pendencies. It is noteworthy that the intermediate layer has performance. [Wu et al., 2019] gets 99.35% accuracy with 200 neurons, which is smaller than the input layer. And the a very large network, the structure is 128C3-256C3-AP2- intermediate layer is learning the spatial and temporal repre- 512C3-AP2-1024C3-512C3-1024FC-512FC-Voting. There sentation of the input patterns. Thus, this network also acts is also additional voting circuit at output layer. We use a sig- like a spatial temporal auto-encoder. nificantly smaller network to achieve the same accuracy, and We drove the input of the network with 64 different testing no additional voting layer or WTA circuits are required. samples and record output of 200 neurons in the intermediate DVS128 Gesture Dataset contains 10 hand gestures such as layer. Figure 4a color codes the spiking rate of those neurons. hand clapping, arm rolling etc. collected from 29 individuals The x-axis gives the index of the neurons, and y-axis gives under 3 illumination conditions using DVS camera. The net- the index of different testing samples. Those samples belong work is trained to classify these hand gestures. This dataset to 10 different classes, and are sorted so that data of the same contains rich temporal information. For this task, we utilize class are placed close to each other vertically. The 10 differ- a network with 64C7-32C3-32C3-256-10 structure. The ad- ent colors on the left side bar indicate each of the 10 classes. vantage of our work is clearly seen in the third column of Ta- The pixel (x, y) represents the spiking rate of neuron x given ble 1. We achieved 96.09 % accuracy, which is state-of-the- testing sample y. Spiking rate of any neuron is almost a con- art in the spiking domain, while other works, such as [Amir stant regardless of which class the testing sample belongs to. et al., 2017], requires additional filters for data preprocessing Figure4b shows the Van Rossum distances between the 200 and WTA circuit at the output layer. Our model and learning neurons’ output spike train. x-axis and y-axis give the input algorithm doesn’t need specialized neuron circuits or any data sample index. The color intensity of pixel (x, y) is propor- preprocessing techniques as the spike streams are directly fed tional to the Van Rossum distance between the 200 neurons’ into the network. output when given input sample x and y respectively. Simi- We also studied the effect of training the synapse response lar as 4a, the color bar on left side indicates the class of each kernels. The learned synapse kernels are shown in Figure sample. It can clearly be seen from figure 4b that the tem- 5. The solid red line represents the original kernel. The poral structure of these 200 neurons’ outputs are significantly decay speed of synapse response of the learned kernel di- different. The fact that our model is able to take those 64 sets verges from original kernel. Slower decay speed indicates of spike trains with almost the same firing rate and generate the synapses are capable of remembering information for a longer time. Such behavior is similar to the gates in an LSTM. 1Code is available at: https://github.com/Snow-Crash/snn-iir The accuracy with and without training synapse filter ker-

2803 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)

Input (jitter + noise + deletion) Output (epoch = 5) Output (epoch = 50)

Noise Neuron indexNeuron indexNeuron Input spike train index Input spike train

Time Time Time Neuron indexNeuron indexNeuron

Input spike train index Input spike train Deletion

Time Time Time (a) Original patterns (b) Input and output spike trains Figure 3: Spatial temporal input and output spike patterns of associative memory network

Label 1.0 0.8 0.6

PSP 0.4 0.2 0.0 0 20 40 60 80 100 Time Input sample index Figure 5: Learned synapse impulse response Neuron index (a) Intermediate layer spike rate 99.5 100 90 Label 99.0 80 70 98.5 60

Accuracy 98 With traning filter Accuracy 50 With traning filter without training filter 40 Without traning filter 97.5 30 1 21 41 61 81 1 21 41 61 81 101 121 141 Epoch Epoch (a) NMNIST accuracy (b) DVS128 accuracy Input sample index Figure 6: Training performance comparison Input sample index (b) van Rossum distance 5.3 Time Series Classification Figure 4: Intermediate layer output spike rate and Van Rossum dis- tance Our work also shows advantages in time series classifica- tion. We evaluated our work in TIDIGITS and Australian Sign Language [Kadous and others, 2002] dataset. TIDIG- ITS is a speech dataset that consists of more than 25,000 digit sequences spoken by 326 individuals. For training and nel are shown in Figure 6. No improvements are observed testing, we extracted MFCC from each sample, resulting for MNIST dataset, the accuracy with training and without 20 time series of length 90. The Australian sign language trained kernel are 99.46% and 99.43% respectively. This is dataset [Kadous and others, 2002] is a multivariate time se- because MNIST is a static dataset, hence no temporal infor- ries dataset, collected from 22 data glove sensors that track mation. There is slight improvement in NMNIST by train- acceleration and hand movements such as roll, pitch etc. Each ing synapse filter kernel, the accuracy increases from 99.24% recorded hand sign is a sequence of sensor readings. The av- to 99.39%. In DVS 128 dataset, the advantage of training erage duration of a hand sign is 45 samples. The dataset has the synapse filter kernel is clearly seen, the model not only 95 classes of hand signs, To convert time series into spike converges faster, the accuracy also increases from 94.14% to trains, we use current-based LIF neuron as encoder. It accu- 96.09%. mulates input data as current and converts time varying con-

2804 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)

Method MNIST NMNIST IBM-DVS128 [Amir et al., 2017] Arnon Amir, Brian Taba, David Berg, [Wu et al., 2018b] 99.42 98.78 - Timothy Melano, Jeffrey McKinstry, Carmelo Di Nolfo, [Jin et al., 2018] 99.42 98.84 - Tapan Nayak, Alexander Andreopoulos, Guillaume Gar- [Wu et al., 2019] - 99.35 - reau, Marcela Mendoza, et al. A low power, fully event- [Lee et al., 2016] 99.31 98.66 - [Gu et al., 2019] 98.60 - - based gesture recognition system. In Proceedings of the [Tavanaei and Maida, 2019] 97.20 - - IEEE Conference on Computer Vision and Pattern Recog- [Shrestha and Orchard, 2018] 99.36 99.2 93.64 nition, pages 7243–7252, 2017. [Kaiser et al., 2018] 98.77 - 94.18 [Kaiser et al., 2019] - - 92.7 [Back and Tsoi, 1991] Andrew D Back and Ah Chung Tsoi. [Amir et al., 2017] - - 91.77 Fir and iir synapses, a new neural network architecture for This work 99.46 99.39 96.09 time series modeling. Neural computation, 3(3):375–385, 1991. Table 1: Results on vision datasets [Brette et al., 2007] Romain Brette, Michelle Rudolph, Ted Carnevale, Michael Hines, David Beeman, James M Method Architecture TIDIGITS Sign language Bower, Markus Diesmann, Abigail Morrison, Philip H [Wu et al., 2018a] SNN 97.6 - Goodman, Frederick C Harris, et al. Simulation of net- [Pan et al., 2019] SNN-SVM 94.9 - [Abdollahi and Liu, 2011] MFCC-HMM 99.7 - works of spiking neurons: a review of tools and strate- [Shrestha et al., 2019] SNN-STDP - 97.5 gies. Journal of computational neuroscience, 23(3):349– [Karim et al., 2019] LSTM-CNN - 97.00 398, 2007. Vanila LSTM LSTM 97.9 96.7 This work SNN 99.13 98.21 [Campolucci et al., 1999] Paolo Campolucci, Aurelio Uncini, Francesco Piazza, and Bhaskar D Rao. On-line Table 2: Results on temporal datasets learning algorithms for locally recurrent neural networks. IEEE transactions on neural networks, 10(2):253–271, tinuous values to time varying spike patterns. 1999. Networks to classify TIDIGITS and Australian Sign Lan- [Diehl et al., 2015] Peter U Diehl, Daniel Neil, Jonathan Bi- guage have a structure 300-300-11 and 300-300-95 respec- nas, Matthew Cook, Shih-Chii Liu, and Michael Pfeif- tively. We trained two-layer stacked LSTM of unit size 300 fer. Fast-classifying, high-accuracy spiking deep networks as baseline. Results are shown in Table 2. The best accu- through weight and threshold balancing. In 2015 Inter- racy in TIDIGITS is achieved by [Abdollahi and Liu, 2011], national Joint Conference on Neural Networks (IJCNN), however, it is a non-spiking approach. In Australian Sign pages 1–8. IEEE, 2015. Language dataset, we outperformed vanilla LSTM and DNN [Esser et al., 2015] Steve K Esser, Rathinakumar Ap- based approaches. [Shrestha et al., 2019] uses EMSTDP to puswamy, Paul Merolla, John V Arthur, and Dharmen- train an SNN to classify 50 classes of the hand signs, the net- dra S Modha. Backpropagation for energy-efficient work size is 990-150-150-50. It buffers the entire sequence neuromorphic computing. In Advances in Neural and flattened the time series into a vector. While our work Information Processing Systems, pages 1117–1125, 2015. is trained to classify all 95 classes, and it processes the time series in a more efficient and natural way, the input data is [Gerstner et al., 2014] , Werner M Kistler, converted into spikes on the fly. Since flattening is no longer Richard Naud, and Liam Paninski. Neuronal dynamics: necessary, the input dimension is also reduced. From single neurons to networks and models of cognition. Cambridge University Press, 2014. 6 Conclusion [Gu et al., 2019] Pengjie Gu, Rong Xiao, Gang Pan, and Huajin Tang. Stca: spatio-temporal credit assignment with In this work, we proposed a general model to formulate SNN delayed feedback in deep spiking neural networks. In Pro- as network of IIR filters with neuron non-linearity. The model ceedings of the 28th International Joint Conference on Ar- is independent of neuron types and capable to model complex tificial Intelligence, pages 1366–1372. AAAI Press, 2019. neuron and synapse dynamics. Based on this model, we de- rived a learning rule to efficiently train synapse weights and [Gutig¨ and Sompolinsky, 2006] Robert Gutig¨ and Haim synapse filter impulse response kernel. The proposed model Sompolinsky. The tempotron: a neuron that learns spike and method are evaluated on various tasks, including associa- timing–based decisions. Nature neuroscience, 9(3):420, tive memory, MNIST, NMNIST, DVS 128 gesture, TIDIG- 2006. ITS etc. and achieved state-of-the-art accuracy. [Gutig,¨ 2016] Robert Gutig.¨ Spiking neurons can discover predictive features by aggregate-label learning. Science, References 351(6277):aab4113, 2016. [ ] [Abdollahi and Liu, 2011] Mohammad Abdollahi and Shih- Hennig, 2013 Matthias H Hennig. Theoretical models of Chii Liu. Speaker-independent isolated digit recognition synaptic short term plasticity. Frontiers in computational using an aer silicon cochlea. In 2011 IEEE Biomedical neuroscience, 7:45, 2013. Circuits and Systems Conference (BioCAS), pages 269– [Jin et al., 2018] Yingyezhe Jin, Wenrui Zhang, and Peng 272. IEEE, 2011. Li. Hybrid macro/micro level backpropagation for train-

2805 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)

ing deep spiking neural networks. In Advances in Neural based on spiking neural networks. In 2018 International Information Processing Systems, pages 7005–7015, 2018. Joint Conference on Neural Networks (IJCNN), pages 1–8. [Kadous and others, 2002] Mohammed Waleed Kadous IEEE, 2018. et al. Temporal classification: Extending the classification [Wu et al., 2018b] Yujie Wu, Lei Deng, Guoqi Li, Jun Zhu, paradigm to multivariate time series. University of New and Luping Shi. Spatio-temporal backpropagation for South Wales Kensington, 2002. training high-performance spiking neural networks. Fron- tiers in neuroscience [Kaiser et al., 2018] Jacques Kaiser, Hesham Mostafa, and , 12, 2018. Emre Neftci. Synaptic plasticity dynamics for deep con- [Wu et al., 2019] Yujie Wu, Lei Deng, Guoqi Li, Jun Zhu, tinuous local learning. arXiv preprint arXiv:1811.10766, Yuan Xie, and Luping Shi. Direct training for spiking 2018. neural networks: Faster, larger, better. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, [Kaiser et al., 2019] Jacques Kaiser, Alexander Friedrich, pages 1311–1318, 2019. J Tieck, Daniel Reichard, Arne Roennau, Emre Neftci, and Rudiger¨ Dillmann. Embodied event-driven random back- [Zenke and Ganguli, 2018] Friedemann Zenke and Surya propagation. arXiv preprint arXiv:1904.04805, 2019. Ganguli. Superspike: in multi- layer spiking neural networks. Neural computation, [Karim et al., 2019] Fazle Karim, Somshubra Majumdar, 30(6):1514–1541, 2018. Houshang Darabi, and Samuel Harford. Multivariate lstm-fcns for time series classification. Neural Networks, 116:237–245, 2019. [Lee et al., 2016] Jun Haeng Lee, Tobi Delbruck, and Michael Pfeiffer. Training deep spiking neural networks using backpropagation. Frontiers in neuroscience, 10:508, 2016. [Mohemmed et al., 2012] Ammar Mohemmed, Stefan Schliebs, Satoshi Matsuda, and Nikola Kasabov. Span: Spike pattern association neuron for learning spatio- temporal spike patterns. International journal of neural systems, 22(04):1250012, 2012. [Neftci et al., 2019] Emre O Neftci, Hesham Mostafa, and Friedemann Zenke. Surrogate gradient learning in spiking neural networks. arXiv preprint arXiv:1901.09948, 2019. [Pan et al., 2019] Zihan Pan, Jibin Wu, Malu Zhang, Haizhou Li, and Yansong Chua. Neural population coding for effective temporal classification. In 2019 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2019. [Shrestha and Orchard, 2018] Sumit Bam Shrestha and Gar- rick Orchard. Slayer: Spike layer error reassignment in time. In Advances in Neural Information Processing Sys- tems, pages 1412–1421, 2018. [Shrestha et al., 2019] Amar Shrestha, Haowen Fang, Qing Wu, and Qinru Qiu. Approximating back-propagation for a biologically plausible local learning rule in spiking neu- ral networks. In Proceedings of the International Confer- ence on Neuromorphic Systems, page 10. ACM, 2019. [Stevens and Zador, 1996] Charles F Stevens and An- thony M Zador. When is an integrate-and-fire neuron like a poisson neuron? In Advances in neural information processing systems, pages 103–109, 1996. [Tavanaei and Maida, 2019] Amirhossein Tavanaei and An- thony Maida. Bp-stdp: Approximating backpropagation using spike timing dependent plasticity. Neurocomputing, 330:39–47, 2019. [Wu et al., 2018a] Jibin Wu, Yansong Chua, and Haizhou Li. A biologically plausible speech recognition framework

2806