<<

Extending the Neural Framework for Nonideal Silicon Synapses

Aaron R. Voelker∗, Ben V. Benjamin†, Terrence C. Stewart∗, Kwabena Boahen† and Chris Eliasmith∗ {arvoelke, tcstewar, celiasmith}@uwaterloo.ca {benvb, boahen}@stanford.edu ∗Centre for Theoretical , University of Waterloo, Waterloo, ON, Canada. †Bioengineering and , Stanford University, Stanford, CA, U.S.A.

Abstract—The Neural Engineering Framework (NEF) is a for parasitic capacitances. We also account for the variability theory for mapping computations onto biologically plausible (i.e., heterogeneity) introduced by transistor mismatch in the networks of spiking . This theory has been applied to a extended pulse’s width and height, and in the LPF’s two time- number of neuromorphic chips. However, within both silicon and constants. In II, we demonstrate how to extend the NEF to real biological systems, synapses exhibit higher-order dynamics directly harness§ these features for system-level computation. and heterogeneity. To date, the NEF has not explicitly addressed This extension is tested by software simulation in IV using how to account for either feature. Here, we analytically extend the circuit models described in III. § the NEF to directly harness the dynamics provided by heteroge- § neous mixed-analog-digital synapses. This theory is successfully validated by simulating two fundamental dynamical systems in II.EXTENDINGTHE NEURAL ENGINEERING Nengo using circuit models validated in SPICE. Thus, our work FRAMEWORK reveals the potential to engineer robust neuromorphic systems with well-defined high-level behaviour that harness the low- The NEF consists of three principles for describing neural level heterogeneous properties of their physical primitives with computation: representation, transformation, and dynamics [2]. millisecond resolution. This framework enables the mapping of dynamical systems onto recurrently connected networks of spiking neurons. We begin by providing a self-contained overview of these three I.THE NEURAL ENGINEERING FRAMEWORK principles using an ideal first-order LPF. We then extend these The field of is concerned with principles to the heterogeneous pulse-extended second-order building specialized hardware to emulate the functioning of LPF, and show how this maps onto a target neuromorphic the nervous system [1]. The Neural Engineering Frame- architecture. work (NEF; [2]) compliments this goal with a theory for “compiling” dynamical systems onto spiking neural networks, A. Principle 1 – Representation and has been used to develop the largest functioning model The first NEF principle states that a vector k may of the , capable of performing various perceptual, x(t) R be encoded into the spike-trains of neurons with∈ rates: cognitive, and motor tasks [3]. This theory allows one to map δi n an algorithm, expressed in software [4], onto some neural ri(x) = Gi [αiei x(t) + βi] , i = 1 . . . n, (1) substrate realized in silicon [5]. The NEF has been applied to · neuromorphic chips including Neurogrid [5], [6] and a VLSI where Gi is a model whose input current to the soma prototype from ETH Zurich [7]. is the linear encoding αiei x(t) + βi with gain αi > 0, unit- · n k length encoding vector ei (row-vectors of E R × ), and ∈ However, the NEF assumes that the postsynaptic current bias current βi. The state x(t) is typically decoded from spike- (PSC) induced by a presynaptic spike is modelled by a trains (see Principle 2) by convolving them with a first-order first-order lowpass filter (LPF). That is, by convolving an LPF that models the PSC triggered by spikes arriving at the impulse representing the incoming spike with an exponentially synaptic cleft. We denote this filter as h(t) in the time-domain decaying impulse-response. Furthermore, the exponential time- and as H(s) in the Laplace domain: constant is assumed to be the same for all synapses within 1 t 1 the same population. In silicon, synapses are neither first- h(t) = e− τ H(s) = . (2) order nor homogeneous and spikes are not represented by τ ⇐⇒ τs + 1 impulses.1 Synapse circuits have parasitic elements that re- Traditionally the same time-constant is used for all synapses sult in higher-order dynamics, transistor mismatch introduces projecting to a given population.2 variability from circuit to circuit, and spikes are represented by pulses with finite width and height. Previously, these features B. Principle 2 – Transformation restricted the overall accuracy of the NEF within neuromorphic hardware (e.g., in [5], [6], [7]). The second principle is concerned with decoding some desired vector function f : S Rk of the represented The silicon synapses that we study here are mixed-analog- vector. Here, S is the domain of→ the vector x(t) represented digital designs that implement a pulse-extender [9] and a first- via Principle 1—typically the unit k-cube or the unit k-ball. th order LPF [10], modelled as a second-order LPF to account Let ri(x) denote the expected firing-rate of the i neuron in

1These statements also hold for real biological systems [8]. 2See [11] for a recent exception. w x δ y 1 g(x) τs+1 G [ ] D w1 x · Γ1 H1 u τ + . . . . w τf(x)+x Φ j x δ y D Γj Hj G [ ] Dg(x) · . . . . Fig. 1. Standard Principle 3 (see (6)) mapped onto an ideal architecture to wm u Φ x implement a general nonlinear dynamical system (see (5)). The state-vector D Γm Hm x is encoded in a population of neurons via Principle 1. The required signal w is approximated by τu plus the recurrent decoders for τf(x) + x applied to δ, such that the first-order LPF correctly outputs x. The output vector y is g(x) approximated using the decoders D . Fig. 2. Using extended Principle 3 (see (11)) to implement a general nonlinear dynamical system (see (5)) on a neuromorphic architecture. The th matrix representation Φ is linearly transformed by Γj to drive the j synapse. response to a constant input x encoded via (1). To account Dashed lines surround the silicon-neuron array that filters and encodes x into for noise from spiking and extrinsic sources of variability, spike-trains (see Fig. 3). we introduce the noise term η (0, σ2). Then the matrix f(x) n k ∼ N D R × that optimally decodes f(x) from the spike- j s 1 ∈ III, is an extended pulse γj (1 e− ) s− convolved with trains δ encoding x is obtained by solving the following § − 1 problem (via regularized least-squares): a second-order LPF ((τj,1s + 1) (τj,2s + 1))− . These higher- order effects result in incorrect dynamics when the NEF is Z n 2 applied using the standard Principle 3 (e.g., in [5], [6], [7]), f(x) X k D = arg min f(x) (ri(x) + η) di d x (3) as shown in IV. n×k − D R S i=1 § ∈ From (7), observe that we must drive the jth synapse with n some signal wj(t) that satisfies the following (W(s) denotes X f(x) = (δi h)(t)d (f(x) h)(t). (4) the Laplace transform of w(t) and we omit j for clarity): ⇒ ∗ i ≈ ∗ i=1 X(s) = H(s) The quantity in (4) may then be encoded via Principle 1 to W(s) complete the connection between two populations of neurons.3 s 1 W(s) 1 e− s− ⇐⇒ 1 − 2 C. Principle 3 – Dynamics = γ− (1 + (τ1 + τ2)s + τ1τ2s )X(s). (9) The third principle addresses the problem of implementing s the following nonlinear dynamical system: To solve for w in practice, we first substitute 1 e− = 2 2 3 3 − k s ( s )/2 + O( s ), and then convert back to the time- x˙ = f(x) + u, u(t) R − ∈ domain to obtain the following approximation: y = g(x). (5) 1  w = (γ)− (x + (τ1 + τ2)x˙ + τ1τ2x¨) + w˙ . (10) Since we take the synapse (2) to be the dominant source 2 of dynamics for the represented vector [2, p. 327], we must Next, we differentiate both sides of (10): essentially “convert” (5) into an equivalent system where the 1 ...  integrator is replaced by a first-order LPF. This transformation w˙ = (γ)− (x˙ + (τ1 + τ2)x¨ + τ1τ2 x) + w¨ , is accomplished by driving the filter h(t) with: 2 and substitute this w˙ back into (10) to obtain: w := τx˙ + x = (τf(x) + x) + (τu) (6) 1 = (w h)(t) = x(t), (7) w = (γ)− (x + (τ1 + τ2 + /2)x˙ + ⇒ ∗ (τ1τ2 + (/2)(τ1 + τ2))x¨) + so that convolution with h(t) achieves the desired integration. 1 ... 2 Therefore, the problem reduces to representing x(t) in a (γ)− (/2)τ1τ2 x + ( /4)w¨ . population of neurons using Principle 1, while recurrently 1 ... Finally, we make the approximation (γ)− (/2)τ1τ2 x + decoding w(t) using the methods of Principle 2 (Fig. 1). (2/4)w¨ w, which yields the following solution to (9):  D. Extensions to Silicon Synapses wj = ΦΓj, (11) Consider an array of m heterogeneous pulse-extended Φ := [x x˙ x¨] , second-order LPFs (in the Laplace domain): " 1 # 1 Γj := (jγj)− τj,1 + τj,2 + j/2 , j s 1 γj (1 e− ) s− τ τ + ( /2)(τ + τ ) Hj(s) = − , j = 1 . . . m, (8) j,1 j,2 j j,1 j,2 (τ s + 1) (τ s + 1) j,1 j,2 4 where jγj is the area of the extended pulse. We compute where j is the width of the extended pulse, γj is the the time-varying matrix representation Φ in the recurrent con- height of the extended pulse, and τj,1, τj,2 are the two time- nection (plus an input transformation) via Principle 2 (similar constants of the LPF. Hj(s), whose circuit is described in 4This form for (6) is Φ = [x x˙ ] and Γ = [1, τ]T . Further generalizations T j 3The effective weight-matrix in this case is W = E Df(x) . are explored in [12]. Iγ τ1 τ2 Iє MS4 MS5  γ MP2 γ ML5 ML6 є Iτ2 Isyn MS3 MP1 CP IPE τ Vspk ML1 ML3 ML4 є MS2 Vspk 10 30 90 0.4 0.8 1.6 0.2 0.4 0.8 0.2 1 5 Time (ms) Time (ms) Width (ms) 1 ML2 Height (ms− ) S MS1 CL2 C Vrst L1 Fig. 4. Log-normally distributed parameters for the silicon synapses. τ (µ ± C Iτ1+I τ2 1 σ = 31 ± 6.4 ms) and τ2 (0.8 ± 0.11 ms) are the two time-constants of the second-order LPF;  (0.4 ± 0.06 ms) and γ (1.0 ± 0.29 ms−1) are the Fig. 3. Silicon synapse and soma. The synapse consists of a pulse-extender widths and heights of the extended pulse, respectively (see (8)). (MP1,2) and a LPF (ML1-6). The pulse-extender converts subnanosecond digital- pulses—representing input spikes—into submillisecond current-pulses (IPE). Then the LPF filters these current-pulses to produce the synapse’s output (Isyn). m The soma integrates this output current on a capacitor (CS) and generates a This equation may be solved to obtain the trajectory of I , subnanosecond digital-pulse—representing an output spike—through positive- and hence the steady-state spike-rate: feedback (MS2-5). This pulse is followed by a reset pulse, which discharges 1 the capacitor. This schematic has been simplified for clarity. κIsyn  Isyn / I0 + 1 − r(Isyn) = ln , CSUT Isyn / Ithr + 1 to Fig. 1). We then drive the jth synapse with its time-invariant where I0 and Ithr are the values of Im at reset and threshold, respectively. These values correspond to the leakage current of linear transformation Γj (Fig. 2). the transistors and the peak short-circuit current of the inverter, A more refined solution may be found by expanding respectively. s 2 1 e− to the third order, which adds  /12 to the third − 5 coordinate of Γj. However, this refinement does not improve IV. VALIDATION WITH CIRCUIT MODELS our results. Some remaining details are addressed in IV. § We proceed by validating the extended NEF neuromorphic architecture (see Fig. 2) implemented using the circuit models III.CIRCUIT DESCRIPTION (see III) on two fundamental dynamical systems: an integrator We now describe the silicon synapse and soma circuits and a§ controlled oscillator. We use Nengo 2.3.1 [4] to simulate (Fig. 3) that we use to validate our extensions to the NEF. this neuromorphic system with a time-step of 50 µs. Test data An incoming pulse discharges CP, which I subsequently is sampled independently of the training data used to optimize charges [9]. As a result, MP2 turns on momentarily, producing (3) via regularized least-squares. For comparison with (5)— an output current-pulse IPE with width and height: simulated via Euler’s method—spike-trains are filtered using (2) with τ = 10 ms. For each trial, the somatic parameters CPVγ I, Iγ,  = / γ = and synaptic parameters (Fig. 4) are randomly sampled from where Vγ is the gate-voltage at which MP2 can no longer pass distributions generated using a model of transistor mismatch Iγ. This current-pulse is filtered to obtain the synapse’s output, (validated in SPICE). These parameters determine each linear Isyn, whose dynamics obeys: transformation Γj, as defined in (11).

dIsyn τ + Isyn = AIPE, A. Integrator 1 dt Consider the one-dimensional integrator, x˙ = u, y = x. where τ = CL1UT / Iτ1,A = Iτ2 / Iτ1, and UT is the thermal 1 We represent the scalar x(t) in a population of 512 modelled voltage. The above assumes all transistors operate in the silicon neurons. To compute the recurrent function, we use subthreshold region and ignores all parasitic capacitances [10]. (11), and assume x¨ =u ˙ is given. To be specific, we optimize If we include the parasitic capacitance CL2, a small-signal (3) for Dx, and then add u and u˙ to the second and third analysis reveals second-order dynamics: coordinates of the representation, respectively, to decode: d2Isyn dIsyn τ τ + (τ + τ ) + Isyn = AIPE, Φ = [x, u, u˙] . 1 2 dt 1 2 dt To evaluate the impact of each extension to the NEF, we where τ = CL2UT / κIτ2, and κ is the subthreshold-slope 2 simulate our network under five conditions: using standard coefficient. This second-order LPF and the aforementioned Principle 3, accounting for second-order dynamics, accounting pulse-extender are modelled together in the Laplace domain for the pulse-extender, accounting for the transistor mismatch by (8) after scaling by A. in τ1, and our full extension (Fig. 5). We find that the last The dynamics of the soma are described by: reduces the error by 63%, relative to the first, across a wide range of input frequencies (5–50 Hz). CSUT dIm = Im + Isyn, κIm dt B. Controlled 2D Oscillator where Im is the current in the positive-feedback loop, assuming all transistors operate in the subthreshold region [13], [14]. Consider a controllable two-dimensional oscillator, x˙ = f(x) + u, y = x, 5In general, the coefficients 1, /2, 2/12, and so on, correspond to the i T [0/q] Pade´ approximants of Pq (−) . f(x1, x2, x3) = [ ωx3x2, ωx3x1, 0] , i=0 (i+1)! − Principle 3 parameter variability validated in SPICE, for two fundamental 0.35 2nd-Order examples: an integrator and a controlled oscillator. When 0.30 Pulse-Extender Mismatch compared to the previous standard approach, our extension is 0.25 Full 0.20 shown to reduce the error by 63% and 73% for the integrator

0.15 and oscillator, respectively. Thus, our theory enables a more

Normalized RMSE 0.10 accurate mapping of nonlinear dynamical systems onto a

0.05 recurrently connected neuromorphic architecture using non- 0.00 ideal silicon synapses. Furthermore, we derive our theory in- 5 10 15 20 25 30 35 40 45 50 Frequency (Hz) dependently of the particular neuron model and encoding. This advance helps pave the way toward understanding how non- Fig. 5. Effect of each NEF extension, applied to a simulated inte- ideal physical primitives may be systematically analyzed and grator given frequencies ranging from 5–50 Hz (mean spike-rate 143 Hz). then subsequently exploited to support useful computations in The standard approach (Principle 3) achieves a normalized RMSE of neuromorphic hardware. 0.203 with 95% CI of [0.189, 0.216] compared to the ideal, while our extension (Full) achieves 0.073 with 95% CI of [0.067, 0.080]—a 63% reduction in error—averaged across 25 trials and 10 frequencies. The largest ACKNOWLEDGEMENTS improvement comes from accounting for the second-order dynamics, while a smaller improvement comes from accounting for the pulse-extended dynamics. This work was supported by CFI and OIT infrastructure, Accounting for the transistor mismatch on its own is counter-productive. the Canada Research Chairs program, NSERC Discovery grant 261453, ONR grants N000141310419 and N0001415l2827,

y˜ and NSERC CGS-D funding. The authors thank Wilten Nicola 0.75 yˆ for inspiring (9) with a derivation for the double-exponential. 0.50 y 0.25 REFERENCES ) t

( 0.00 y [1] C. Mead, Analog VLSI and Neural Systems. Boston, MA: Addison- 0.25 Wesley, 1989. − 0.50 [2] C. Eliasmith and C. H. Anderson, Neural engineering: Computation, − representation, and dynamics in neurobiological systems. MIT Press, 0.75 − 2003. [3] C. Eliasmith, T. C. Stewart, X. Choo, T. Bekolay, T. DeWolf, Y. Tang, and 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 Time (s) D. Rasmussen, “A large-scale model of the functioning brain,” Science, vol. 338, no. 6111, pp. 1202–1205, 2012. [4] T. Bekolay, J. Bergstra, E. Hunsberger, T. DeWolf, T. C. Stewart, Fig. 6. Output of controlled 2D oscillator with ω = 5 Hz (mean spike-rate D. Rasmussen, X. Choo, A. R. Voelker, and C. Eliasmith, “Nengo: A ∗ 140 Hz). The control (x3) is changed from 0.5 to −0.5 at 1 s to reverse the Python tool for building large-scale functional brain models,” Frontiers direction of oscillation. The standard Principle 3 (y˜) achieves a normalized in , vol. 7, 2013. RMSE of 0.188 with 95% CI of [0.166, 0.210] compared to the ideal (y), [5] S. Choudhary, S. Sloan, S. Fok, A. Neckar, E. Trautmann, P. Gao, while our extension (yˆ) achieves 0.050 with 95% CI of [0.040, 0.063]—a T. Stewart, C. Eliasmith, and K. Boahen, “Silicon neurons that compute,” 73% reduction in error—averaged across 25 trials. in International Conference on Artificial Neural Networks, vol. 7552. Springer, 2012, pp. 121–128. [6] S. Menon, S. Fok, A. Neckar, O. Khatib, and K. Boahen, “Controlling where ω is the angular frequency in radians per second, x articulated robots in task-space with spiking silicon neurons,” in 5th 3 IEEE RAS/EMBS International Conference on Biomedical and controls this frequency multiplicatively, and x3∗ is the fixed- . IEEE, 2014, pp. 181–186. point target supplied via input u = x∗ x . The inputs u and 3 3 − 3 1 [7] F. Corradi, C. Eliasmith, and G. Indiveri, “Mapping arbitrary mathemati- u2 initiate the oscillation with a brief impulse. We represent cal functions and dynamical systems to neuromorphic VLSI circuits for the three-dimensional state-vector x(t) in a population of 2048 spike-based neural computation,” in 2014 IEEE International Symposium modelled silicon neurons.6 To compute the recurrent function, on Circuits and Systems (ISCAS). IEEE, 2014, pp. 269–272. [8] A. Destexhe, Z. F. Mainen, and T. J. Sejnowski, “Synaptic currents, we again use (11). For this example, u(t) 0 for most t ≈ and kinetic models,” The handbook of brain theory (apart from initial transients and changes to the target x3∗), and neural networks, vol. 66, pp. 617–648, 1995. and so x¨ = Jf (x) x˙ + u˙ Jf (x) f(x) (where Jf denotes [9] J. V. Arthur and K. Boahen, “Recurrently connected silicon neurons with the Jacobian of f).· We then≈ optimize· (3) for Dx, Df(x), and active dendrites for one-shot learning,” in International Joint Conference Jf (x) f(x) on Neural Networks (IJCNN), vol. 3. IEEE, 2004, pp. 1699–1704. D · , and add u to the second column of the matrix [10] W. Himmelbauer and A. G. Andreou, “Log-domain circuits in subthres- representation to decode: hold MOS,” in Circuits and Systems, 1997. Proceedings of the 40th Midwest Symposium on, vol. 1. IEEE, 1997, pp. 26–30. Φ = [x, f(x) + u, Jf (x) f(x)] . [11] K. E. Friedl, A. R. Voelker, A. Peer, and C. Eliasmith, “Human-inspired · neurorobotic system for classifying surface textures by touch,” Robotics We find that this solution reduces the error by 73% relative to and Automation Letters, vol. 1, no. 1, pp. 516–523, 01 2016. the standard Principle 3 solution (Fig. 6). [12] A. R. Voelker and C. Eliasmith, “Improving spiking dynamical net- works: Accurate delays, higher-order synapses, and time cells,” 2017, Manuscript in preparation. V. SUMMARY [13] E. Culurciello, R. Etienne-Cummings, and K. Boahen, “A biomorphic We have provided a novel extension to the NEF that di- digital image sensor,” IEEE Journal of Solid-State Circuits, vol. 38, no. 2, pp. 281–294, 2003. rectly harnesses the dynamics of heterogeneous pulse-extended [14] P. Gao, B. V. Benjamin, and K. Boahen, “Dynamical system guided second-order LPFs. This theory is validated by software simu- mapping of quantitative neuronal models onto neuromorphic hardware,” lation of a neuromorphic system, using circuit models with IEEE Transactions on Circuits and Systems, vol. 59, no. 10, pp. 2383– 2394, 2012. 6We use this many neurons in order to minimize the noise from spiking.