A Synchronized Hillock for Memristive Neuromorphic Systems

Ryan Weiss, Gangotree Chakma and Garrett S. Rose Department of Electrical Engineering and Computer Science University of Tennessee, Knoxville Knoxville, Tennessee 37996 USA Email: {rweiss2, gchakma, garose}@utk.edu

Abstract—In this paper we present circuit techniques to plasticity (STDP) based online learning. The synchronous optimize analog specifically for operation in memristive nature of the system helps simplify the necessary control logic neuromorphic systems. Since the peripheral circuits and control for STDP, thus reducing area utilization. Here we describe signals of the system are digital in nature, we take a mixed-signal circuit design approach to leverage analog computation in multi- an analog axon hillock neuron design approach leveraging plying and accumulating digital input spikes and generate binary domino logic to synchronize output spikes. We further show spikes as outputs to be consistent with surrounding synchronous that the proposed provides desired energy and area efficiency digital logic circuits. A novel approach for synchronization is for emerging CMOS-memristive neuromorphic implementa- leveraged based on domino logic. The principal advantage of tions. utilizing analog neurons within an overall digital system design is to ensure efficiency in size and power consumption. Energy II. MEMRISTIVE per spike was determined to be 20 fJ, based on Cadence Spectre simulations of the proposed domino-based . The synapse of a neural network stores a weight value that relates the strength of a firing spike of a pre-synaptic neuron to I. INTRODUCTION the following post-synaptic neuron. The weights characterize The represents a very complex architecture the required activation of the preceding neuron to activate the for computing and processing data. A deep understanding following neuron. This relationship is created by the output for how biological neural systems provide both accuracy and current of the synapse. energy efficiency remains a goal for many researchers. During The synapse design considered here uses a twin memristor this continuing process, researchers have developed neural configuration (shown in Fig. 1) to store the weight value. network models which emulate brain functionality, including Memristors are two terminal nanoscale and non-volatile de- deep learning techniques used to train such systems. However, vices first theorized by Chua [2] in 1971. The non-volatile offline training approaches are still inefficient compared to nature of these devices provide opportunities for high effi- the low energy consumption exhibited by the human brain. ciency in area and energy consumption. A memristor provides Further, for many brain-inspired approaches, researchers often multiple resistance levels between the low resistance state leverage von Neumann machines to implement neural net- (LRS) and high resistance state (HRS). The LRS and HRS works. The training and execution of many machine learning of any memristor are dependent on the switching material, systems require a large amount of energy and area resources process conditions, noise and environmental conditions. Sev- which is not efficient for embedded systems applications. eral materials used to build memristors for their switching Thus, many researchers try to overcome such limitations behavior, including TaOx [3], TiO2 [4], and HfOx [5]. All of by adapting unconventional computing architectures, such as these memristors are differentiated according to LRS values, neuromorphic computing [1]. Memristive neuromorphic com- LRS to HRS ratios, threshold voltage, and switching time. puting leverages metal-oxide memristors [2], [3] to minimize For this design a suitable range of LRS and HRS has been the area and energy consumption. considered based on the literature. In this research, the proposed memristive neuromorphic architecture is constructed from both analog and digital com- A. Synapse Design and Timing ponents. Further, communication between neurons is syn- The synapse uses a digital logic block to drive each memris- chronous, enabling a simplified approach for controlling the tor in the pair to a high voltage. The two phases of operation timing of spiky information. Artificial are imple- examined here are the accumulation phase and learning phase. mented using metal-oxide memristors to store weight values Other phases important in practice but beyond the scope of this and transmit analog weighted results to post-synaptic neurons. work are the forming phase and programming phase, which The neuron uses the analog output of the synapse to produce enable memristor behavior and set the pair to an initial weight. a firing event (or spike) that is synchronized with the system. During the accumulation phase, the synapse produces an Further, the system considered leverages the relative timing of output current, Iin, that is fed into the neuron. The difference synchronous digital spikes to perform spike time dependent in the resistance values of the memristors in each pair creates

978-1-5090-6389-5/17/$31.00 ©2017 IEEE 361 occurs when the synapse fires (Fpre) the clock cycle before the neuron fires (Fpost), illustrated in Fig. 2. Long term depression (Fig. 3) is performed on the synapse when it fires the clock cycle after the output neuron fires. During the neuron’s output fire, the neuron drives the summing node high for one clock cycle and then low for the next.

III. NEURON

Fig. 1: Synaptic input illustrating the “twin memristor” ap- proach to providing both positive and negative weights. Also shown are the FETs used in the learning phase and integrator representing the input of a neuron. a positive or negative weight. Positive weights are created when the output current is adding charge into the neuron. This happens when Rp in Fig. 1 is less than Rn. Negative weights are created when the output current is pulling charge from the neuron. This is accomplished when R is less than R . n p Fig. 4: Implementation of Axon-Hillock neuron with a com- During the learning phase, the synapse updates the weights parator for variable threshold. by changing the resistance of the memristors.

0.6 CLK 0 (V) -0.6

0.6 0 F

(V) pre -0.6

0.6 F 0 pre_t (V) -0.6

0.6 F 0 post (V) -0.6

0.6 0 V

(V) op -0.6

0.6 0 V (V) on -0.6 0.6 Summing Node 0 (V) -0.6 Fig. 2: Timing diagram of LTP

Fig. 5: Proposed Implementation of synchronous neuron with a comparator for variable threshold. 0.6 CLK 0 (V) -0.6

0.6 F The axon hillock neuron first proposed by Carver Mead 0 pre (V) -0.6 [1] takes an input current, integrates the input current on 0.6 F 0 post (V) -0.6 a capacitor and outputs a voltage spike upon crossing a

0.6 F 0 post_t threshold. The axon hillock circuit considered has a variable (V) -0.6 threshold that triggers a fire when the stored voltage on the 0.6 V 0 op (V) -0.6 capacitor Cmem reaches the voltage Vref . The output spike 0.6 V 0 on (V) width, refractory period and reset time is dependent on the -0.6 C C M 0.6 Summing Node sizing of the capacitors and transistors , , 9, and 0 mem fb (V) -0.6 M10. For this project the output spike must be synchronized Fig. 3: Timing diagram of LTD with the system clock. Neuron operation is determined by the pre-neuron synapses. The waveforms in Fig. 2 and 3 show the signals used by If there are pre-neuron synapse fires, the neuron sums and the synapse. To update the weight of the synapse, the voltage integrates the weight dependent output currents of the synapses across the memristor pair needs to be larger than the threshold and holds the charge on the capacitor until there is another voltage for the memristor [6]. To accomplish this the neuron input into the neuron. When the charge on the capacitor goes drives the summing node while the synapse drives a voltage above the threshold voltage, the neuron fires and resets the of opposite polarity from the left. Long term potentiation is capacitor. The neuron has a refractory period of two clock initiated when the synapse causes the neuron to fire. This cycles that it will not accumulate inputs from the pre-neuron

362 synapses. The output fire of the neuron drives the post-neuron the neuron with the clock period. This allows the D flip flop synapses to fire into the next neuron layer in the network. The to capture the output of the neuron only on the next clock output of the neuron also controls the weight updates of its cycle. If the pulse width of the neuron is not equal to the pre-neuron synapses. clock period, the D flip flop will either miss spikes or drive The proposed neuron design leverages the synchronous a high output for more than one clock cycle. The output fire system and control signals used to operate the system with signal Fpost can also be delayed by a clock cycle. This occurs the ideas presented for the axon hillock neuron. The design when the integrated capacitance crosses the threshold and the follows the same basic functionality of integrating the input delay to Vout going high causes the D flip flop to capture it current and triggering a firing event upon crossing the thresh- on the next clock cycle. The neuron needs a built in refractory old. In the proposed design, the control signals synchronize period for reseting Cmem, which is overshadowed by the two the timing of neuron output spikes. clock cycle refractory period for learning. A. Neuron Input Circuit Design C. Pulse Control Inverter The current input of the neuron flows from the memristors The proposed domino based design incorporates the control of the synapse through a series of PFETs into the input signals needed for the learning phase in the computing phase. capacitor. The proposed design uses two PFETs that block The design feeds the delayed fire signal of the preceding the capacitor C from the voltage drive of the summing neuron into the subsequent neuron. This signal drives a domino mem F node during learning but pass current into the capacitor logic inverter that evaluates the neuron’s output signal post at during accumulation. Design specifications for the PFETs and the appropriate time. Since the evaluation of the comparison is capacitor are based on the available resistance levels of the performed using domino logic, the output signal will not fall to F memristors, as well as clock speed. Analogous to the standard negative rail until the end of the clock period when pre,t goes axon hillock neuron, the proposed domino based design uses low. This means the functionality of the output is restricted the PFETs at the input to define the refractory period. For the to driving the necessary gates for learning and computing. domino based neuron, the refractory period is defined by clock The output of the neuron is not internally connected to the cycles needed in the learning phase and is not determined input to set the refractory period. The reset of the capacitor is immediately driven, while the refractory period is still defined by sizing the capacitor C or NFETs M9 and M10. Results fb F F presented for this implementation assumes a defined refractory by post and post,t signals on the input PFETs. period of two clock cycle. IV. SIMULATION RESULTS For this design, the sizing of the PFETs and Cmem are intended to be small to allow for integrating many neurons per C chip. A high speed clock is used to keep the size of mem 0.6 F 0 pre (V) small . The sizes of the PFETs are also small because as the -0.6 r I ds resistance of the PFETs increases the input current, in, 0.6 F 0 post decreases. Keeping the size of the PFETs small also keeps the (V) -0.6 capacitance the neuron output needs to drive small. 0.6 V 0 mem (V) B. Neuron Output Circuit Design -0.6 0.6 V 0 cmpr The neural network structure in this research uses a syn- (V) -0.6 chronous digital logic system to implement the weight update 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 of the synapses. Because of this synchronous approach the Time(us) neuron must output a synchronous signal to the synapse to Fig. 6: Simulation result for the analog neuron with domino update its weight upon a post-synapse neuron fire. To syn- logic. chronize the asynchronous analog integration and comparison computations performed by the neuron, it outputs a clocked In the examples provided in Fig. 6 and Fig. 7 the neurons digital pulse. To align the output signal with the pre-neuron both take three spikes through the synapse to cause an output synapse a D-flip flop is be used to capture the output pulse. fire. Both designs function as intended, taking inputs from the The proposed design uses a domino logic inverter to control synapses and integrating them on Vmem and generating an pulse timing in a way that reduces transistor count and energy. output fire pulse Fpost. For the standard axon hillock, Vout For comparison purposes, a flip flop synchronized axon is captured on a D flip-flop to produce Fpost. As described hillock is considered. This neuron uses two inverters to drive above, Vout is designed to be high for the one clock period. the output to the positive rail when the voltage stored on Cmem The concern for the differential amplifier in the standard axon crosses the threshold Vref . The inverter drives the voltage hillock is the rise time, while for the proposed domino based Vmem higher than the threshold by positive feedback though design it is the fall time. Both designs require that the voltage Cfb. The output turns on NFET M9 and stores the output on Vcmpr is quickly driven high so as not to miss a fire. The the first D flip flop. The D flip flop outputs the signal Fpost. proposed domino design does not require the same high speed, The voltage Vpw is set to equalize the output pulse width of and is limited by the reset time of Vcmpr.

363 edakloop feedback oio hscntttsteetr lc yl o every for cycle clock entire the constitutes this domino, h oe osmdwhen consumed power the o prahi smaue rmwhen from measured is it approach flop eursahg pe lc oke h etr iesmall, size feature the keep to system clock the speed since However, high reliability. advantages a and for requires power allows synapses. of system memristive terms our for in of learning nature perform synchronous to The output the use nivre nteao ilc,teba urn sgreater. is voltage current the bias where time the all control hillock, V averages reliably power axon accumulation to the The needs in However, inverter amplifier used. an is differential amplifier the the differential to same because close the and is since idle power same both accumulation for The voltage linear power. accumulated a accumulation causes with This power increases. in increases, M6 axon increase and voltage standard M5 output the through amplifier current In the differential M7. the no and almost as M6, is hillock, M5, there because through logic power flow of domino idle current average the The for an lower fires. is much and power is accumulations idle neurons between The between used equivalent. setup power be simulation to The intended process. is 65nm the in AL :PromneMtisfrSnhoosA Neurons AH Synchronous for Metrics Performance I: TABLE nryprsiefrtedmn nldsteeeg sdin used energy spike. produce the per that includes energy cycles domino the clock the of the for part spike considered per Energy is it because power o h o ae prah h cuuainpwrincludes power accumulation of the accumulation approach, the based flop the For i.7 iuainrsl o h nlgao-ilc neuron. axon-hillock analog the for result Simulation 7: Fig.

mem (V) (V) (V) (V)

(V) -0.6 -0.6 -0.6 -0.6 h motneo ycrnzn steaiiyt reliably to ability the is synchronizing of importance The al hw iuainrslsfrtenuospresented neurons the for results simulation shows I Table -0.6 0.6 0.6 0.6 0.6 0.6 0 0 0 0 0 .100 .300 .500 .70.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 cuuainPwr( Power Accumulation sacmltn urn rmtesnpeotu.Frthe For output. synapse the from current accumulating is nryprSie(fJ) Spike per Energy rqec (MHz) Frequency dePwr( Power Idle rnitrCount Transistor Metric C fb hsi o nlddi h accumulation the in included not is This . µ V W) mem .D V. µ W) rmtesnpefie n truncates and fire, synapse the from V ISCUSSION mem Time(us) F xnHillock Axon post sdie ihb h positive the by high driven is (Flop) 0.417 4.76 167 140 41 and F V post,t out xnHillock Axon (Domino) oshigh. goes hl o the for while , 6.10 2.55 167 20 28 V F V F V cmpr F mem post pre out pre . 364 ob nrae n eur agrdvcsadmr energy. more and devices larger need require will inverter and capacitance the increased of load be strength the to drive If is the inverter. high, the input excessively logic by the is domino determined since the is synapses capability of out of strength should fan neuron multitude for The a a currents. into used in handle summed fan to signals The circuit. able the clock be out skew of fan limits the of clock in is fan defined of delay and rigorously not affects Also of outputs. The the control capturing clock. the require from could arise can issues ie orpoueaddsrbt ernsfrGovernmental thereon. for notation reprints copyright any distribute notwithstanding and purposes autho- reproduce is to Government U.S. rized under The NCS-FO-1631472. Foundation number No. Science agreement Grant National under the Laboratory and Research FA8750-16-0065 Force Air by interesting topic. for this on Knoxville discussions Tennessee, useful of and Amer University Sherif the and from Sayyaparaju, Sagarvarma Adnan, Musabbir reduces circuit count. proposed transistor with the our along that power From and see timing. energy we guaranteeing and results utilize power and simulation learning reducing the in for necessary take them we signals system, the our of of advantage online nature perform synchronous was that the Given synapses neuron memristive learning. The for activity. work power to low designed robust for neuron analog 4 .Mdio-ier,F enr .Cre,H bal,M .Pcet and Pickett, D. M. Abdalla, H. Carter, R. Perner, F. Medeiros-Ribeiro, G. [4] 5 .Le .Ce,P hn .W,F hn .Wn,P zn,M-.Tsai, M.-J. Tzeng, P. Wang, C. Chen, F. Wu, T. Chen, P. Chen, Y. Lee, H. [5] 3 .J ag .Zag .P tahn .Ma,M .Pcet .D Kelley, D. R. Pickett, D. M. Miao, F. Strachan, P. J. Zhang, M. Yang, J. J. [3] 2 .O ha Mmitrtemsigcrutelement,” circuit missing “Memristor-the Chua, O. L. [2] 1 .A ed in Mead, A. C. [1] 6 .Kaisy .Rmdn .G remn n .Kldy “Vteam: Kolodny, A. and Friedman, G. E. Ramadan, M. Kvatinsky, S. [6] hsmtra sbsdi atuo eerhsponsored research upon part in based is material Md. This Dean, Mark Dr. thank to like would authors The synchronous a of design the presented have we paper, this In .S ilas Lgomlsicigtmsfrttnu ixd bipolar resolution,” dioxide titanium and for origin times switching memristors: “Lognormal Williams, S. R. 2010. 232102, nCrutTheory Circuit on n .Le,“o-oe n aoeodsicigi outhafnium robust cap,” ti in thin switching a with nanosecond memory and resistive “Low-power oxide Lien, C. and 2011. 095702, .Mdio-ier,adR .Wlim,“ihsicigedrnein endurance switching “High devices,” Williams, memristive S. taox R. and Medeiros-Ribeiro, G. dio-ely 1989. Addison-Wesley, nCrut n ytm I xrs Briefs Express II: 2015. Systems memristors,” and voltage-controlled Circuits on for model general A 2010. 44–46, pp. 1, no. 31, vol. o.1,n.5 p 0–1,Spebr1971. September 507–519, pp. 5, no. 18, vol. , nlgVS n erlSystems Neural and VLSI Analog A I C VI. CKNOWLEDGMENT R ple hsc Letters Physics Applied EFERENCES ONCLUSION Nanotechnology EEEeto eieLetters Device Electron IEEE o.6,n.8 p 786–790, pp. 8, no. 62, vol. , o.9,n.2,p. 23, no. 97, vol. , o.2,n.9 p. 9, no. 22, vol. , edn,MA: Reading, . EETransactions IEEE EETransactions IEEE ,