Ultra-Low Energy Charge Trap Flash Based Synapse Enabled by Parasitic Leakage Mitigation
Total Page:16
File Type:pdf, Size:1020Kb
Ultra-low Energy charge trap flash based synapse enabled by parasitic leakage mitigation Shalini Shrivastava*, Tanmay Chavan and Udayan Ganguly† Department of Electrical Engineering, Indian Institute of Technology Bombay, Mumbai, 400076, India † *[email protected]; [email protected] Abstract shown [13][18]. In addition, for fixed 푉푝푢푙푠푒 applied Brain-inspired computation promises complex cognitive repeatedly, 퐺 increases/decreases gradually and then saturates 4 푚푎푥 tasks at biological energy efficiencies. The brain contains 10 (Fig. 1d). For fixed 푉푝푢푙푠푒, the maximum Δ퐺 (i.e., 훥퐺 ) synapses per neuron. Hence, ultra-low energy, high-density occurs initially. Synapses have two key challenges. First, both synapses are needed for spiking neural networks (SNN). In LTP and LTD this paper, we use tunneling enabled CTF (Charge Trap Flash) stack for ultra-low-energy operation (1F); Further, CTF on an SOI platform and back-to-back connected pn diode and Zener diode (2D) prevent parasitic leakage to preserve energy advantage in array operation. A bulk 100 휇푚 × 100 휇푚 CTF operation offers tunable, gradual conductance change (Δ퐺) i.e. 104 levels, which gives 100 × improvement over literature. SPICE simulations of 1F2D synapse shows ultra- low energy (≤ 3 fJ/pulse) at 180nm node for long-term potentiation (LTP) and depression (LTD), which is comparable to energy estimate in biological synapses (10 fJ). A record low learning rate (i.e., maximum Δ퐺<1% of 퐺- range) is observed – which is tunable. Excellent reliability (> 106 endurance cycles at full conductance swing) is observed. Such a highly energy efficient synapse with tunable learning rate on the CMOS platform is a key enabler for the Fig. 1 a) A synapse connects a pre-neuron to a post-neuron. b) Synaptic human-brain-scale systems. conductance change (ΔG) depends on the spike time difference (Δ푡) between Keywords: Spiking Neural Network; Charge trap flash, post-neuron ( 푡푝표푠푡 : yellow) and pre-neuron (푡푝푟푒: blue) known as Spike Time SONAS, Fowler-Nordheim Tunneling, Synapse Dependent Plasticity (STDP). In memristors, write pulses (LTP: blue and LTD: pink) and read pulses (grey) are applied to change 퐺. c) 퐺 changes with The high-performance computing such as IBM Watson the 푉푝푢푙푠푒 increase is typical memristors behavior. d) 퐺 change and then Δ 푚푎푥 supercomputer is 106 × less energy- and 103 × less area- saturates when fixed 푉푝푢푙푠푒 is applied repeatedly. The G occurs initially. efficient than biological neural network [1]. Each biological This is the critical requirement for SNNs e) An artificial synapse is implemented in a Charge Trap Flash (CTF) enables LTP & LTD by the pre-neuron transmits information to post-neurons through a threshold voltage (푉 ) shift to replicate the learning rate behavior at fixed gate 4 푇 synapse (Fig. 1a). There are approximately 10 synapses per pulses. neuron, which make the synapses the largest component of the neural network. Hence, area density and energy of operation need to be gradual for SNN algorithms in software [19]. In of the synapse are critical determinants of system fact, the learning rate, i.e., 훥퐺푚푎푥 for repeated maximum performance. Recently, instead of large digital circuits to 푉푝푢푙푠푒 needs to be low (<2% of the total range of 퐺) for stable represent analog weights [2], various nanoscale memristive weight evolution in a network during training [20-21][27]. In synapses have been proposed [3-16]. These memristive other words, the change in G should saturate after a larger devices store the weight as an analog conductance (퐺) value to number of identical 푉푝푢푙푠푒푠 i.e., in excess of 256 identical provide excellent areal density improvement [17]. The pulses for analog-valued datasets like Fischer Iris [22]. synapse “learns” by conductance change (Δ퐺) which depends Second, low energy operation (especially write-energy) in the upon the spike time difference (Δ푡) of pre- and post-neurons analog synapse is a critical challenge for memristors [17]. (Fig. 1b), known as spike time dependent plasticity (STDP). Presently, nanoscale memristive synapse may not have Long-term potentiation (LTP: Δ퐺 > 0) and depression (LTD: gradual LTP and LTD. For example, some synapses are binary Δ퐺 < 0) is needed. For memristive synapses, the application (e.g., HfO RRAM [23]). Other synapse has gradual LTP but of a Δ푡-dependent pulse voltage (푉 ) at fixed pulse-width 2 푝푢푙푠푒 abrupt LTD (e.g., Phase Change Memory), which requires (푡 ) causes Δ퐺. The Δ퐺 per pulse depends upon both the 푝 novel synapse circuit design with enhanced controller instantaneous conductance 퐺 and 푉푝푢푙푠푒. For varying 푉푝푢푙푠푒, complexity [24-25]. Other memristors (e.g., Mn doped Δ퐺 increases/decreases with 푉푝푢푙푠푒 (Fig. 1c), which is widely HfO2 [12] or PCMO RRAM [13]) have analog LTP and LTD Shalini Shrivastava and Tanmay Chavan contributed equally to this work. with <100 states. In fact, multiple analog RRAMs in parallel 퐼퐷 = 퐾(푉퐺푆 − 푉푇)푉퐷푆 (1) are required to obtain sufficient precision in weight storage to 퐼퐷 = 퐺푉퐷푆 enable software equivalent learning [26][27]. As an alternative 퐺 = 퐾(푉퐺푆 − 푉푇) (2) to RRAMs, Flash memory-based synapses have been where K is proportionality constant [38]. Erasing (Δ푉푇 < 0) proposed [28-34]. The channel hot electron injection (CHI) for results in LTP (Δ퐺 > 0) and programming (Δ푉푇 > 0) programming during LTP [28-30] uses a large current to inject implies the LTD (Δ퐺 < 0). Thus, 푉푇 shift translates to electrons in the floating gate, resulting in a high energy-loss conductance change (Δ퐺) to enable LTP and LTD. Given [35]. NBTI based trap generation in high 휅 dielectric that 푉푇 has a range from 푉푇−푚푖푛 to 푉푇−푚푎푥, and we applied MOSFETs is also shown to mimic synaptic activity, but it 푉퐺푆 = 푉푇−푚푎푥; then, requires large electrical stressing to prepare the device [31- 퐺 = 퐾(푉푇−푚푎푥 − 푉푇) (3) 35]. Thus, a highly energy efficient, scalable synaptic device This implies a 퐺푚푖푛 ≈ 0 and 퐺푚푎푥 = 퐾(푉푇−푚푎푥 − 푉푇−푚푖푛) with gradual LTP and LTD is still challenging. corresponding to 푉푇−푚푎푥 and 푉푇−푚푖푛. In this paper, we propose a synapse based on highly Experimental LTD and LTP based on 푉푇 shift with repeated manufacturable Charge Trap Flash (CTF) Memory (1F) device application of identical 푉퐺 pulse is shown in Fig. 2a. The program pulse (12.5V for 1ms) and erase pulse (-14.5V for [36] on an SOI platform shown in Fig. 1e. The bulk Si 20ms) are chosen such that symmetric operation LTD & LTP technology based CTF capacitors exhibit gradual 푉 shift by 푇 (푉 shift from -1.3 V to -0.3 V, i.e., 푅푎푛푔푒(푉 ) = 1 푉) (same Fowler Norheim (FN) tunneling to enable symmetric LTP & 푇 푇 LTP and LTD window) occurs within 1000 pulses. As pulse LTD with high energy efficiency and a designable learning voltage reduces for fixed pulse-width, the 푉 shift magnitude rate. A careful design of operation and sub-circuit using SOI- 푇 reduces. The 푅푎푛푔푒(푉푇) after a fixed number of (say 1000) platform and diodes (2D) in SPICE show the essential energy pulses for different pulse bias is plotted in Fig. 2b. The efficiency (< 3 푓퐽/spike) is preserved by preventing 푅푎푛푔푒(푉푇) vs. 푉퐺 curves are linearly extrapolated to undesirable (parasitic) leakage currents. A mathematical 푅푎푛푔푒(푉푇) = 0, to estimate the threshold of 푉퐺 for finite LTP model of the experimental LTP / LTD is developed. A Spiking and LTD. For 1000 pulses, the threshold was estimated as 9.8 V Neural Network (SNN) for Iris classification is implemented at 1 ms pulse width of LTD and -11.5 V at 20 ms pulse width of with CTF synpase to show excellent performance due to LTP. The 푉푇 change vs. programming time at fixed gradual LTP/LTD. Finally, a benchmarking of this work with programming voltage (푉퐺) is well-documented [41]. the state-of-the-art is presented. Charge Trap Flash (CTF) device To demonstrate synapse by FN tunneling in CTF device, a 100 휇푚 × 100 휇푚 CTF capacitor shown in Fig. 1e is fabricated as described in detail earlier [36]. In brief, the device is fabricated on an n-Si substrate with 4 nm thermal SiO2 as the tunnel oxide, 6 nm LPCVD Si3N4 as the charge trap layer (CTL), and 12 nm MOCVD Al2O3 as the blocking oxide and n+ polysilicon on 300 mm Si substrate by Applied Materials cluster tool. Aluminum is used as the back contact. A self-aligned Boron implant provides a source for minority carriers for fast programming. In this CTF capacitor test vehicle, the bias is applied at the gate, and the source/drain and body are shorted to ground to measure total current. Though the demonstration is based on p-channel based CTF device, the conclusions are valid for n-channel CTF MOSFETs. Fowler Nordheim (FN) Tunneling based CTF Synapse The gradual program/erase operation to enable LTP & Fig. 2 a) Repeated application of same 푉푝푢푙푠푒 on gate (푉퐺) shows gradual 푉푇 shift for symmetric LTP and LTD with pulse-widths of 1 ms and 20 ms LTD is performed through FN tunneling, which is an electric respectively. Higher 푉 increases 푉 range. b) Range of 푉 shift field driven current transport to charge/discharge the CTL, 푝푢푙푠푒 푇 푇 (푅푎푛푔푒(푉푇)) extracted increases with the total number of pulses and 푉퐺 which enables low current operation. The detailed physical respectively. The extrapolation to 푅푎푛푔푒(푉푇) = 0 estimates the threshold for mechanisms of CTF memory operation is presented elsewhere write and erase for different pulse numbers. c) LTP and LTD behaviour can be [37]. However, in brief, when a positive (negative) voltage is tuned by the pulse-width arbitrarily, for fixed 푉퐺 12.5 V (LTD) and -14.5 V 푚푎푥 푚푎푥 (LTP).