Redesigning Commercial Floating-Gate Memory for Analog Computing Applications

Redesigning Commercial Floating-Gate Memory for Analog Computing Applications F. Merrikh Bayat1, X. Guo1, H. A. Om’mani2, N. Do2, K. K. Likharev3, and D. B. Strukov1* 1 UC Santa Barbara, Santa Barbara, CA 93106-9560, U.S.A. 2 Silicon Storage Technology Inc., A Subsidiary of Microchip Technology Inc., San Jose, CA 95134, U.S.A 3 Stony Brook University, Stony Brook, NY 11794-3800, U.S.A. *Corresponding author; email [email protected] Abstract— We have modified a commercial NOR flash SST floating-gate technology has been designed for digital memory array to enable high-precision tuning of individual NOR flash memory applications, and does not allow setting a floating-gate cells for analog computing applications. The precise analog state of each cell, necessary for analog modified array area per cell in a 180 nm process is about 1.5 μm2. applications. This paper describes a successful redesign of the While this area is approximately twice the original cell size, it is SST memory, which enables such individual cell tuning. still at least an order of magnitude smaller than in the state-of- the-art analog circuit implementations. The new memory cell (a) (c) arrays have been successfully tested, in particular confirming that each cell may be automatically tuned, with ~1% precision, to any desired subthreshold readout current value within an almost three-orders-of-magnitude dynamic range, even using an unoptimized tuning algorithm. Preliminary results for a four- (b) word line 1 word line 2 quadrant vector-by-matrix multiplier, implemented with the G G S modified memory array gate-coupled with additional peripheral bit line bit line D floating-gate transistors, show highly linear transfer D common characteristics over a broad range of input currents. substrate shared source line Keywords— Floating-gate memory; Analog memory; Analog Fig. 1. SST's ESF-1 technology [21]: (a) schematic cross-section of a computing; Vector-matrix multiplier supercell, (b) its equivalent circuit, and (c) TEM cross-section image of one half of the supercell implemented in a 180-nm process. I. INTRODUCTION Nonvolatile floating gate memory devices are very II. MEMORY ARRAY DESIGN attractive for analog computing [1, 2], because their state may The SST NOR memory array consists of “supercells” (Fig. be tuned continuously. For example, vector-by-matrix 1). Each supercell is a common-source assembly of two multiplication is a bottleneck in many signal processing and floating-gate memory cells with a highly asymmetric structure: artificial neural network tasks [1, 3]. Floating-gate devices the control gate (usually connected to a "word" line) overlaps enable such multiplication, for slowly changing matrix the drain region of cell’s MOSFET transistor, while being elements, with relatively low precision, but very high separated from its source region by the floating gate. Because performance: high speed, high density, and low power [4-6]. of that, the direct effect of the gate voltage on the process of The so-called synaptic transistors and similar devices [4, 5, electron emission by the source is very small. This is evident 8-17, 19] (see also reviews [18, 7]) is the most commonly from the readout characteristics of the cell, shown in Fig. 2: at reported implementation of this idea. This technique is very VDS > 0, when the source-to-drain current is due to the electron convenient due to its compatibility with the generic CMOS emission from the source, a large gate voltage is necessary to fabrication process. Its handicap is a large cell area and lower open the transistor of a fully programmed cell (with negatively quality (e.g., in terms of retention [20]) of the floating-gate charged floating gate). On the other hand, at VDS < 0, when devices in comparison with those used in highly optimized electrons are emitted by transistor’s drain, the effect of control flash memories. Figure 1 illustrates one such flash memory gate voltage on the current is much stronger, while that of the technology (ESF-1), from Silicon Storage Technology, Inc. floating gate charge is much weaker. 2 (SST) [21]. Its relative cell area A/F , where F is the half-pitch The same structure asymmetry affects the switching of the employed CMOS process, is close to 20. Such relative dynamics of the cell (Fig. 3). During the “programming” area is at least an order of magnitude smaller than that in process, the negative charge of the floating gate may be synaptic transistors [8-10, 15, 17]. However, the baseline increased very fast using very effective hot-electron injection from the source area of transistor’s channel, while the simplest way to decrease it (and hence “erase” the cell) is via the This work was supported by DARPA under Contract No. HR0011-13-C- 0051UPSIDE via BAE Systems. Useful discussions with M. Graziano, O. Fowler-Nordheim tunneling of electrons from the floating gate Kavehei, L. Sengupta and V. Tiwari are gratefully acknowledged. to the control gate, by applying a rather high voltage ( ~ 11 V) the same row are erased simultaneously. However, in analog to the latter electrode. applications it is highly desirable to perform not only a gradual programming of each cell, but also a gradual erasure of each The top row of Fig. 4 shows the usual structure of the NOR cell without disturbing its neighbors. Our detailed flash memory and its programming/erasure voltage protocols, measurements (see, e.g., Fig. 3c,d) have shown that in the employing these properties of the SST cells. In this baseline architecture (Fig. 4a-c) the latter operation is architecture, cells of the same row share transistor source and impossible for any bias voltage set. control gate (“word”) lines, while transistor drains of all cells of the same column are connected to the same “bit” line. Fig. To resolve this problem, we have modified the array 4a shows the set of applied voltages used for programming of structure (without changing the optimized cell fabrication the top left cell, while avoiding state disturb in all other cells. technology) as shown in the bottom row of Fig. 4, i.e. by re- P’ In particular, a positive bias VD > 2V, applied to all routing the gate lines in the “vertical” direction, i.e. unselected bit lines, inhibits unintentional hot-electron perpendicular to the source lines. A straightforward analysis of injection in all unselected cells, including type-A half-selected the data shown in Figs. 2 and 3 shows that the new design cells (sitting on the selected word line). Also, grounding of 5E-5 V =5.5V (a) S unselected word lines guarantees the absence of disturb VG = 1.6V V =6V S V =6.5V processes (such as the back Fowler-Nordheim tunneling) in all S V =7V unselected cells including half-selected cells of type B (sharing 1E-6 S | V =7.5V S the source voltage with the selected cell). As Fig. 3a indicates, (A) V =8V 1E-7 S DS V =8.5V the same programming protocol, only with pulsed source I S | voltage and slightly modified voltage values, allows analog 1E-8 programming of the selected cell, also without disturbing the half-selected cells, regardless of their charge state. 1E-9 0 1 2 3 4 5 V (V) Unfortunately, in this memory architecture the opposite D process of cell erasure (Fig. 4b) is much less controllable. (b) 5E-5 Namely, the fully selected cell and the type-C half-selected cell 1E-5 share their gate and source voltages, and due to the cell | 1E-6 V =5V structure (Fig. 1) the process responsible for erasure (the S V =5.5V (A) S V =6V Fowler-Nordheim tunneling of electrons from the floating gate DS S I 1E-7 V =6.5V | S to the control gate) is only weakly affected by the drain voltage V =7V S 1E-8 V =7.5V VD – the only voltage which may be different for these two S V =8V E’ S cells. (The possible increase of VD is limited by the onset of VD = 0V V =8.5V 1E-9 S large drain-to-source current.) For digital applications this -1 -0.5 0 0.5 1 1.5 2 V (V) feature is not a handicap, because in flash memories all cells of G 3E-5 V =8V, t=600us (c) G V = 0V V =8.5V, t=600us (a) D G V =9V, t=600us 1E-6 G 1E-5 V =9.5V, t=600us G | V =10V, t=600us G (A) V =8V, t=6ms 1E-7 G DS | I V =8.5V, t=6ms 1E-8 G | (A) V =9V, t=6ms G DS V =9.5V, t=6ms 1E-9 G |I Fully Erased,V =1V DS V =10V, t=6ms Fully Erased,V =-1V G DS 10E-10 Fully Programmed,V =1V 1E-11 DS Fully Programmed,V =-1V DS 0 1 2 3 4 5 V (V) 1E-13 S 0 1 2 3 4 5 3E-5 V (V) (d) V =8V, t=600us G G V =8.5V, t=600us G V =4V V =9V, t=600us (b) Fully Erased, V =0, V =swept G 1E-6 G S D V =9.5V, t=600us 60 3.5V G Fully Erased, V =0, V =swept | V =10V, t=600us D S G (A) V =8V, t=6ms 3.0V G DS | I V =8.5V, t=6ms 1E-8 G | A) 40 2.5V V =9V, t=6ms G ( V =9.5V, t=6ms VS = 0V G DS I V =10V, t=6ms | 2.0V 1E-10 G 20 1.5V 0 1 2 3 4 5 1.0V V (V) D 0 0 0.2 0.4 0.6 0.8 1 |V (V)| DS Fig.

Redesigning Commercial Floating-Gate Memory for Analog Computing Applications

Chapter 3 Semiconductor Memories

Section 10 Flash Technology

Nanotechnology ? Nram (Nano Random Access

MSP430 Flash Memory Characteristics (Rev. B)

The Rise of the Flash Memory Market: Its Impact on Firm

Charles Lindsey the Mechanical Differential Analyser Built by Metropolitan Vickers in 1935, to the Order of Prof

A Hybrid Swapping Scheme Based on Per-Process Reclaim for Performance Improvement of Android Smartphones (August 2018)

AN568: EEPROM Emulation for Flash Microcontrollers

Data Remanence in Flash Memory Devices

Low-Cost Integration of Serial Eeproms and Flash Memory Devices

Comparing SLC and MLC Flash Technologies and Structure

MSP430™ Flash Devices Bootloader (BSL) User's Guide (Rev