Neural Network-Inspired Analog-To-Digital Conversion to Achieve Super-Resolution with Low-Precision RRAM Devices
Total Page:16
File Type:pdf, Size:1020Kb
Neural Network-Inspired Analog-to-Digital Conversion to Achieve Super-Resolution with Low-Precision RRAM Devices Weidong Cao*, Liu Ke*, Ayan Chakrabarti** and Xuan Zhang* * Department of ESE, ** Department of CSE, Washington University, St.louis, MO, USA Abstract— Recent works propose neural network- (NN-) NN operations, and can be trained to approximate the spe- inspired analog-to-digital converters (NNADCs) and demon- cific quantization=conversion functions required by different strate their great potentials in many emerging applications. systems. However, a major challenge for designing such These NNADCs often rely on resistive random-access memory (RRAM) devices to realize the NN operations and require NNADCs is the limited conductance=resistance resolution high-precision RRAM cells (6∼12-bit) to achieve a moderate of RRAM devices. Although these NNADCs optimistically quantization resolution (4∼8-bit). Such optimistic assumption assume that each RRAM cell can be precisely programmed of RRAM resolution, however, is not supported by fabrication with 6∼12-bit resolution, measured data from realistic fab- data of RRAM arrays in large-scale production process. In this rication process suggest the actual RRAM resolution tends paper, we propose an NN-inspired super-resolution ADC based on low-precision RRAM devices by taking the advantage of to be much lower (2∼4-bit) [8], [9]. Therefore, there exists a co-design methodology that combines a pipelined hardware a gap between the reality and the assumption of RRAM architecture with a custom NN training framework. Results precision, yet lacks a design methodology to build super- obtained from SPICE simulations demonstrate that our method resolution NNADCs from low-precision RRAM devices. leads to robust design of a 14-bit super-resolution ADC using 3- In this paper, we bridge this gap by introducing an NN- bit RRAM devices with improved power and speed performance and competitive figure-of-merits (FoMs). In addition to the inspired design methodology that constructs super-resolution linear uniform quantization, the proposed ADC can also sup- ADCs with low-precision RRAM devices. Taking advantage port configurable high-resolution nonlinear quantization with of a co-design methodology that combines a pipelined hard- high conversion speed and low conversion energy, enabling ware architecture with deep learning-based custom training future intelligent analog-to-information interfaces for near- framework, our method is able to achieve an NN-inspired sensor analytics and processing. ADC whose resolution far exceeds the precision of the underlying RRAM devices. The key idea of a pipelined I. INTRODUCTION architecture is that many consecutive low-resolution (1∼3- Many emerging applications have posed new challenges bit) quantization stages can be cascaded in a chain structure for the design of conventional analog-to-digital (A=D) con- to obtain higher resolution. Since each stage now only needs verters (ADCs) [1]–[4]. For example, multi-sensor systems to resolve 1∼3-bit, we can accurately train and instantiate it desire programmable nonlinear A=D quantization to maxi- with low-precision RRAM devices to approximate the ideal mize the extraction of useful features from the raw analog quantization functions and residue functions. Key innova- signal, instead of directly performing uniform quantization tions and contributions in this paper are as follow: by conventional ADCs [3], [4]. This can alleviate the compu- • We propose a co-design methodology leveraging tational burden and reduce the power consumption of back- pipelined hardware architecture and custom training end digital processing, which is the dominant bottleneck in framework to achieve super-resolution analog-to-digital intelligent multi-sensor systems. However, such flexible and conversion that far exceeds the limited precision of the configurable quantization schemes are not readily supported RRAM device. arXiv:1911.12815v1 [cs.LG] 28 Nov 2019 by conventional ADCs with dedicated circuitry that has fixed • We systematically evaluate the impacts of NN size conversion references and thresholds. and RRAM precision on the accuracy of NN-inspired To overcome this inherent limitation of conventional sub-ADC and residue block and perform design space ADCs, several recent works [5]–[7] have introduced neural exploration to search for optimal pipelined stage con- network-inspired ADCs (NNADCs) as a novel approach figuration with balanced trade-off between speed, area, to designing intelligent and flexible A=D interfaces. For and power consumption. instance, a learnable 8-bit NNADC [7] is presented to • SPICE simulation results demonstrate that our proposed approximate multiple quantization schemes where the NN method is able to generate robust design of a 14-bit weight parameters are trained off-line and can be config- super-resolution NNADC using 3-bit RRAM devices. ured by programming the same hardware substrate. Another Comparisons with both the state-of-the-art ADCs and example is a 4-bit neuromorphic ADC [6] proposed for other NNADC designs reveal improved performance general-purpose data conversion using on-line training by and competitive figure-of-merits (FoMs). leveraging the input amplitude statistics and application sen- • Our proposed ADC can also support configurable non- sitivity. These NNADCs are often built on resistive random- linear quantization with high-resolution, high conver- access memory (RRAM) crossbar array to realize the basic sion speed, and low conversion energy. P upper Vin,H X sub-array i,n P V U in,k g k,j bi 1,m P Vin,1 V Source line o,M Xi,3 bi 2,p Vo,j VTC Bit line Vo,1 N bi 1,2 Vin,1 N Vin,k L X bi 2,1 gk,j i,2 lower VN in,H sub-array bi 1,1 i 1,i 2 PN Vin (V in ,V in ) i,i 1 PN Xi,1 VV,VVVin,k in,k in,k DD in,k (a) (b) upper sub-array Source line VTC Bit line lower sub-array (a) (b) P upper Vin, H X sub-array in, V P in, k g P kj, bim1, P Vin,1 Source line Xi,3 bip2, VTC Bit line b N i 1,2 Vin,1 V N in, k N N X bi 2,1 gkj, i,2 Vin, H lower sub-array ωii1, 2 bi 1,1 PN Vin (,)VVin in PN X ωii,1 VVVVVin,,, k in k, in k DD in, k i,1 (a) (b) upper RELIMINARIES Xi,n II. P sub-array b A. RRAM Device, Crossbar Array and NN i 1,m b 1) RRAM device: A RRAM device is a passive two- Source line Xi,3 i 2,p VTC terminal element with variable resistance and possesses Bit line bi 1,2 many special advantages, such as small cell size (4F 2, F – Xi,2 bi 2,1 the minimum feature size), excellent scalability (<10nm), lower 10 sub-array b ωi 1,i 2 faster read=write time (<10ns) and better endurance (∼10 i 1,1 ω cycles) than Flash devices [2], [10]. Xi,1 i,i 1 2) RRAM crossbar array: RRAM devices can be orga- (a) (b) nized into various ultra-dense crossbar array architectures. Fig. 1: (a) Hardware substrate to perform basic NN operations, where the Fig. 1(a) shows a passive crossbar array composed of passive crossbar array with two sub-arrays executes VMM and the VTC of CMOS inverter acts as NAF. (b) An example of NN. two sub-arrays to realize bipolar weights without the use of power-hungry operational-amplifiers (op-amps) [7]. The rely on a training process to learn the appropriate NN weights ~ relationship between the input voltage “vector” (Vin) and to approximate flexible quantization schemes that can be ~ output voltage “vector” (Vo) can be expressed as Vo;j = configured by programming the weights stored in RRAM P k Wk;j · Vin;k + Voff;j. Here, k (k 2 f1; 2;:::;Hg) and conductance=resistance. However, existing NNADCs [5]– j (j 2 f1; 2;:::;Mg) are the indices of input ports and [7] often exhibit modest conversion resolution (4∼8-bit) output ports of the crossbar array. The weight Wk;j can be and invariably rely on optimistic assumption of the RRAM represented by the subtraction of two conductances in upper precision (6∼12-bit), which is not well substantiated by mea- (U) sub-array and lower (L) sub-array as surement data from realistic RRAM fabrication process [8], U L X X X U L [9]. This resolution limitation severely constrains the appli- Wk;j = (gk;j − gk;j)= ; = (gk;j + gk;j): (1) k cation of NNADCs in emerging multi-sensor systems that Therefore, the RRAM crossbar array is capable of per- require high-resolution (>10-bit) A=D interfaces for feature forming analog vector-matrix multiplication (VMM) and the extraction and near-sensor processing [1], [3], [4]. parameters of the matrix rely on the RRAM resistance states. 3) Artificial NN: With the RRAM crossbar array, an NN C. Pipelined ADCs shown in Fig. 1(b) can be implemented on such hardware Pipelined architecture is a well-established ADC topology substrate. Generally, the NN processes the data by executing to achieve high sampling rate and high resolution with low- the following operations layer-wise [17]: resolution quantization stages [11]. Fig. 2(a) illustrates a ~ typical pipelined ADC with M stages whose resolution ~yi+1 = f(Wi;i+1 · ~xi + bi+1): (2) RESO can be achieved by concatenating Ni-bit of each stage Here, Wi;i+1 is the weight matrix to connect the layer i and PM with digital combiner: RESO = i=1 Ni. Note that Ni is layer (i + 1). f(·) is a nonlinear activation function (NAF). usually ≤ 4 and not necessarily identical in all stages. As the These basic NN operations, e.g., VMM and NAF, can be Fig.