Neural Network-Inspired Analog-to-Digital Conversion to Achieve Super-Resolution with Low-Precision RRAM Devices

Weidong Cao*, Liu Ke*, Ayan Chakrabarti** and Xuan Zhang* * Department of ESE, ** Department of CSE, Washington University, St.louis, MO, USA

Abstract— Recent works propose neural network- (NN-) NN operations, and can be trained to approximate the spe- inspired analog-to-digital converters (NNADCs) and demon- cific quantization/conversion functions required by different strate their great potentials in many emerging applications. systems. However, a major challenge for designing such These NNADCs often rely on resistive random-access memory (RRAM) devices to realize the NN operations and require NNADCs is the limited conductance/resistance resolution high-precision RRAM cells (6∼12-bit) to achieve a moderate of RRAM devices. Although these NNADCs optimistically quantization resolution (4∼8-bit). Such optimistic assumption assume that each RRAM cell can be precisely programmed of RRAM resolution, however, is not supported by fabrication with 6∼12-bit resolution, measured data from realistic fab- data of RRAM arrays in large-scale production process. In this rication process suggest the actual RRAM resolution tends paper, we propose an NN-inspired super-resolution ADC based on low-precision RRAM devices by taking the advantage of to be much lower (2∼4-bit) [8], [9]. Therefore, there exists a co-design methodology that combines a pipelined hardware a gap between the reality and the assumption of RRAM architecture with a custom NN training framework. Results precision, yet lacks a design methodology to build super- obtained from SPICE simulations demonstrate that our method resolution NNADCs from low-precision RRAM devices. leads to robust design of a 14-bit super-resolution ADC using 3- In this paper, we bridge this gap by introducing an NN- bit RRAM devices with improved power and speed performance and competitive figure-of-merits (FoMs). In addition to the inspired design methodology that constructs super-resolution linear uniform quantization, the proposed ADC can also sup- ADCs with low-precision RRAM devices. Taking advantage port configurable high-resolution nonlinear quantization with of a co-design methodology that combines a pipelined hard- high conversion speed and low conversion energy, enabling ware architecture with deep learning-based custom training future intelligent analog-to-information interfaces for near- framework, our method is able to achieve an NN-inspired sensor analytics and processing. ADC whose resolution far exceeds the precision of the underlying RRAM devices. The key idea of a pipelined I.INTRODUCTION architecture is that many consecutive low-resolution (1∼3- Many emerging applications have posed new challenges bit) quantization stages can be cascaded in a chain structure for the design of conventional analog-to-digital (A/D) con- to obtain higher resolution. Since each stage now only needs verters (ADCs) [1]–[4]. For example, multi-sensor systems to resolve 1∼3-bit, we can accurately train and instantiate it desire programmable nonlinear A/D quantization to maxi- with low-precision RRAM devices to approximate the ideal mize the extraction of useful features from the raw analog quantization functions and residue functions. Key innova- signal, instead of directly performing uniform quantization tions and contributions in this paper are as follow: by conventional ADCs [3], [4]. This can alleviate the compu- • We propose a co-design methodology leveraging tational burden and reduce the power consumption of back- pipelined hardware architecture and custom training end digital processing, which is the dominant bottleneck in framework to achieve super-resolution analog-to-digital intelligent multi-sensor systems. However, such flexible and conversion that far exceeds the limited precision of the configurable quantization schemes are not readily supported RRAM device. arXiv:1911.12815v1 [cs.LG] 28 Nov 2019 by conventional ADCs with dedicated circuitry that has fixed • We systematically evaluate the impacts of NN size conversion references and thresholds. and RRAM precision on the accuracy of NN-inspired To overcome this inherent limitation of conventional sub-ADC and residue block and perform design space ADCs, several recent works [5]–[7] have introduced neural exploration to search for optimal pipelined stage con- network-inspired ADCs (NNADCs) as a novel approach figuration with balanced trade-off between speed, area, to designing intelligent and flexible A/D interfaces. For and power consumption. instance, a learnable 8-bit NNADC [7] is presented to • SPICE simulation results demonstrate that our proposed approximate multiple quantization schemes where the NN method is able to generate robust design of a 14-bit weight parameters are trained off-line and can be config- super-resolution NNADC using 3-bit RRAM devices. ured by programming the same hardware substrate. Another Comparisons with both the state-of-the-art ADCs and example is a 4-bit neuromorphic ADC [6] proposed for other NNADC designs reveal improved performance general-purpose data conversion using on-line training by and competitive figure-of-merits (FoMs). leveraging the input amplitude statistics and application sen- • Our proposed ADC can also support configurable non- sitivity. These NNADCs are often built on resistive random- linear quantization with high-resolution, high conver- access memory (RRAM) crossbar array to realize the basic sion speed, and low conversion energy. P upper Vi n ,H X sub-array i,n P V U i n ,k g k,j bi 1,m P Vin,1 V Source line o ,M Xi,3 bi 2 ,p Vo ,j VTC Bit line Vo ,1 N bi 1, 2 Vi n ,1 N Vi n , k L X bi 2,1 gk,j i , 2 lower VN i n , H sub-array bi 1,1 i 1,i 2 PN Vininin (V ,V ) i,i 1 PN Xi ,1 VV,VVVin,kin,kin,kDDin,k (a) (b)

upper sub-array

Source line VTC Bit line

lower sub-array

(a) (b)

P upper Vi n H, X sub-array in, V P in, k g P kj, bim1, P Vin,1

Source line Xi,3 bip2, VTC Bit line b N i 1, 2 Vin,1 V N i n k, N N X bi 2 ,1 gkj, i,2 Vi n H, lower sub-array ωii1, 2 bi 1,1 PN Vin  (VV , )i n i n PN X ωii,1 VVVVVin,,, kin kin kDDin, k , i,1 (a) (b)

upper RELIMINARIES Xi,n II.P sub-array b A. RRAM Device, Crossbar Array and NN i 1,m

b 1) RRAM device: A RRAM device is a passive two- Source line Xi,3 i 2,p VTC terminal element with variable resistance and possesses Bit line bi 1,2 many special advantages, such as small cell size (4F 2, F – Xi,2 bi 2,1 the minimum feature size), excellent scalability (<10nm), lower 10 sub-array b ωi 1,i 2 faster read/write time (<10ns) and better endurance (∼10 i 1,1 ω cycles) than Flash devices [2], [10]. Xi,1 i,i 1 2) RRAM crossbar array: RRAM devices can be orga- (a) (b) nized into various ultra-dense crossbar array architectures. Fig. 1: (a) Hardware substrate to perform basic NN operations, where the Fig. 1(a) shows a passive crossbar array composed of passive crossbar array with two sub-arrays executes VMM and the VTC of CMOS inverter acts as NAF. (b) An example of NN. two sub-arrays to realize bipolar weights without the use of power-hungry operational-amplifiers (op-amps) [7]. The rely on a training process to learn the appropriate NN weights ~ relationship between the input voltage “vector” (Vin) and to approximate flexible quantization schemes that can be ~ output voltage “vector” (Vo) can be expressed as Vo,j = configured by programming the weights stored in RRAM P k Wk,j · Vin,k + Voff,j. Here, k (k ∈ {1, 2,...,H}) and conductance/resistance. However, existing NNADCs [5]– j (j ∈ {1, 2,...,M}) are the indices of input ports and [7] often exhibit modest conversion resolution (4∼8-bit) output ports of the crossbar array. The weight Wk,j can be and invariably rely on optimistic assumption of the RRAM represented by the subtraction of two conductances in upper precision (6∼12-bit), which is not well substantiated by mea- (U) sub-array and lower (L) sub-array as surement data from realistic RRAM fabrication process [8], U L X X X U L [9]. This resolution limitation severely constrains the appli- Wk,j = (gk,j − gk,j)/ , = (gk,j + gk,j). (1) k cation of NNADCs in emerging multi-sensor systems that Therefore, the RRAM crossbar array is capable of per- require high-resolution (>10-bit) A/D interfaces for feature forming analog vector-matrix multiplication (VMM) and the extraction and near-sensor processing [1], [3], [4]. parameters of the matrix rely on the RRAM resistance states. 3) Artificial NN: With the RRAM crossbar array, an NN C. Pipelined ADCs shown in Fig. 1(b) can be implemented on such hardware Pipelined architecture is a well-established ADC topology substrate. Generally, the NN processes the data by executing to achieve high sampling rate and high resolution with low- the following operations layer-wise [17]: resolution quantization stages [11]. Fig. 2(a) illustrates a ~ typical pipelined ADC with M stages whose resolution ~yi+1 = f(Wi,i+1 · ~xi + bi+1). (2) RESO can be achieved by concatenating Ni-bit of each stage Here, Wi,i+1 is the weight matrix to connect the layer i and PM with digital combiner: RESO = i=1 Ni. Note that Ni is layer (i + 1). f(·) is a nonlinear activation function (NAF). usually ≤ 4 and not necessarily identical in all stages. As the These basic NN operations, e.g., VMM and NAF, can be Fig. 2(a) illustrates, an arbitrary stage-i contains two sub- mapped to the RRAM crossbar array and CMOS inverters blocks: a sub-ADC and a residue. The sub-ADC resolves shown in Fig. 1(a), where the voltage transfer characteristic Ni-bit binary codes DNi from input residue ri−1, while (VTC) is used as an NAF [7]. the residue part amplifies the subtraction between the input Ni B. NN-Inspired ADCs residue ri−1 and the analog output of sub-ADC by 2 to generate the output residue ri for next stage. This process ADC can be viewed as a special case of classification can be expressed as a simple function: problems which maps a continuous analog signal to a multi- r = [r − REF(D )] · 2Ni . (4) bit digital code. An NN can be trained to learn this input- i i−1 Ni output relationship, and a hardware implementation of this Here, REF(DNi ) is the analog output of sub-DAC that

NN can be instantiated in the analog and mixed-signal depends on DNi . For example, assuming ri−1 ∈ [0,VDD] domain. This is the basic idea behind NNADCs which imple- and Ni = 1, then REF(0) = 0 and REF(1) = VDD/2; ments the learned NN on a hardware substrate to approximate and Fig. 2(b) shows the corresponding residue function. To the desired quantization functions for data conversion: understand the basic working principle of pipelined ADCs, M   we use a 4-bit pipelined ADC with four 1-bit stages in X i−1 Vin − Vmin M 2 · bo,i = round × (2 − 1) , (3) Fig. 2(c) as an example. Assuming the initial analog input Vmax − Vmin i=1 is 0.7V (VDD = 1V ), then the first stage will output “1”— where, M is the resolution; Vin is input analog signal and a digital code, and “0.4V ”— an analog residue according bo is the output digital codes; Vmin and Vmax are the to Eq. (4) which will be processed by the following stage minimum and maximum values of the scalar input signal Vin. in the same way as initial analog input. Finally, we can Since RRAM crossbar array provides a promising hardware obtain 4-bit outputs 1011, which is the quantization of 0.7V substrate to build NNs, recent work has demonstrated several (0.7/1 = 11.2/24 ≈ 11/24). This example also shows that NNADCs based on RRAM devices [5]–[7]. Although the NN a higher resolution (4-bit) can indeed be constructed with architectures adopted by these NNADCs are various, they all low-precision (1-bit) stages in a pipelined ADC. General Pipelined ADC Architecture Stage-1 Stage-i Stage-M Residue ADC Residue

r1 ri1 ri rM1 DD N V ... 2 i ...

VIN i r

sub-ADC sub-DAC 0 ri-1 VDD

N1-bit MSB Ni-bit NM-bit LSB ...... -bit (NN1 2 M ... N) Digital Combiner

DOUT DD V

Optimal Hardware Architecture Optimal Training Flow General Pipelined ADC Architecture Residue Super-reso ADC Architecture Training Framework Stage-1 Stage-i Stage-M 0 Ground Truth Datasets VDD Vin Flash ADC Residue part Quantization function Residual function DD Ni V

VIN r1 ri1 r r 1 i M1 Residue 3 Offline Training Model of Stage-i Linear Uniform Quantization outputs0 0 N -bit N -bit N -bit Flash ADC Residual part VDD VDD 1 i M Vin Vin 1 1 (d) bF ,H Ni bR , H

2 b Ni F,Ni NN...NN12 M1M -bit Digital Combiner ... ri 1 bit 1 2 - VIN bF , 2 i 1 bR , 2 bR ,1 DOUT (a) N Training Objective b2 1 1 General Pipelined ADC Architecture Residue 4 F ,1 DD DD

1 1 V ADC Residue V Hardware Substrate of Stage-i bF,1 VIN bR ,1 r i1 ri Flash ADC Residual part 2

Ni i 2 r

ri1 (c)

Ni Residue ... sub-ADC sub-DAC 0 ... ri-1 VDD ...... 1 establish learning objective Quantization outputs (b) 0 0 VDD VDD

N -bit 2 model hardware substrate constraints Vin Vin

i 1 2 3 4 (e) 1V ... Stage-1 Stage-i Stage-M ...

r r r r 0.8V 3 generate datasets and train through backpropagation

1 i1 i M1 0.7V Minimize cost based on Minimize cost based on ...... discrepancy between true discrepancy between true 0.6V ... S/H V 4 instantiate design parameters and predicted bits and predicted residue IN 0.5V (mid) 0.1 ... 1 Ni ri 0.4V (b) 0.1 N -bit MSB N -bit N -bit LSB 1 i M 1 0 1 1 (a) (c)

ri

General Pipelined ADC Architecture Residue function DD ADC Residue V i r

sub-ADC sub-DAC 0 ri-1 VDD (b) N -bit i Stage 1 2 3 4 Stage-1 Stage-i Stage-M 1V 0.8V ...... 0.7V 0.6V VIN 0.5V (mid) 0.1 ... 0.4V

N1-bit MSB Ni-bit NM-bit LSB 1 0 1 1 (a) (c)

General Pipelined ADC Architecture Residue function DD

Sub-ADC Residue V ri 1 ri

Ni i 2 r

sub-ADC sub-DAC 0 ri-1 VDD (b) Ni -bit Stage 1 2 3 4 Stage-1 Stage-i Stage-M 1V 0.8V r1 ri 1 ri rM1 0.7V ...... sub-ADC 0.5V 0.6V VIN 0.1 ... (mid) 0.4V 0.1 N -bit MSB N -bit N -bit LSB 1 i M 1 0 1 1 (a) (c)

General Pipelined ADC Architecture Residue function

DD size of the sub-ADC. For example, we use 3 → 2 smooth Residue Sub-ADC V codes to train a 2-bit sub-ADC with 3-bit smooth codes as i r output in Fig. 4(b). For the residue, there are (1 + Si) input sub-ADC sub-DAC 0 neurons (one analog input and S -bit smooth digital codes ri-1 VDD i (b) from the proceeding sub-ADC block), and only one analog Ni -bit Stage 1 2 3 4 Stage-1 Stage-i Stage-M output neuron; therefore, the weight matrix dimensions are 1V 0.8V 0.7V ...... sub-ADC HR,i × (1 + Si) between the hidden and the input layer and 0.5V 0.6V VIN 0.1 ... (mid) 0.4V HR,i × 1 between the hidden and the output layer, assuming N -bit MSB N -bit N -bit LSB 1 i M 1 0 1 1 there are HR,i hidden neurons. The sampling/hold (S/H) (a) (c) circuits [18] are used in the output layer to drive the next Fig. 2: (a) General architecture of pipelined ADC. (b) An example of residue stage. Since the op-amps in Fig. 2(a) are eliminated in the function when Ni = 1. (c) A quantization example of a 4-bit pipelined ADC with four 1-bit stages. NN-inspired design of residue circuit, considerable power saving can be obtained from each stage. III.CO-DESIGN METHODOLOGY B. Training Framework A. Hardware Substrate 1) Training overview: We propose a training framework 1) Pipelined architecture: The observation from tradi- that accurately captures the circuit-level behavior of the tional pipelined ADCs motivates us to extend such archi- hardware substrate in its mathematical model and is able tecture to NNADC to enhance its resolution beyond the to learn the robust NNs and its associated hardware design limit of RRAM precision. The overall hardware architec- parameters (i.e., RRAM conductance) to approximate the ture for the proposed high-resolution NNADC is presented sub-ADC and residue for each stage. The training frame- in Fig. 3(a), where a pipelined architecture composed of work incorporates two important features. First, we employ cascaded conversion stages is adopted in the design. This collaborative training for the two sub-blocks in each stage. pipelined architecture brings two direct benefits. First, each The sub-ADC is initially trained to approximate the ideal stage in the proposed NNADC now only needs to resolve quantization function with high-fidelity, then its digital out- 1∼3-bit quantization, which is well within the precision puts and original analog input are directly fed to the residue limit of current RRAM fabrication process [8], [9] and can block for the residue training. This collaborative training flow be easily achieved with the automated design methodology can effectively minimize the discrepancy between the circuit introduced in previous work [7]. Second, although many artifacts and the ideal conversion at each stage. Second, non- cascading stages are needed, there only exist three distinct idealities of devices, such as process, voltage and temperature low-resolution configurations to choose from for each stage, (PVT) variations of the CMOS device and limited precision namely Ni = 1, 2, 3. This allows us to simplify the design of the RRAM devices, can be incorporated into training to process by focusing on optimizing the sub-block design make the proposed NNADC robust to these defects [14]. of each stage with different resolutions. The full pipelined This is another advantage of the proposed NNADC over system can then be assembled by iterating through different traditional ADC designs, where even with delicate calibration combinations of the sub-blocks with different resolutions. techniques, the non-idealities cannot be fully mitigated [11]. 2) Low-resolution NNADC stage: For stage-i in the pro- 2) Training steps: The detailed training flow is shown posed NNADC, we use a five-layer NN to implement the in Fig. 3(b), which consists of four steps. We focus on de- sub-ADC and the residue block. The five-layer NN can scribing the training steps for the residue block, as we adopt be decomposed into two three-layer sub-blocks, and each similar sub-ADC training method that has been elaborated of them can be mapped into the corresponding sub-ADC in previous work [7], [14]. and residue in Fig. 2(a). The cornerstone of this mapping Step 1 : establish learning objective. For the residue methodology is the universal approximation theorem that circuit, its output is an analog value; therefore, the hardware a feed-forward three-layer NN with a single hidden layer substrate can be modeled as a three-layer NN with a “place- can approximate arbitrary complex functions [13]. We use holder” output neuron: the RRAM crossbar array and CMOS inverter illustrated in h˜ = L (r ,D ; θ ), r = L (h ; θ ). (5) Fig. 1(a) as the hardware substrate to design the sub-blocks i 1 i−1 Si 1,i i 2 i 2,i ˜ of each stage. As Fig. 3(b) shows, for the sub-ADC, the input Here, hi = σVTC,i(hi). DSi indicates the digital output of the analog signal represents the single “place holder” neuron in ADC (“1” means VDD, and “0” means GND), and ri−1 is the ˜ MLP’s input layer. Therefore, the weight matrix dimensions scalar residue input of stage-i; hi denote the outputs of the are HF,i × 1 between the hidden and the input layer, and first crossbar layer, which are modeled as a linear function L1

HF,i ×Si between the hidden and the output layer, assuming of ri−1 and DSi , with learnable parameters θ1,i = {W1,V1} there are HF,i and Si neurons in the hidden and output corresponding to RRAM crossbar array conductances. Each layer. Here, we use a redundant “smooth” Si → Ni encoding of these voltages is passed through an inverter (shown in method to replace the standard Ni-bit binary encoding with Fig. 1(a)), whose input-output relationship is modeled by

Si bits (Si > Ni) according to previous work [7], as it the nonlinear function σVTC,i(·), to yield the vector hi. The improves the training accuracy and reduces hidden layer linear function L2 models the second layer of the crossbar to Hardware Architecture Training Framework Stage-1 Stage-i Stage-M Ground Truth Datasets Sub-ADC Residue circuit Quantization function Residual function DD Si V

VIN r1 ri1 r r 1 i M1 (b) Residue 3 Offline Training Model of Stage-i Linear Uniform Quantization outputs0 0 N -bit N -bit N -bit Flash ADC Residual part VDD VDD 1 i M Vin Vin 1 1 (d) bF,H Ni bR ,H

2 N N ... N N b Ni 1 2 M 1 M -bit Digital Combiner F,Ni ... r

1 bit 1 2 i - D VIN bF, 2 i 1 bR , 2 bR ,1 OUT (a) N Training Objective b2 1 4 F,1 1 DD

1 1 V Hardware Substrate of Stage-i bF,1 VIN bR ,1 r Sub-ADC Residue circuit 2 i1 (c) Residue HF,i HR,i

... 1 establish learning objective Quantization outputs

...... S/H 0 0 i VDD VDD S ... i 2 model hardware substrate constraints Vin Vin r (e)

... 3 generate datasets and train through backpropagation Minimize cost based on Minimize cost based on

discrepancy between true discrepancy between true

1 4 instantiate design parameters and predicted bits and predicted residue ......

Hardware Architecture Training Framework Stage-1 Stage-i Stage-M Ground Truth Datasets Sub-ADC Residue circuit Quantization function Residue function DD Si V

VIN 1 Residue 3 Offline Training Model of Stage-i Linear Uniform Quantization outputs 0 0 -bit -bit -bit Flash ADC Residual part VDD VDD Vin Vin

Ni M N DOUT Ni-bit Digital Combiner i

i1 ... bit VIN - 1 Training Objective Sub-ADC Residue circuit 4 1 1 DD V VIN

HF,i HR,i 2

... (b)

...... S/H i Residue S ... i r 1 establish learning objective Quantization outputs ... 0 0 VDD VDD

2 model hardware substrate constraints Vin Vin

... 1

...... 3 generate datasets and train through backpropagation Minimize cost based on Minimize cost based on discrepancy between true discrepancy between true (a) 4 instantiate design parameters and predicted bits and predicted residue

Hardware Architecture Training Framework Stage-1 Stage-i Stage-M Sub-ADC Residue Ground Truth Datasets Quantization function Residue function Si DD

VIN V 1

Offline Training Model of Stage-i Residue Sub-ADC Residue 3 -bit -bit -bit Linear Uniform Quantization outputs0 0 S VDD VDD i Vin Vin b2 S F , Si i DOUT -bit Digital Combiner ...

bit Training Objective - VIN i 1 S 1

Sub-ADC Residue 1 DD 4 V VIN

HF,i HR,i

... 2 (b)

...... S/H Residue i S ... i r

1 establish learning objective Quantization outputs 0 0 ... VDD VDD 2 model hardware substrate constraints Vin Vin

...

1

...... 3 generate datasets and train through backpropagation Minimize cost based on Minimize cost based on discrepancy between true discrepancy between true (a) 4 instantiate design parameters and predicted bits and predicted residue

Hardware Architecture Training Framework Stage-1 Stage-i Stage-M Sub-ADC Residue Ground Truth Datasets Quantization function Residue function Si DD

VIN V 1

Offline Training Model of Stage-i Residue Sub-ADC Residue 3 -bit -bit -bit Linear Uniform Quantization outputs 0 0 S VDD VDD i Vin Vin b2 S F , Si i DOUT -bit Digital Combiner ...

bit Training Objective - VIN i 1 S 1

Sub-ADC Residue 1 DD 4 V VIN

HF,i HR,i

... 2 (b)

...... S/H Residue i S ... i r

1 establish learning objective Quantization outputs 0 0 ... VDD VDD 2 model hardware substrate constraints Vin Vin

...

1

...... 3 generate datasets and train through backpropagation Minimize cost based on Minimize cost based on discrepancy between true discrepancy between true (a) 4 instantiate design parameters and predicted bits and predicted residue

Hardware Architecture Training Framework Stage-1 Stage-i Stage-M Sub-ADC Residue Ground Truth Datasets Quantization function Residue function Si DD

VIN V 1

Offline Training Model of Stage-i Residue Sub-ADC Residue 3 -bit -bit -bit Linear Uniform Quantization outputs0 0 S VDD VDD i Vin Vin

Si DOUT -bit Digital Combiner ...

bit Training Objective - VIN i 1 4 S 1

Sub-ADC Residue 1 DD V 2 VIN

HF,i HR,i ...

...... S/H 1 establish learning objective Residue i S ... i r

2 model hardware substrate constraints Quantization outputs 0 0 ... VDD VDD Vin Vin

3 generate datasets and train through backpropagation ...

1 Minimize cost based on Minimize cost based on

...... 4 instantiate design parameters discrepancy between true discrepancy between true and predicted bits and predicted residue (a) (b)

Ni=1, 1-bit stage Ni=2, 2-bit stage Reconstructed Orginal Reconstructed Orginal 1.0 1.0 11 Sub-ADC Sub-ADC ) Hardware Architecture ) 0.8 Training1 Framework0.8 V V ( Stage-1 Stage-i Stage-M ( 10 0.6 0.6 Sub-ADC Residue Ground Truth Datasets Quantization function01 Residue function Si 0.4 0.4

0 DD V Amplitude VIN Amplitude 00 ri1 rM1 0.2 0.2 1 0.0 0.0

Offline Training0.0 Model0.2 of0. 4Stage0-.6i 0.8 1.0 0.0 0.2 0.4 Residue 0.6 0.8 1.0 Sub-ADC InputResidue (V) 3 Input (V) -bit -bit -bit Linear Uniform Trained residue Ideal residue Quantization outputs0 Trained residue 0 Ideal residue 1.0 V V 1.0 Si V DD V DD Residue Residuein in M 0.8 Si 0.8 ) N -bit ... DOUT i1 i Digital Combiner ) V bit Training Objective V - ( i VIN ( 1 4 0.6 S 1 0.6

Sub-ADC Residue 1 DD 0.4 0.4 V 2 VIN Residue Residue

HF,i HR,i 0.2 0.2 ...

...... S/H 1 establish learning objective 0.0 Residue

i 0.0 S ... i r

2 model hardware substrate0.0 0. 2constraints0.4 0.6 0.8 1.0 Quantization outputs 0.0 0.2 0.4 0.6 0.8 1.0 0 0 ... Input (V) VDDInput (V) VDD Vin Vin

3 generate datasets and train through( abackpropagation) ... (b)

1 Minimize cost based on Minimize cost based on

...... 4 instantiate design parameters discrepancy between true discrepancy between true and predicted bits and predicted residue (a) (b) Fig. 3: Proposed co-design framework for the super-resolution NNADC. (a) Pipelined architecture for the proposed NNADC. (b) Off-line training model of each stage-i. Proposed training framework takes ground truth datasets as inputs during off-line training to find the optimal weights and derive the RRAM resistances to minimize cost function and best approximate ideal quantization function and residue function.

Ni =1, 1-bit stage Ni =2, 2-bit stage produce the output residue ri for next stage, with learnable Reconstructed Orginal Reconstructed Orginal 1.0 1.0 11 parameters θ2,i = {W2,V2}. The learning objective is to Sub-ADC Sub-ADC ) ) 0.8 1 0.8 V V ( find optimal values for the parameters {θ1,i, θ2,i} such that ( 10 0.6 0.6 for all values of ri−1 in the input range, the circuit yields 01 0.4 0.4 0 corresponding residue ri that are equal or close to the desired Amplitude Amplitude 0.2 0.2 00 “ground truth” rGT in Eq. (4). To achieve this aim, we define a 0.0 0.0 cost function C(ri, rGT) to measure the discrepancy between 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Input (V) Input (V) predicted ri and true rGT based on the mean-square loss: Trained residue Ideal residue Trained residue Ideal residue X 2 1.0 1.0 C(ri, rGT) = [rGT(j) − ri(j)] . (6) Residue Residue j 0.8 0.8 ) ) V V ( Step 2 : model hardware constraints. Hardware constraints ( 0.6 0.6 come from three aspects: CMOS neuron PVT variations, 0.4 0.4 Residue limited precision of RRAM device, and passive crossbar Residue 0.2 0.2

array. To reflect these hardware constraints, we first group all 0.0 0.0 VTCs obtained by Monte Carlo simulations in AVTC using 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Input (V) Input (V) the technology specification in Section IV-A. Meanwhile, (a) (b) we control the precision of weight with AR-bit during the Fig. 4: Illustrations of trained sub-ADC and residue functions for a pipeline training. Finally, we let the summation of all elements stage with different resolution. (a) 1-bit stage (Ni = 1). (b) 2-bit stage (Ni = 2). (absolute value) in each column (“0”) of W1,2 be < 1: X X (abs(W1), 0) < 1; (abs(W2), 0) < 1, (7) devices in the hardware substrate. After this, we perturb each resistance R by: to reflect the weights constraints in Eq. (1). R ← R · eθ; θ ∼ N(0, σ2), (9) Step 3 : hardware-oriented training. We initialize the parameters {θ1,i, θ2,i} randomly, and update them itera- to evaluate the robustness of the NN model to the stochastic tively based on gradients computed on mini-batches of variation of RRAM resistance [2]. {(r ,D , r )} pairs randomly sampled from the input i−1 Ni GT C. Examples of Trained Sub-ADC and Residue range. To incorporate the hardware constraints in step 2 into training, we let each neuron j in Eq. (5) randomly pick Fig. 4 illustrates the SPICE simulation of different trained up a VTC from AVTC during training: stages with the proposed training framework. The sub-ADC j and the residue in Fig. 4(a) are trained through a 1 × 3 × 2 σVTC,i = AVTC[randint(N)], j = 1, 2, ..., HR,i. (8) NN and a 3×5×1 NN respectively by setting Ni = 1, while We then periodically clip all values of W1 between [−1/(1+ the sub-ADC and the residue in Fig. 4(b) are trained through N ), /( +N )] W [− /H , /H ] i 1 1 i , as well as 2 between 1 R,i 1 R,i a 1×4×3 NN and a 4×7×1 NN by setting Ni = 2. In both to satisfy Eq. (7). figures, we use 3-bit RRAM and set σ = 0.05 in Eq. (9) for Step 4 : instantiate conductance values. We adopt the evaluation. The comparison between the trained function and same instantiation method based on previous work [7], which the ideal function shows that each stage with low-precision is proven to always find a set of equivalent conductances RRAM can accurately approximate the ideal stage function from the trained weights and biases to map to the RRAM with the aid of the proposed training framework. Sub-ADC Evaluation Residue Evaluation 1.5 100 Ni  1 Ni  1 3 6 1

2-bit 1 3-bit 3 5 1

) 10 3 4 1 bits ( 1.0 102 MSE

ENOB ENOB 1 4 2 103 1 3 2 1 2 2 0.5 104 1 2 3 4 5 1 2 3 4 5 6 7 RRAM precision (bits) RRAM precision (bits) 2.5 100 Ni  2 3-bit Ni  2 4 7 1

2.0 1 4 6 1

) 10 4 5 1 bits

( 1.5 102 4-bit

1.0 MSE

ENOB ENOB 1 4 3 103 0.5 1 3 3 1 2 3 0 104 1 2 3 4 5 1 2 3 4 5 6 7 RRAM precision (bits) RRAM precision (bits) 3.5 100 Ni  3 Ni  3 5 7 1 3.0 5 6 1 101 ) 2.5 4-bit 5 5 1 bits ( 2.0 102 5-bit MSE 1.5

ENOB ENOB 1 4 4 103 1.0 1 3 4 1 2 4 0.5 104 1 2 3 4 5 1 2 3 4 5 6 7 RRAM precision (bits) RRAM precision (bits) (a) (b)

Sub-ADC Evaluation Residue Evaluation 1.5

2-bit 3-bit ) bits ( 1.0 MSE ENOB ENOB

0.5 1 2 3 4 5 1 2 3 4 5 6 7 RRAM precision (bits) RRAM precision (bits) 2.5 3-bit Sub-ADC Evaluation Residue Evaluation 2.0 1.5 ) bits

( 1.5 2-bit 3-bit

4-bit ) MSE

1.0 bits ( ENOB ENOB 1.0

0.5 MSE ENOB ENOB 0 1 2 3 4 5 1 2 3 4 5 6 7 RRAM precision (bits) RRAM precision (bits) 3.5 0.5 1 2 3 4 5 1 2 3 4 5 6 7 3.0 RRAM precision (bits) RRAM precision (bits) 2.5 ) 2.5 4-bit 3-bit bits

( 2.0 2.0 ) 5-bit MSE bits

( 1.5 1.5 ENOB ENOB 4-bit 1.0 1.0 MSE ENOB ENOB 0.5 1 2 3 4 5 1 2 3 4 5 6 7 0.5 RRAM precision (bits) RRAM precision (bits) (a) (b) 0 1 2 3 4 5 1 2 3 4 5 6 7 RRAM precision (bits) RRAM precision (bits) 3.5 3.0

) 2.5 4-bit bits ( 2.0 5-bit MSE 1.5 ENOB ENOB 1.0 0.5 Sub-ADC Evaluation Residue Evaluation 1 2 3 4 5 1 2 3 4 5 6 7 IV. EXPERIMENTAL RESULTS 1.5 RRAM precision (bits) RRAM precision (bits) (a) (b) 2-bit 3-bit

A. Experimental Methodology ) bits ( 1) Training configuration: We set Ni = 1, 2, 3 to get three 1.0 MSE

distinct resolution configurations in each pipeline stage in our ENOB experiments. For each stage, we train different NN models and each NN model is trained via stochastic gradient descent 0.5 1 2 3 4 5 1 2 3 4 5 6 7 with the Adam optimizer using TensorFlow [15]. The weight RRAM precision (bits) RRAM precision (bits) 2.5 precision AR during training is set to be 1∼7-bit. The batch 3-bit 2.0 size is 4096, and the projection step is performed every 256 )

4 bits ( 1.5 iterations. We train for a total of 2×10 iterations for each 4-bit sub-ADC model and residue model, varying the learning rate 1.0 MSE −3 −4 ENOB from 10 to 10 across the iterations. 0.5

2) Technology model: We use the HfOx-based RRAM 0 1 2 3 4 5 1 2 3 4 5 6 7 device model to simulate the crossbar array [16]. We set the RRAM precision (bits) RRAM precision (bits) resistance stochastic variation σ = 0.05, since it is a moder- 3.5 ate variation based on the evaluations from prior work [17]. 3.0 ) 2.5 4-bit

The transistor model is based on a standard 130nm CMOS bits ( 2.0 5-bit technology. The inverters, output comparators, and transistor MSE 1.5 switches in the RRAM crossbars are simulated with the ENOB 1.0 130nm model using Cadence Spectre. The VTC group AVTC 0.5 is obtained by running 100 times Monte Carlo simulations. 1 2 3 4 5 1 2 3 4 5 6 7 RRAM precision (bits) RRAM precision (bits) The simulation results presented in the following section are (a) (b) all based on SPICE simulation. Fig. 5: Sub-block training performance using different NN models and RRAM precision at a fixed stochastic variation σ = 0.05. (a) The trend 3) Metric of training accuracy: The trained accuracy between ENOB and RRAM precision of sub-ADC under different NN of the sub-ADC/proposed NNADC is represented by the models, where the Ni is set as 1, 2, 3 respectively. (b) The trend between effective number of bits (ENOB)–a metric to evaluate the MSE and RRAM precision of residue circuit under different NN models, effective resolution of an ADC. We report ENOB based on where the Ni is set as 1, 2, 3 respectively. its standard definition ENOB=(SNDR-1.76)/6.02, where the can accurately approximate the residue circuit of Ni-bit stage signal to and ratio (SNDR) is measured from with (Ni + 1)-bit RRAM precision. the sub-ADC’s/proposed NNADC’s output spectrum. The 2) Sub-block design trade-off: Each stage-i has design training accuracy of the residue circuit is represented by the trade-off among power consumption Pi, sampling rate fS,i mean-square error (MSE) between predicted residue function and area As,i. A completed design space exploration may and ideal residue function. We report the MSE based on 2048 involve the searching of different NN sizes of each sub- uniform sampling points in the full range of input [0,VDD]. block in stage-i, RRAM precision and stochastic variations. Here, we use three pairs of sub-blocks highlighted by the B. Sub-block Evaluations solid boxes in Fig. 5 as an example to illustrate the design 1) Resolution and robustness: To find a robust design for trade-off, since each of them shows enough accuracy and each stage, we study the relationship between the trained robustness with no more than 4-bit RRAM precision. For accuracy and RRAM precision of each sub-block with dif- these experiments, we combine each pair of sub-blocks to ferent NN sizes at a fixed stochastic variation. For these form three distinct sub-blocks with resolution Ni = 1, 2, 3, experiments, we first incorporate both CMOS PVT variations respectively. We then fix the precision of RRAM device with and limited precision of RRAM device into training, and 3-bit for for all building blocks except for the residue in then instantiate several batches of 100-run Monte Carlo Ni = 3 stage, which use 4-bit RRAM device. We finally simulations with a resistance variation σ = 0.05 in Eq. (9), study the relationship between the power Ej, speed fj, and and finally compute the median accuracy of each model. area Aj of each distinct stage-j (j = 1, 2, 3) by simulating We plot the trends in Fig. 5. Generally, an (Ni + 1)- the minimum power consumption/area of each distinct stage bit RRAM precision is enough to train an NN model to that works well at different sampling rates. accurately approximate an Ni-bit sub-ADC, which confirms The trends are plotted in Fig. 6, which shows clear trade- the conclusion in previous work [7]. Particularly, larger offs between speed and power consumption, as well as speed size NN models with more hidden neurons can even accu- and area, for each distinct stage. This is because in order rately approximate an Ni-bit sub-ADC with Ni-bit RRAM to make each sub-block work well under faster speed, we precision. Similar conclusions can also be made from the need to increase the driving strength of the neurons by trained performance of residue circuits. As the Fig. 5(b) sizing up the inverters, which results in an increase of power shows, an (Ni + 2)-bit RRAM precision is enough to train consumption and area for each stage. an NN model to accurately approximate a residue circuit. 3) Design optimization: Based on the exploration of dif- Moreover, a larger size NN with more hidden layer neurons ferent sub-block configurations, an optimal design for the ENOB=14 bits 65 1.0 SNDR 60 SFDR ) ) V ( dB

( 55 0.5 50 Amplitude SNDR 45 0.0 Reconstructed signal Original signal 40 0 1 2 3 4 5 6 0 0.15 0.3 Phase (rad) Input frequency(GHz) (b) (e)

EN OB 12.5 bits 85 1.0 fin 0.5GHZ fS 1GH Z 80 ) ) V ( dB

( 75 0.5 70 SNDR SNDR Amplitude 65 0.0 Reconstructed signal Original signal 60 0 1 2 3 4 5 6 0 0.25 0.5 Phase (rad) Input frequency(GHz) (a) (b)

ENOB=12.5 bits Reconstructed signal ENOB=9.1 bits 1.0 1.0 Original signal ) ) V V ( (

0.5 0.5 Amplitude Amplitude

Stage-M Reconstructed signal ... 0.0 0.0 Original signal

LSB 0 1 2 3 4 5 6 0 0.2 0.4 0.6 0.8 1.0 Phase (rad) Input amplitude (V) (a) (b)

Stage-1 ... Stage-i ... Stage-9 Stage-10 Sub-ADC Reconstructed signal ENOB=9.1 bits VIN 1.0 Original signal 1-bit 1-bit 1-bit 2-bit 3-bit ) V bits 85 ( 1.0 0.5 80 ) ) V ( Amplitude dB

( 75 0.5 0.0 70 SNDR SNDR 0 0.2 0.4 0.6 0.8 1.0 Amplitude 65 Input amplitude (V) Reconstructed signal N3i N3i 0.0 20 0.4 (b) Original signal N2i N2i 60 0 1 2 3 4 5 6 0 0.25 0.5 N1i N1i Input frequency(GHz) 15 Phase (rad) ) 0.2 (a) (b) 2 mW ( 10 (mm ) 0.1 Reconstructed signal ENOB=9.1 bits 1.0 Original signal Area Power Power 5 0.05

) 0.025 0 V 1.0 1.1 1.2 1.3 ( 1.4 1.0 1.1 1.2 1.3 1.4 Speed (GS/s) 0.5 Speed (GS/s)

(a) Amplitude (b)

0.0

0 0.2 0.4 0.6 0.8 1.0

Input amplitude (V)

(b)

Ni  3 Ni  3 20 0.4 Ni  2 Ni  2 Ni  1 Ni  1 15 ) 0.2 mW ( 10 0.1 Area Power Power 5 0.05

0 0.025 1.0 1.1 1.2 1.3 1.4 1.0 1.1 1.2 1.3 1.4 Speed (GS/s) Speed (GS/s) Stage-1 ... Stage-i ... Stage-9 Stage-10 (a) (b) VIN 1-bit 1-bit 1-bit 1-bit

Reconstructed signal ENOB=9.1 bits 1.0 Original signal 85 ) ENOB 12.5bits

V 1.0 ( 20 0.4 fin  0.5GHz 80 fS  1GHz Reconstructed signal

0.5 ) 1.0 )

V Original signal

15 ( ) 0.2 dB

( 75 ) Amplitude mW 0.5 V ( 10 0.1 ( 70 0.0 0.5 SNDR Amplitude Area 0 0.2 0.4 0.6 0.8 1.0 Power 5 0.05 65 Reconstructed signal Input amplitude (V) 0.0 Original signal Amplitude 0 0.025 60 1.0 1.1 1.2 1.3 1.4 1.0 1.1 1.2 1.3 1.4 0 1 2 3 4 5 6 0 0.25 0.5 Speed (GS/s) Speed (GS/s) Phase (rad) 0.0 Input frequency(GHz) (a) (b) (a) 0 0.2 0.4 0.6 0.8 (b1.)0 Input amplitude (V) Fig. 6: Design trade-offs of three distinct stages, with resolution Ni = 1, 2, 3 Fig. 7: (a) Reconstruction of a 14-bit pipelined NNADC with 3-bit RRAM respectively. (a) Power VS speed. (c) Area VS speed. whose pipelined chain consists of eleven stages: nine 1-bit stages, one 2-bit stage and one 3-bit sub-ADC. (b) SNDR trend. proposed ADC with a given resolution can be derived by Reconstructed signal solving the following optimization problem: 1.0 Original signal

ENOB )

min F oMW = P/(2 · fS) V (

min AADC 0.5

 M ENOB 9.1bits Amplitude  P f 0.5GHz  ENOB ≤ Ni Ni ∈ {1, 2, 3}, in  f 1GHz  i=1 0.0 S  M  P (10) 0 0.2 0.4 0.6 0.8 1.0  P = Pi Pi ∈ {E1,E2,E3}, Input amplitude (V) s.t. i=1 Fig. 8: A 10-bit logarithmic NNADC with ten 1-bit stages.  fS = min {fS,i} fS,i ∈ {f1, f2, f3},  1≤i≤M  M the sampling frequency (×2 of input signal frequency) of  P  AADC = As,i As,i ∈ {A1,A2,A3}. the proposed 14-bit NNADC is well above 1GHz. i=1 Finally, we train a nonlinear ADC based on the same Here, the first optimal objective F oMW (fJ/conv) is a stan- methodology using a logarithmic encoding on the input sig- dard figure-of-merit that describes the energy consumption of nal by replacing Vin in Eq. (3) with Vin,log = VDD ·log2(a+1) one conversion for an ADC, and the second optimal objective (a ∈ {0, 1}) to train a 1-bit stage. We find that a 10-bit AADC is the area of the proposed ADC. We set F oMW as logarithmic ADC with 9.1-bit ENOB working at 1GS/s can the main optimal objective, since energy efficiency usually be achieved by cascading ten such 1-bit stages, and the is the most important consideration for most applications. reconstructed signal is illustrated in Fig. 8. In this way, as shown in Fig. 7, we can obtain an optimal D. Performance Comparisons design for a maximum 14-bit pipelined NNADC with 12.5 1) Comparison with existing NNADCs: bits of ENOB, and 11.6fJ/conv of F oMW working at 1GS/s. We first design It showcases the advantages of our proposed co-design an optimal 8-bit NNADC by cascading eight 1-bit stages in framework that incorporates many circuit-level non-idealities Section IV-B and compare it with previous NNADCs [6], [7]. in the training process, allowing us to realize a robust design The comparative data are summarized in the left columns of cascading up to eleven stages, a level often unattainable with Table I. Compared with them, the proposed 8-bit NNADC traditional pipelined ADCs. can achieve the same resolution and higher energy effi- ciency with ultra-low precision 3-bit RRAM devices. Both C. Full Pipelined NNADC Evaluation NNADC1 and NNADC2 adopt a typical NN (Hopfield or We choose the three distinct stages in Section IV-B to eval- MLP) architecture to directly train an 8-bit ADC without uate the quantization ability of the proposed full pipelined the optimization of architecture; therefore, they needs high- NNADC. We find that although the co-design framework can precision RRAM to achieve the targeted resolution of ADC. help us to train a low-resolution stage to approximate the NNADC1 uses a large size (1 × 48 × 16) three-layer MLP ideal quantization function and residue function with high- as the circuits model, where parasitic aggregations on the fidelity, the minor discrepancy between the trained stage and large size crossbar array degenerates the conversion speed. In ideal stage will propagate and aggregate along the pipeline addition, more hidden neurons are used in NNADC1 which and finally results in a wrong quantization. Our simulations consume more energy. Since each stage in the proposed 8- based on various combinations of different pipeline stages bit NNADC resolves only 1-bit and has very small size, show that a maximum 14-bit pipelined NNADC working at it can achieve faster conversion speed with higher energy- 1GS/s can be achieved by cascading nine 1-bit stages, one 2- efficiency, and high-resolution with low-precision RRAM bit stage and one 3-bit sub-ADC with 3-bit RRAM precision. devices. Please note that the F oMW reported in NNADC2 is Note that the last stage of the 14-bit pipelined NNADC does based on sampling a low frequency (44KHz) signal at high not need to generate residue. The reconstructed signal of frequency (1.66GHz). Therefore, it is considered outside the this 14-bit ADC is shown in Fig. 7(a), where the ENOB is scope of a Nyquist ADC, and cannot be compared directly 12.5 bits under 1GHz sampling frequency. We also report the with our work on the same F oMW basis. SNDR trend with input signal frequency in Fig. 7(b). The 2) Comparison with traditional nonlinear ADCs: We then SNDR begins to degenerate after 0.5GHz input, verifying compare the trained 10-bit logarithmic ADC with state-of- TABLE I: Performance comparison with different types of ADCs. ADC types NNADC Nonlinear ADC Uniform ADC Work NNADC1 [7]* NNADC2 [6]* This work* JSSC 09’ [11]** ISSCC 18’ [3]** This work* JSSC 15’ [12]** This work* Technology (nm) 130 180 130 180 90 130 65 130 Supply (V ) 1.2 1.2 1.5 1.62 1.2 1.5 1.2 1.5 Area (mm2) 0.2 0.005/0.01 0.02 0.56 1.54 0.03 0.594 0.1 Power (mW ) 30 0.1/0.65 25 2.54 0.0063 31.3 49.7 67.5 fS (S/s) 0.3G 1.66G/0.74G 1G 22M 33K 1G 0.25G 1G Resolution (bits) 8 4/8 8 8 10 10 12 14 ENOB (bits) 7.96 3.7/(N/A) 8 5.68 9.5 9.1 10.6 12.5 F oMW (fJ/c) 401 8.25/7.5 97.7 2380 263 57 108.5 11.6 RRAM precision 9 6/12 3 N/A N/A 3 N/A 3 Reconfigurable ? Yes Yes/Yes Yes No Yes Yes No Yes * The results are shown based on simulation. ** The results are shown on chip. the-art traditional nonlinear ADCs [3], [11]. The comparative work with different types of state-of-the-art ADCs. The data are summarized in the middle columns of Table I. As it comparison results demonstrate the compelling advantages of shows, the proposed 10-bit logarithmic ADC has competitive the proposed NN-inspired ADC with pipelined architecture advantages in area, sampling rate, and energy efficiency. in high energy efficiency, high ENOB and fast conversion JSSC 09’ [11] uses a pipelined architecture to implement speed. This work opens a new avenue to enable future an 8-bit logarithmic ADC. Due to the devices mismatch, intelligent analog-to-information interfaces for near-sensor its ENOB degenerates a bit from the targeted resolution. analytics using NN-inspired design methodology. > ISSCC 18’ [3] requires 10-bit capacitive DAC to achieve a ACKNOWLEDGEMENT configurable 10-bit nonlinear quantization resolution; there- This work was partially supported by the National Science fore, it can achieve high ENOB but only works at ∼KS/s Foundation (CNS-1657562). with significant area overhead. Since we adopt the proposed training framework to directly train a log-encoding signal REFERENCES using small-sized NN models and incorporating device non- [1] R. LiKamWa et al., “RedEye: Analog ConvNet Image Sensor Archi- idealities, we can achieve a logarithmic ADC with small area, tecture for Continuous Mobile Vision,” IEEE ISCA, 2016, pp. 255-266. [2] B. Li et al., “RRAM-Based Analog Approximate Computing,” IEEE high sampling rate and high ENOB. TCAD, vol. 34, no. 12, pp. 1905-1917, 2015. 3) Comparison with traditional uniform ADC: Finally, [3] J. Pena-Ramos et al., “A Fully Configurable Non-Linear Mixed-Signal we compare the trained 14-bit uniform ADC with state-of- Interface for Multi-Sensor Analytics,” IEEE JSSC, vol. 53, no. 11, pp. 3140-3149, Nov. 2018. the-art traditional uniform ADC. The comparative data are [4] M. Buckler et al., “Reconfiguring the Imaging Pipeline for Computer summarized in the right columns of Table I. It shows that the Vision,” IEEE ICCV, 2017, pp. 975-984. proposed 14-bit NNADC has competitive advantages in sam- [5] L. Gao et al., “Digital-to-analog and analog-to-digital conversion with metal oxide memristors for ultra-low power computing,” IEEE/ACM pling rate, ENOB, and energy efficiency. JSSC 15’ [12] uses NanoArch, 2013, pp. 19-22. power hungry op-amps and dedicated calibration techniques, [6] L. Danial et al., “Breaking Through the Speed-Power-Accuracy Trade- resulting in the power consumption overhead and degen- off in ADCs Using a Memristive Neuromorphic Architecture,” IEEE TETCI, vol. 2, no. 5, pp. 396-409, Oct. 2018. eration of conversion speed. The proposed 14-bit NNADC [7] W. Cao et al., “NeuADC: Neural Network-Inspired RRAM-Based uses low-resolution stages with very small NN size, enabling Synthesizable Analog-to-Digital Conversion with Reconfigurable faster conversion speed with higher energy efficiency. The Quantization Support,” DATE, 2019, pp. 1456-1461. [8] T. F. Wu et al., “14.3 A 43pJ/Cycle Non-Volatile Microcontroller with slight ENOB degeneration of the proposed ADC is caused 4.7µs Shutdown/Wake-up Integrating 2.3-bit/Cell Resistive RAM and by the discrepancy (between the trained stage and ideal stage) Resilience Techniques,” IEEE ISSCC, 2019, pp. 226-228. propagation along the pipeline stages. Also note that the [9] Y. Cai et al., “Training low bitwidth convolutional neural network on RRAM,” ASP-DAC, 2018, pp. 117-122. performance of the proposed NNADCs and the performance [10] H. -. P. Wong et al., “MetalOxide RRAM,” in Proceedings of the IEEE, of previous NNADCs are based on simulations, while the vol. 100, no. 6, pp. 1951-1970, June 2012. performance of the traditional nonlinear ADCs and uniform [11] J. Lee et al., “A 2.5mW 80 dB DR 36dB SNDR 22 MS/s Logarithmic Pipeline ADC,” IEEE JSSC, vol. 44, no. 10, pp. 2755-2765, Oct. 2009. ADC are based on measurements. [12] H. H. Boo et al., “A 12b 250 MS/s Pipelined ADC With Virtual Ground Reference Buffers,” IEEE JSSC, vol. 50, no. 12, pp. 2912-2921, 2015. V. CONCLUSION [13] Kurt Hornik, “Approximation capabilities of multilayer feedforward networks,” Neural Networks, vol. 4, issue. 2, pp. 251-257, 1991. In this paper, we present a co-design methodology that [14] W. Cao et al., “NeuADC: Neural Network-Inspired Synthesizable Analog-to-Digital Conversion,” IEEE TCAD, 2019, Early Access. combines a pipelined hardware architecture with a cus- [15] Kingma et al, “Adam: A method for stochastic optimization,” arXiv tom NN training framework to achieve high-resolution NN- preprint arXiv:1412.6980, 2014. inspired ADC with low-precision RRAM devices. A sys- [16] P. Chen and S. Yu, “Compact Modeling of RRAM Devices and Its Applications in 1T1R and 1S1R Array Design,” IEEE TED, vol. 62, tematic design exploration is performed to search the design no. 12, pp. 4022-4028, Dec. 2015. space of the sub-ADCs and residue blocks to achieve a [17] B. Li, et al., “MErging the Interface: Power, area and accuracy balanced trade-off between speed, area, and power consump- co-optimization for RRAM crossbar-based mixed-signal computing system,” IEEE ACM/EDAA/IEEE DAC, 2015, pp. 1-6. tion of each distinct low-resolution stages. Using SPICE [18] Weidong Cao et al., “A 40Gb/s 39mW 3-tap adaptive closed-loop simulation, we evaluate our design based on various ADC decision feedback equalizer in 65nm CMOS,” IEEE MWSCAS, 2015, metrics and perform a comprehensive comparison of our pp. 1-4.