eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Converters and Charge Domain Computing

Shanshan Xie1, Can Ni1, Aseem Sayal1, Pulkit Jain2, Fatih Hamzaoglu2, Jaydeep P. Kulkarni1

1University of Texas at Austin, Austin, TX 2Intel, Hillsboro, OR

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 1 of 60 Self Introduction

Education • B.S. 2018 Worcester Polytechnic Institute • Ph.D. student at University of Texas at Austin since 2018 • Circuit Research Lab: https://sites.utexas.edu/CRL/ • Advisor: Professor Jaydeep P. Kulkarni Experience • Analog Design Intern, Texas Instruments, 2019 • Application Engineering Intern, Analog Devices Inc., 2018 Shanshan Xie • Test Engineering Intern, Analog Devices Inc., 2017 • Product Engineering Intern, Analog Devices Inc., 2016 Email: [email protected] Research Area Website: http://cyanxie.com/ • Mixed signal approaches for ML accelerators • Compute-in-memory techniques

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 2 of 60 Outline n Compute-in-memory: motivation and challenges

n Proposed eDRAM-CIM macro

l Overall architecture

l Digital-to-analog conversion (DAC)

l Multiply-accumulate-averaging (MAV) computation

l Analog-to-digital conversion (ADC) n Chip measurement and classification accuracy results

n Summary

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 3 of 60 Outline n Compute-in-memory: motivation and challenges

n Proposed eDRAM-CIM macro

l Overall architecture

l Digital-to-analog conversion (DAC)

l Multiply-accumulate-averaging (MAV) computation

l Analog-to-digital conversion (ADC) n Chip measurement and classification accuracy results

n Summary

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 4 of 60 Deep Neural Network Applications

AI High AI Cloud Edge Computing Performance Computing Face Detection Object Detection Data Center

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 5 of 60 DNN Model Size Challenges

Total Number of MACs Energy Efficiency Challenges

341k LeNet-5 1 Data Intensive 724M AlexNet 2 Huge Data Movement 2.8G OverFeat 3 Data Transfer Latency 3.9G ResNet50

1.43G GoogleNet v1 4 Computation Energy Cost

15.5G VGG16 5 Memory Access Energy Cost

Ref: Vivienne Sze, Proceedings of the IEEE 105.12 2017

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 6 of 60 eDRAM-CIM Motivation

Operation Energy(pJ) Computation Energy Cost Integer Add (32b) 0.1 Integer Multiply (32b) 3.1 Floating Point Add (32b) 0.9 Floating Point Multiply (32b) 3.7 Memory Access Energy Cost 8KB SRAM (64b) 10 ~650X 1MB SRAM (64b) 100 DRAM 2000

DRAM memory access is expensive: Ref: Mark Horowitz, ~650X higher than computation energy cost ISSCC 2014

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 7 of 60 eDRAM Advantages for CIM

ü Highest bitcell density ü High access speed ü Low read/write energy Desired attributes for CIM designs ü High write endurance ü High bandwidth

Ref: Gregory Fredeman, JSSC 2015; Fatih Hamzaoglu, JSSC 2014; Nasser Kurd, JSSC 2015 © 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 8 of 60 Our Approach: eDRAM based CIM

MAV Compute Array

• Directly perform computation inside embedded DRAM (eDRAM) • Reconfigure existing 1T1C eDRAM columns as charge domain computing units • Reduce off-chip memory access and latency

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 9 of 60 Outline n Compute-in-memory: motivation and challenges

n Proposed eDRAM-CIM macro

l Overall architecture

l Digital-to-analog conversion (DAC)

l Multiply-accumulate-averaging (MAV) computation

l Analog-to-digital conversion (ADC) n Chip measurement and classification accuracy results

n Summary

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 10 of 60 eDRAM CIM Overall Architecture: DAC

Data Flow *DAC: Digital-to-Analog Converter

1. Load Data to eDRAM Load Input to DAC eDRAM array Bitcell

Va & Va_bar

2. Generate Analog Values Va & Va_bar

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 11 of 60 eDRAM CIM Overall Architecture: MAV *DAC: Digital-to-Analog Converter *MAV: Multiply-Accumulation-Average

3. Load Weights to MAV Compute Array

VMAV

4. Generate VMAV © 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 12 of 60 eDRAM CIM Overall Architecture: Partial Sum *DAC: Digital-to-Analog Converter *MAV: Multiply-Accumulation-Average

Store Partial Sum

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 13 of 60 eDRAM CIM Overall Architecture: Partial Sum *DAC: Digital-to-Analog Converter *MAV: Multiply-Accumulation-Average

Pooling & ReLU

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 14 of 60 eDRAM CIM Overall Architecture: ADC *DAC: Digital-to-Analog Converter *SAR: Successive-Approximation *MAV: Multiply-Accumulation-Average *ADC: Analog-to-Digital Converter

5. Generate VREF

6. Final YOUT

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 15 of 60 eDRAM CIM Overall Architecture *DAC: Digital-to-Analog Converter *SAR: Successive-Approximation *MAV: Multiply-Accumulation-Average *ADC: Analog-to-Digital Converter

3. Load Weights to MAV Compute Array 5. Generate VREF 1. Load Data to eDRAM Bitcell

2. Generate Analog Values Va & Va_bar 4. Generate VMAV 6. Final YOUT

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 16 of 60 In-eDRAM Differential Digital-to-Analog Converter (DAC)

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 17 of 60 In-eDRAM DAC – Positive DAC eDRAM Bitcell Storage Constant “1”

X3

X2 Input Data (4b)

X1

X0

Constant “0”

32 charge share result 16×1 + 8×� + 4×� + 2×� + 1×� � = � ∗ 32

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 18 of 60 In-eDRAM DAC – Negative DAC eDRAM Bitcell Storage Constant “0”

X3

X2 Complimentary to positive DAC X1

X0

Constant “1”

32 capacitors charge share result

8×� + 4×� + 2×� + 1×� + 1 � = � ∗ _ 32

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 19 of 60 Differential Analog Voltages, Va and Va_bar

VDD

Va Positive DAC

Zero Reference=VDD/2

Va_bar Negative DAC

VSS

• Prepare differential analog voltages (Va and Va_bar) for sign computation

• Positive DAC: Va; Negative DAC: Va_bar • Depends on sign of the weight, choose either Va or Va_bar

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 20 of 60 Positive DAC Demonstration

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 21 of 60 8b In-eDRAM DAC Dataflow – Step 1 Step 1: Load lower 4- to DAC Constant ‘1’ WL<5> 0 WL<4> Turn on twice to make sure 16 capacitors are fully charged WL<3> 0 Input WL<2> Data 0 WL<1> WL<0> Load data on to memory array 1 WLx16 using eDRAM peripheral circuits Constant ‘0’ WLx1 (not shown in the diagram) DAC_CS

Vx16, Vx1 Floating Va/Va_bar Va Step 1

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 22 of 60 8b In-eDRAM DAC Dataflow – Step 2 Step 2: Charge Share Data Flow

1 WL<5> 0 WL<4> Activate all wordlines for WL<3> charge share to convert 0 WL<2> to analog 0 WL<1> voltage WL<0> 1 WLx16 0 WLx1 DAC_CS

Vx16 Voltage is sampled on Vx16, Vx1 Cx1

Va/Va_bar Va Step 1 Step 2 Vx1 © 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 23 of 60 8b In-eDRAM DAC Dataflow – Step 3 Step 3: Load higher 4-bit to DAC Constant ‘1’ WL<5> 1 WL<4> WL<3> 1 Input WL<2> Data 0 WL<1> WL<0> 1 WLx16 Constant ‘0’ WLx1 DAC_CS

Vx16, Vx1

Va/Va_bar Va Step 1 Step 2 Step 3

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 24 of 60 8b In-eDRAM DAC Dataflow – Step 4 Step 4: Charge Share Data Flow

1 WL<5> 1 WL<4> WL<3> 1 WL<2> 0 WL<1> WL<0> 1 WLx16 0 WLx1 DAC_CS Voltage is sampled on Cx16 capacitor Vx16 Vx16, Vx1

Va/Va_bar Va Step 1 Step 2 Step 3 Step 4 Vx1 © 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 25 of 60 8b In-eDRAM DAC Dataflow – Step 5 Data Flow Step 5: Generate Va and Va_bar

WL<5> WL<4> WL<3> WL<2> WL<1> WL<0> WLx16 WLx1 DAC_CS Charge share to generate final analog voltage

Vx16 Vx16, Vx1

Va generated Va Step 1 Step 2 Step 3 Step 4 Step 5 Vx1 © 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 26 of 60 1T1C In-eDRAM Differential DAC Demonstration

V a VDD/2 = 600mV

100ns VCx1 Va_bar 0.5V VCx16 Load 4b LSB CS Load 4b MSB CS Generate *CS=Charge Share DAC_CS Va & Va_bar WLx16 WLx1 WL<5> WL<0> 2 cycles: Make sure 16 capacitors are fully charged • Signals match the expected waveforms

• Va and Va_bar are symmetrical around VDD/2 (600mV) • Higher 4 are loaded last: minimize capacitor leakage on final charge share value

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 27 of 60 In-eDRAM DAC Characterization: Measured Results

Positive DAC Negative DAC

• Systematic deviation: positive DAC saturate at high value (can be correct during training)

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 28 of 60 In-eDRAM DAC Characterization: Zoomed In Positive DAC Negative DAC

• Good linearity, DNL<1LSB

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 29 of 60 In-eDRAM Multiply-Accumulate- Averaging (MAV) Computation

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 30 of 60 eDRAM-CIM: MAV Computation

Weights are stored in 1T1C eDRAM bitcell: W7 (sign bit); W6~W0 (magnitude bits)

Analog Voltage from DAC

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 31 of 60 eDRAM-CIM: MAV Computation Data Flow

W7 (sign bit) is used to select

between Va and

Va_bar Voltage from DAC Zero reference

Weight magnitude

(W0~W6) are used as MUX control signals

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 32 of 60 eDRAM-CIM: MAV Computation Data Flow

2:1 MUX: analog dot-product unit

Reuse 1T1C eDRAM bitcell

Voltages are sampled on binary scaled 1T1C eDRAM capacitors

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 33 of 60 eDRAM-CIM: MAV Computation Data Flow

(S0~S6) are closed to charge share between all the 1T1C eDRAM cells

• VMAV = final MAV computation result

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 34 of 60 ReLU

V > V /2 VMAV Zero Reference MAV DD V /2 DD Comparator Final MAV Analog Results VMAV < VDD/2 VDD/2

• Comparator compares VMAV and zero reference voltage VDD/2

• If VMAV > VDD/2 à output VMAV

• If VMAV < VDD/2 à output VDD/2

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 35 of 60 In-eDRAM Analog-to-Digital Converter (ADC)

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 36 of 60 eDRAM-CIM: ADC Operation

st VREF Generation Waveforms for 1 (b=1) conversion cycle

VREF2 VMAV

VREF1 VREF1 V REF2 VREF3 VREF1 VREF2 VREF3 VREF3 YOUT[7] SA2_OUT

SA3_OUT YOUT[6]

• Step 1: Charge corresponding bitcells

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 37 of 60 eDRAM-CIM: ADC Operation

st VREF Generation Waveforms for 1 (b=1) conversion cycle

VREF2 VMAV

VREF1 VREF1 V REF2 VREF3 VREF1 VREF2 VREF3 VREF3 YOUT[7] SA2_OUT

SA3_OUT YOUT[6]

• Step 1: Charge corresponding bitcells • Step 2: IDLE

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 38 of 60 eDRAM-CIM: ADC Operation

st VREF Generation Waveforms for 1 (b=1) conversion cycle

VREF2 VMAV ¾VDD ½VDD VREF1 VREF1 VREF2 ¼VDD VREF3 VREF1 VREF2 VREF3 VREF3 VMAV > VREF1 ½VDD ¾VDD ¼VDD YOUT[7] SA2_OUT VMAV < VREF2

SA3_OUT YOUT[6]

• Step 1: Charge corresponding bitcells • Step 2: IDLE • Step 3: Charge Share to generate reference voltages

YOUT[7] YOUT[6] © 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 39 of 60 eDRAM-CIM: ADC Operation

st VREF Generation Waveforms for 1 (b=1) conversion cycle

VREF2 VMAV

VREF1 VREF1 VREF2 VREF3 VREF1 VREF2 VREF3 VREF3 YOUT[7] SA2_OUT

SA3_OUT YOUT[6]

• Step 1: Charge corresponding bitcells • Step 2: IDLE • Step 3: Charge Share to generate reference voltages • Step 4: Discharge and reset 1T1C eDRAM capacitors

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 40 of 60 2b/Conversion In-eDRAM ADC Waveforms

VMAV VREF1 VREF2 VREF3

VMAV

• 2b/conversion cycle

10 10 00 11 • 2 extra eDRAM columns

• 3 reference voltages (VREF1~VREF3)

V Step 4 REF2 • Minimize VMAV capacitor leakage

VREF1 during conversion VREF3 Step 1 Step 2 Step 3

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 41 of 60 eDRAM-CIM: Adaptive ADC Motivation

Convolution Outputs (CiFAR-10 dataset Layer 1) X = Clipping Threshold µ=-0.07 Outputs Outputs σ=25.7 are are X X clipped clipped

VLB VHB VLB VHB Occurrence Frequency Occurrence Frequency

Output Data YOUT Output Data YOUT

Infrequent appearing MAV computation outputs (VMAV) are trimmed to mitigate ADC energy and latency

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 42 of 60 Clipping In-eDRAM ADC Data Flow

VMAV VREF2 = VHB

VREF2 LB HB VREF3 = V = V ADC_DONE REF3 REF2 V V SA2_OUT

SA3_OUT

• Step 1: Charge corresponding bitcells • Step 2: IDLE

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 43 of 60 Clipping In-eDRAM ADC Data Flow

VMAV VREF2 = VHB VMAV>VHB

VREF2 LB HB

DD VREF3 = VLB VREF3 DD = V = V ADC_DONE = ¾ V REF3 REF2 V = ¼ V V SA2_OUT

SA3_OUT

• Step 1: Charge corresponding bitcells • Step 2: IDLE • Step 3: Charge share to generate reference voltages

• Step 4: Compare à VMAV>VHB? or VMAV

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 44 of 60 Clipping ADC Simulation Waveforms

VMAV VREF2 VREF3 ADC_DONE

VMAV>VHB

Step 1 Step 2 Step 3 Step 4 Step 5

• Skip subsequent ADC conversion cycles if VMAV is higher than VHB or lower than VLB

• If VMAV is in conversion range, in-eDRAM ADC will perform regular SAR conversion

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 45 of 60 Clipping Threshold vs Accuracy and Energy

60%

Clipping threshold

DNN accuracy

ADC Energy

Optimum clipping threshold = 60%

*ADC with clipping energy/without clipping energy

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 46 of 60 Clipping vs Non-Clipping ADC Tradeoff

1 5 1.21% - - 3.79% Accuracy (%) Accuracy (%) Measured Top Measured Top Measured

1.14X

1.14X (GOPS) Throughput ADC Energy (pJ)

• Throughput is improved by 1.14X while incurring 3.79% (1.21%) drop in Top-1 (Top-5) in CIFAR-10

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 47 of 60 In-eDRAM ADC Measurement Results

(VHB, 191)

(VLB, 64)

Simulated Simulated DNL (LSB) DNL DNL (LSB) DNL

Code Code • Linear for both clipping and non-clipping in-eDRAM ADC

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 48 of 60 Outline n Compute-in-memory: motivation and challenges

n Proposed eDRAM-CIM macro

l Overall architecture

l Digital-to-analog conversion (DAC)

l Multiply-accumulate-averaging (MAV) computation

l Analog-to-digital conversion (ADC) n Chip measurement and classification accuracy results

n Summary

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 49 of 60 Chip Micrograph

• 65nm native CMOS process

• Supply Voltage: 1~1.2V Input image eDRAM • Input Clock Frequency: scanchain1 Weight array

50~200MHz DAC scanchain2 MAV computation array

2 0.669mm • Bitcell Size: 22.08um ADC test MAV Main ADC test • Macro Area: 0.57mm2 structure structure

• eDRAM Capacitor Type:

Metal-Oxide-Metal (MOM) 1.759mm

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 50 of 60 Test-chip Area Breakdown

Input image array Unit: mm2 DAC array Input image eDRAM scanchain1 Weight array

Weight DAC scanchain2 array MAV computation array 0.669mm ADC test MAV MAV Main ADC test structure structure computation array Non-Macro: ADC array scan-chains & test structures eDRAM-CIM 1.759mm Macro Area

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 51 of 60 CIFAR-10 Classification Details

64 Filters 64 Filters Input Image 32 Filters 32 Filters

Avg. Avg. c 2x2 3x3 Pooling Pooling 3x3 2x2 … 3x3 …

Input Xi CONV1 CONV2 CONV3 CONV3 (32x32x3) (32,32,32) (32,32,32) (16,16,64) (16,16,64) FC1 FC2 4096x512 512x10 • 4 convolution layers, 2 average pooling layers, and 2 fully connected layers • Input precision: 8 bits • Weight precision: signed 8 bits • Output precision: 8 bits

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 52 of 60 Classification Accuracy on CIFAR-10 Dataset

Top-1 Accuracy Top-5 Accuracy 1T1C eDRAM-CIM 1T1C eDRAM-CIM 100% 100% 83.27% 82.96% 82.60% 98.43% 98.40% 98.40% 98.10% 96.89% 80% 80.10% 76.31% 80%

60% 2.86% 60% 0.3%

40% 40% 1 Accuracy 1 Accuracy 5 - 6.29% - 1.51% 20% 20% Top Top 0% 0% Software Without With 60% Without With 60% Software Without With 60% Without With 60% 8-bit INT Clip Clip Clip Clip 8-bit INT Clip Clip Clip Clip

Simulated over Measured over Simulated over Measured over 10,000 images 5,000 images 10,000 images 5,000 images • Measured Top-5 accuracy is dropped 0.3% with non-clipping ADC and dropped 1.51% with clipping ADC • Comparing with 8b software accuracy, eDRAM-CIM preserved good accuracy results

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 53 of 60 Comparison with Prior Works This work ISSCC’20 [2] ISSCC’20 [3] ISSCC’20 [4] ISSCC'18 [5] Technology 65nm 28nm 28nm 22nm 65nm 6T + Local 1T1R SLC 1T1C eDRAM 6T SRAM Computing 6T SRAM Structure ReRAM SRAM Array Size 16Kb 64Kb 64Kb 2Mb 128Kb Input Precision (bit) 8 8 8 4 8 Weight Precision (bit) 8 8 8 4 8 Supply Voltage (V) 1~1.2 0.85~1.0 0.7~0.9 0.8 1 Dataset CIFAR-10 CIFAR-10 MNIST CNN: 4 CONV + CNN: ResNet- CNN: Model N/A SVM 2 Pooling + 2 FC 20 ResNet-20 80.1% (Top-1), Measured Accuracy 591.91% 592.02% N/A 583.27% 98.1 % (Top-5) Throughput (GOPS) 1,34.71 N/A N/A N/A 4 Average Energy 7.3 14.08 28.93 14.76 3.125 Efficiency (TOPS/W) 2(1.35) 2(2.61) 2(3.31) GOPS/mm2 8.26 N/A N/A N/A 2.78 4FoM 304.6 86.4 167 53 201.6 1measured at 1.1V 2Scaled to 65nm, assume energy ∝ (Tech.)2 [7] 3 Limited by clocking infrastructure, chip size, technology and bit cell area 4FoM = input precision x weight precision x energy efficiency (scaled to 65nm) 5Top-1 or Top-5 is not mentioned

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 54 of 60 Comparison with Prior Works This work ISSCC’20 [2] ISSCC’20 [3] ISSCC’20 [4] ISSCC'18 [5] Technology 65nm 28nm 28nm 22nm 65nm 6T + Local Memory Cell 1T1R SLC 1T1C eDRAM 6T SRAM Computing 6T SRAM Structure ReRAM SRAM Array Size 16Kb 64Kb 64Kb 2Mb 128Kb Input Precision (bit) 8 8 8 4 8 Weight Precision (bit) 8 8 8 4 8 Supply Voltage (V) 1~1.2 0.85~1.0 0.7~0.9 0.8 1 Dataset CIFAR-10 CIFAR-10 MNIST CNN: 4 CONV + CNN: ResNet- CNN: Model N/A SVM 2 Pooling + 2 FC 20 ResNet-20 80.1% (Top-1), Measured Accuracy 591.91% 592.02% N/A 583.27% 98.1 % (Top-5) Throughput (GOPS) 1,34.71 N/A N/A N/A 4 Average Energy 7.3 14.08 28.93 14.76 3.125 Efficiency (TOPS/W) 2(1.35) 2(2.61) 2(3.31) GOPS/mm2 8.26 N/A N/A N/A 2.78 4FoM 304.6 86.4 167 53 201.6 1measured at 1.1V 2Scaled to 65nm, assume energy ∝ (Tech.)2 [7] 3 Limited by clocking infrastructure, chip size, technology and bit cell area 4FoM = input precision x weight precision x energy efficiency (scaled to 65nm) 5Top-1 or Top-5 is not mentioned

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 55 of 60 Comparison with Prior Works This work ISSCC’20 [2] ISSCC’20 [3] ISSCC’20 [4] ISSCC'18 [5] Technology 65nm 28nm 28nm 22nm 65nm 6T + Local Memory Cell 1T1R SLC 1T1C eDRAM 6T SRAM Computing 6T SRAM Structure ReRAM SRAM Array Size 16Kb 64Kb 64Kb 2Mb 128Kb Input Precision (bit) 8 8 8 4 8 Weight Precision (bit) 8 8 8 4 8 Supply Voltage (V) 1~1.2 0.85~1.0 0.7~0.9 0.8 1 Dataset CIFAR-10 CIFAR-10 MNIST CNN: 4 CONV + CNN: ResNet- CNN: Model N/A SVM 2 Pooling + 2 FC 20 ResNet-20 80.1% (Top-1), Measured Accuracy 591.91% 592.02% N/A 583.27% 98.1 % (Top-5) Throughput (GOPS) 1,34.71 N/A N/A N/A 4 Average Energy 7.3 14.08 28.93 14.76 3.125 Efficiency (TOPS/W) 2(1.35) 2(2.61) 2(3.31) GOPS/mm2 8.26 N/A N/A N/A 2.78 4FoM 304.6 86.4 167 53 201.6 1measured at 1.1V 2Scaled to 65nm, assume energy ∝ (Tech.)2 [7] 3 Limited by clocking infrastructure, chip size, technology and bit cell area 4FoM = input precision x weight precision x energy efficiency (scaled to 65nm) 5Top-1 or Top-5 is not mentioned

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 56 of 60 Test-chip Summary

Advanced Technology Test-chip Summary Node Estimation Technology 65nm CMOS 22nm tri-gate CMOS Total eDRAM Size 16Kb 238.4MB Bitcell Size (µm2) 22.08 0.029 Unit eDRAM Capacitor Size (fF) 13 13 eDRAM Capacitor Type Metal-oxide-metal (MOM) Capacitor-over-bitline (COB) 50 MHz (measured) Input Clock Frequency 1.6GHz 200MHz (simulation) GOPS/mm2 8.26 1305 MAV Operation Latency (ns) 10 (2 cycles) 1.25 (2 Cycles) w/o ADC Clipping 4.037 GOPS 19.5 TOPS 1Throughput @ 1.1V with 60% ADC Clipping 4.71 GOPS 22.2 TOPS 3Average Energy Efficiency (TOPS/W) @ 1.1V 4.76 11.12 Energy/MAV (fJ/MAV) with 60% ADC Clipping @ 30.6 40.21 1.1V 1Assume 1MAV = 2 operations 3Excluded scan-chains power 2Assume 30% utilization of 128MB (77mm2) [6] eDRAM memory for CIM usage 4Increased parallelism in large-capacity eDRAM array

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 57 of 60 Outline n Compute-in-memory: motivation and challenges

n Proposed eDRAM-CIM macro

l Overall architecture

l Digital-to-analog conversion (DAC)

l Multiply-accumulate-averaging (MAV) computation

l Analog-to-digital conversion (ADC) n Chip measurement and classification accuracy results

n Summary

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 58 of 60 Summary

• 1T1C eDRAM bit cell: good candidate for compute-in-memory design • Repurposing existing 1T1C eDRAM bitcell • Intrinsic charge share operation in eDRAM bitcell • Support 8b input and 8b signed/unsigned weight for MAV operations • Measured Top-5 (Top-1) peak accuracy 98.1% (80.1%) on CIFAR-10 • Measured average energy efficiency 4.76 TOPS/W • Measured throughput 4.71GOPS • FoM metrics can exceed significantly in an advanced eDRAM technology • Potential for scalability to advanced eDRAM technology node

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 59 of 60 Acknowledgements

• Clifford Ong for technical discussions • for funding support

Thank you for your attention!

© 2021 IEEE 16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array International Solid-State Circuits Conference Realizing Adaptive Data Converters and Charge Domain Computing 60 of 60