Nanosystem Design Kit (NDK): Transforming Emerging Technologies into Physical Designs of VLSI Systems

G. Hills, M. Shulaker, C.-S. Lee, H.-S. P. Wong, S. Mitra

Stanford Massachusetts University Institute of Technology Abundant-Data Explosion

“Swimming in sensors, drowning in data”

Wide variety & complexity Unstructured data

0 40K 0 ExaB (Billionsof GB)

2006 Year 2020

 Mine, search, analyze data in near real-time

 Data centers, mobile phones, robots 2 Abundant-Data Applications Huge memory wall: processors, accelerators

Energy Measurements Genomics classification Natural language processing

5% 18 % 0% 0% … 95 82 % %

Compute Memory

Intel performance counter monitors 2 CPUs, 8-cores/CPU + 128GB DRAM 3 US National Academy of Sciences (2011) 4 Computing Today 2-Dimensional

5 3-Dimensional Nanosystems Computation immersed in memory

6 3-Dimensional Nanosystems Computation immersed in memory

Increased functionality

Fine-grained, Memory ultra-dense 3D

Computing logic

Impossible with today’s technologies 7 Enabling Technologies 3D Resistive RAM

Massive storage

No TSV 1D CNFET, 2D FET

Compute, RAM access thermal

STT MRAM Ultra-dense, Quick access fine-grained 1D CNFET, 2D FET vias Compute, RAM access thermal

1D CNFET, 2D FET Silicon Compute, Power, Clock compatible thermal 8 Nanosystems: Compact Models Essential 3D Resistive RAM

nanohub Massive storage

1D CNFET, 2D FET Compute, RAM access thermal

STT MRAM Quick access m-Cell 1D CNFET, 2D FET Compute, RAM access thermal

1D CNFET, 2D FET Compute, Power, Clock

thermal 9 Compact Models: Insufficient Alone Design for Realistic Systems Wire parasitics Inter-module interface circuits Routing congestion Application-dependent workloads Multiple clock domains Cache architecture Memory access patterns Processor vs. memory … Sleep mode & sleep High performance vs. low power SRAM retention Dynamic power vs. leakage power DRAM refresh Placement utilization

10 Example: OpenSparc T2 Processor Core Design for Realistic Systems Wire parasitics Inter-module interface circuits Routing congestion Application-dependent workloads Multiple clock domains Cache architecture Memory access patterns Processor vs. memory … Sleep mode & sleep transistors High performance vs. low power SRAM retention Dynamic power vs. leakage power DRAM refresh Placement utilization

0.5 FinFET CNFET

energy/cycle (nJ) preferred corner 0.05 0.2 1 5

clock frequency (GHz) 11 Example: OpenSparc T2 Processor Core Design for Realistic Systems Wire parasitics Inter-module interface circuits Routing congestion Application-dependent workloads Multiple clock domains Cache architecture Memory access patterns Processor vs. memory Timing variations Sleep mode & sleep transistors High performance vs. low power Noise immunity SRAM retention Dynamic power vs. leakage power Energy variations DRAM refresh Placement utilization Functional yield …

0.5 FinFET CNFET + variations CNFET

energy/cycle (nJ) preferred corner 0.05 0.2 1 5

clock frequency (GHz) 12 Variation-Aware Nanosystem Design

CNFET + variations co-optimized processing & design

0.5 FinFET CNFET

energy/cycle (nJ) preferred corner 0.05 0.2 1 5

clock frequency (GHz) 13 Our Nanosystem Design Kit (NDK)

• Available: nanohub.org

14 Accessing the NDK

• Link . https://www.nanohub.org/groups/nanosystems

• File . ndk_v2016-12-13.tar

15 NDK: Tool Dependencies

• Tools . CAD tools: lc_shell, dc_shell, icc_shell, Milkyway, StarRC

. Cadence CAD tools: spectremdl

. common unix utilities: sed, grep, cat, ...

. Matlab

. perl

16 NDK: Tool Dependencies

• External resources (free download) . Compact model

. E.g., virtual source CNFET model

. Process Design Kit (PDK)

. E.g., NanGate 15 nm library

. Register-Transfer Level (RTL) hardware description

. E.g., OpenSparc T2 processor core

17 Installing the NDK

1. untar ndk_v2016-12-13.tar

2. Set environment variables

. $SVNROOT, $NDK, $DATAROOT, $XT

3. Download external resources

. Compact model, PDK, RTL

4. Run installation scripts

. bash scripts inside $NDK/install

18 NDK User Guide directory structure within $NDK Case Study: CNFET Processor Core OpenSparc T2 SoC core1 core2 core3 core4

thermal core5 core6 core7 core8

www.opensparc.net

+ CNFET compact model

thermal

thermal 20 Carbon Nanotube FET (CNFET)

carbon nanotube (CNT) d~1nm

sub-lithographic CNT pitch gate oxide

21 NDK: High-Level Overview experimental data+ compact models + variations 3 data metallic model m) semiconducting μ

(mA/

D I

0 CNTs

0 VDS (V) 0.4 physical layouts + full system + design targets Delay Noise immunity Energy Yield NanGate 15 nm Library OpenSparc T2 SoC + wire parasitics + wire parasitics … 22 Variation-Aware NDK: 2 Steps

Step 1) Library characterization

. Parasitic extraction

. SPICE analysis using compact models

Step 2) VLSI circuit EDP optimization

a) Synthesis, place & route, power/timing

b) Rapidly quantify variations

23 Variation-Aware NDK: 2 Steps

Step 1) Library characterization

. Parasitic extraction

. SPICE analysis using compact models

Step 2) VLSI circuit EDP optimization

a) Synthesis, place & route, power/timing

b) Rapidly quantify variations

24 Step 1) Library Characterization

Required Inputs compact model

module vscnfet_1_0_1(D,G,S);

25 NDK User Guide open Graphical User Interface (GUI)

1. Open SystemVariations GUI (run ‘SystemVariations’ from Matlab terminal)

This is the “StepStruct”, it is a comma separated variable (.csv) file you can edit in libreoffice (or excel) NDK User Guide load configuration “StepStruct”

Browse for example StepStruct file, which contains technology information such as gate length, contact length, gate oxide thickness, gate oxide dielectric constant, etc.

The example one is for carbon nanotube field-effect transistors (CNFET), browse for: ‘$SVNROOT/cnfet_modeling/SystemVariations/SPICE_deck _gen/cnfet_macro/n07_cnt_pex_top_end_lg09_lc09_lx12_s 04_d17_r03_k10/VDD500mV/lvt/pex/BaseParameters.csv’, it will load all the fields into the GUI NDK User Guide run configuration “StepStruct” a) You can click “Run SPICE StepStruct” to run the circuit simulation (using Cadence spectremdl) associated with this StepStruct. The SystemVariations GUI generates a spectre , runs it, and then loads & plots the output. This particular file will show the current- voltage (I-V) characteristics, the capacitance-voltage (C-V) characteristics, and some other key device-level parameters for CNFET

b) Run single step

c) It will take ~1 minute to run the simulation, you can see the spectre output in the Matlab command window NDK User Guide example FET characteristics: I-V, C-V

Here are example I-V, C-V, and CNFET parameters

ID vs. VDS ID vs. VGS

ION & IEFF vs. VDD

CGS vs. VGS CGS vs. VDD

device parasitics: resistance & capacitance NDK User Guide other compact models

wrapper model to hook into NDK

compact model instantiation Step 1) Library Characterization

Required Inputs compact model + physical layouts

AOI222_X1 module vscnfet_1_0_1(D,G,S); module vscnfet_1_0_1(D,G,S); NanGate 15nm Library

31 NDK User Guide view in virtuoso

Standard cell layout for AOI21_X1 in NanGate 15 nm library NDK User Guide .macro.lef file

This is the .macro.lef file, it contains information on the locations of the wires in each standard cell, as well as standard cell area and pin locations NDK User Guide .tech.lef file

You can open the input .lef files in a text editor: N7_3X2Y2Z_P42_mint.tech.lef and N7_3X2Y2Z_P42_mint.macro.lef

This is the .tech.lef file, it contains information on the via pitch, width, spacing, height, as well as the metal layers (width, pitch, min spacing, etc.) NDK User Guide .itf file

You can open the input .itf file from the text editor, it contains information about the resistance and capacitance of the wires and vias on each layer, as well as the inter-layer dielectrics (dielectric constant and spacing)

example .itf specification NDK User Guide extracted netlist

Example: AOI21_X1 (NanGate 15 nm OCL)

extracted parasitics (R & C) from standard cell layouts

FET instances (instantiating NDK wrapper) Step 1) Library Characterization

Required Inputs compact model + physical layouts + variations metallic semiconducting

CNTs AOI222_X1 module vscnfet_1_0_1(D,G,S); NanGate 15nm Library

37 NDK User Guide CNT variations parameters • Parameterize measured CNT spacing variations: 2 σspacing –––––––2 = 0.5 μspacing

0.1 CNTs 0.08

0.06

0.04 probability 0.02

0 0 0.2 0.4 0.6 0.8 1 inter-CNT spacing: s (µm) NDK User Guide variations: other compact models

switch statement in Matlab based on compact model

function call + processing: compact-model dependent

(instructions on how to add new functions for new variations based on new compact models) Step 1) Library Characterization compact model physical layouts variations metallic semiconducting

AOI222_X1 CNTs

characterize timing/power libraries automatic SPICE deck generation & analysis

40 NDK User Guide generate Power/Timing configuration “StepStruct” a) Click “Generate NOMINAL Leakage/Cin/Timing StepStructs”, this will create a spectre simulation for many different standard library cells to characterize leakage current, input capacitance, timing, and power for the standard cell library

b) For ‘Choose SPICE StepStruct file for cell type: comb’, select ‘template_LeakageCinTiming_Ioff_retarget.mdl’ NDK User Guide select variations parameters

Choose the parameters for ‘Input for cnfet_macro’: NOMINAL: Number of Monte Carlo trials = 1, IDC = 0, pm = 0, pRs = 0, pRm = 1

VARIATIONS: trials = 100, IDC = 0.5, pm = 0.10, pRs = 1%, pRm = 99.99% NDK User Guide select load capacitance & input slew vectors

Choose the default parameters for ‘fan-out and input slew for cell type: comb’, these will set the output load capacitance and input slew rate to characterize the timing/power in the library characterization file for combinational cells (default: 7x7 table) NDK User Guide run example timing/power script a) As a demonstration, you can run one step of the StepStruct to see the timing/power numbers for one of the cells (in this case, the first library cell is ‘AOI21D0’, which is a 2:1 And-Or-Invert with drive strength 0 (minimum width transistors)

b) Run single step NDK User Guide example timing/power output

These are example timing and energy plots for the AOI21D0 library cell, along with leakage power and input capacitance information;

Output slew vs. CLOAD Delay vs. CLOAD

Increasing input slew Increasing input slew

Energy vs. input slew

Increasing C LOAD Other key parameters NDK User Guide example timing/power output waveform

It will also display a trace from one spectre simulation for one combination of input slew rate & output load, note that time is in log scale (to easily display the full simulation)

NDK User Guide generate SPICE files & perl execution script a) Click ’Generate SPICE Files & Execution script’: this will create many scripts to run all the necessary spectre files for the full library characterization (230 scripts in this case). It will also generate a perl file you can run to execute all the spectre commands

c) Choose how many separate perl scripts you want to generate; if you choose more than 1 (e.g., 4), then you can run them in parallel so they complete sooner (for now, you can just choose 1) b) This will take about 10 minutes to generate all the files, you can watch the status until it gets to “Running script 230 of 230” (starting from “1 of 230”) NDK User Guide run perl script

Here is the terminal that was opened, you can see directories for each of the standard library cells, as well as the perl script generated to run all the spectre commands. If you like, you can traverse into those directories to see the spectre files that were generated. To run all the commands, execute ‘perl run_spectremdl_files_001- 230_of_230.pl’ (if you had generated multiple scripts, then you could execute them in parallel). The full script takes about 1 hour to run NDK User Guide load SPICE output

Many plots will be displayed showing timing/power for each input/output pin combination for each library cell. Once all the data is loaded & verified (about 5 mintues), it will plot a summary figure (shown here) for all the library cells Leakage Input delay energy power capacitance NDK User Guide generate .lib file

Now the library generation is complete! To see the library, you can navigate (e.g., using the terminal) to $NDK/lib/all

You can see the files you just created: n07_cnt_pex_top_end_lg09_lc09_lx12_s04_d17_r03_k10.lvt.tt.vdd.500.T25_nldm.lib n07_cnt_pex_top_end_lg09_lc09_lx12_s04_d17_r03_k10.lvt.tt.vdd.500.T25_nldm.db NDK User Guide example .lib file

Further down in the file, you can see the entries for each library cell, for example, for the inverter (INVD1: drive strength 1), you can see the timing and energy tables that were plotted in Matlab while the file was being generated)

Index1: CLOAD Index2: Rise delay table tINSLEW

Rise output slew table

Fall delay table

Fall output slew table

Rise power table (power multiplied by time unit) Step 1) Library Characterization compact model physical layouts variations metallic semiconducting

AOI222_X1 CNTs

characterize timing/power libraries automatic SPICE deck generation & analysis

variation-aware libraries:

delay, noise, energy, yield 52 NDK User Guide example variation-aware power/timing library

delay distribution for delay & load capacitance

example variation-aware timing/power library Variation-Aware NDK: 2 Steps

Step 1) Library characterization

. Parasitic extraction

. SPICE analysis using compact models

Step 2) VLSI circuit optimization

a) Synthesis, place & route, power/timing

b) Rapidly quantify variations

54 Step 2) VLSI Circuit Optimization

• Oracle OpenSparc T2 SoC . Synthesis + Place & Route

. Parasitics: standard cells + interconnects

FETs: 27 M core1 core2 core3 core4 Area: 0.27 mm2

core5 core6 core7 core8 Total wire length: 9.5 meters www.opensparc.net

55 Step 2) VLSI Circuit Optimization

Required Inputs variation models

variation-aware timing/power library

56 Step 2) VLSI Circuit Optimization

Required Inputs variation models + full system

variation-aware OpenSparc T2 SoC timing/power library + wire parasitics

57 NDK User Guide OpenSparc T2 modules location

OpenSparc T2: “un-core” modules

OpenSparc T2 core modules Step 2) VLSI Circuit Optimization

Required Inputs variation models + full system + design targets Delay Noise immunity Energy Yield variation-aware OpenSparc T2 SoC ... timing/power library + wire parasitics

59 Step 2) VLSI Circuit Optimization variation models full system design targets

Delay Noise immunity Energy Yield ...

optimize system energy & delay

60 NDK User Guide open ‘ndk’ GUI

All the necessary files should be correctly installed for physical design, and we will use the ARM M0 as an example. This is a 32 bit stand-alone processor which contains about 5,000 logic gates, so it is relatively quick to design, and is a full stand-alone processor

To open the GUI for the physical design tools, run ‘ndk’ from the Matlab command window, you should see the following GUI pop up NDK User Guide input circuit module

a) Select ‘RES_module_vec’ then press ‘return’ to edit the module

b) Enter ‘CORTEXM0DS’ to select the M0 processor (”m zero”)

Note some of the other fields: ‘RES_target_syn_frequency_GHz_vec’ specifies the target frequency for the physical design tools to meet all the timing constraints (default: 1 GHz), and ‘RES_target_pt_frequency_GHz_vec’ specifies the operating frequency of the chip at which the power is measured (e.g., the chip can be designed to run at 1 GHz, but then power can reduced by running the chip at 0.1 GHz) NDK User Guide generate scripts: synthesis, place & route, power/timing

a) Click “generate synthesis, place & route, power/timing”, this will generate scripts to run the Synopsys physical design tools, similar to how the SystemVariations GUI generated the perl scripts to run all the spectre simulations

b) Select the library we just created through the SystemVariations GUI (this looks for all the .db files generated in $NDK/lib/all): n07_cnt_pex_top_end_lg09_lc09_lx12_s04_d17_r03_k10.db NDK User Guide run scripts: synthesis, place & route, power/timing

Here is the terminal opened by the ndk GUI, run all the scripts for synthesis (syn), place & route (pnr), and power/timing (pt) by running: ‘bash run_syn_pnr_pt_’ Where is the time the script was generated, in the format: ______

The bash script calls the perl script inside the directory created for the specific library (or multiple perl scripts if multiple libraries were selected), which calls the scripts for: synthesis (ndk_syn_wrapper) place & route (ndk_pnr_wrapper) power/timing (ndk_pt_wrapper) Each of these scripts can be called individually, e.g., during development, or debugging NDK User Guide load physical design data

Now that the simulations are complete, load them into the NDK GUI to visualize statistics gathered from the physical design. a) Click ‘Load data: synthesis, place & route, power/timing’

b) There is an option to ‘load previously saved data’, since it can take a few minutes to load all the physical design data over many different operating frequencies. In this case, we have not loaded any data before, so it doesn’t matter if you click ‘yes’ or ‘no’. In general you should click ‘no’ each time after you re-run synthesis, place & route, power/timing (so that the newly generated data is updated), otherwise click ’yes’ (e.g., if you’ve loaded the data before and want to compare it to physical design with a different technology) NDK User Guide example energy & frequency

Congratulations! You should now see the physical design statistics for the ARM M0 processor you just designed, using the carbon nanotube standard cell library. The single point in the graph shows the total energy consumption per cycle while operating at a clock frequency of 1 GHz. The table on the right shows many of the parameters extracted from the physical design, including the breakdown of power into dynamic/leakage energy components, the total chip power density, the number of logic gates, the average wire length and resistance/capacitance, etc.

Physical design statistics Total energy consumption per cycle: 0.577 pJ for 1 GHz clock frequency NDK User Guide generate multiple libraries (previous steps)

Now you will use each of the generated BaseParameters.csv files to generate its own liberty file. In particular, you will generate the additional library files: n07_cnt_pex_top_end_lg09_lc09_lx12_s04_d17_r03_k10.lvt.tt.vdd0.375.T25_nldm.db n07_cnt_pex_top_end_lg09_lc09_lx12_s04_d17_r03_k10.lvt.tt.vdd0.400.T25_nldm.db n07_cnt_pex_top_end_lg09_lc09_lx12_s04_d17_r03_k10.lvt.tt.vdd0.425.T25_nldm.db n07_cnt_pex_top_end_lg09_lc09_lx12_s04_d17_r03_k10.lvt.tt.vdd0.450.T25_nldm.db n07_cnt_pex_top_end_lg09_lc09_lx12_s04_d17_r03_k10.lvt.tt.vdd0.475.T25_nldm.db

Repeat the steps on slides 4-28 for each BaseParameters.csv file generated, i.e., once to generate each .db file. As a reminder, these steps involve: a) Loading the BaseParameters.csv file b) Generating the LeakageCinTiming_subckt-ALL.csv StepStruct c) Generating the perl script to run all the circuit simulations (using spectremdl) d) Running all the perl scripts: NOTE these can be run in parallel (they take about 1 hour each) e) Building the Leakage/Cin/Timing models using the SystemVaraitions GUI f) Writing the .lib files using the SystemVariations GUI g) Converting the .lib files to .db files NDK User Guide input multiple frequencies simultaneously

Now that all the library database .db files have been generated, we will generate the scripts for synthesis, place & route, and power/timing analysis using these libraries, using the ndk GUI

a) Select ‘RES_target_frequency_syn_GHz_vec’ and press ‘return’ then enter ‘1:1:10’ (Matlab syntax for the vector [1,2,3,4,5,6,7,8,9,10]. This will sweep the target frequency from 1 GHz to 10 GHz during synthesis (in 1 GHz increments) b) Select ’RES_target_frequency_pt_GHz_vec’ and press ‘return’ then enter ’1:0.1:10’. This will sweep the operating frequency (used during power/timing analysis) from 1 GHz to 10 GHz in 100 MHz increments (for a total of 91 different power/timing frequencies) NDK User Guide run & load output data

Loading all the data will take about 5-10 minutes (you can watch the output in the Matlab command terminal to see the status)

Once its loaded you will see all the different combinations of operating chip frequency and energy consumption for each different physical design. Note that, these points are from many different physical designs of the ARM M0 processor designed at different supply voltages, to meet timing different target frequencies, and operating at different frequencies (only designs that meet timing parameters are shown)

Only the design with minimum energy- delay-product (EDP) is shown in the table (marked with a gray circle in the plot)

The points represent all the different designs, and the solid curve represents the Pareto-optimal trade-off curve (a design is Pareto-optimal if no other design can operate at a higher frequency while simultaneously consuming less energy NDK User Guide view Pareto-optimal curve

Or you can select ’Convex hull’ from the ‘Display options’ panel to only show designs that are on the boundary of the convex hull of Pareto-optimal designs (in a log-frequency, log-energy space). This is primarily for aesthetic purposes to “smooth” the curve, highlighting the typical ”convexity” of energy vs. delay trade-off curves Step 2) VLSI Circuit Optimization variation models full system design targets

Delay Noise immunity Energy Yield ...

optimize system energy & delay

system performance distributions: delay, noise, energy, yield

71 NDK User Guide select design for variations analysis script

Select ‘fd_pt’ and then press ‘s’, this generates the scripts for Monte Carlo timing trials NDK User Guide run variations analysis script

Run the ndk_pt_timing_paths.tcl script generated in the power/timing directory NDK User Guide load distribution of circuit critical path delay

TNOMINAL delay penalty 95%

probability cumulative 0% 0.9X 1.0X 1.1X 1.2X 1.3X relative delay (vs. TNOMINAL) OpenSparc T2 Processor Core

0.5 FinFET CNFET + variations

CNFET energy/cycle (nJ)

preferred corner 0.05 0.2 1 5 clock frequency (GHz) 7 nm node 75 OpenSparc T2 Processor Core

0.5 FinFET CNFET + variations:

co-optimized process & design

+5% delay +5% energy CNFET energy/cycle (nJ)

preferred corner 0.05 0.2 1 5 clock frequency (GHz) 7 nm node 76 NDK: CNFET Processor Core OpenSparc T2 SoC core1 core2 core3 core4

thermal core5 core6 core7 core8

www.opensparc.net

+ CNFET compact model

thermal

thermal 77 NDK: 2-D FET Memory Interface Memory Interface Circuitry

thermal

+ 2-D FET compact model

thermal

thermal 78 NDK: CNFET Processor Core Design 3-D Resistive RAM (RRAM)

thermal

+ RRAM compact model

thermal

thermal 79 Conclusion

• Our Nanosystem Design Kit: nanohub.org . Primary focus: CNFETs

. Future compact models supported

• New opportunities enabled . Technology benefits for realistic systems

. Physical design of 3-D nanosystems

. ...

80