Nanosystem Design Kit (NDK): Transforming Emerging Technologies into Physical Designs of VLSI Systems
G. Hills, M. Shulaker, C.-S. Lee, H.-S. P. Wong, S. Mitra
Stanford Massachusetts University Institute of Technology Abundant-Data Explosion
“Swimming in sensors, drowning in data”
Wide variety & complexity Unstructured data
0 40K 0 ExaB (Billionsof GB)
2006 Year 2020
Mine, search, analyze data in near real-time
Data centers, mobile phones, robots 2 Abundant-Data Applications Huge memory wall: processors, accelerators
Energy Measurements Genomics classification Natural language processing
5% 18 % 0% 0% … 95 82 % %
Compute Memory
Intel performance counter monitors 2 CPUs, 8-cores/CPU + 128GB DRAM 3 US National Academy of Sciences (2011) 4 Computing Today 2-Dimensional
5 3-Dimensional Nanosystems Computation immersed in memory
6 3-Dimensional Nanosystems Computation immersed in memory
Increased functionality
Fine-grained, Memory ultra-dense 3D
Computing logic
Impossible with today’s technologies 7 Enabling Technologies 3D Resistive RAM
Massive storage
No TSV 1D CNFET, 2D FET
Compute, RAM access thermal
STT MRAM Ultra-dense, Quick access fine-grained 1D CNFET, 2D FET vias Compute, RAM access thermal
1D CNFET, 2D FET Silicon Compute, Power, Clock compatible thermal 8 Nanosystems: Compact Models Essential 3D Resistive RAM
nanohub Massive storage
1D CNFET, 2D FET Compute, RAM access thermal
STT MRAM Quick access m-Cell 1D CNFET, 2D FET Compute, RAM access thermal
1D CNFET, 2D FET Compute, Power, Clock
thermal 9 Compact Models: Insufficient Alone Design for Realistic Systems Wire parasitics Inter-module interface circuits Routing congestion Application-dependent workloads Multiple clock domains Cache architecture Memory access patterns Processor vs. memory … Sleep mode & sleep transistors High performance vs. low power SRAM retention Dynamic power vs. leakage power DRAM refresh Placement utilization
10 Example: OpenSparc T2 Processor Core Design for Realistic Systems Wire parasitics Inter-module interface circuits Routing congestion Application-dependent workloads Multiple clock domains Cache architecture Memory access patterns Processor vs. memory … Sleep mode & sleep transistors High performance vs. low power SRAM retention Dynamic power vs. leakage power DRAM refresh Placement utilization
0.5 FinFET CNFET
energy/cycle (nJ) preferred corner 0.05 0.2 1 5
clock frequency (GHz) 11 Example: OpenSparc T2 Processor Core Design for Realistic Systems Wire parasitics Inter-module interface circuits Routing congestion Application-dependent workloads Multiple clock domains Cache architecture Memory access patterns Processor vs. memory Timing variations Sleep mode & sleep transistors High performance vs. low power Noise immunity SRAM retention Dynamic power vs. leakage power Energy variations DRAM refresh Placement utilization Functional yield …
0.5 FinFET CNFET + variations CNFET
energy/cycle (nJ) preferred corner 0.05 0.2 1 5
clock frequency (GHz) 12 Variation-Aware Nanosystem Design
CNFET + variations co-optimized processing & design
0.5 FinFET CNFET
energy/cycle (nJ) preferred corner 0.05 0.2 1 5
clock frequency (GHz) 13 Our Nanosystem Design Kit (NDK)
• Available: nanohub.org
14 Accessing the NDK
• Link . https://www.nanohub.org/groups/nanosystems
• File . ndk_v2016-12-13.tar
15 NDK: Tool Dependencies
• Tools . Synopsys CAD tools: lc_shell, dc_shell, icc_shell, Milkyway, StarRC
. Cadence CAD tools: spectremdl
. common unix utilities: sed, grep, cat, ...
. Matlab
. perl
16 NDK: Tool Dependencies
• External resources (free download) . Compact model
. E.g., virtual source CNFET model
. Process Design Kit (PDK)
. E.g., NanGate 15 nm standard cell library
. Register-Transfer Level (RTL) hardware description
. E.g., OpenSparc T2 processor core
17 Installing the NDK
1. untar ndk_v2016-12-13.tar
2. Set environment variables
. $SVNROOT, $NDK, $DATAROOT, $XT
3. Download external resources
. Compact model, PDK, RTL
4. Run installation scripts
. bash scripts inside $NDK/install
18 NDK User Guide directory structure within $NDK Case Study: CNFET Processor Core OpenSparc T2 SoC core1 core2 core3 core4
thermal core5 core6 core7 core8
www.opensparc.net
+ CNFET compact model
thermal
thermal 20 Carbon Nanotube FET (CNFET)
carbon nanotube (CNT) d~1nm
sub-lithographic CNT pitch gate oxide
21 NDK: High-Level Overview experimental data+ compact models + variations 3 data metallic model m) semiconducting μ
(mA/
D I
0 CNTs
0 VDS (V) 0.4 physical layouts + full system + design targets Delay Noise immunity Energy Yield NanGate 15 nm Library OpenSparc T2 SoC + wire parasitics + wire parasitics … 22 Variation-Aware NDK: 2 Steps
Step 1) Library characterization
. Parasitic extraction
. SPICE analysis using compact models
Step 2) VLSI circuit EDP optimization
a) Synthesis, place & route, power/timing
b) Rapidly quantify variations
23 Variation-Aware NDK: 2 Steps
Step 1) Library characterization
. Parasitic extraction
. SPICE analysis using compact models
Step 2) VLSI circuit EDP optimization
a) Synthesis, place & route, power/timing
b) Rapidly quantify variations
24 Step 1) Library Characterization
Required Inputs compact model
module vscnfet_1_0_1(D,G,S);
25 NDK User Guide open Graphical User Interface (GUI)
1. Open SystemVariations GUI (run ‘SystemVariations’ from Matlab terminal)
This is the “StepStruct”, it is a comma separated variable (.csv) file you can edit in libreoffice (or excel) NDK User Guide load configuration “StepStruct”
Browse for example StepStruct file, which contains technology information such as gate length, contact length, gate oxide thickness, gate oxide dielectric constant, etc.
The example one is for carbon nanotube field-effect transistors (CNFET), browse for: ‘$SVNROOT/cnfet_modeling/SystemVariations/SPICE_deck _gen/cnfet_macro/n07_cnt_pex_top_end_lg09_lc09_lx12_s 04_d17_r03_k10/VDD500mV/lvt/pex/BaseParameters.csv’, it will load all the fields into the GUI NDK User Guide run configuration “StepStruct” a) You can click “Run SPICE StepStruct” to run the circuit simulation (using Cadence spectremdl) associated with this StepStruct. The SystemVariations GUI generates a spectre netlist, runs it, and then loads & plots the output. This particular file will show the current- voltage (I-V) characteristics, the capacitance-voltage (C-V) characteristics, and some other key device-level parameters for CNFET
b) Run single step
c) It will take ~1 minute to run the simulation, you can see the spectre output in the Matlab command window NDK User Guide example FET characteristics: I-V, C-V
Here are example I-V, C-V, and CNFET parameters
ID vs. VDS ID vs. VGS
ION & IEFF vs. VDD
CGS vs. VGS CGS vs. VDD
device parasitics: resistance & capacitance NDK User Guide other compact models
wrapper model to hook into NDK
compact model instantiation Step 1) Library Characterization
Required Inputs compact model + physical layouts
AOI222_X1 module vscnfet_1_0_1(D,G,S); module vscnfet_1_0_1(D,G,S); NanGate 15nm Library
31 NDK User Guide view in virtuoso
Standard cell layout for AOI21_X1 in NanGate 15 nm library NDK User Guide .macro.lef file
This is the .macro.lef file, it contains information on the locations of the wires in each standard cell, as well as standard cell area and pin locations NDK User Guide .tech.lef file
You can open the input .lef files in a text editor: N7_3X2Y2Z_P42_mint.tech.lef and N7_3X2Y2Z_P42_mint.macro.lef
This is the .tech.lef file, it contains information on the via pitch, width, spacing, height, as well as the metal layers (width, pitch, min spacing, etc.) NDK User Guide .itf file
You can open the input .itf file from the text editor, it contains information about the resistance and capacitance of the wires and vias on each layer, as well as the inter-layer dielectrics (dielectric constant and spacing)
example .itf specification NDK User Guide extracted netlist
Example: AOI21_X1 (NanGate 15 nm OCL)
extracted parasitics (R & C) from standard cell layouts
FET instances (instantiating NDK wrapper) Step 1) Library Characterization
Required Inputs compact model + physical layouts + variations metallic semiconducting
CNTs AOI222_X1 module vscnfet_1_0_1(D,G,S); NanGate 15nm Library
37 NDK User Guide CNT variations parameters • Parameterize measured CNT spacing variations: 2 σspacing –––––––2 = 0.5 μspacing
0.1 CNTs 0.08
0.06
0.04 probability 0.02
0 0 0.2 0.4 0.6 0.8 1 inter-CNT spacing: s (µm) NDK User Guide variations: other compact models
switch statement in Matlab based on compact model
function call + processing: compact-model dependent
(instructions on how to add new functions for new variations based on new compact models) Step 1) Library Characterization compact model physical layouts variations metallic semiconducting
AOI222_X1 CNTs
characterize timing/power libraries automatic SPICE deck generation & analysis
40 NDK User Guide generate Power/Timing configuration “StepStruct” a) Click “Generate NOMINAL Leakage/Cin/Timing StepStructs”, this will create a spectre simulation for many different standard library cells to characterize leakage current, input capacitance, timing, and power for the standard cell library
b) For ‘Choose SPICE StepStruct file for cell type: comb’, select ‘template_LeakageCinTiming_Ioff_retarget.mdl’ NDK User Guide select variations parameters
Choose the parameters for ‘Input for cnfet_macro’: NOMINAL: Number of Monte Carlo trials = 1, IDC = 0, pm = 0, pRs = 0, pRm = 1
VARIATIONS: trials = 100, IDC = 0.5, pm = 0.10, pRs = 1%, pRm = 99.99% NDK User Guide select load capacitance & input slew vectors
Choose the default parameters for ‘fan-out and input slew for cell type: comb’, these will set the output load capacitance and input slew rate to characterize the timing/power in the library characterization file for combinational cells (default: 7x7 table) NDK User Guide run example timing/power script a) As a demonstration, you can run one step of the StepStruct to see the timing/power numbers for one of the cells (in this case, the first library cell is ‘AOI21D0’, which is a 2:1 And-Or-Invert logic gate with drive strength 0 (minimum width transistors)
b) Run single step NDK User Guide example timing/power output
These are example timing and energy plots for the AOI21D0 library cell, along with leakage power and input capacitance information;
Output slew vs. CLOAD Delay vs. CLOAD
Increasing input slew Increasing input slew
Energy vs. input slew
Increasing C LOAD Other key parameters NDK User Guide example timing/power output waveform
It will also display a trace from one spectre simulation for one combination of input slew rate & output load, note that time is in log scale (to easily display the full simulation)
NDK User Guide generate SPICE files & perl execution script a) Click ’Generate SPICE Files & Execution script’: this will create many scripts to run all the necessary spectre files for the full library characterization (230 scripts in this case). It will also generate a perl file you can run to execute all the spectre commands
c) Choose how many separate perl scripts you want to generate; if you choose more than 1 (e.g., 4), then you can run them in parallel so they complete sooner (for now, you can just choose 1) b) This will take about 10 minutes to generate all the files, you can watch the status until it gets to “Running script 230 of 230” (starting from “1 of 230”) NDK User Guide run perl script
Here is the terminal that was opened, you can see directories for each of the standard library cells, as well as the perl script generated to run all the spectre commands. If you like, you can traverse into those directories to see the spectre files that were generated. To run all the commands, execute ‘perl run_spectremdl_files_001- 230_of_230.pl’ (if you had generated multiple scripts, then you could execute them in parallel). The full script takes about 1 hour to run NDK User Guide load SPICE output
Many plots will be displayed showing timing/power for each input/output pin combination for each library cell. Once all the data is loaded & verified (about 5 mintues), it will plot a summary figure (shown here) for all the library cells Leakage Input delay energy power capacitance NDK User Guide generate .lib file
Now the library generation is complete! To see the library, you can navigate (e.g., using the terminal) to $NDK/lib/all
You can see the files you just created: n07_cnt_pex_top_end_lg09_lc09_lx12_s04_d17_r03_k10.lvt.tt.vdd.500.T25_nldm.lib n07_cnt_pex_top_end_lg09_lc09_lx12_s04_d17_r03_k10.lvt.tt.vdd.500.T25_nldm.db NDK User Guide example .lib file
Further down in the file, you can see the entries for each library cell, for example, for the inverter (INVD1: drive strength 1), you can see the timing and energy tables that were plotted in Matlab while the file was being generated)
Index1: CLOAD Index2: Rise delay table tINSLEW
Rise output slew table
Fall delay table
Fall output slew table
Rise power table (power multiplied by time unit) Step 1) Library Characterization compact model physical layouts variations metallic semiconducting
AOI222_X1 CNTs
characterize timing/power libraries automatic SPICE deck generation & analysis
variation-aware libraries:
delay, noise, energy, yield 52 NDK User Guide example variation-aware power/timing library
delay distribution for delay & load capacitance
example variation-aware timing/power library Variation-Aware NDK: 2 Steps
Step 1) Library characterization
. Parasitic extraction
. SPICE analysis using compact models
Step 2) VLSI circuit optimization
a) Synthesis, place & route, power/timing
b) Rapidly quantify variations
54 Step 2) VLSI Circuit Optimization
• Oracle OpenSparc T2 SoC . Synthesis + Place & Route
. Parasitics: standard cells + interconnects
FETs: 27 M core1 core2 core3 core4 Area: 0.27 mm2
core5 core6 core7 core8 Total wire length: 9.5 meters www.opensparc.net
55 Step 2) VLSI Circuit Optimization
Required Inputs variation models
variation-aware timing/power library
56 Step 2) VLSI Circuit Optimization
Required Inputs variation models + full system
variation-aware OpenSparc T2 SoC timing/power library + wire parasitics
57 NDK User Guide OpenSparc T2 modules location
OpenSparc T2: “un-core” modules
OpenSparc T2 core modules Step 2) VLSI Circuit Optimization
Required Inputs variation models + full system + design targets Delay Noise immunity Energy Yield variation-aware OpenSparc T2 SoC ... timing/power library + wire parasitics
59 Step 2) VLSI Circuit Optimization variation models full system design targets
Delay Noise immunity Energy Yield ...
optimize system energy & delay
60 NDK User Guide open ‘ndk’ GUI
All the necessary files should be correctly installed for physical design, and we will use the ARM M0 as an example. This is a 32 bit stand-alone processor which contains about 5,000 logic gates, so it is relatively quick to design, and is a full stand-alone processor
To open the GUI for the physical design tools, run ‘ndk’ from the Matlab command window, you should see the following GUI pop up NDK User Guide input circuit module
a) Select ‘RES_module_vec’ then press ‘return’ to edit the module
b) Enter ‘CORTEXM0DS’ to select the M0 processor (”m zero”)
Note some of the other fields: ‘RES_target_syn_frequency_GHz_vec’ specifies the target frequency for the physical design tools to meet all the timing constraints (default: 1 GHz), and ‘RES_target_pt_frequency_GHz_vec’ specifies the operating frequency of the chip at which the power is measured (e.g., the chip can be designed to run at 1 GHz, but then power can reduced by running the chip at 0.1 GHz) NDK User Guide generate scripts: synthesis, place & route, power/timing
a) Click “generate synthesis, place & route, power/timing”, this will generate scripts to run the Synopsys physical design tools, similar to how the SystemVariations GUI generated the perl scripts to run all the spectre simulations
b) Select the library we just created through the SystemVariations GUI (this looks for all the .db files generated in $NDK/lib/all): n07_cnt_pex_top_end_lg09_lc09_lx12_s04_d17_r03_k10.db NDK User Guide run scripts: synthesis, place & route, power/timing
Here is the terminal opened by the ndk GUI, run all the scripts for synthesis (syn), place & route (pnr), and power/timing (pt) by running: ‘bash run_syn_pnr_pt_
The bash script calls the perl script inside the directory created for the specific library (or multiple perl scripts if multiple libraries were selected), which calls the scripts for: synthesis (ndk_syn_wrapper) place & route (ndk_pnr_wrapper) power/timing (ndk_pt_wrapper) Each of these scripts can be called individually, e.g., during development, or debugging NDK User Guide load physical design data
Now that the simulations are complete, load them into the NDK GUI to visualize statistics gathered from the physical design. a) Click ‘Load data: synthesis, place & route, power/timing’
b) There is an option to ‘load previously saved data’, since it can take a few minutes to load all the physical design data over many different operating frequencies. In this case, we have not loaded any data before, so it doesn’t matter if you click ‘yes’ or ‘no’. In general you should click ‘no’ each time after you re-run synthesis, place & route, power/timing (so that the newly generated data is updated), otherwise click ’yes’ (e.g., if you’ve loaded the data before and want to compare it to physical design with a different technology) NDK User Guide example energy & frequency
Congratulations! You should now see the physical design statistics for the ARM M0 processor you just designed, using the carbon nanotube standard cell library. The single point in the graph shows the total energy consumption per cycle while operating at a clock frequency of 1 GHz. The table on the right shows many of the parameters extracted from the physical design, including the breakdown of power into dynamic/leakage energy components, the total chip power density, the number of logic gates, the average wire length and resistance/capacitance, etc.
Physical design statistics Total energy consumption per cycle: 0.577 pJ for 1 GHz clock frequency NDK User Guide generate multiple libraries (previous steps)
Now you will use each of the generated BaseParameters.csv files to generate its own liberty file. In particular, you will generate the additional library files: n07_cnt_pex_top_end_lg09_lc09_lx12_s04_d17_r03_k10.lvt.tt.vdd0.375.T25_nldm.db n07_cnt_pex_top_end_lg09_lc09_lx12_s04_d17_r03_k10.lvt.tt.vdd0.400.T25_nldm.db n07_cnt_pex_top_end_lg09_lc09_lx12_s04_d17_r03_k10.lvt.tt.vdd0.425.T25_nldm.db n07_cnt_pex_top_end_lg09_lc09_lx12_s04_d17_r03_k10.lvt.tt.vdd0.450.T25_nldm.db n07_cnt_pex_top_end_lg09_lc09_lx12_s04_d17_r03_k10.lvt.tt.vdd0.475.T25_nldm.db
Repeat the steps on slides 4-28 for each BaseParameters.csv file generated, i.e., once to generate each .db file. As a reminder, these steps involve: a) Loading the BaseParameters.csv file b) Generating the LeakageCinTiming_subckt-ALL.csv StepStruct c) Generating the perl script to run all the circuit simulations (using spectremdl) d) Running all the perl scripts: NOTE these can be run in parallel (they take about 1 hour each) e) Building the Leakage/Cin/Timing models using the SystemVaraitions GUI f) Writing the .lib files using the SystemVariations GUI g) Converting the .lib files to .db files NDK User Guide input multiple frequencies simultaneously
Now that all the library database .db files have been generated, we will generate the scripts for synthesis, place & route, and power/timing analysis using these libraries, using the ndk GUI
a) Select ‘RES_target_frequency_syn_GHz_vec’ and press ‘return’ then enter ‘1:1:10’ (Matlab syntax for the vector [1,2,3,4,5,6,7,8,9,10]. This will sweep the target frequency from 1 GHz to 10 GHz during synthesis (in 1 GHz increments) b) Select ’RES_target_frequency_pt_GHz_vec’ and press ‘return’ then enter ’1:0.1:10’. This will sweep the operating frequency (used during power/timing analysis) from 1 GHz to 10 GHz in 100 MHz increments (for a total of 91 different power/timing frequencies) NDK User Guide run & load output data
Loading all the data will take about 5-10 minutes (you can watch the output in the Matlab command terminal to see the status)
Once its loaded you will see all the different combinations of operating chip frequency and energy consumption for each different physical design. Note that, these points are from many different physical designs of the ARM M0 processor designed at different supply voltages, to meet timing different target frequencies, and operating at different frequencies (only designs that meet timing parameters are shown)
Only the design with minimum energy- delay-product (EDP) is shown in the table (marked with a gray circle in the plot)
The points represent all the different designs, and the solid curve represents the Pareto-optimal trade-off curve (a design is Pareto-optimal if no other design can operate at a higher frequency while simultaneously consuming less energy NDK User Guide view Pareto-optimal curve
Or you can select ’Convex hull’ from the ‘Display options’ panel to only show designs that are on the boundary of the convex hull of Pareto-optimal designs (in a log-frequency, log-energy space). This is primarily for aesthetic purposes to “smooth” the curve, highlighting the typical ”convexity” of energy vs. delay trade-off curves Step 2) VLSI Circuit Optimization variation models full system design targets
Delay Noise immunity Energy Yield ...
optimize system energy & delay
system performance distributions: delay, noise, energy, yield
71 NDK User Guide select design for variations analysis script
Select ‘fd_pt’ and then press ‘s’, this generates the scripts for Monte Carlo timing trials NDK User Guide run variations analysis script
Run the ndk_pt_timing_paths.tcl script generated in the power/timing directory NDK User Guide load distribution of circuit critical path delay
TNOMINAL delay penalty 95%
probability cumulative 0% 0.9X 1.0X 1.1X 1.2X 1.3X relative delay (vs. TNOMINAL) OpenSparc T2 Processor Core
0.5 FinFET CNFET + variations
CNFET energy/cycle (nJ)
preferred corner 0.05 0.2 1 5 clock frequency (GHz) 7 nm node 75 OpenSparc T2 Processor Core
0.5 FinFET CNFET + variations:
co-optimized process & design
+5% delay +5% energy CNFET energy/cycle (nJ)
preferred corner 0.05 0.2 1 5 clock frequency (GHz) 7 nm node 76 NDK: CNFET Processor Core OpenSparc T2 SoC core1 core2 core3 core4
thermal core5 core6 core7 core8
www.opensparc.net
+ CNFET compact model
thermal
thermal 77 NDK: 2-D FET Memory Interface Memory Interface Circuitry
thermal
+ 2-D FET compact model
thermal
thermal 78 NDK: CNFET Processor Core Design 3-D Resistive RAM (RRAM)
thermal
+ RRAM compact model
thermal
thermal 79 Conclusion
• Our Nanosystem Design Kit: nanohub.org . Primary focus: CNFETs
. Future compact models supported
• New opportunities enabled . Technology benefits for realistic systems
. Physical design of 3-D nanosystems
. ...
80