EDA Challenges for Low Power Design

Anand Iyer, Cadence Design Systems Agenda

• ItIntrod ucti on • LP techniques in detail • Challenges to low power techniques • Guidelines for choosinggq various techniques Why is Power an Issue?

Leakage Power

Active Performance = Power mW/MHz180 130 90 65 Source: Intel, 2004 Technology PhPower(nm) hungry process Source: EETimes, 2004 Complex System Sluggish Battery Life Improvement 2000 2001 2002 2003 2004 Approaches To

• System Architecture (multi-core) Leakage Active • Software/Hardware power management system – ARM IEM Design and System Level • Voltage scaling / frequency scaling Optimizat io n • Multiple voltage islands •We Clock gating,will logic discuss structuring this •Multi-Vth cell selection to reduce leakage Implementation • Support for multiin voltagedetail islands (aka “multi-vdd” aka “MSV”) implementation • Signoff accurate analysis

•SOI Process Level Optimization • High-K, Gate Stack, power gating, etc. • LLD Controlling Power in Implementation

Dynamic power Leakage power 2 (≈ k•Ck • C L •V• V DD •f• f CLK) (≈ VDD • Ileakage)

• Clock gating (including de-clone) • Multi-Vt cell optimization • Area optimization • Substrate biasing (VT CMOS) • Static voltage scaling (MSV)* • Power shut-off (PSO) – aka • Dynamic voltage frequency scaling MTCMOS - including State retention (DVFS)* • Adaptive voltage scaling (AVS)* – Fine grain control – Coarse grain control

* Techniques that affect both dynamic and leakage power Techniques and Trade-offs Power reduction Leakage Dynamic Timing Area Methodology technique power power penalty penalty impact Dyyppnamic power optimization 10% 10% 0% -10% None Multi-Vt optimization 6X 0% 0% 0% Low

Clock gating 0% 20% 0% <2% Low

Voltage Islands 2X 40-50% 0% <10% Medium

Power shut-off (PSO) 10-50X 0% 4-8% 5-15% Medium-high

Dynamic and Adaptive Voltage Frequency Scaling 2-3X 40-70% 0% <10% High (DVFS and AVS)

Substrate Biasing 10X - 10% <10% High

Source – Customer interviews, Conference papers (ISSCC), magazine articles LP Techniques in Detail Dynamic Power Optimization (No V)

Pin swapping: low C with high F

Gate sizing: CMOS power usage related to size

Buffer removal: remove unnecessary buffers

it2inst_2 MVT Optimization

Low Vt

Implementation Norm Vt

High Vt Clock Gating

• Relies on clock gate control signal in RTL or netlist

RTL always @(posedge clk) if (en) Control out <= in; signal Block A Maps to either: 1. User defined gating clk module 2.Clock-gating-integrated Block B cell from library 3.Gating function built clk from standard logic Designing with Voltage Islands

1.0 V Power Domain clamps 12VPowerDomain1.2 V Power Domain

Low V High V t Normal Vt t Memory (High S peed) (low l eak age, lower Speed) Voltage Level Shifter

clamps Power Domain 3 (0.8V) 1.2V Domain Voltage Level Shifter Power -Off (PSO) Methodologies

Fine Grain Power Coarse Grain Power Switches

VDD VDD Real VSS Switch A Z SLEEP Virtual Vss Virtual VSS (No Pin) VDD SLEEP Standard Real VSS A Z Cell

Virtual Vss

Real Vdd SLEEP RlReal VSS Vdd SLEEP VSS

Standard Cells (power switch Standard Cells Switch Built-in) Module

Logical Representation Logical Representation (No change except for SLEEP) (Logic needs to be power aware!) Coarse Grain PSO Methodologies

Always On Always On (Default Domain) (Default Domain)

Always Always On/Off On On Domain Domain Domain

Global Vdd Global Vdd Switched VDD On/Off Separated Area VDD GND Domain Common GND Power Power Switching cell Switching Clus ter Sw itc hes cell StdSithSegmented Switches Dynamic Voltage Frequency Scaling

• Hardware that scales supply voltage and clock frequency in response to software demands

– 16 levels of VDD (use 5 to 7 in practice) from Power Energy 1.1V to 0.6V Energy Characteristics of a – Clock frequency from 200MHz to 700MHz in incremen ts o f 33MH z • Triggered when load change (detected by CPU software, or HW) – (load means ergy/Power

number of functions to be executed) nn E – Heavier load → ramp up supply voltage, when stable, then scale up clock frequency – Lighter load → scale down clock freqqyuency, when PLL locks onto new rate, ramp down supply voltage 300 Mhz, 433 Mhz, 533 Mhz, 667 Mhz, 800 Mhz, 900 Mhz, 1000 Mhz, 0.80 V 0.875V 0.95 V 1.05V 1.15V 1.25V 1.3V • Must keeps clock frequency within limits Operating Points required by supply voltage to avoid clock Source – Magazine article skew problems, timing violation. – Worst-case scenario of a full swing from 0.6 V to 1.1V and from 200MHz to 700MHz could t ak e ab out 280 m icrosecon ds. Dynamic Voltage Frequency Scaling

Mode Core Sleep Slow

SLEEP SLOW Baseline 1.08V 1.08V 1.08V 125MHz 125 MHz 125 MHz

Slow 1.08V 1.08V 0.9V CORE 125MHz 125MHz 66MHz

Standby 0.0V 1.08V 0.0V 125MHz

•Multipp()le constraints (.sdc) • Libraries • Multiple modes need to be analyzed/optimized for multiple – Example: baseline.sdc, – stdcell_1.08sl.lib, corners ios.sdc, slow.sdc, sleep.sdc stdcell_0.9sl.lib, – Setup analysis for (WC, 1,125C) stdcell_1.08fs.lib, corner stdcell_ 0. 9fs. lib Adaptive Voltage Scaling

Operating Voltage

PM PM CPU/SOC Power PM Management Unit PM Performance parameters

Closed loop control Substrate Bias Control

0.9 -2.5, 0.84 0.8 -2, 0.775 Vdd 0.7 -1.5, 0.7 -1, 0.625 Vbp 0.6 -0.5, 0.54 Vbn Vth (V) 0.5 Vss 0.4 0, 0.45 0.3

0.2 -252.5 -2 -151.5 -1 -050.5 0 Vsb (V) • For an n-channel device, the substrate is normally tied to ground (Vsb = 0) • A negative bias on Vsb causes Vth to increase • Substrate biasing can be done during packaging (VTCMOS) or during operation (ABB) Challenggges to implementing LP Techniques Dyyp()namic Power Optimization (No V)

• Toggle reduction – Efficient synthesis • CitdtiCapacitance reduction – Placement – Physical synthesis • Toggle based Capacitance reduction – Pin swapping – Area compaction – Wire length minimization (high-toggle, fanout) • Useful skew MVT Optimization

• Library c harac ter iza tion – Identical footprint – Footprint independent • Implementation

– Efficiently replacing lower Vt cells with higher Vt cells • Analysis – How/When to measure leakage power? – Signal Integrity Analysis – Lowest leakage state Clock Gating

latch_posedge_precontrol_obslatch_posedge_precontrollatch_posedge • Identif y ing ga ting con ditions test enable • Testability requirements ck_in ck_out • Physical effects of clock obs gating • Timing effects of clock Observability Logic gggating …… SISO

SE . .

Specify max # observable per observability flop (default=36) Low Power Clock Tree Synthesis – De-Cloning

CLK CLK

Congestion! Skew! Dynamic power De-cloning

CGEnable Clock Gates

CGEnable Clock Gates

Flip flops Flip flops Voltage Islands

• Which logic modules are suitable for voltage scaling? • What should be the scaled voltage value for these blocks? • Library characterization – Multiple voltages/ multiple conditions – Additional components – Voltage level shifters • Implementation – Phyypsical shape of the volta ge islands – Level shifter insertion in the netlist – Placement of level shifters – Routing to a level shifter – Power connection of a level shifter • Analysis – Timing analysis of islands – Optimization including level shifters – Sigggyynal integrity analysis – IR drop and how it affects timing Power Switch-Off

• Library Characterization – Additional parameters – leakage power, max. current through the cells (Id), max. voltage drop – Additional cells – Switches, isolation cells, state retention cells • Implementation – Logic level Switch insertion/simulation/verification – Switch placement schemes – Ring/Column/Distributed – Switch enable distribution – high fan out net – Power planning/routing – Fine grain, coarse grain – SRPG control signals • Analysis – Transient analysis – On/Off analysis – Functional verification – Sneak ppyath analysis DVFS/AVS

• Library c harac ter iza tion – Advanced modeling (ECSM, CCS) • Implementation – Clock synchronization – Use of level shifters in the clock design • Analysis – Multi-mode/multi-corner analysis/optimization – Functional verification (huge for AVS) Substrate Bias

• Timing Analysis – Characterization for VTCMOS – Custom analysis for ABB • Optimization – Must be aware of body bias • WllWell separa tion – Between the regions that are subjected to control and that are not • Planning/routing additional power signals – Congestion – EM – Cell design – Functional Verification/validation Variability and Low Power

Test Chip Timing Path Slack Distribution, -100ps -> +200ps

16% notime 14% timed 12% MVT MSV 10% ths 8% 6% % of pa % of 4% 2% 0% 0 20 40 60 80 -80 -60 -40 -20 100 120 140 160 180 200 -100 ps Functional Checks Need to be Done @ Transistor Level

VDD VDD

PwrEn1 PwrEn2

V2 V1 Vs Vc Power PwrEn1 Control PwrEn2 A B A A Vs Vc Y FSM ISO Y ISO Iso ISO

Level Shifting Isolation Cell in Source Domain, which will be shut off State Retention Register Checks

PwrEn1 SW A

PwrEn1 ON/OFF VDD VRET Q Power RTCLK Controller D RET SRPG V1 RET V2 RTCLK

RTCLK Don’t care

D Don’t care

Q X Sleep Wake RET

• Struct ural Check – Checks that RET signal comes from an Always ON power domain; VRET tied to continuous Power – Checks that VDD and D pins connect to the same power domain • Functional Checks assert (RETL || RETK) Æ (RTCLK off) Sneak Path Detection

Floati ng nod e wh en X is switched-off can cause additional ENB VDD leakage

Block X A Y

EN VSS

Common in mux logic Guidelines for LP Technique Selection How To Choose Between Various LP Techniques

• Understand the application/technology need for power reduction • Choose the techniques based on the power reduction requitdtiirement and not vice versa. • Understand the trade-offs – esp. methodology implications High-level Guidelines for Power Reduction in Design

• Power i s a per formance paramet er si iiltmilar to area an dtiid timing – Optimize and analyze timing, power and area concurrently • Choose the LP techniques early in the implementation – Helps to get max. power reduction – Architecture/process selection must be driven by power need • Use of voltagggqe scaling techniques leads to quadratic reduction in power e.g. MSV, DVFS • When not in use, shut it off! • Verify, verify, verify! Steps for Successful LP Design Tapeout!

• LP implementation is complex and requires more time (2X) than normal. Plan ahead! • Library characterization can time consuming as new cells need to be desigggned and the existing cells characterized under new conditions. • Choose a comprehensive implementation tool to address not onlfthibtltdly a range of techniques, but also trade-offs b et ween power, area and timing. • LP techniques force you to change the existing methodology adding new tools and steps. In order to be successful, consider partnering with a EDA vendor (Cadence!) • Verification is key to successful implementation. Make sure the verification tool can understand low power techniques. Backup