Clock Gating
Total Page:16
File Type:pdf, Size:1020Kb
EDA Challenges for Low Power Design Anand Iyer, Cadence Design Systems Agenda • ItIntrod ucti on • LP techniques in detail • Challenges to low power techniques • Guidelines for choosinggq various techniques Why is Power an Issue? Leakage Power Active Performance = Power 180 130 90 65 Source: Intel, 2004 mW/MHzProcess Technology (nm) PhPower hungry process Source: EETimes, 2004 Complex System Sluggish Battery Life Improvement 2000 2001 2002 2003 2004 Approaches To Power Management • System Architecture (multi-core) Leakage Active • Software/Hardware power management system – ARM IEM Design and System Level • Voltage scaling / frequency scaling Optimizat io n • Multiple voltage islands •We Clock gating, will logic discuss structuring this •Multi-Vth cell selection to reduce leakage Implementation • Support for multiin voltagedetail islands (aka “multi-vdd” aka “MSV”) implementation • Signoff accurate analysis •SOI Process Level Optimization • High-K, Gate Stack, power gating, etc. • LLD Controlling Power in Implementation Dynamic power Leakage power 2 (≈ k•Ck • C L •V• V DD •f• f CLK) (≈ VDD • Ileakage) • Clock gating (including de-clone) • Multi-Vt cell optimization • Area optimization • Substrate biasing (VT CMOS) • Static voltage scaling (MSV)* • Power shut-off (PSO) – aka • Dynamic voltage frequency scaling MTCMOS - including State retention (DVFS)* • Adaptive voltage scaling (AVS)* – Fine grain control – Coarse grain control * Techniques that affect both dynamic and leakage power Techniques and Trade-offs Power reduction Leakage Dynamic Timing Area Methodology technique power power penalty penalty impact Dyyppnamic power optimization 10% 10% 0% -10% None Multi-Vt optimization 6X 0% 0% 0% Low Clock gating 0% 20% 0% <2% Low Voltage Islands 2X 40-50% 0% <10% Medium Power shut-off (PSO) 10-50X 0% 4-8% 5-15% Medium-high Dynamic and Adaptive Voltage Frequency Scaling 2-3X 40-70% 0% <10% High (DVFS and AVS) Substrate Biasing 10X - 10% <10% High Source – Customer interviews, Conference papers (ISSCC), magazine articles LP Techniques in Detail Dynamic Power Optimization (No V) Pin swapping: low C with high F Gate sizing: CMOS power usage related to size Buffer removal: remove unnecessary buffers it2inst_2 MVT Optimization Low Vt Implementation Norm Vt High Vt Clock Gating • Relies on clock gate control signal in RTL or netlist RTL always @(posedge clk) if (en) Control out <= in; signal Block A Maps to either: 1. User defined gating clk module 2.Clock-gating-integrated Block B cell from library 3.Gating function built clk from standard logic Designing with Voltage Islands 1.0 V Power Domain clamps 12VPowerDomain1.2 V Power Domain Low V High V t Normal Vt t Memory (Hig h Spee d) (low lea kage, lower Speed) Voltage Level Shifter clamps Power Domain 3 (0.8V) 1.2V Domain Voltage Level Shifter Power Switch-Off (PSO) Methodologies Fine Grain Power Switches Coarse Grain Power Switches VDD VDD Real VSS Switch A Z SLEEP Virtual Vss Virtual VSS (No Pin) VDD SLEEP Standard Real VSS A Z Cell Virtual Vss Real Vdd SLEEP RlReal VSS Vdd SLEEP VSS Standard Cells (power switch Standard Cells Switch Built-in) Module Logical Representation Logical Representation (No change except for SLEEP) (Logic needs to be power aware!) Coarse Grain PSO Methodologies Always On Always On (Default Domain) (Default Domain) Always Always On/Off On On Domain Domain Domain Global Vdd Global Vdd Switched VDD On/Off Separated Area VDD GND Domain Common GND Power Power Switching cell Switching Clus ter Sw itc hes cell StdSithSegmented Switches Dynamic Voltage Frequency Scaling • Hardware that scales supply voltage and clock frequency in response to software demands – 16 levels of VDD (use 5 to 7 in practice) from Power Energy 1.1V to 0.6V Energy Characteristics of a Processor – Clock frequency from 200MHz to 700MHz in incremen ts o f 33MHz • Triggered when load change (detected by CPU software, or HW) – (load means ergy/Power number of functions to be executed) nn E – Heavier load → ramp up supply voltage, when stable, then scale up clock frequency – Lighter load → scale down clock freqqyuency, when PLL locks onto new rate, ramp down supply voltage 300 Mhz, 433 Mhz, 533 Mhz, 667 Mhz, 800 Mhz, 900 Mhz, 1000 Mhz, 0.80 V 0.875V 0.95 V 1.05V 1.15V 1.25V 1.3V • Must keeps clock frequency within limits Operating Points required by supply voltage to avoid clock Source – Magazine article skew problems, timing violation. – Worst-case scenario of a full swing from 0.6 V to 1.1V and from 200MHz to 700MHz could tak e ab ou t 280 m icrosecon ds. Dynamic Voltage Frequency Scaling Mode Core Sleep Slow SLEEP SLOW Baseline 1.08V 1.08V 1.08V 125MHz 125 MHz 125 MHz Slow 1.08V 1.08V 0.9V CORE 125MHz 125MHz 66MHz Standby 0.0V 1.08V 0.0V 125MHz •Multipp()le constraints (.sdc) • Libraries • Multiple modes need to be analyzed/optimized for multiple – Example: baseline.sdc, – stdcell_1.08sl.lib, corners ios.sdc, slow.sdc, sleep.sdc stdcell_0.9sl.lib, – Setup analysis for (WC, 1,125C) stdcell_1.08fs.lib, corner stdcell_ 0. 9fs. lib Adaptive Voltage Scaling Operating Voltage PM PM CPU/SOC Power PM Management Unit PM Performance parameters Closed loop control Substrate Bias Control 0.9 -2.5, 0.84 0.8 -2, 0.775 Vdd 0.7 -1.5, 0.7 -1, 0.625 Vbp 0.6 -0.5, 0.54 Vbn Vth (V) 0.5 Vss 0.4 0, 0.45 0.3 0.2 -252.5 -2 -151.5 -1 -050.5 0 Vsb (V) • For an n-channel device, the substrate is normally tied to ground (Vsb = 0) • A negative bias on Vsb causes Vth to increase • Substrate biasing can be done during packaging (VTCMOS) or during operation (ABB) Challenggges to implementing LP Techniques Dyyp()namic Power Optimization (No V) • Toggle reduction – Efficient synthesis • CitdtiCapacitance reduction – Placement – Physical synthesis • Toggle based Capacitance reduction – Pin swapping – Area compaction – Wire length minimization (high-toggle, fanout) • Useful skew MVT Optimization • Library c harac ter iza tion – Identical footprint – Footprint independent • Implementation – Efficiently replacing lower Vt cells with higher Vt cells • Analysis – How/When to measure leakage power? – Signal Integrity Analysis – Lowest leakage state Clock Gating latch_posedge_precontrol_obslatch_posedge_precontrollatch_posedge • Iden tify ing ga ting con ditions test enable • Testability requirements ck_in ck_out • Physical effects of clock obs gating • Timing effects of clock Observability Logic gggating …… SISO SE . Specify max #flops observable per observability flop (default=36) Low Power Clock Tree Synthesis – De-Cloning CLK CLK Congestion! Skew! Dynamic power De-cloning CGEnable Clock Gates CGEnable Clock Gates Flip flops Flip flops Voltage Islands • Which logic modules are suitable for voltage scaling? • What should be the scaled voltage value for these blocks? • Library characterization – Multiple voltages/ multiple conditions – Additional components – Voltage level shifters • Implementation – Phyypsical shape of the voltag e islands – Level shifter insertion in the netlist – Placement of level shifters – Routing to a level shifter – Power connection of a level shifter • Analysis – Timing analysis of islands – Optimization including level shifters – Sigggyynal integrity analysis – IR drop and how it affects timing Power Switch-Off • Library Characterization – Additional parameters – leakage power, max. current through the cells (Id), max. voltage drop – Additional cells – Switches, isolation cells, state retention cells • Implementation – Logic level Switch insertion/simulation/verification – Switch placement schemes – Ring/Column/Distributed – Switch enable distribution – high fan out net – Power planning/routing – Fine grain, coarse grain – SRPG control signals • Analysis – Transient analysis – On/Off analysis – Functional verification – Sneak ppyath analysis DVFS/AVS • Library c harac ter iza tion – Advanced modeling (ECSM, CCS) • Implementation – Clock synchronization – Use of level shifters in the clock design • Analysis – Multi-mode/multi-corner analysis/optimization – Functional verification (huge for AVS) Substrate Bias • Timing Analysis – Characterization for VTCMOS – Custom analysis for ABB • Optimization – Must be aware of body bias • WllWell separa tion – Between the regions that are subjected to control and that are not • Planning/routing additional power signals – Congestion – EM – Cell design – Functional Verification/validation Variability andLow Power % of paths 10% 12% 14% 16% 0% 2% 4% 6% 8% Test Chip Timing Path Slack Distribution, -100ps -> +200ps -100 -80 -60 -40 -20 0 20 40 ps 60 80 100 120 140 160 180 MSV MVT timed notime 200 Functional Checks Need to be Done @ Transistor Level VDD VDD PwrEn1 PwrEn2 V2 V1 Vs Vc Power PwrEn1 Control PwrEn2 A B A A Vs Vc Y FSM ISO Y ISO Iso ISO Level Shifting Isolation Cell in Source Domain, which will be shut off State Retention Register Checks PwrEn1 SW A PwrEn1 ON/OFF VDD VRET Q Power RTCLK Controller D RET SRPG V1 RET V2 RTCLK RTCLK Don’t care D Don’t care Q X Sleep Wake RET • Struct ural Check – Checks that RET signal comes from an Always ON power domain; VRET tied to continuous Power – Checks that VDD and D pins connect to the same power domain • Functional Checks assert (RETL || RETK) Æ (RTCLK off) Sneak Path Detection Floati ng nod e wh en X is switched-off can cause additional ENB VDD leakage Block X A Y EN VSS Common in mux logic Guidelines for LP Technique Selection How To Choose Between Various LP Techniques • Understand the application/technology need for power reduction • Choose the techniques based on the power reduction requitdtiirement and not vice versa. • Understand the trade-offs – esp. methodology implications High-level Guidelines for Power Reduction in Design • Power is a per formance parame ter s iiltimilar to area an dtiid timing – Optimize and analyze timing, power and area concurrently • Choose the LP techniques early in the implementation – Helps to get max. power reduction – Architecture/process selection must be driven by power need • Use of voltagggqe scaling techniques leads to quadratic reduction in power e.g. MSV, DVFS • When not in use, shut it off! • Verify, verify, verify! Steps for Successful LP Design Tapeout! • LP implementation is complex and requires more time (2X) than normal.