Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky OUTLINE
Total Page:16
File Type:pdf, Size:1020Kb
PRACTICAL POWER GATING AND DYNAMIC VOLTAGE/FREQUENCY SCALING Stephen Kosonocky AMD Fellow 1 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky OUTLINE . Motivation for DVFS and Power Gating . Dynamic Voltage/Frequency Scaling . Power Gating . Conclusion 2 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky MOTIVATION 3 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky SERVER COST TRENDS . Server compute capacity/$ is increasing steadily – Driving up total power/$ spent on servers – Also driving up electrical power spending . Green initiative and focus on energy independence creates additional pressure to lower energy costs . Increased energy efficiency directly reduces costs • Source Data: Uptime Institute, 2009 4 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky MOBILE AND DESKTOP Now: Parallel/Data-Dense . High degree of integration of 16:9 @ 7 megapixels special functions HD video flipcams, phones, webcams (1GB) . Diverse workloads 3D Internet apps and HD video online, social networking w/HD files – Data serial, data parallel 3D Blu-ray HD – Streaming Multi-touch, facial/gesture/voice – Compute intensive in bursts recognition + mouse & keyboard – Long periods of inactivity All day computing (8+ Hours) . Small form factor Immersive and . Long battery life interactive performance . Instant access of the device . Autonomous power management Workloads - Samuel Naffziger, VLSI Symposium 2011, Kyoto 5 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky POWER/PERFORMANCE TRADE-OFF Processor Power-Frequency/Voltage 35 12 30 Perf/Watt 10 25 Power Performance/Watt 8 maximized at the 20 lowest voltage operating point 6 15 Watts/Core 4 Frequency (GHz) Frequency 10 Fmax/Voltage 2 5 0 0 0.725 0.775 0.825 0.875 0.925 0.95 1.025 1.075 1.125 1.175 1.225 Voltage Performance/Watt maximized at VMIN 6 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky HIGH PERFORMANCE CORE POWER Depending on process node, leakage ~ 40% total power 7 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky Two major knobs have emerged for controlling power 1. Dynamic Voltage and Frequency Scaling – Optimize performance for the application while it’s running 2. Power Gating – Gate power during idle periods Each present unique challenges for implementation and optimization 8 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky DYNAMIC VOLTAGE/FREQUENCY SCALING 9 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky LLANO ACCELERATED PROCESSING UNIT (APU) • Integration Provides Improvement • Eliminate power and latency of extra chip crossing • 3X bandwidth between GPU and Memory • Same sized GPU is substantially more effective • Power efficient, advanced technology for both CPU and GPU • Thermal management optimization between CPU and GPU ~ Power Breakdown . Key features for Mobile Mark 07 IO Analog PHY – Lower Idle, Active power CPU – DVFS / Clock-gating / Power Gating NB – Robust and Flexible Power Management Control DDR GPU PHY 10 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky APU POWER MANAGEMENT - Sebastien Nussbaum, IEEE Vail Computer Elements, June 22th, 2009 11 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky DYNAMIC VOLTAGE SCALING Voltage . Dynamic frequency and voltage scaling (DVFS) P-states: Core executing at some frequency V . Match the frequency of operation with P-states C1/C2: Core halted and frequency ramped down the workload . Lower voltage to sustain required frequency C3/C4: Core halted, clocks ramped to even lower VAltvid frequency, maintain caches, maintain core Examples: . P-states for active control of V C5: Core halted, clocks ramped to even lower deeper Altvid frequency, flush caches, maintain core voltage/frequency . C-states for degrees of idle control C6: Core off, flush caches, core state saved to 0V DRAM 12 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky DESIGN FOR DVFS . Optimization of timing across entire voltage range necessary for efficiency DVFS . Required two separate timing corners with equally aggressive cycle time goals – 0.8V & 1.3V . Both gate-dominated timing paths and paths with high RC content have been tuned to provide better scaling than the prior design . Enables the core to thoroughly exploit boost capability 13 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky APU DVFS - Sebastien Nussbaum, IEEE Vail Computer Elements, June 22th, 2009 14 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky APU THERMAL PROFILE 110.0 105.0 100.0 95.0 90.0 85.0 CPU-centric workload - Samuel Naffziger, VLSI Symposium 2011, Kyoto 15 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky APU THERMAL PROFILE 110.0 105.0 100.0 95.0 90.0 85.0 GPU-centric data parallel workload - Samuel Naffziger, VLSI Symposium 2011, Kyoto 16 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky APU THERMAL PROFILE 110.0 105.0 100.0 95.0 90.0 85.0 Balanced workload - Samuel Naffziger, VLSI Symposium 2011, Kyoto 17 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky WHAT HAPPENS WHEN WE ADD MORE CORES? CPU 0 CPU 1 Already at 6, 2 high current supplies PLL PLL VDD CPU 2 CPU 3 PLL PLL VDDIO, VLDT VRM DIMMs VTT PLLs SVI PLLs Graphics Misc Northbridge PLL VDDA VDDNB 18 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky PACKAGE LIMITS . Packages add constraints to increasing number of discrete Typical low cost package structure supplies Chip . Typically only use 4 thick layers for high current power distribution . 3-4 thin build-up layers on each side – Used for secondary supply and signal routing to pins . Difficult to add many more supply rails 19 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky TYPICAL PACKAGE LAYER ROUTING LAYERS VDDR VDDA VDDNB VDD VDDIO M1 – PWR (top metal layer) M2 – routing layer 1 Package capacitors Routing congestion Space limited for capacitors 20 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky CORE VOLTAGE POWER DELIVERY IN PACKAGE VDD Max Package Droop VSS Max Package Droop 2.4x -0.025 0.041 higher -0.0255 0.04 -0.026 0.039 0.038 -0.0265 0.037 -0.027 0.036 0.035 -0.0275 1 7 0.034 -0.028 13 0.033 S29 19 0.032 -0.0285 25 S22 S25 S22 -0.029 S15 31 S19 1 S16 4 7 S13 S8 10 S10 S7 13 16 19 S4 22 S1 25 S1 28 31 34 . Package Routing Constraints forces compromises for VDD plane – More difficult with increasing VDD planes . More resources dedicated to VSS – Shows less droop 21 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky WHAT DOES THE ACTUAL VDD LOOK LIKE? 25ns trace of DGEMM benchmark 1.35 1.4 1.3 1.2 ~50mV 1 1.25 0.8 VDD 1.2 0.6 0.4 1.15 Current Normalized 0.2 1.1 0 Time X86 core Supply simulation w/ package board model Increased local decoupling cap could help 22 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky PACKAGE CAPS FOR SUPPLY DECOUPLING Most effective Marginally effective Hard to find places for additional components 23 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky LARGE ON-DIE DECOUPLING CAPACITOR FOR REDUCTION OF SUPPLY NOISE Trench Capacitor Shift resonant frequency from 80MHz -> 7MHz - D. Wendel, et.al. “The Implementation of POWER7: A Highly Parallel and Scalable Multi-Core High-End Server Processor”, ISSCC 2010 Adds significant process cost 100fF/um2 ~ 25x thick-ox 24 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky METHODS FOR INTEGRATED VOLTAGE REGULATION . Switched capacitor regulator . Buck converter . Linear regulator . Switched voltages 25 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky INTEGRATED SWITCHED CAPACITOR REGULATOR . Efficiency dependent on load current 2:1 Conversion – Most efficient at 2:1 voltage conversion ratio – Limits number of switches in the path – ~90% for 2:1, ~70% for other values . Requires large capacitors – Can use trench capacitor for better area efficiency . Large voltage and current ripple – Can mitigate with large decap and/or Leland Chang, Robert K. Montoye, Brian L. Ji, Alan J. Weger, Kevin G. Stawiasz, and Robert H. Dennard, “A interleaved converters Fully-Integrated Switched-Capacitor 2:1 Voltage Converter with Regulation Capability and 90% Efficiency at 2.3A/mm2”, VLSI Symposium , 2010 26 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky INTEGRATED BUCK CONVERTER . Integrated inductor options . ~ 0.33nH possible