Sandy Bridge Power Management Overview
Total Page:16
File Type:pdf, Size:1020Kb
Power management architecture of the 2nd generation Intel® Core™ microarchitecture, formerly codenamed Sandy Bridge Efi Rotem - Sandy Bridge power architect Alon Naveh, Doron Rajwan, Avinash Ananthakrishnan, Eli Weissmann Hot Chips Aug-2011 Agenda Power management overview Intel® Turbo Boost Technology 2.0 Thermal management Energy efficiency Average power management Platform view Summary High CPU and PG performance Power and energy efficiency Sandy Bridge - Hot Chips 2011 2 Power management overview Sandy Bridge - Hot Chips 2011 3 Sandy Bridge power mgmt ID card VR Sandy Bridge is: DMI PCI Express* 1-4 CPU cores + PG System Agent Integrated System Agent (SA) SVID Sliced LLC shared by all cores/PG IMC PECI PCU Ring interconnect + power management link Display EC Package Control Unit (PCU) : Core LLC On chip logic and embedded controller running power management firmware Core LLC Communicates internally with cores, ring and SA Monitors physical conditions Core LLC − Voltage, temperature, power consumption Core LLC Controls power states − CPU and PG voltage and frequency Graphics − Controls voltage regulators DDR and system External power management interface Accepts external inputs − System power management requests and limits − Power and temperature reading MSR, MMIO and PECI system bus Sandy Bridge - Hot Chips 2011 4 Voltage and frequency domains VCC Periphery Two Independent Variable Power Planes: VCC SA CPU cores, ring and LLC Embedded power gates - Each core can be turned off individually VCC Core Cache power gating - Turn off portions or all (Gated) cache at deeper sleep states Graphics processor VCC Core Can be varied or turned off when not active VCC Core (ungated) Shared frequency for all IA32 cores and ring (Gated) Independent frequency for PG Fixed Programmable power plane for System Agent Optimize SA power consumption VCC Core System On Chip functionality and PCU logic (Gated) Periphery: DDR, PCIe, Display Embedded power gates VCC Core (Gated) VCC Graphics VCC Periphery Sandy Bridge - Hot Chips 2011 5 Power performance fundamentals Maximize user experience under multiple constraints User Experience (May have different preferences): − Throughput performance − Responsiveness - burst performance − CPU / PG performance − Battery life / Energy bills − Ergonomics (acoustic noise, heat) Optimizing around Constraints to meet user preferences − Silicon capabilities − System Thermo-Mechanical capabilities − Power delivery capabilities − S/W and Operating system explicit control − Workload and usage Rich set of control knobs for the user and system designer to tailor power - performance preferences Sandy Bridge - Hot Chips 2011 6 Power management features topology Operating system, PG driver, BIOS, Embedded Controller and user preferences S/W Platform Control Power/performance optimization algorithms Opt. Milliseconds to seconds control algorithms Power Control Perf PCU “kernel” – mission critical power management events C-state control, P-states transitions and latency sensitive actions Control events Real Time Real Time Thermal sensing, Maximum current control, physical layer communication Platform control: DDR thermal, Voltage Regulator optimization, hot sensors etc. Layer Physical Sandy Bridge - Hot Chips 2011 7 Intel® Turbo Boost Technology 2.0 Sandy Bridge - Hot Chips 2011 8 Power metering Power management is based on power metering Sandy Bridge implements a digital power meter 3rd generation of power metering in Intel® products Active power - Event counters track main building blocks activities − 100 Micro arch. event counters - apply active energy cost to each event − CPU, PG, Ring, Cache, and I/O Static power – Leakage and idle as a function of voltage and temperature Used for power management algorithms Architecturally exposed to software and system For the use of S/W or system embedded controller 45 40 35 30 25 20 CPU - predicted PG - predicted 15 Average accuracy – 0.9% Package - predicted CPU - actual 10 STDEV 0.6% PG - actual 5 Package - actual 0 0 50 100 150 200 250 Sandy Bridge - Hot Chips 2011 9 What is CPU Turbo P-state: a voltage/frequency pair (ACPI terminology) P0 1C P1 is guaranteed frequency “Turbo” CPU and PG simultaneous heavy load at worst case conditions H/W Control Actual power has high dynamic range P0 is max possible frequency P1 Pn is the energy efficient state OS OS control Pn-P1 range Visible P1-P0 has significant frequency range (GHz) frequency States P1 to P0 range is fully H/W controlled Pn User preferences and policies Single thread or lightly loaded applications Thermal Control GFX <>CPU balancing LFM Sandy Bridge - Hot Chips 2011 10 What is Turbo Turbo enabled product specifications CPU PG TDP total package P1 P0 P1 P0 sustained power Source: http://www.intel.com/Assets/PDF/datasheet/324692.pdf Sandy Bridge - Hot Chips 2011 11 New concept: thermal capacitance Classic Model New Model Steady-State Thermal Resistance Steady-State Thermal Resistance Design guide for steady state PG and CPU sharing AND Dynamic Thermal Capacitance CPU PG Example: Cp_Al ~ 0.9 J/(gr*’K) 100gr heat sink heated by 35W CPU 100Sec Classic model More realistic respond response to Temperature Temperature power changes Time Time Sandy Bridge - Hot Chips 2011 12 New concept: thermal capacitance Classic Model New Model Steady-State Thermal Resistance Steady-State Thermal Resistance Design guide for steady state PG and CPU sharing AND Dynamic Thermal Capacitance • Managing of energy budget rolling average • Heat sink capacity time constant – few sec. CPU PG • Short time constants for power delivery En+1 = αEn + (1−α)*(TDPn − Pn )∆tn • Package energy sharing between CPU and PG • Multiple sources of controls • Software or external embedded controller Classic model More realistic respond response to Temperature Temperature power changes Time Time PCU manages energy budgets over multiple time constants Accumulated energy during idle period used when needed Sandy Bridge - Hot Chips 2011 13 Intel® Turbo Boost Technology 2.0 - Dynamic Power After idle periods, the system accumulates “energy budget” and can accommodate high C0/P0 (Turbo) power/performance for up to a minute Max power 1.2-1.3X TDP In Steady State conditions the power stabilizes on TDP, possibly at higher then nominal frequency 30-60 Sec “TDP” Use accumulated energy budget to enhance user 5 Sec / 30-60 Sec Sleep or exponential Average experience Low power Time Buildup thermal budget during idle periods Sandy Bridge - Hot Chips 2011 14 Usage Scenario: Responsive Behavior • Interactive work benefits from Intel® Turbo Boost 2.0 • Idle periods intermixed with user actions Photo editing • Open image • Process • View • Balance colors • Red eye removal • Contrast • Filters • Etc. User interactive actions I mage open and process IntelIntel Restricted Top Secret Secret – RSNDA 35 Sandy Bridge - Hot Chips 2011 15 Turbo controls in action Voltage Regulator reported capability Actual instantaneous CURRENT_CONFIG_CONTROL MSR power Allow programmability P-state Power TURBO_POWER_LIMIT Control MSR Hard Limit Enables and locks Max Icc Package Power limit 2 – Instantaneous Power limit 2 Package Power limit 1 Time interval Power delivery Package Power limit 1 clamp bit Package Power limit 1 - power C0 P0 •Also: • Individual power controls available Power limit 1 • Explicit frequency control Config TDP User / OEM / OS preference PL1 time exp. average Time Sandy Bridge - Hot Chips 2011 16 Intel® Turbo Boost Technology 2.0 - Package Power specification is defined for the entire package Monolithic die – power budget shared by CPU and PG Sum of component power at or below specifications Core Heavy CPU Power [W] workload Total package power CPU+ PG= const Heavy Graphics workload Applications PG Power [W] Sandy Bridge - Hot Chips 2011 17 Intel® Turbo Boost Technology 2.0 - Package Power specification is defined for the entire package Monolithic die – power budget shared by CPU and PG Sum of component power at or below specifications Energy budget spit dynamically according to user preference Control algorithm translates energy headroom to turbo bins CPU preference Core Power budget is Power [W] assigned to the CPU Balanced Power budget split between CPU and PG IA preference MSR GT preference MSR PG preference Power budget is assigned to the CPU Application PG Power [W] Sandy Bridge - Hot Chips 2011 18 Turbo in action – measurements • Four core 45W 2.2 up to 3.5 GHz Sandy Bridge example • Running CPU and PG simultaneous workloads • Control power management knobs on the fly using a control utility After idle period turbo to 56W for ~20Sec - stabilize at TDP = 45W Frequency varies Sandy Bridge - Hot Chips 2011 19 Energy Efficient P-State - optimizing MIPS / Watt Frequency voltage scaling up is not energy efficient Cubic increase in power for linear increase in frequency and performance Used to get raw performance at the cost of increased energy consumption Not all workloads gain performance from frequency For example – many memory accesses poor performance scalability “Wait slowly” lower frequency at memory bound intervals − Save energy to be used for core bounded phases − Or just save energy with minimal performance impact Continuously generate “scalability” metric Drop frequency if scalability is low Workload Scalability over time 1 User preference control 0.98 Max performance – ignore energy cost 0.96 0.94 Balanced – lower frequency at 0.92 0.9 memory-bound intervals Performance Scaling 0.88 0.86 Max energy