White Paper |  ADVANCED POWER MANAGEMENT HELPS BRING IMPROVED PERFORMANCE TO HIGHLY INTEGRATED PROCESSORS TABLE OF CONTENTS

THE IMPORTANCE OF POWER MANAGEMENT 3 THE X86 EXAMPLES 3 ESTABLISH A REALISTIC WORST-CASE FOR POWER 4 POWER LIMITS CAN TRANSLATE TO PERFORMANCE LIMITS 4 AMD TACKLES THE UNDERUSED TDP HEADROOM ISSUE 5 GOING ABOVE TDP 6 INTELLIGENT BOOST 7 CONFIGURABLE TDP 8 SUMMARY 9 Complex heterogeneous processors have the potential to leave a large amount of performance headroom untapped when workloads don’t utilize all cores. Advanced power management techniques for x86 processors are designed to reduce the power of underutilized cores while also allowing for dynamic allocation of the thermal budget between cores for improved performance.

THE IMPORTANCE OF THE X86 EXAMPLE POWER MANAGEMENT Typical x86 processors widely used Those with experience implementing in both consumer and embedded know the importance applications are a perfect example: of proper power management. Whether Integration of network and security for simple applications processors or engines, memory controllers, graphics high-end server processors, the ability processing units (GPUs), and video to down-clock, clock-gate, power-off, encode/decode engines has effectively or in some manner disable unused or turned them into heterogeneous underused hardware blocks is crucial in compute units that excel at a wide limiting power consumption. variety of workloads.

Better power management benefits The notable thing about traditional range from energy savings within the reduction-based power management data center to improved battery life in is that a particular functional block is mobile devices. But don’t underestimate only turned off when unused, or down- the value of reducing power and clocked when higher performance is increasing efficiency. In fact, power not needed by the application. What reduction and increased efficiency about applications that desire more is even more important today, as performance? Shouldn’t saving power processors integrate more and varied in one area allow you to utilize it functional blocks. in another?

WHITE PAPER | ADVANCED POWER MANAGEMENT HELPS BRING IMPROVED 3 PERFORMANCE TO HIGHLY INTEGRATED X86 PROCESSORS Specifying power usage is complex, ESTABLISH A REALISTIC particularly with highly integrated WORST-CASE FOR POWER processors. If the worst-case power The pragmatic approach for silicon for each individual hardware block in a providers is to survey real-world heterogeneous were added application software to establish a more together, the resulting total could be realistic worst-case power and add several times the achievable worst-case some guard-band for safety. Both AMD power for the device. The fact that it is and use this type of methodology nearly impossible to write software that and specify it as thermal design power will simultaneously utilize all functional (TDP). TDP is essentially the maximum blocks to their fullest extent is one sustained power a processor can reason. Simply feeding the various draw with “real world” software while compute engines and I/O ports with operating under defined temperature enough data to keep them all 100% and voltage limits. utilized would likely exceed the available bandwidth of internal buses. (CPU) cores manage POWER LIMITS CAN data movement, and time spent there TRANSLATE TO is less time spent executing higher- PERFORMANCE LIMITS power instructions. Most embedded x86-based systems are power-constrained in some Another issue is that different way. Designers will look for the best instruction sequences can incur vastly performance they can get in a given different power usage, which can further power envelope, at a price they can complicate specifying processor power. afford. The worst-case power limit can For instance, complex floating-point translate directly into a performance instructions burn much more power limit for a given processor product than a simple I/O data read due to the by effectively defining the maximum significant difference in transistor logic operating frequency. they activate during execution. The combination of varying instruction types Using TDP as a worst-case power and utilized hardware blocks makes the specification instead of the cumulative actual power usage of the processor per-block maximum power helps to highly workload-dependent, and increase that operating frequency, but explains why it is rare to see a “typical” it’s also based on an assumption of the power specification for this device type. software workload. Applications using Still, implementers expect a maximum fewer hardware blocks, or using them power specification on which to base to a lesser extent, use less power and their design. effectively leave performance headroom on the table.

WHITE PAPER | ADVANCED POWER MANAGEMENT HELPS BRING IMPROVED 4 PERFORMANCE TO HIGHLY INTEGRATED X86 PROCESSORS AMD TACKLES THE UNDERUSED TDP HEADROOM ISSUE AMD Turbo CORE technology1 was "PILEDRIVER" 2MB L2 launched several years ago to address DUAL-CORE underutilized TDP headroom. AMD Turbo X86 MODULE CORE began with a simple core-counting mechanism that allowed some CPU

PCI EXPRESS® cores to use higher-frequency “boost” NORTHBRIDGE states while other CPU cores were idle. This approach only affected the CPU cores, and was primarily targeted at accelerating single-threaded "PILEDRIVER" applications that didn’t leverage DUAL-CORE a multi-core architecture. X86 MODULE 2MB L2 MEMORY INTERFACE MEMORY

DP & VGA Generational improvements have increased the granularity and effectiveness of the technology by adding more boost states for CPU and GPU cores, real-time power and GRAPHICS CORES temperature monitors, and enabling & MULTIMEDIA dynamic power budget allocation between cores.

Increasing performance by boosting to Integration of large GPU cores, as done in AMD R-Series APUs, higher frequencies is relatively simple, increases the potential for unused power budget. since the use of multiple performance states (voltage and frequency operating AMD’s recent move to integrate points) has been around for a while. discrete-class GPUs with x86 processor However, the complexity lies in cores in accelerated processing determining when and which cores to units (APUs) underscores this power boost. For AMD Embedded R-Series management challenge. Some APUs APUs, the starts by dividing contain a GPU that accounts for the processor into separate thermal more than half of the silicon die and entities: one for each CPU core-pair and a proportional amount of the power one for the GPU. I/O power is small by budget. A much larger potential for comparison, so it is defined as a fixed under-utilization of the APU’s power value based on characterization to envelope exists in this scenario if the reduce complexity. software workload is highly CPU- centric or GPU-centric. The trend An integrated manages toward integration of these complex, AMD Turbo CORE calculations, allowing heterogeneous cores is likely to continue a more complex and therefore more and necessitates a means of harnessing effective algorithm. In deciding whether the excess thermal headroom. boosting a given core is possible, the

WHITE PAPER | ADVANCED POWER MANAGEMENT HELPS BRING IMPROVED 5 PERFORMANCE TO HIGHLY INTEGRATED X86 PROCESSORS power usage of each thermal entity be explained later. Total instantaneous must be determined. On-die analog power of the thermal entity can then 2 power measurement at many amps be calculated by P=CAC*V *f + Pstatic, is not practical in a 32nm silicon on and total power for the APU equals insulator (SOI) process, and external the summation of the power for each measurement is not possible because thermal entity and the I/O power offset. the various cores share power rails. The instantaneous power calculation result is compared to an allocated power MAX DIE TEMP LIMIT budget for the thermal entity, as well as the device’s thermal design current

TDP BUDGET specification to ensure that current demand does not exceed what the Unused voltage regulator can provide. If either CPU Power CORE Budget value is too close to the limit, firmware PWR can impose throttling by reducing the CPU CORE core’s performance state. The ability PWR to boost the performance state is

CPU DIE TEMP APU POWER CORE CPU maintained when headroom exists PWR CORE on both parameters. PWR

I/O I/O GOING ABOVE TDP PWR PWR Even if an application with a high CAC drives the APU to consume the full APP 1 APP 2 high CAC low CAC TDP, operation at this level may occur in bursts or be preceded by idle time Applications with a low CAC can leave unused such that the die temperature at the TDP and temperature headroom. New power management techniques can exploit both for start of the high CAC period is far below improved performance. the maximum specification. The latest version of AMD Turbo CORE also takes Alternatively, proprietary activity the opportunity to boost in this scenario monitors that are integrated throughout by allowing brief excursions above TDP the processor architecture model current when there is adequate temperature logic activity as an AC capacitance (CAC). headroom. After all, the purpose The CAC monitors effectively profile the of a TDP limit is only to ensure die running application to determine if it is temperature stays in check. one of those “worst-case” workloads that defines TDP or something less Real-time temperature values from laborious. Static power of the core is around the thermal entity provide a determined by transistor leakage at a scaling factor to the power calculation given voltage and temperature which so they influence the boost decision can be characterized for the device and without controlling it directly. Derivation hard-coded into the algorithm as a of a calculated temperature comes from function of temperature. A calculated application of the calculated power to temperature value from a previous a reference thermal solution model. iteration is used for reasons that will Reducing the influence of actual die

WHITE PAPER | ADVANCED POWER MANAGEMENT HELPS BRING IMPROVED 6 PERFORMANCE TO HIGHLY INTEGRATED X86 PROCESSORS temperature on the boost algorithm is an intentional tradeoff to increase deterministic performance of the device. The calculated temperature is then combined with temperature data from other thermal entities to determine if thermal headroom exists.

BOOST SCALAR ON-DIE TEMPERATURE SENSOR READINGS

Activity P T Pboost0 from T POWER calculated calculated calc Cac * TEMP. <> T Monitors CALCULATION max Pboost1 CALCULATION Temp. Pboost2 of other Freq V T TEs P0 x86 module calc cTDP SCALAR NEW OPERATING F,V P1

Control Loop (similar to above) Intelligent Boost Mgr. x86 module GPU Control Loop (similar to above)

AMD Turbo CORE algorithms use a variety of frequency, voltage, temperature and logic activity inputs to dynamically determine which cores need a performance boost and how much thermal headroom is available.

Other thermal entities can act as heat under a demanding workload, all cores sources or sinks, depending on their would attempt to boost until they reach temperature state, and therefore must their maximum performance state or be considered. Temperature offsets until the device thermal limit is reached. are also included to account for sensor It is very unlikely that the application tolerance and help make sure that the will be perfectly balanced, but rather maximum junction temperature is never limited by one core type (CPU or GPU) exceeded. The calculated temperature being saturated. is compared to predefined thresholds to determine the amount of boost that Intelligent Boost examines the workload is possible. at a very high frequency to give more thermal budget to the core that needs it INTELLIGENT BOOST the most by preventing the other cores from boosting more than necessary, The final stage of AMD Turbo CORE maximizing efficiency without affecting technology is called Intelligent Boost; it overall processer performance. uses a proprietary algorithm that helps improve efficiency by only allowing With an understanding of how boost a core to boost if it can translate technologies work, designers should that higher frequency to increased consider where it could affect their performance. If each thermal entity application or design practices. A control loop operated independently common concern is that designers may

WHITE PAPER | ADVANCED POWER MANAGEMENT HELPS BRING IMPROVED 7 PERFORMANCE TO HIGHLY INTEGRATED X86 PROCESSORS have become accustomed to the idea its impact. While users will not see a that the power draw of their software significant performance delta across the application doesn’t come close to driving operating temperature of the processor, a processor near its TDP, leading them any variation that does exist can be to design to a lower specification. seen when operating the device near its Historically, this may have been safe, maximum die temperature. but boost technologies will tend to drive the processor closer to TDP than before It is worth noting that the temperature- by allowing the active cores to consume based boost scalar described only serves more power. to increase boost but does not gradually scale down performance if the maximum Operating closer to TDP may sound die temperature is exceeded. A separate like a bad thing, but keep in mind that and less granular hardware thermal performance can be gained with the control mechanism can be used to drop increase in power. An example could be cores to minimum performance states a machine vision application achieving in the event of an over-temperature higher frame rates for faster recognition. condition. For applications that are very Total processor power might increase sensitive to deterministic performance, in that scenario, but doesn’t materially the boost features can always be increase with applications such as disabled. media playback in a digital signage player. Fixed, periodic workloads may CONFIGURABLE TDP complete faster at a higher power Beyond performance benefits, the level, but when the burst of activity is ability of AMD Turbo CORE algorithms over, cores then spend more time in to control average power consumption lower-power idle states, so the average of the processor also enables a new power is approximately the same. and interesting feature on the latest Applications like this can still benefit generation of AMD APUs, called from boosted performance through configurable TDP. It essentially provides better responsiveness. the system designer a knob to modify the processor TDP to better fit the needs Some designers will dislike the idea of the application. of variable performance, especially in real-time applications, but the potential A useful example might be a system delta is actually quite small in the AMD design with a thermal budget for a case. The CAC value of the application 20W processor but vendor offerings has the biggest impact on how much a that only include 15W and 25W options. core will boost, and is very deterministic Configurable TDP enables flexibility so in behavior on a given processor model. the designer isn’t forced to choose a Testing an application on the target lower-performing 15W option in order to processor is a simple way to determine remain within the 20W power budget. actual performance. The temperature- Instead, the 25W processor might be based boost scalar is the only used but configured for 20W. mechanism that provides variability and it is intentionally limited to minimize

WHITE PAPER | ADVANCED POWER MANAGEMENT HELPS BRING IMPROVED 8 PERFORMANCE TO HIGHLY INTEGRATED X86 PROCESSORS SUMMARY AMD Turbo CORE technology will help to dynamically provide the processor’s best available performance while keeping thermal dissipation under the specified amount. Support for configurable TDP and the level of configurability varies by processor model, but it can be a very useful feature for those that support it.

System designers should keep these new concepts in mind when choosing and implementing embedded x86 processors. Power management isn’t just about saving power anymore.

1. AMD Turbo CORE technology is available only with select AMD APUs and GPUs.

DISCLAIMER The information contained herein is for informational purposes only, and is subject to change without notice. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. , Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of non-infringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD’s products are as set forth in a signed agreement between the parties or in AMD's Standard Terms and Conditions of Sale.

AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. PID 54696-A

© 2014 Advanced Micro Devices, Inc. All rights reserved.

WHITE PAPER | ADVANCED POWER MANAGEMENT HELPS BRING IMPROVED 9 PERFORMANCE TO HIGHLY INTEGRATED X86 PROCESSORS