The Fourth-Generation Intel Core Processor
Total Page:16
File Type:pdf, Size:1020Kb
................................................................................................................................................................................................................. HASWELL:THE FOURTH-GENERATION INTEL CORE PROCESSOR ................................................................................................................................................................................................................. HASWELL, THE FOURTH-GENERATION INTEL CORE PROCESSOR ARCHITECTURE, DELIVERS A Per Hammarlund RANGE OF CLIENT PARTS, A CONVERGED CORE FOR THE CLIENT AND SERVER, AND Alberto J. Martinez TECHNOLOGIES USED ACROSS MANY PRODUCTS.ITUSESANOPTIMIZEDVERSIONOFINTEL Atiq A. Bajwa David L. Hill 22-NM PROCESS TECHNOLOGY.HASWELL PROVIDES ENHANCEMENTS IN POWER- Erik Hallnor PERFORMANCE EFFICIENCY, POWER MANAGEMENT, FORM FACTOR AND COST, CORE AND Hong Jiang UNCORE MICROARCHITECTURE, AND THE CORE’S INSTRUCTION SET. Martin Dixon ......Haswell, the fourth-generation Haswell is a “tock”—a significant micro- Intel Core Processor, delivers a family of pro- architecture change over the previous- Michael Derr cessors with new innovations.1,2 Haswell generation Ivy Bridge. Haswell is built with delivers a range of client parts, a converged an SoC design approach that allows fast and Mikal Hunsaker microprocessor core for the client and server, easy creation of derivatives and variations on Rajesh Kumar and technologies used across many products. the baseline. Graphics and media come with Many of Haswell’s innovations are in the more scalability that lets designers build effi- Randy B. Osborne areas of improving power-performance effi- cient configurations from the lowest to highest ciency and power management. Power- end. The core comes with power-performance Ravi Rajwar performance efficiency has been enhanced to enhancements and a set of new instructions, increase the processor’s operating range and such as floating-point fused multiply-add Ronak Singhal improve its inherent performance in power- (FMA) and transactional synchronization limited scenarios and its battery life. extensions (TSX). Reynold D’Sa Improvements in power management in- Haswell uses an enhanced version of Intel’s Robert Chappell clude additional idle states, specifically the 22-nm process technology, which has en- new active idle state S0ix, which enables 20Â hanced tri-gate transistors to reduce leakage Shiv Kaushik reduction in idle power. One key enabler for current by a factor of 2Â to 3Â with the same power-performance improvements is the frequency capability. Haswell’s version of the Srinivas Chennupaty fully integrated voltage regulator (FIVR), 22-nm process has 11 metal interconnect which also improves board space and cost. layers, compared to nine for Ivy Bridge, to opti- Stephan Jourdan Performance improvements in the core and mize for better performance, area, and cost. graphics come with corresponding improve- Steve Gunther ments in cache hierarchies; the first two cache levels have twice the bandwidth. For the Power efficiency and management Tom Piazza top graphics configurations, Intel Iris Pro Current processors operate in power- Graphics, Haswell also introduces a new constrained modes; they must maximize the Ted Burton fourth-level, 128-Mbyte on-package cache performance they deliver inside a fixed power Intel that enables a new level of integrated graphics envelope. This power constraint is true for performance. both server and mobile applications. One of ....................................................... 6 Published by the IEEE Computer Society 0272-1732/14/$31.00 c 2014 IEEE DMI PCI Express* Power System IMC Display agent Core LLC Core Performance LLC Figure 1. Power and performance voltage- Core LLC frequency scaling improvements. The baseline (solid line) is improved (dashed line) by being lowered and by being Core extended for better burst and Turbo LLC headroom. the most important goals of a new processor Processor graphics generation is to dramatically improve power- performance efficiency. In Figure 1, the basic nonlinear relationship between power and Figure 2. Conceptual block diagram of the performance is shown in the solid line. To im- Haswell processor showing the different prove power-performance efficiency across the independent voltage domains. The figure voltage-frequency scaling range, we must also shows Haswell’s cache hierarchy and achieve three goals, as shown in the dashed line: memory controller, which features extending the operating range down- bandwidth, load balancing, and DRAM ward to allow the processor to go into efficiency improvements. smaller form factors that are even Optimized microarchitecture and more power constrained, algorithms. In each generation, we improving the basic power-perform- evaluate for sufficient power- ance efficiency of the processor by performance efficiency. Areas that fall pushing each operating point to the below our goals will be reimple- right and down, and mented in ways that improve the extending the operating range upward power-performance efficiency. for more burst and Turbo headroom. Optimization of design and imple- In Haswell, we employ multiple techniques mentation through continued focus to improve power-performance efficiency. We on gating unused logic and using can describe them in three categories: low-level low-power modes. implementation, high-level architecture, and An example of a high-level architecture platform power management. improvement in Haswell is extending the use Examples of low-level implementation of independent voltage-frequency domains. improvements include the following: Figure 2 shows a conceptual block diagram of Optimized manufacturing, process tech- the different voltage-frequency domains. nology, and circuits help achieve all Cores, caches, graphics, and the system agent three goals just listed. These improve- are all running at dedicated, individually con- ments are enabled by Intel’s manu- trolled voltage-frequency points. A power con- facturing capability and a deep trol unit (PCU) dynamically allocates the collaboration across the different Intel power budget among the domains to maxi- teams. mize performance. Prioritization based on ............................................................. MARCH/APRIL 2014 7 .............................................................................................................................................................................................. HOT CHIPS runtime characteristics select the domain with converts into burst performance), a substan- the highest-performance return. For example, tial battery life increase, and a 70 to 80 per- for a graphics-focused workload, most of the cent platform footprint reduction. processor power is allocated to the graphics Figure 3 gives an overview of FIVR. A domain. Sufficient power is allocated to the first-stage voltage regulator (VR), which is on rest of the blocks that the graphics domain the motherboard, converts from the power depends on for performance, such as the sys- supply or battery voltage (12 to 20 V) to tem agent to provide memory bandwidth. approximately 1.8 V, and the second conver- At a platform level, we improved battery sion stage is provided by parallel FIVRs (one life to deliver “all-day experiences.” To achieve for each major architectural domain). As this, we focused both on active workloads, illustrated, FIVR eliminates four VRs from such as media playback, and on idle power. the prior platform. To support the new Intel Haswell achieves a 20Â improvement in Iris Pro Graphics variants of Haswell, those idle power. Haswell has evolutionary power- platform VRs would have grown in both size management improvements, such as improve- and number. With FIVR, a platform-size ments in C-states (CPU idle states). Haswell reduction opportunity was achieved instead has both new, deeper C-states and improve- of what would have been a substantial ments in the entry-exit latencies to C-states. growth. That platform space can be used to These latency improvements let Haswell more add platform features, increase the battery aggressively enter deep C-states. size, and reduce the platform dimensions in Haswell also has revolutionary power- many Haswell mobile products. management improvements—for example, At the onset of the Haswell design, FIVR’s the introduction of a new active idle-power expected benefits fell into half a dozen state, S0ix. We leverage learnings from past categories: phone and tablet development to deliver 20Â Battery life increase. FIVR’s 140-MHz improvements in idle power compared to the switching frequency enables several prior generation. This improvement enables orders of magnitude less output decou- significant improvements in realizable battery pling and much lower input decoupling life. S0ix appears to software as an active state, than the prior generation’s voltage rails, while in actuality the hardware autonomously allowing input and output voltages to enters and exits deep idle states with low be quickly reduced or powered off to latency. The new power state is transparent to save power, and quickly ramped back well-written software. Power management of up for brief high-performance bursts. platform components is continuous and fine Increased available power for in- grained; everything that is not needed is indi- creased burst performance, where vidually turned off. FIVR can direct the entire package Fully integrated voltage regulator power to the unit that needs the