Nehalem Client
Total Page:16
File Type:pdf, Size:1020Kb
Transitioning the I ntel® Next Generation Microarchitectures (Nehalem and Westmere) into the Mainstream Lily Looi, Stephan Jourdan Intel Corporation Hot Chips 21 August 24, 2009 11 Agenda • Next Generation Mainstream CPU’s • New Technologies for Integration for 2009 and beyond 22 2 Intel® Core™ i7 Recap • Core microarchitecture Misc IO – Increased parallelism – e.g. 33% larger out of order window, handle more cache misses simultaneously Core – Enhanced algorithms – e.g. faster “unaligned” cache accesses, faster sync primitives, loop streaming detector, macro-fusion – Better branch prediction Core – e.g. 2nd level branch predictor, renamed RSB – New Instructions (SSE4) Shared – Intel® Hyper-Threading Technology Queue L3 • Uncore microarchitecture and connectivity Cache – Scalable multi-core fabric Core – Shared last level Cache Memory Controller – Integrated memory controller – Intel® QuickPath Interconnect • Power management technologies Core – PCU Microcontroller – Intel® Turbo Boost Technology Misc IO QPI 0 – Integrated power gates QPI: Intel® QuickPath Interconnect (Intel® QPI) 33 3 Enabling Nehalem for Every Segment 2008 2009+ 4 Cores / 8 Threads 2 Cores / 4 Threads with I ntegrated Graphics Mainstream Desktop Thin & Light Laptop 45 nm High-K 32nm High-K Delivering Outstanding Nehalem Performance to Mainstream Desktops and Laptop Computers 44 44 Intel® Microarchitecture codenamed Nehalem Mainstream Platform Partitioning Intel® Core™2 Processor Nehalem/ Westmere based based 3-Chip Solution 2-Chip Solution Processor Graphics moves into Processor1 Processor PCIe* iGFX IMC Graphics Memory Controller FSB moves into the PCIe* Intel® 4 Series Chipset DDR3 Processor Graphics Intel® Flexible iGFX IMC DDR2/3 Display I nterface DMI (Intel® FDI) Display ME DISPLAY ® DMI Intel 5 series Display moves into Chipset Intel® 5 series chipset ICH10 (Ibex Peak) DISPLAY Display ME ® Clock Intel Manageability Clocks I/O Engine moves into I ntel® Buffer 5 series chipset (I bex Peak) I/O Greater Performance and Lower Power through I ntegration 1. Integrated graphics on Clarkdale/Arrandale 55 Intel® Microarchitecture codenamed Nehalem 5 Westmere: 32nm version of Intel® microarchitecture codename Nehalem Mainstream Microprocessors Lynnfield/Clarksfield Clarkdale / Arrandale 45nm processor (single die) DDR3 DDR3 32nm Westmere processor Processor Discrete Graphics Graphics core ® DMI or 1x16 or 2x8, Intel Discrete / Gen2 FDI Switchable DMI Graphics 45nm iGfx Intel® 5 series Intel® 5 series Manageability, Security, Chipset Chipset Display, PCIe, SATA, etc. Discrete graphics Integrated or discrete graphics 66 Agenda • Next Generation Mainstream CPU’s • New Technologies for Integration for 2009 and beyond Integrated Power Gates Intel® Turbo Boost Technology Intel® Hyper-Threading Technology Energy Efficient performance Single Thread Multi-threads 77 77 I ntegrated Power Gates • Integrated Power Gates (switches) are critical for integration, turning individual component blocks on/off – Zero leakage power, low latency to wake block – Key benefits in both idle and active power VCC • Nehalem turns individual cores on/off – Transparent to OS Core0 Core1 Core2 Core3 – Reduces latency to wake a core – Modular/Scalable Clocking Memory System, Cache, I/O – Cores, Memory System, I/O can run at independent voltage/frequency VTT • Extended in 2009 platforms as Integrated Power Gates also used in shared cache and I/O logic to dynamically power down when inactive I ntegrated Power Gates enable Energy Efficient I ntegration 88 8 Intel® Microarchitecture codenamed Nehalem Intel® Turbo Boost Technology • Integration splits power allocation among more component blocks • Intel® Turbo Boost Technology is critical to dynamically m anage power allocation and seamlessly maximize performance – Higher benefits in smaller form factors Lynnfield/ Clarksfield Awake: Lower core activity Sleep Higher Frequency Core 1 Core 2 Core 3 Core 4 Dynamically Scaled Performance Boost 99 9 Intel® Hyper-Threading Technology • Nehalem is a scalable multi-core architecture • Hyper-Threading Technology augments benefits – Power-efficient way to boost performance in all form factors: higher multi-threaded performance, faster multi-tasking response Hyper-Threading Multi-cores Without HT Shared or Technology Partitioned Replicated Replicated Register State XX Return Stack XX Reorder Buffer XX Instruction TLB XX Reservation Stations XX Cache (L1, L2) XX With HT Data TLB XX Technology Execution Units XX • Next generation Hyper-Threading Technology: – Low-latency pipeline architecture – Enhanced cache architecture – Higher memory bandwidth Enables 8-way processing in Quad Core systems, 4-way processing in Small Form Factors 1010 10 Intel® Microarchitecture codenamed Nehalem Energy Efficient Performance • Many innovations in energy efficiency such as loop-streaming detector and dynamic loadline • As looping is very common to every type of applications, Nehalem loop- streaming detector captures bigger loops and saves more energy II ntel® ntel® CoreCore™™2i7 Processor Processor Pipeline Pipeline BranchBranch Queue Queue FetchFetch Decode DecodeDecode ExecuteExecute PredictionPrediction (Loop) (Loop) V P = V x I Worst case • Dynamic only loadline (PCU) Current cdtn I – Power = Voltage x Current – In prior processors, voltage line is anchored based on worst case – Nehalem lowers Voltage based on current conditions: # active cores, temperature, and saves more energy. Major I nnovations in Energy Efficiency 1111 11 Intel® Microarchitecture codenamed Nehalem Summary • I ntel maintains pace of innovation and execution – Next generation performance – 32nm: Another Process Technology Breakthrough • Enabling Nehalem for every segment – Delivering outstanding Nehalem performance to mainstream desktops and laptop computers • Redesigning more efficient platforms – Best performance across all segments – Low power and better power management – Higher levels of integration 1212 12 Intel® Microarchitecture codenamed Nehalem Q&A 1313 13 Backup 1414 14 Lynnfield/ Clarksfield Microarchitecture ® • Built on modularity of Intel Core™ i7 Thread -E Core Thread • Further integration to support new mainstream Graphics Thread PCI platforms: Core Thread Power 8M Thread – VTd and IO virtualization support Thread DDR3 Core – PCI Express* interface Thread DDR3 I MC Core Thread – x16 PCIe configurable to 2x8 DMI – 2.5 GT/s (Gen1) and 5 GT/s (Gen2) – Flexible interface: lane reversal, dynamic speed and I ntel® 5 series link width changes, peer to peer posted writes Chipset – Power Optimization: L0s/L1 support, low-voltage swing mode and de-emphasis – x4 DMI Interface – Series 5 enhancements – Extended power management 1515 15 I ntegrated Graphics and Media Architecture (Clarkdale, Arrandale) VS/GS • Unified Shader Architecture Vertex Setup/Rasterize Fetch – Evolution of G965 & GM965 Add’l 3D Fixed Fxn – DX10 & Shader Model 4.0 in HW Shaded Vertices Unified Execution Unit Array – Full HD Decode, High Quality Video – 6 threads/EU EU EU • • • Mathbox Unit – Hierarchical Depth Buffer Texture EU EU • • • • Dynamic load balanced Mathbox Cache • Multi-functional; multi-threaded EU EU • • • Pixel backend • Enables scalability and flexibility Mathbox MC/LF in EUs • Improved Extended Math, larger Dedicated HD Decode Pipelines caches MPEG2 VLD IT MC VC1 VLD IT MC LF Post Process AVC VLD IT MC LF 1616 16 Legal Disclaimer • INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR I MPLI ED, BY ESTOPPEL OR OTHERWI SE, TO ANY I NTELLECTUAL PROPERTY RI GHTS I S GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/ OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATI NG TO FI TNESS FOR A PARTI CULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRI GHT OR OTHER I NTELLECTUAL PROPERTY RI GHT. I NTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS. • Intel may make changes to specifications and product descriptions at any time, without notice. • All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. • Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. • Nehalem, Lynnfield, Clarkdale, Clarksfield, Arrandale, Westmere, Ibex Peak and other code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licensees and other third parties are not authorized by Intel to use code names in advertising, promotion or marketing of any product or services and any such use of Intel's internal code names is at the sole risk of the user • Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. • Intel, Core and the Intel logo are trademarks of Intel Corporation in the United States and other countries. • * Other names and brands may be claimed as the property of others. • Copyright © 2009 Intel Corporation. • Intel® Active Management Technology