Intel® Core™ Microarchitecture • Wrap Up
Total Page:16
File Type:pdf, Size:1020Kb
EW N IntelIntel®® CoreCore™™ MicroarchitectureMicroarchitecture MarchMarch 8,8, 20062006 Stephen L. Smith Bob Valentine Vice President Architect Digital Enterprise Group Intel Architecture Group Agenda • Multi-core Update and New Microarchitecture Level Set • New Intel® Core™ Microarchitecture • Wrap Up 2 Intel Multi-core Roadmap – Updates since Fall IDF 3 Ramping Multi-core Everywhere 4 All products and dates are preliminary and subject to change without notice. Refresher: What is Multi-Core? Two or more independent execution cores in the same processor Specific implementations will vary over time - driven by product implementation and manufacturing efficiencies • Best mix of product architecture and volume mfg capabilities – Architecture: Shared Caches vs. Independent Caches – Mfg capabilities: volume packaging technology • Designed to deliver performance, OEM and end user experience Single die (Monolithic) based processor Multi-Chip Processor Example: 90nm Pentium® D Example: Intel Core™ Duo Example: 65nm Pentium D Processor (Smithfield) Processor (Yonah) Processor (Presler) Core0 Core1 Core0 Core1 Core0 Core1 Front Side Bus Front Side Bus Front Side Bus *Not representative of actual die photos or relative size 5 Intel® Core™ Micro-architecture *Not representative of actual die photo or relative size 6 Intel Multi-core Roadmap 7 Intel Multi-core Roadmap 8 Intel® Core™ Microarchitecture Based Platforms Platform 2006 20072007 Caneland Platform (2007) MP Servers Tigerton (QC) (2007) Bensley Platform (Q2’06)/ Glidewell Platform (Q2’06) ) DP Servers/ Woodcrest (Q3’06) DP Workstation Clovertown (QC) (Q1’07) Kaylo Platform (Q3’06)/ Wyloway Platform (Q3 ’06) UP Servers/ Conroe (Q3’06) UP Workstation Kentsfield (QC) (Q1’07) Bridge Creek Platform (Mid’06) Desktop -Home Conroe (Q3’06) Kentsfield (QC) (Q1’07) Desktop -Office Averill Platform (Mid’06) Conroe (Q3’06) Mobile Client Napa Platform (Q1’06) Merom (2H’06) All products and dates are preliminary 9 Note: only Intel® Core™ microarchitecture QC refers to Quad-Core and subject to change without notice. based processors listed Intel® Core™ Microarchitecture Performance Delivering both industry leading performance and performance/watt • Conroe: >40% improvement in performance1 & >40% reduction in power2 – As compared to today’s high-end Pentium® D processor 950 (formerly Presler) • Woodcrest: >80% improvement in performance1 and > 35% reduction in power2 – As compared to today’s high-end Dual-Core Intel® Xeon® processor 2.8GHz (formerly Paxville DP) • Merom: Extends the already significant performance and performance/watt leadership delivered with today's Intel® Core™ Duo processor with greater than 20% additional performance1 improvement – As compared to today’s high-end Intel® Core™ Duo processor (formerly Yonah) 1 - Estimated SPECint*_rate_base2000 2 – Expected reduction in TDP 10 Agenda • Multi-core Update and New Micro-architecture level set • New Intel® Core™ Microarchitecture • Summary 11 InsideInside thethe IntelIntel® CoreCore™™ MicroarchitectureMicroarchitecture 12 AgendaAgenda – Multi-core Update and New Micro-architecture level set – New Intel® Core™ Microarchitecture – Intel Microarchitecture History – Intel® Core™ Microarchitecture Design Goals and Roadmap – Processor Architecture 101 – Intel® Core™ Microarchitecture – Software Implications – Wrap Up 13 MicroarchitectureMicroarchitecture HistoryHistory 14 NewNew MicroarchitectureMicroarchitecture ComingComing inin 20062006 15 AgendaAgenda – Multi-core Update and New Micro-architecture level set – New Intel® Core™ Microarchitecture – Intel Microarchitecture History – Intel® Core™ Microarchitecture Design Goals – Processor Architecture 101 – Intel® Core™ Microarchitecture – Software Implications – Wrap Up 16 IntelIntel® CoreCore™™ Microarchitecture:Microarchitecture: DesignDesign GoalsGoals y Deliver world class performance combined with superior energy/power efficiency – Existing and emerging applications and uses – Greater performance and performance/watt – Optimized for Intel Multi-core platforms y Deliver single foundation for optimized processors across each segment and power envelope – Optimized for mobile, desktop and server segments Driving Performance and Performance/Watt Leadership 17 AgendaAgenda – Multi-core Update and New Micro-architecture level set – New Intel® Core™ Microarchitecture – Intel Microarchitecture History – Intel® Core™ Microarchitecture Design Goals – Processor Architecture 101 – Intel® Core™ Microarchitecture – Software Implications – Wrap Up 18 ProcessorProcessor ArchitectureArchitecture 101101 DeliveredDeliveredDelivered PerformancePerformancePerformance === FrequencyFrequencyFrequency *** InstructionsInstructionsInstructions PerPerPer CycleCycleCycle (IPC)(IPC)(IPC) Goal is higher performance and lower power PowerPower αα CC ** VV ** VV ** FrequencyFrequency Power α Cdyndynaamicmic * V * V * Frequency Cdynamic is roughly a product of area and activity “how many bits” * “how much do they toggle” 19 ProcessorProcessor ArchitectureArchitecture 101101 DeliveredDeliveredDelivered PerformancePerformancePerformance === FrequencyFrequencyFrequency *** InstructionsInstructionsInstructions PerPerPer CycleCycleCycle (IPC)(IPC)(IPC) Frequency is proportional to voltage, so frequency reduction coupled with voltage reduction results in cubic reduction in power. PowerPower αα CC ** VV ** VV ** FrequencyFrequency Power α Cdyndynaamicmic * V * V * Frequency 20 ProcessorProcessor ArchitectureArchitecture 101101 DeliveredDeliveredDelivered PerformancePerformancePerformance === FrequencyFrequencyFrequency *** InstructionsInstructionsInstructions PerPerPer CycleCycleCycle (IPC)(IPC)(IPC) Higher IPC usually results in wider data paths and/or more speculation : directly increasing C dynamic PowerPower αα CC ** VV ** VV ** FrequencyFrequency Power α Cdyndynaamicmic * V * V * Frequency 21 AgendaAgenda – Multi-core Update and New Micro-architecture level set – New Intel® Core™ Microarchitecture – Intel Microarchitecture History – Intel® Core™ Microarchitecture Design Goals – Processor Architecture 101 – Intel® Core™ Microarchitecture – Software Implications – Wrap Up 22 Instructionon Fetch IntelIntel® CoreCore™™ and PreDecode and PreDecode MicroarchitectureMicroarchitecture Instruction Queue 2M/4M 5 shared L2 uCodeuCode ROMROM Decodeode Cache 4 up to Rename/Alloc 10.4 Gb/s FSB Block Diagram Retirement Unit Walkthrough 4 (ReOrder Buffer)er) Schedulers ALU ALU ALU Branch FAdd FMuull MMX/SSE MMX/SSE MMMMXX//SSSSEE LoadLoad StStoorere FPmove FPmove FPFPmovmove L1 D-Cache and D-TLB 23 Instructionon Fetch IntelIntel® CoreCore™™ and PreDecodPreDecodee MicroarchitectureMicroarchitecture Instruction Queue 2M/4M 5 shared L2 uCodeuCode ROMROM Decodeode Cache inin orderorder 4 up to Rename/Alloc 10.4 Gb/s instructioninstruction fetchfetch FSB instructioninstruction decodedecode Retirement Unit 4 (ReOrder Buffer)er) micromicro--opop renamerename Schedulers micromicro--opop allocateallocate ALU ALU ALU Branch FAdd FMuull MMX/SSE MMX/SSE MMMMXX//SSSSEE LoadLoad StStoorere FPmove FPmove FPFPmovmove L1 D-Cache and D-TLB 24 Instructionon Fetch IntelIntel® CoreCore™™ and PreDecode and PreDecode MicroarchitectureMicroarchitecture Instruction Queue 2M/4M 5 shared L2 uCodeuCode ROMROM Decodeode Cache outout ofof orderorder 4 up to Rename/Alloc 10.4 Gb/s micromicro--opop scheduleschedule FSB micromicro--opop executeexecute Retirement Unit 4 (ReOrder Buffer)er) Schedulers ALU ALU ALU Branch FAdd FMuull MMX/SSE MMX/SSE MMMMXX//SSSSEE LoadLoad StStoorere FPmove FPmove FPFPmovmove L1 D-Cache and D-TLB 25 Instructionon Fetch IntelIntel® CoreCore™™ and PreDecodPreDecodee MicroarchitectureMicroarchitecture Instruction Queue 2M/4M 5 shared L2 uCodeuCode ROMROM Decodeode Cache outout ofof orderorder 4 up to Rename/Alloc 10.4 Gb/s memorymemory pipelinespipelines FSB Retirement Unit 4 (ReOrder Buffer)er) memorymemory orderorder unitunit maintains architectural Schedulers maintains architectural ALU ALU ALU orderingordering requirementsrequirements Branch FAdd FMuull MMX/SSE MMX/SSE MMMMXX//SSSSEE LoadLoad StStoorere FPmove FPmove FPFPmovmove L1 D-Cache and D-TLB 26 Instructionon Fetch IntelIntel® CoreCore™™ and PreDecodPreDecodee MicroarchitectureMicroarchitecture Instruction Queue 2M/4M 5 shared L2 uCodeuCode ROMROM Decodeode Cache inin orderorder 4 up to Rename/Alloc 10.4 Gb/s micromicro--opop retirementretirement FSB faultfault handlinghandling Retirement Unit 4 (ReOrder Buffer)er) Schedulers RetirementRetirement UnitUnit ALU ALU ALU maintainsmaintains illusionillusion Branch FAdd FMuull MMX/SSE MMX/SSE MMMMXX//SSSSEE LoadLoad StStoorere FPmove FPmove FPFPmovmove ofof inin orderorder L1 D-Cache and D-TLB instructioninstruction retirementretirement 27 Instructionon Fetch IntelIntel® CoreCore™™ and PreDecodPreDecodee MicroarchitectureMicroarchitecture Instruction Queue 2M/4M 5 shared L2 uCodeuCode Wide Dynamic Execution ROMROM Decodeode Cache 4 up to Advanced Digital Rename/Alloc 10.4 Gb/s Media Boost FSB Retirement Unit 4 Smart Memory Access (ReOrder Buffer)er) Schedulers ALU ALU ALU Advanced Smart Cache Branch FAdd FMuull MMX/SSE MMX/SSE MMMMXX//SSSSEE LoadLoad StStoorere FPmove FPmove FPFPmovmove Intelligent Power Capability L1 D-Cache and D-TLB New, State-of-the-Art, Microarchitecture 28 Instructionon Fetch and PreDecodPreDecodee WideWide DynamicDynamic ExecutionExecution Instruction Queue 2M/4M 5 shared L2 uCodeuCode ROMROM Decodeode Cache Start with Instruction Fetch 4 up to Rename/Alloc four(+) instructions / cycle 10.4 Gb/s FSB >33%