Barcelona Power Efficient Design Enhancements
Total Page:16
File Type:pdf, Size:1020Kb
Quad-core Press Briefing Giuseppe Amato Director, Technical Marketing AMD Do Not Not Distribute Distribute UNDER EMBARGO: EMBARGO: Until May Until 14, 2007,May 12:0114, 2007, a.m. ET 12:01 a.m. ET Agenda AMD Client Computing Innovation and Milestones Quad Core Performance/Watt Features on Server and on Desktop: Why true quad core matters Desktop Momentum At a Glance New Desktop Family Introduction Upcoming Desktop Processor Positioning Summary Do Not Distribute UNDER EMBARGO: Until May 14, 2007, 12:01 a.m. ET AMD Client Computing Innovation AMDAMD ContinuesContinues toto RaiseRaise thethe BarBar onon 2005 AwardAward WinningWinning AMD64AMD64 Innovation,Innovation, PerformancePerformance andand –––––––––– HyperTransport™HyperTransport™ TechnologyTechnology FeaturesFeatures inin x86x86 ProcessorsProcessors –––––––––– 2000 DirectDirect ConnectConnect ArchitectureArchitecture AMDAMD Athlon™Athlon™ ProcessorsProcessors WithWith –––––––––– IndustryIndustry LeadingLeading PerformancePerformance DesignedDesigned ForFor ––––––– ––––––– DualDual CoreCore Great Integer & FP Performance AMD’s Highly Competitive K6s Great Integer & FP Performance –––––––––– 1990 ––––––– –––––––––––––– Cool’n’Quiet™Cool’n’Quiet™ AMD’s 1st AMD K6®: Faster, Smaller, QuantiSpeedQuantiSpeed ArchitectureArchitecture –––––––––– Lower Power –––––––––––––– EnhancedEnhanced VirusVirus AMD Takes x86 To The Max Ground Up ––––––– ProtectionProtection ––––––– AMD K6-2: 1st Processor TrueTrue PerformancePerformance IndexIndex ModelModel x86 Design ––––– 80286 Extended To 20MHz with 3DNow! NumbersNumbers ––––– ––––– Enthusiast,Enthusiast, ––––––– ––––––– –––––––––––––– ® Worlds 1st AMD Mobile K6-2: Performance,Performance, Am386 Extended To 80MHz AdvancedAdvanced 3DNow!3DNow! andand SSESSE ––––––– Intro of PowerNow! MainstreamMainstream andand ValueValue x86 RISC ––––––– –––––––––––––– Am486® Extended To 133MHz BrandsBrands Processor AMD K6-III: 1st x86 PerformancePerformance andand ValueValue BrandsBrands with Integrated L2 286 386 486 AMD K5 3 Month ##, 200# Real-Time Tessellation on ATI Radeon HD2000 Series Do Not Distribute UNDER EMBARGO: Until May 14, 2007, 12:01 a.m. ET What’s Next for AMD? Mid-2007 — Quad-Core AMD Opteron™ Processors More than just four cores y Significant CPU Core Enhancements Core 1 Core 2 Core 3 Core 4 y Significant Cache Enhancements 512KB 512KB 512KB 512KB L2 Cache L2 Cache L2 Cache L2 Cache World-class performance goals y Native Quad-Core 2MB L3 Cache – Faster data sharing between cores System Request Interface y AMD Virtualization™ enhancements – Nested paging acceleration for virtual environments Crossbar Switch Reducing total cost of ownership HyperTransport™ DDR2 y Performance/Watt leadership technology Link 1 Link 2 Link 3 72 bit – Consistent 95W thermal design point 72 bit – Low power 68W solutions y Drop-in upgrade HyperTransport™ technology – Socket F compatibility – BIOS upgrade 10.7GB/s@ DDR2-667 links provide up to 24GB/s peak – Leverage existing platform infrastructure Bandwidth per processor y Common Core Architecture Quad-Core – One core technology top-to-bottom AMD Opteron™ – Top-to-bottom platform feature consistency Processor Design for Socket F (1207) 4 Do Not Distribute UNDER EMBARGO: Until May 14, 2007, 12:01 a.m. ET AMD Quad-Core Processor Architecture A Closer Look at Barcelona Comprehensive Upgrades Enhanced for SSE128 Virtualization Quadruples floating-point New “Nested Paging” capabilities feature designed for near native performance on virtualization applications New Highly Efficient Cache Structure including a Advanced Power shared L3 Management Balance of dedicated Provides granular and shared cache for power management optimal Quad-Core resulting in improved performance power efficiency Enhanced DRAM Controller CPU Cores Enhancements Benefits all applications Specifically tuned for by improving the Quad-Core memory overall efficiency and accesses, improves performance of the cores overall memory performance 5 Do Not Distribute UNDER EMBARGO: Until May 14, 2007, 12:01 a.m. ET Barcelona … Not Just Four Cores Comprehensive 128-bit SSE Upgrades 64-bit Intel AMD Platforms Clovertown Barcelona Goal: Balanced SSE Execution 1x 2x 2x Instruction Fetch Bandwidth 1x 1x 2x Data Cache Bandwidth 1x 1x 2x L2/NB Bandwidth 1x 2x 2x Barcelona doubles Instruction and Data delivery … Intel’s pipeline doesn’t • Helps keep our 128-bit SSE pipeline full for optimal performance Dedicated 36-entry floating-point scheduler helps reduce application latency • Intel’s 32-entry scheduler is shared between floating-point and integer operations Over 80% performance boost, per core, on target applications! 6 Do Not Distribute UNDER EMBARGO: Until May 14, 2007, 12:01 a.m. ET Native Quad-Core Benefit: Faster Data Sharing Situation: Core 1 needs data in Core 3 cache … How Does it Get There? Native Quad-Core AMD Opteron™ Quad-Core Clovertown Core 1 Core 2 Core 3 Core 4 Core 1 Core 2 Core 3 Core 4 100011 100011 L2 L2 L2 L2 L2 L2 L3 Front-Side Bus Front-Side Bus System Request Queue Crossbar Hyper Transport™ Memory Controller Memory Controller Northbridge 1. Core 1 sends a request to the memory 1. Core 1 probes Core 3 cache, data is controller, which probes Core 3 cache copied directly back to Core 1 2. Core 3 sends data back to the memory controller, which forwards it to Core 1 This happens at processor frequency This happens at front-side bus frequency Result: Improved Result: Reduced Quad-Core Performance Quad-Core Performance 7 Do Not Distribute UNDER EMBARGO: Until May 14, 2007, 12:01 a.m. ET Quad-Core AMD Opteron™ Processor Performance Projections Estimated 21% gain Estimated 50% gain over Xeon 5355 over Xeon 5355 Native quad-core design and enhanced processor features translates into superior performance across floating point and database applications Do Not Distribute UNDER EMBARGO: Until May 14, 2007, 12:01 a.m. ET AMD Opteron™ Processor Power Efficiency A structured approach Finally, enable AMD PowerNow!™ technology to help dynamically manage power based on day to AMD day usage PowerNow!™ Next integrate these technology processors into platforms from leading system builders that focus on Energy- power efficiency efficient Platforms Start with processors specifically designed to reduce power consumption Energy-efficient Processor Design 9 Do Not Distribute UNDER EMBARGO: Until May 14, 2007, 12:01 a.m. ET Improving Processor Power Management with Enhanced AMD PowerNow!™ Technology “GOOD” “GREAT” ‘Barcelona’ 75% 35% 75% 35% M M M LE H LE H LE MH LE H ID z ID z D z ID z I CORE 0 CORE 1 CORE 0 CORE 1 Current AMD Opteron™ 10% 1% E M E M DL Hz DL Hz Dual-core processors I CORE 2 I CORE 3 MHz and voltage is locked to MHz is independently adjusted highest utilized core’s p-state separately per core. Voltage is locked to highest utilized core’s p-state Native Quad-Core technology enables enhanced power management across all four cores 10 Do Not Distribute UNDER EMBARGO: Until May 14, 2007, 12:01 a.m. ET Reducing power through logic design Increased the amount of clock gating in the design – Automatically shuts down areas of logic when not in use to further reduce processor power consumption Increased Coarse Gaters – shuts down entire block of logic at a time Example only: does not reflect actual areas of clock gating Increased Fine Gaters Reducing power consumption – Shuts down pieces of logic when is high priority in AMD appropriate processor designs 11 Do Not Distribute UNDER EMBARGO: Until May 14, 2007, 12:01 a.m. ET AMD Virtualization™ versus Intel VT VMVM VMVM VMVM VMVM VMVM VMVM VMVM VMVM VMVM VMVM VMVM VMVM VM VM VM VM VM VM CPU CPU CPU VM VM VM VM VM VM CPUCPU CPU CPU CPU CPU CPU CPU CPU Memory Memory Memory Memory Controller Controller Controller Controller MemoryMemory Controller Controller Hub Hub VMVM VMVM VMVM VMVM VMVM VMVM CPU CPU CPU CPU Memory Memory Memory Memory Controller Controller Controller Controller Shared memory can create bottlenecks Dedicated memory for scalability • Shared front-side bus can decrease application • Direct Connect Architecture helps improve performance within a virtual machine application performance within a virtual machine • Untagged TLB means less efficient switching • Tagged TLB means more efficient switching between between virtual machines virtual machines • Software-based memory management and security • Hardware-based memory management and security (via external Memory Controller Hub) can reduce (Integrated memory controller with DEV) can overall virtualization performance and efficiency improve overall virtualization performance and efficiency 12 Do Not Distribute UNDER EMBARGO: Until May 14, 2007, 12:01 a.m. ET Device Exclusion Vector (DEV) VMVM 1 1 VMVM 2 2 VMVM 3 3 VMVM 4 4 VMVM 5 5 VMVM 6 6 • DEV lets the Hypervisor (VMM) know if a device is HypervisorHypervisor (VMM)(VMM) allowed to access a page of memory or not • So DEV improves virtualization security by Core1Core1 CoreCore 22 denying memory accesses for unauthorized requests Memory Controller • For example: DEV Table HTHT 11 Requests page 28, VMVM 1 1 VM Pages Owned access is granted …quickly VM 1 VM 1 1, 5, 11, 28, 29 Requests page 25, VM 5 VMVM 2 2 VM 5 access is denied …quickly 3, 9, 15, 20, 27 HTHT 22 VMVM 3 3 4, 7, 13, 22, 25 • Xeon can do this, but it happens in software … VMVM 4 4 so it happens slower 8, 12, 19, 21, 30 VMVM 5 5 • Only processors with an Integrated Memory VM 6 2, 10, 16, 23, 26 VM 6 HTHT 33 Controller offer this benefit 6, 14, 17, 18, 24 Do Not Distribute UNDER EMBARGO: Until May 14, 2007, 12:01 a.m. ET