Intel Multi-Core Presentation

Total Page:16

File Type:pdf, Size:1020Kb

Intel Multi-Core Presentation MultiMulti--CoreCore MicroprocessorMicroprocessor Chips:Chips: MotivationMotivation && ChallengesChallenges DileepDileep Bhandarkar,Bhandarkar, Ph.Ph. D.D. Architect at Large Digital Enterprise Group Intel Corporation May 2006 Copyright © 2006 Intel Corporation. 2006 Intel Distinguished Lecture Agenda yy SemiconductorSemiconductor TechnologyTechnology EvolutionEvolution yy DesignDesign ChallengesChallenges yy WhyWhy MultiMulti--CoreCore ProcessorProcessor Chips?Chips? yy Power/PerformancePower/Performance TradeTrade--OffsOffs yy CMPCMP DirectionsDirections yy BeyondBeyond CMPCMP yy SummarySummary ©2006, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others www.intel.com/education 2006 Intel Distinguished Lecture IntelIntel only:only: OnOn--timetime ““22--yearyear--cyclecycle”” 180nm 130nm 90nm 65nm 45nm Wafer Size (mm): 200 200/300 300 300 300 1st Production: 1999 2001 2003 2005 2007 Transistors: SiGe SiGe Interconnects: 100nm LG 70nm LG 50nm LG 35nm LG Details CoSi2 CoSi2 NiSi NiSi Coming! Strain Si Strain Si 6 Al 6 Cu 7 Cu 8 Cu SiOF SiOF Low-k Low-k 4545 nmnm LogicLogic ProcessProcess onon TrackTrack forfor DeliveryDelivery inin 20072007 Process Name P1262 P1264 P1266 P1268 Lithography 90 nm 65 nm 45 nm 32 nm 1st Production 2003 2005 2007 2009 Moore'sMoore's LawLaw continues!continues! IntelIntel continuescontinues toto developdevelop aa newnew technologytechnology generationgeneration everyevery 22 yearsyears Intel 11th EMEA Academic Forum HistoricalHistorical DrivingDriving ForcesForces Increased Performance Shrinking Geometry via Increased Frequency 100000 10 10000 1 1000 Feature Frequency Size (MHz) 100 (um) 0.1 10 1 0.01 1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020 1971 1978 1985 1993 2005 4004 Processor 8008 Processor i386 Processor Pentium Processor Montecito 2300 Transistors IBM PC 32-bit 3.1M transistors 1.7B Transistors TheThe ChallengesChallenges Power Limitations Diminishing Voltage Scaling 1000 10 0.0.7um7um 0.0.5um5um 0.0.35um35um 0.0.25um25um 0.0.18um18um ~30% 0.0.13um13um CPU Supply 90nm90nm 100 1 65nm65nm Power Voltage 45nm45nm (W) (V) 30nm30nm 10 0.1 1990 1995 2000 2005 2010 2015 1990 1993 1997 2001 2005 2009 Power = Capacitance x Voltage2 x Frequency also Power ~ Voltage3 Agenda yy SemiconductorSemiconductor TechnologyTechnology EvolutionEvolution yy DesignDesign ChallengesChallenges yy WhyWhy MultiMulti--CoreCore ProcessorProcessor Chips?Chips? yy Power/PerformancePower/Performance TradeTrade--OffsOffs yy CMPCMP DirectionsDirections yy BeyondBeyond CMPCMP yy SummarySummary ©2005, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others www.intel.com/education 2006 Intel Distinguished Lecture DesignDesign ChallengesChallenges yy MemoryMemory latencylatency notnot scalingscaling asas fastfast asas processorprocessor speedspeed yy PowerPower growinggrowing nonnon--linearlylinearly withwith singlesingle threadthread performanceperformance yy DesignerDesigner productivityproductivity lagginglagging designdesign complexitycomplexity yy AbilityAbility toto validatevalidate andand testtest complexcomplex designdesign yy KeepingKeeping upup withwith newnew processprocess technologytechnology everyevery twotwo yearsyears www.intel.com/education 2006 Intel Distinguished Lecture LongLong LatencyLatency DRAMDRAM Accesses:Accesses: NeedsNeeds LatencyLatency TolerantTolerant TechniquesTechniques 1400 1200 1000 800 600 ions during DRAM Access ions during DRAM Access 400 200 Peak Instruct Peak Instruct 0 Pentium® Pentium-Pro Pentium III Pentium 4 Future CPUs Processor Processor Processor Processor 66 MHz 200 MHz 1100 MHz 2 GHz www.intel.com/education 2006 Intel Distinguished Lecture DRAMDRAM LatencyLatency ToleranceTolerance yy ContinueContinue buildingbuilding eveneven largerlarger cachescaches – Every semiconductor process generation provides opportunity to double cache size – Cache becomes larger part of die yy HideHide multiplemultiple threadsthreads ofof executionexecution behindbehind memorymemory latencylatency yy IntelIntel implementedimplemented simultaneoussimultaneous multimulti-- threadingthreading inin 20002000 yy ImplementImplement multimulti--corecore productsproducts asas MooreMoore’’ss LawLaw allowsallows www.intel.com/education 2006 Intel Distinguished Lecture Agenda yy SemiconductorSemiconductor TechnologyTechnology EvolutionEvolution yy DesignDesign ChallengesChallenges yy WhyWhy MultiMulti--CoreCore ProcessorProcessor Chips?Chips? yy Power/PerformancePower/Performance TradeTrade--OffsOffs yy CMPCMP DirectionsDirections yy BeyondBeyond CMPCMP yy SummarySummary ©2005, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others www.intel.com/education 2006 Intel Distinguished Lecture SituationalSituational AnalysisAnalysis y With Each Process Generation transistor density doubles – Frequency has increased by ~1.5X; ~1.3x in future – Vcc has scaled by about ~0.8x; ~0.9x in future – Capacitance has scaled by 0.7x – Total power may not scale down due to increased leakage y Instruction Level Parallelism harder to find y Increasing single-stream performance often requires non-linear increase in design complexity y Many server applications are inherently parallel y Parallelism exists in multimedia applications y Multi-tasking usage models becoming popular www.intel.com/education 2006 Intel Distinguished Lecture ProcessorProcessor PowerPower www.intel.com/education 2006 Intel Distinguished Lecture DesignDesign ComplexityComplexity andand ProductivityProductivity factorsfactors yy HugeHuge transistortransistor budgetsbudgets stressstress abilityability toto designdesign andand verifyverify complexcomplex chipschips yy MultiMulti--corecore fitsfits wellwell withwith increasingincreasing transistortransistor budgetsbudgets yy MultiMulti--corecore designdesign addressesaddresses density/designerdensity/designer gapgap www.intel.com/education 2006 Intel Distinguished Lecture Agenda yy SemiconductorSemiconductor TechnologyTechnology EvolutionEvolution yy DesignDesign ChallengesChallenges yy WhyWhy MultiMulti--CoreCore ProcessorProcessor Chips?Chips? yy Power/PerformancePower/Performance TradeTrade--OffsOffs yy CMPCMP DirectionsDirections yy BeyondBeyond CMPCMP yy SummarySummary ©2005, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others www.intel.com/education 2006 Intel Distinguished Lecture IronIron LawLaw ofof PerformancePerformance yy ExecutionExecution TimeTime isis thethe productproduct ofof –– PathPath LengthLength –– CyclesCycles PerPer InstructionInstruction (CPI)(CPI) –– CycleCycle TimeTime yyCPICPI isis thethe sumsum ofof –– infiniteinfinite--cachecache corecore cpicpi –– missmiss raterate ** effectiveeffective memorymemory latencylatency yy BadBad (good)(good) newsnews isis thatthat performanceperformance doesdoes notnot scalescale upup (down)(down) linearlylinearly withwith frequencyfrequency www.intel.com/education 2006 Intel Distinguished Lecture TheThe MagicMagic ofof VoltageVoltage ScalingScaling yy PowerPower == CapacitanceCapacitance ** VoltageVoltage2 * FrequencyFrequency yy FrequencyFrequency αα VoltageVoltage inin regionregion ofof interestinterest yy PowerPower increasesincreases asas thethe cubecube ofof FrequencyFrequency yy GoodGood newsnews isis thatthat voltagevoltage scalingscaling worksworks yy 10%10% reductionreduction inin voltagevoltage yieldsyields – 10% reduction in frequency – 30% reduction in power – less than 10% reduction in performance www.intel.com/education 2006 Intel Distinguished Lecture SimpleSimple DualDual CoreCore ExampleExample yy AssumeAssume SingleSingle CoreCore processorprocessor atat 100W100W – 80W for core, 20W for cache and I/O – 50% die are is core yy DualDual corecore withinwithin samesame powerpower envelopenvelop – 20W for I/O and cache – 40W per core – Die size increases by 50% – Reduce voltage by 21% to reduce core power to 40W – Frequency reduces by ~20% – Single thread perf reduces by ~15% – Throughput increases by 70-80% www.intel.com/education 2006 Intel Distinguished Lecture PossiblePossible ImprovementsImprovements yy DevelopDevelop newnew powerpower efficientefficient corecore – E.g. extensive clock gating – Big power savings with little or no performance loss yy DesignDesign aa smallersmaller corecore withwith lowerlower performanceperformance – Area and power savings much greater than performance loss – Use larger number of cores yy AdjustAdjust frequencyfrequency andand powerpower ofof eacheach corecore withwith loadload factorfactor – Inactive cores can be put in sleep mode – Maintain overall die power constant www.intel.com/education 2006 Intel Distinguished Lecture AA NewNew EraEra…… THE NEW Performance Equals IPC THE OLD Multi-Core Power Efficiency Microarchitecture Performance Advancements Equals Frequency Unconstrained Power Voltage Scaling
Recommended publications
  • AMD Athlon™ Processor X86 Code Optimization Guide
    AMD AthlonTM Processor x86 Code Optimization Guide © 2000 Advanced Micro Devices, Inc. All rights reserved. The contents of this document are provided in connection with Advanced Micro Devices, Inc. (“AMD”) products. AMD makes no representations or warranties with respect to the accuracy or completeness of the contents of this publication and reserves the right to make changes to specifications and product descriptions at any time without notice. No license, whether express, implied, arising by estoppel or otherwise, to any intellectual property rights is granted by this publication. Except as set forth in AMD’s Standard Terms and Conditions of Sale, AMD assumes no liability whatsoever, and disclaims any express or implied warranty, relating to its products including, but not limited to, the implied warranty of merchantability, fitness for a particular purpose, or infringement of any intellectual property right. AMD’s products are not designed, intended, authorized or warranted for use as components in systems intended for surgical implant into the body, or in other applications intended to support or sustain life, or in any other applica- tion in which the failure of AMD’s product could create a situation where per- sonal injury, death, or severe property or environmental damage may occur. AMD reserves the right to discontinue or make changes to its products at any time without notice. Trademarks AMD, the AMD logo, AMD Athlon, K6, 3DNow!, and combinations thereof, AMD-751, K86, and Super7 are trademarks, and AMD-K6 is a registered trademark of Advanced Micro Devices, Inc. Microsoft, Windows, and Windows NT are registered trademarks of Microsoft Corporation.
    [Show full text]
  • Inside Intel® Core™ Microarchitecture Setting New Standards for Energy-Efficient Performance
    White Paper Inside Intel® Core™ Microarchitecture Setting New Standards for Energy-Efficient Performance Ofri Wechsler Intel Fellow, Mobility Group Director, Mobility Microprocessor Architecture Intel Corporation White Paper Inside Intel®Core™ Microarchitecture Introduction Introduction 2 The Intel® Core™ microarchitecture is a new foundation for Intel®Core™ Microarchitecture Design Goals 3 Intel® architecture-based desktop, mobile, and mainstream server multi-core processors. This state-of-the-art multi-core optimized Delivering Energy-Efficient Performance 4 and power-efficient microarchitecture is designed to deliver Intel®Core™ Microarchitecture Innovations 5 increased performance and performance-per-watt—thus increasing Intel® Wide Dynamic Execution 6 overall energy efficiency. This new microarchitecture extends the energy efficient philosophy first delivered in Intel's mobile Intel® Intelligent Power Capability 8 microarchitecture found in the Intel® Pentium® M processor, and Intel® Advanced Smart Cache 8 greatly enhances it with many new and leading edge microar- Intel® Smart Memory Access 9 chitectural innovations as well as existing Intel NetBurst® microarchitecture features. What’s more, it incorporates many Intel® Advanced Digital Media Boost 10 new and significant innovations designed to optimize the Intel®Core™ Microarchitecture and Software 11 power, performance, and scalability of multi-core processors. Summary 12 The Intel Core microarchitecture shows Intel’s continued Learn More 12 innovation by delivering both greater energy efficiency Author Biographies 12 and compute capability required for the new workloads and usage models now making their way across computing. With its higher performance and low power, the new Intel Core microarchitecture will be the basis for many new solutions and form factors. In the home, these include higher performing, ultra-quiet, sleek and low-power computer designs, and new advances in more sophisticated, user-friendly entertainment systems.
    [Show full text]
  • Evolution of Microprocessor Performance
    EvolutionEvolution ofof MicroprocessorMicroprocessor PerformancePerformance So far we examined static & dynamic techniques to improve the performance of single-issue (scalar) pipelined CPU designs including: static & dynamic scheduling, static & dynamic branch predication. Even with these improvements, the restriction of issuing a single instruction per cycle still limits the ideal CPI = 1 Multiple Issue (CPI <1) Multi-cycle Pipelined T = I x CPI x C (single issue) Superscalar/VLIW/SMT Original (2002) Intel Predictions 1 GHz ? 15 GHz to ???? GHz IPC CPI > 10 1.1-10 0.5 - 1.1 .35 - .5 (?) Source: John P. Chen, Intel Labs We next examine the two approaches to achieve a CPI < 1 by issuing multiple instructions per cycle: 4th Edition: Chapter 2.6-2.8 (3rd Edition: Chapter 3.6, 3.7, 4.3 • Superscalar CPUs • Very Long Instruction Word (VLIW) CPUs. Single-issue Processor = Scalar Processor EECC551 - Shaaban Instructions Per Cycle (IPC) = 1/CPI EECC551 - Shaaban #1 lec # 6 Fall 2007 10-2-2007 ParallelismParallelism inin MicroprocessorMicroprocessor VLSIVLSI GenerationsGenerations Bit-level parallelism Instruction-level Thread-level (?) (TLP) 100,000,000 (ILP) Multiple micro-operations Superscalar /VLIW per cycle Simultaneous Single-issue CPI <1 u Multithreading SMT: (multi-cycle non-pipelined) Pipelined e.g. Intel’s Hyper-threading 10,000,000 CPI =1 u uuu u u Chip-Multiprocessors (CMPs) u Not Pipelined R10000 e.g IBM Power 4, 5 CPI >> 1 uuuuuuu u AMD Athlon64 X2 u uuuuu Intel Pentium D u uuuuuuuu u u 1,000,000 u uu uPentium u u uu i80386 u i80286
    [Show full text]
  • Intel® Core™ Microarchitecture • Wrap Up
    EW N IntelIntel®® CoreCore™™ MicroarchitectureMicroarchitecture MarchMarch 8,8, 20062006 Stephen L. Smith Bob Valentine Vice President Architect Digital Enterprise Group Intel Architecture Group Agenda • Multi-core Update and New Microarchitecture Level Set • New Intel® Core™ Microarchitecture • Wrap Up 2 Intel Multi-core Roadmap – Updates since Fall IDF 3 Ramping Multi-core Everywhere 4 All products and dates are preliminary and subject to change without notice. Refresher: What is Multi-Core? Two or more independent execution cores in the same processor Specific implementations will vary over time - driven by product implementation and manufacturing efficiencies • Best mix of product architecture and volume mfg capabilities – Architecture: Shared Caches vs. Independent Caches – Mfg capabilities: volume packaging technology • Designed to deliver performance, OEM and end user experience Single die (Monolithic) based processor Multi-Chip Processor Example: 90nm Pentium® D Example: Intel Core™ Duo Example: 65nm Pentium D Processor (Smithfield) Processor (Yonah) Processor (Presler) Core0 Core1 Core0 Core1 Core0 Core1 Front Side Bus Front Side Bus Front Side Bus *Not representative of actual die photos or relative size 5 Intel® Core™ Micro-architecture *Not representative of actual die photo or relative size 6 Intel Multi-core Roadmap 7 Intel Multi-core Roadmap 8 Intel® Core™ Microarchitecture Based Platforms Platform 2006 20072007 Caneland Platform (2007) MP Servers Tigerton (QC) (2007) Bensley Platform (Q2’06)/ Glidewell Platform (Q2’06) ) DP Servers/ Woodcrest (Q3’06) DP Workstation Clovertown (QC) (Q1’07) Kaylo Platform (Q3’06)/ Wyloway Platform (Q3 ’06) UP Servers/ Conroe (Q3’06) UP Workstation Kentsfield (QC) (Q1’07) Bridge Creek Platform (Mid’06) Desktop -Home Conroe (Q3’06) Kentsfield (QC) (Q1’07) Desktop -Office Averill Platform (Mid’06) Conroe (Q3’06) Mobile Client Napa Platform (Q1’06) Merom (2H’06) All products and dates are preliminary 9 Note: only Intel® Core™ microarchitecture QC refers to Quad-Core and subject to change without notice.
    [Show full text]
  • Nt* and Rtl* INT 2Eh CALL Ntdll!Kifastsystemcall
    ȘFĢ: Fųřțįm’ș Đěřįvǻțįvě Bỳ Jǿșěpħ Ŀǻňđřỳ ǻňđ Ųđį Șħǻmįř Țħě Ŀǻbș țěǻm ǻț ȘěňțįňěŀǾňě řěčěňțŀỳ đįșčǿvěřěđ ǻ șǿpħįșțįčǻțěđ mǻŀẅǻřě čǻmpǻįģň șpěčįfįčǻŀŀỳ țǻřģěțįňģ ǻț ŀěǻșț ǿňě Ěųřǿpěǻň ěňěřģỳ čǿmpǻňỳ. Ųpǿň đįșčǿvěřỳ, țħě țěǻm řěvěřșě ěňģįňěěřěđ țħě čǿđě ǻňđ běŀįěvěș țħǻț bǻșěđ ǿň țħě ňǻțųřě, běħǻvįǿř ǻňđ șǿpħįșțįčǻțįǿň ǿf țħě mǻŀẅǻřě ǻňđ țħě ěxțřěmě měǻșųřěș įț țǻķěș țǿ ěvǻđě đěțěčțįǿň, įț ŀįķěŀỳ pǿįňțș țǿ ǻ ňǻțįǿň-șțǻțě șpǿňșǿřěđ įňįțįǻțįvě, pǿțěňțįǻŀŀỳ ǿřįģįňǻțįňģ įň Ěǻșțěřň Ěųřǿpě. Țħě mǻŀẅǻřě įș mǿșț ŀįķěŀỳ ǻ đřǿppěř țǿǿŀ běįňģ ųșěđ țǿ ģǻįň ǻččěșș țǿ čǻřěfųŀŀỳ țǻřģěțěđ ňěțẅǿřķ ųșěřș, ẅħįčħ įș țħěň ųșěđ ěįțħěř țǿ įňțřǿđųčě țħě pǻỳŀǿǻđ, ẅħįčħ čǿųŀđ ěįțħěř ẅǿřķ țǿ ěxțřǻčț đǻțǻ ǿř įňșěřț țħě mǻŀẅǻřě țǿ pǿțěňțįǻŀŀỳ șħųț đǿẅň ǻň ěňěřģỳ ģřįđ. Țħě ěxpŀǿįț ǻffěčțș ǻŀŀ věřșįǿňș ǿf Mįčřǿșǿfț Ẅįňđǿẅș ǻňđ ħǻș běěň đěvěŀǿpěđ țǿ bỳpǻșș țřǻđįțįǿňǻŀ ǻňțįvįřųș șǿŀųțįǿňș, ňěxț-ģěňěřǻțįǿň fįřěẅǻŀŀș, ǻňđ ěvěň mǿřě řěčěňț ěňđpǿįňț șǿŀųțįǿňș țħǻț ųșě șǻňđbǿxįňģ țěčħňįqųěș țǿ đěțěčț ǻđvǻňčěđ mǻŀẅǻřě. (bįǿměțřįč řěǻđěřș ǻřě ňǿň-řěŀěvǻňț țǿ țħě bỳpǻșș / đěțěčțįǿň țěčħňįqųěș, țħě mǻŀẅǻřě ẅįŀŀ șțǿp ěxěčųțįňģ įf įț đěțěčțș țħě přěșěňčě ǿf șpěčįfįč bįǿměțřįč věňđǿř șǿfțẅǻřě). Ẅě běŀįěvě țħě mǻŀẅǻřě ẅǻș řěŀěǻșěđ įň Mǻỳ ǿf țħįș ỳěǻř ǻňđ įș șțįŀŀ ǻčțįvě. İț ěxħįbįțș țřǻįțș șěěň įň přěvįǿųș ňǻțįǿň-șțǻțě Řǿǿțķįțș, ǻňđ ǻppěǻřș țǿ ħǻvě běěň đěșįģňěđ bỳ mųŀțįpŀě đěvěŀǿpěřș ẅįțħ ħįģħ-ŀěvěŀ șķįŀŀș ǻňđ ǻččěșș țǿ čǿňșįđěřǻbŀě řěșǿųřčěș. Ẅě vǻŀįđǻțěđ țħįș mǻŀẅǻřě čǻmpǻįģň ǻģǻįňșț ȘěňțįňěŀǾňě ǻňđ čǿňfįřměđ țħě șțěpș ǿųțŀįňěđ běŀǿẅ ẅěřě đěțěčțěđ bỳ ǿųř Đỳňǻmįč Běħǻvįǿř Țřǻčķįňģ (ĐBȚ) ěňģįňě. Mǻŀẅǻřě Șỳňǿpșįș Țħįș șǻmpŀě ẅǻș ẅřįțțěň įň ǻ mǻňňěř țǿ ěvǻđě șțǻțįč ǻňđ běħǻvįǿřǻŀ đěțěčțįǿň. Mǻňỳ ǻňțį-șǻňđbǿxįňģ țěčħňįqųěș ǻřě ųțįŀįżěđ.
    [Show full text]
  • The Intel X86 Microarchitectures Map Version 2.0
    The Intel x86 Microarchitectures Map Version 2.0 P6 (1995, 0.50 to 0.35 μm) 8086 (1978, 3 µm) 80386 (1985, 1.5 to 1 µm) P5 (1993, 0.80 to 0.35 μm) NetBurst (2000 , 180 to 130 nm) Skylake (2015, 14 nm) Alternative Names: i686 Series: Alternative Names: iAPX 386, 386, i386 Alternative Names: Pentium, 80586, 586, i586 Alternative Names: Pentium 4, Pentium IV, P4 Alternative Names: SKL (Desktop and Mobile), SKX (Server) Series: Pentium Pro (used in desktops and servers) • 16-bit data bus: 8086 (iAPX Series: Series: Series: Series: • Variant: Klamath (1997, 0.35 μm) 86) • Desktop/Server: i386DX Desktop/Server: P5, P54C • Desktop: Willamette (180 nm) • Desktop: Desktop 6th Generation Core i5 (Skylake-S and Skylake-H) • Alternative Names: Pentium II, PII • 8-bit data bus: 8088 (iAPX • Desktop lower-performance: i386SX Desktop/Server higher-performance: P54CQS, P54CS • Desktop higher-performance: Northwood Pentium 4 (130 nm), Northwood B Pentium 4 HT (130 nm), • Desktop higher-performance: Desktop 6th Generation Core i7 (Skylake-S and Skylake-H), Desktop 7th Generation Core i7 X (Skylake-X), • Series: Klamath (used in desktops) 88) • Mobile: i386SL, 80376, i386EX, Mobile: P54C, P54LM Northwood C Pentium 4 HT (130 nm), Gallatin (Pentium 4 Extreme Edition 130 nm) Desktop 7th Generation Core i9 X (Skylake-X), Desktop 9th Generation Core i7 X (Skylake-X), Desktop 9th Generation Core i9 X (Skylake-X) • Variant: Deschutes (1998, 0.25 to 0.18 μm) i386CXSA, i386SXSA, i386CXSB Compatibility: Pentium OverDrive • Desktop lower-performance: Willamette-128
    [Show full text]
  • Inside Intel® Core™ Microarchitecture and Smart Memory Access an In-Depth Look at Intel Innovations for Accelerating Execution of Memory-Related Instructions
    White Paper Inside Intel® Core™ Microarchitecture and Smart Memory Access An In-Depth Look at Intel Innovations for Accelerating Execution of Memory-Related Instructions Jack Doweck Intel Principal Engineer, Merom Lead Architect Intel Corporation Entdecken Sie weitere interessante Artikel und News zum Thema auf all-electronics.de! Hier klicken & informieren! White Paper Intel Smart Memory Access and the Energy-Efficient Performance of the Intel Core Microarchitecture Introduction . 2 The Five Major Ingredients of Intel® Core™ Microarchitecture . 3 Intel® Wide Dynamic Execution . 3 Intel® Advanced Digital Media Boost . 4 Intel® Intelligent Power Capability . 4 Intel® Advanced Smart Cache . 5 Intel® Smart Memory Access . .5 How Intel Smart Memory Access Improves Execution Throughput . .6 Memory Disambiguation . 7 Predictor Lookup . 8 Load Dispatch . 8 Prediction Verification . .8 Watchdog Mechanism . .8 Instruction Pointer-Based (IP) Prefetcher to Level 1 Data Cache . .9 Traffic Control and Resource Allocation . 10 Prefetch Monitor . 10 Summary . .11 Author’s Bio . .11 Learn More . .11 References . .11 2 Intel Smart Memory Access and the Energy-Efficient Performance of the Intel Core Microarchitectures White Paper Introduction The Intel® Core™ microarchitecture is a new foundation for Intel® architecture-based desktop, mobile, and mainstream server multi-core processors. This state-of-the-art, power-efficient multi-core microarchi- tecture delivers increased performance and performance per watt, thus increasing overall energy efficiency. Intel Core microarchitecture extends the energy-efficient philosophy first delivered in Intel's mobile microarchitecture (Intel® Pentium® M processor), and greatly enhances it with many leading edge microarchitectural advancements, as well as some improvements on the best of Intel NetBurst® microarchitecture.
    [Show full text]
  • Intel Mobile CPU (2007-2010)
    Intel Mobile CPU (2007-2010) Brand Core 2 Extreme Core 2 Quad Core 2 Extreme Core i7 Processor # X9000 X9100 Q9000 Q9100 QX9xxx Voltage Extreme Extreme Power optimized Standard V Standard V Extreme Codename Penryn 6M Penryn QC Auburndale Clarksfield Platform Santa Rosa Montevina Montevina Calpella Micro-architecture Core MA Core MA Nehalem # of Core Dual Core Quad Core Dual Core Quad Core Hyper-Threading N/A 2 threads/core 2 threads/core Intel 64 Intel 64 Intel 64 Intel 64 VT VT Extended VT-x/d Extended VT-x/d EIST(SpeedStep) EIST(SpeedStep) EIST(SpeedStep) IDA N/A IDA Turbo Mode Cache 6MB L2 2x3MB L2 2x6MB 512KB L2+4MB L3 1MB L2+8MB L3 FSB FSB 800 FSB1066 FSB1066 PCIe x16/DMI PCIe x16/DMI Memory interface N/A N/A DDR3 x2 DDR3 x2 GPU core N/A N/A GPU core N/A Package PGA PGA rPGA989 rPGA989 Socket Socket P Socket P Process Technology 45nm 45nm 45nm 45nm # of Die 1 2 2 2 2(CPU+GMCH) 1 TDP 44W 45W 45W 45W 35W 45W? 45W? 55W Launch Q1'08 Q2'08 Q1'09 Q3'08 Q3'08 Q3'09 09 09 Q3'09 Brand Atom Celeron Core 2 Duo Core 2 Duo Core 2 Duo Processor # N2xx 7xx T8100/8300 T9300/9500 SP9200/9400 P8400/8600 P9500 T9400/9600 Voltage Standard V Standard V ULV Standard V Power optimized Power optimized Standard V Codename Diamondville SC Pineview SC Pineview DC Penryn Penryn 3M Penryn 6M Penryn 6M Penryn 3M Penryn 6M Platform Montevina Santa Rosa Montevina Micro-architecture Silverthorne Lincroft Core MA Core MA Core MA # of Core Single Core Single Core Dual Core Single Core Dual Core Dual Core Hyper-Threading 2 threads/core 2 threads/core N/A N/A N/A Intel
    [Show full text]
  • Energy Per Instruction Trends in Intel® Microprocessors
    Energy per Instruction Trends in Intel® Microprocessors Ed Grochowski, Murali Annavaram Microarchitecture Research Lab, Intel Corporation 2200 Mission College Blvd, Santa Clara, CA 95054 [email protected], [email protected] Abstract where throughput performance is the primary objective. In order to deliver high throughput performance within a Energy per Instruction (EPI) is a measure of the amount fixed power budget, a microprocessor must achieve low of energy expended by a microprocessor for each EPI. instruction that the microprocessor executes. In this It is important to note that MIPS/watt and EPI do not paper, we present an overview of EPI, explain the consider the amount of time (latency) needed to process factors that affect a microprocessor’s EPI, and derive a an instruction from start to finish. Other metrics such as MIPS 2/watt (related to energy•delay) and MIPS 3/watt historical comparison of the trends in EPI over multiple 2 generations of Intel microprocessors. We show that the (related to energy•delay ) assign increasing importance recent Intel® Pentium® M and Intel® Core™ Duo to the time required to process instructions, and are thus microprocessors achieve significantly lower EPI than used in environments in which latency performance is what would be expected from a continuation of historical the primary objective. trends. 2. What Determines EPI? 1. Introduction Consider a capacitor that is charged and discharged With the power consumption of recent desktop by a CMOS inverter as shown in Figure 1. microprocessors having reached 130 watts, power has emerged at the forefront of challenges facing the V microprocessor designer [1, 2].
    [Show full text]
  • 5 Microprocessors
    Color profile: Disabled Composite Default screen BaseTech / Mike Meyers’ CompTIA A+ Guide to Managing and Troubleshooting PCs / Mike Meyers / 380-8 / Chapter 5 5 Microprocessors “MEGAHERTZ: This is a really, really big hertz.” —DAVE BARRY In this chapter, you will learn or all practical purposes, the terms microprocessor and central processing how to Funit (CPU) mean the same thing: it’s that big chip inside your computer ■ Identify the core components of a that many people often describe as the brain of the system. You know that CPU CPU makers name their microprocessors in a fashion similar to the automobile ■ Describe the relationship of CPUs and memory industry: CPU names get a make and a model, such as Intel Core i7 or AMD ■ Explain the varieties of modern Phenom II X4. But what’s happening inside the CPU to make it able to do the CPUs amazing things asked of it every time you step up to the keyboard? ■ Install and upgrade CPUs 124 P:\010Comp\BaseTech\380-8\ch05.vp Friday, December 18, 2009 4:59:24 PM Color profile: Disabled Composite Default screen BaseTech / Mike Meyers’ CompTIA A+ Guide to Managing and Troubleshooting PCs / Mike Meyers / 380-8 / Chapter 5 Historical/Conceptual ■ CPU Core Components Although the computer might seem to act quite intelligently, comparing the CPU to a human brain hugely overstates its capabilities. A CPU functions more like a very powerful calculator than like a brain—but, oh, what a cal- culator! Today’s CPUs add, subtract, multiply, divide, and move billions of numbers per second.
    [Show full text]
  • Course #: CSI 440/540 High Perf Sci Comp I Fall ‘09
    High Performance Computing Course #: CSI 440/540 High Perf Sci Comp I Fall ‘09 Mark R. Gilder Email: [email protected] [email protected] CSI 440/540 This course investigates the latest trends in high-performance computing (HPC) evolution and examines key issues in developing algorithms capable of exploiting these architectures. Grading: Your grade in the course will be based on completion of assignments (40%), course project (35%), class presentation(15%), class participation (10%). Course Goals Understanding of the latest trends in HPC architecture evolution, Appreciation for the complexities in efficiently mapping algorithms onto HPC architectures, Familiarity with various program transformations in order to improve performance, Hands-on experience in design and implementation of algorithms for both shared & distributed memory parallel architectures using Pthreads, OpenMP and MPI. Experience in evaluating performance of parallel programs. Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 2 Grades 40% Homework assignments 35% Final project 15% Class presentations 10% Class participation Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 3 Homework Usually weekly w/some exceptions Must be turned in on time – no late homework assignments will be accepted All work must be your own - cheating will not be tolerated All references must be sited Assignments may consist of problems, programming, or a combination of both Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 4 Homework (continued) Detailed discussion of your results is expected – the program is only a small part of the problem Homework assignments will be posted on the class website along with all of the lecture notes ◦ http://www.cs.albany.edu/~gilder Mark R.
    [Show full text]
  • Itanium Processor
    IA-64 Microarchitecture --- Itanium Processor By Subin kizhakkeveettil CS B-S5….no:83 INDEX • INTRODUCTION • ARCHITECTURE • MEMORY ARCH: • INSTRUCTION FORMAT • INSTRUCTION EXECUTION • PIPELINE STAGES: • FLOATING POINT PERFORMANCE • INTEGER PERFORMANCE • CONCLUSION • REFERENCES Itanium Processor • First implementation of IA-64 • Compiler based exploitation of ILP • Also has many features of superscalar INTRODUCTION • Itanium is the brand name for 64-bit Intel microprocessors that implement the Intel Itanium architecture (formerly called IA-64). • Itanium's architecture differs dramatically from the x86 architectures (and the x86-64 extensions) used in other Intel processors. • The architecture is based on explicit instruction-level parallelism, in which the compiler makes the decisions about which instructions to execute in parallel Memory architecture • From 2002 to 2006, Itanium 2 processors shared a common cache hierarchy. They had 16 KB of Level 1 instruction cache and 16 KB of Level 1 data cache. The L2 cache was unified (both instruction and data) and is 256 KB. The Level 3 cache was also unified and varied in size from 1.5 MB to 24 MB. The 256 KB L2 cache contains sufficient logic to handle semaphore operations without disturbing the main arithmetic logic unit (ALU). • Main memory is accessed through a bus to an off-chip chipset. The Itanium 2 bus was initially called the McKinley bus, but is now usually referred to as the Itanium bus. The speed of the bus has increased steadily with new processor releases. The bus transfers 2x128 bits per clock cycle, so the 200 MHz McKinley bus transferred 6.4 GB/sand the 533 MHz Montecito bus transfers 17.056 GB/s.
    [Show full text]