Intel Multi-Core Presentation

MultiMulti--CoreCore MicroprocessorMicroprocessor Chips:Chips: MotivationMotivation && ChallengesChallenges DileepDileep Bhandarkar,Bhandarkar, Ph.Ph. D.D. Architect at Large Digital Enterprise Group Intel Corporation May 2006 Copyright © 2006 Intel Corporation. 2006 Intel Distinguished Lecture Agenda yy SemiconductorSemiconductor TechnologyTechnology EvolutionEvolution yy DesignDesign ChallengesChallenges yy WhyWhy MultiMulti--CoreCore ProcessorProcessor Chips?Chips? yy Power/PerformancePower/Performance TradeTrade--OffsOffs yy CMPCMP DirectionsDirections yy BeyondBeyond CMPCMP yy SummarySummary ©2006, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others www.intel.com/education 2006 Intel Distinguished Lecture IntelIntel only:only: OnOn--timetime ““22--yearyear--cyclecycle”” 180nm 130nm 90nm 65nm 45nm Wafer Size (mm): 200 200/300 300 300 300 1st Production: 1999 2001 2003 2005 2007 Transistors: SiGe SiGe Interconnects: 100nm LG 70nm LG 50nm LG 35nm LG Details CoSi2 CoSi2 NiSi NiSi Coming! Strain Si Strain Si 6 Al 6 Cu 7 Cu 8 Cu SiOF SiOF Low-k Low-k 4545 nmnm LogicLogic ProcessProcess onon TrackTrack forfor DeliveryDelivery inin 20072007 Process Name P1262 P1264 P1266 P1268 Lithography 90 nm 65 nm 45 nm 32 nm 1st Production 2003 2005 2007 2009 Moore'sMoore's LawLaw continues!continues! IntelIntel continuescontinues toto developdevelop aa newnew technologytechnology generationgeneration everyevery 22 yearsyears Intel 11th EMEA Academic Forum HistoricalHistorical DrivingDriving ForcesForces Increased Performance Shrinking Geometry via Increased Frequency 100000 10 10000 1 1000 Feature Frequency Size (MHz) 100 (um) 0.1 10 1 0.01 1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020 1971 1978 1985 1993 2005 4004 Processor 8008 Processor i386 Processor Pentium Processor Montecito 2300 Transistors IBM PC 32-bit 3.1M transistors 1.7B Transistors TheThe ChallengesChallenges Power Limitations Diminishing Voltage Scaling 1000 10 0.0.7um7um 0.0.5um5um 0.0.35um35um 0.0.25um25um 0.0.18um18um ~30% 0.0.13um13um CPU Supply 90nm90nm 100 1 65nm65nm Power Voltage 45nm45nm (W) (V) 30nm30nm 10 0.1 1990 1995 2000 2005 2010 2015 1990 1993 1997 2001 2005 2009 Power = Capacitance x Voltage2 x Frequency also Power ~ Voltage3 Agenda yy SemiconductorSemiconductor TechnologyTechnology EvolutionEvolution yy DesignDesign ChallengesChallenges yy WhyWhy MultiMulti--CoreCore ProcessorProcessor Chips?Chips? yy Power/PerformancePower/Performance TradeTrade--OffsOffs yy CMPCMP DirectionsDirections yy BeyondBeyond CMPCMP yy SummarySummary ©2005, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others www.intel.com/education 2006 Intel Distinguished Lecture DesignDesign ChallengesChallenges yy MemoryMemory latencylatency notnot scalingscaling asas fastfast asas processorprocessor speedspeed yy PowerPower growinggrowing nonnon--linearlylinearly withwith singlesingle threadthread performanceperformance yy DesignerDesigner productivityproductivity lagginglagging designdesign complexitycomplexity yy AbilityAbility toto validatevalidate andand testtest complexcomplex designdesign yy KeepingKeeping upup withwith newnew processprocess technologytechnology everyevery twotwo yearsyears www.intel.com/education 2006 Intel Distinguished Lecture LongLong LatencyLatency DRAMDRAM Accesses:Accesses: NeedsNeeds LatencyLatency TolerantTolerant TechniquesTechniques 1400 1200 1000 800 600 ions during DRAM Access ions during DRAM Access 400 200 Peak Instruct Peak Instruct 0 Pentium® Pentium-Pro Pentium III Pentium 4 Future CPUs Processor Processor Processor Processor 66 MHz 200 MHz 1100 MHz 2 GHz www.intel.com/education 2006 Intel Distinguished Lecture DRAMDRAM LatencyLatency ToleranceTolerance yy ContinueContinue buildingbuilding eveneven largerlarger cachescaches – Every semiconductor process generation provides opportunity to double cache size – Cache becomes larger part of die yy HideHide multiplemultiple threadsthreads ofof executionexecution behindbehind memorymemory latencylatency yy IntelIntel implementedimplemented simultaneoussimultaneous multimulti-- threadingthreading inin 20002000 yy ImplementImplement multimulti--corecore productsproducts asas MooreMoore’’ss LawLaw allowsallows www.intel.com/education 2006 Intel Distinguished Lecture Agenda yy SemiconductorSemiconductor TechnologyTechnology EvolutionEvolution yy DesignDesign ChallengesChallenges yy WhyWhy MultiMulti--CoreCore ProcessorProcessor Chips?Chips? yy Power/PerformancePower/Performance TradeTrade--OffsOffs yy CMPCMP DirectionsDirections yy BeyondBeyond CMPCMP yy SummarySummary ©2005, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others www.intel.com/education 2006 Intel Distinguished Lecture SituationalSituational AnalysisAnalysis y With Each Process Generation transistor density doubles – Frequency has increased by ~1.5X; ~1.3x in future – Vcc has scaled by about ~0.8x; ~0.9x in future – Capacitance has scaled by 0.7x – Total power may not scale down due to increased leakage y Instruction Level Parallelism harder to find y Increasing single-stream performance often requires non-linear increase in design complexity y Many server applications are inherently parallel y Parallelism exists in multimedia applications y Multi-tasking usage models becoming popular www.intel.com/education 2006 Intel Distinguished Lecture ProcessorProcessor PowerPower www.intel.com/education 2006 Intel Distinguished Lecture DesignDesign ComplexityComplexity andand ProductivityProductivity factorsfactors yy HugeHuge transistortransistor budgetsbudgets stressstress abilityability toto designdesign andand verifyverify complexcomplex chipschips yy MultiMulti--corecore fitsfits wellwell withwith increasingincreasing transistortransistor budgetsbudgets yy MultiMulti--corecore designdesign addressesaddresses density/designerdensity/designer gapgap www.intel.com/education 2006 Intel Distinguished Lecture Agenda yy SemiconductorSemiconductor TechnologyTechnology EvolutionEvolution yy DesignDesign ChallengesChallenges yy WhyWhy MultiMulti--CoreCore ProcessorProcessor Chips?Chips? yy Power/PerformancePower/Performance TradeTrade--OffsOffs yy CMPCMP DirectionsDirections yy BeyondBeyond CMPCMP yy SummarySummary ©2005, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others www.intel.com/education 2006 Intel Distinguished Lecture IronIron LawLaw ofof PerformancePerformance yy ExecutionExecution TimeTime isis thethe productproduct ofof –– PathPath LengthLength –– CyclesCycles PerPer InstructionInstruction (CPI)(CPI) –– CycleCycle TimeTime yyCPICPI isis thethe sumsum ofof –– infiniteinfinite--cachecache corecore cpicpi –– missmiss raterate ** effectiveeffective memorymemory latencylatency yy BadBad (good)(good) newsnews isis thatthat performanceperformance doesdoes notnot scalescale upup (down)(down) linearlylinearly withwith frequencyfrequency www.intel.com/education 2006 Intel Distinguished Lecture TheThe MagicMagic ofof VoltageVoltage ScalingScaling yy PowerPower == CapacitanceCapacitance ** VoltageVoltage2 * FrequencyFrequency yy FrequencyFrequency αα VoltageVoltage inin regionregion ofof interestinterest yy PowerPower increasesincreases asas thethe cubecube ofof FrequencyFrequency yy GoodGood newsnews isis thatthat voltagevoltage scalingscaling worksworks yy 10%10% reductionreduction inin voltagevoltage yieldsyields – 10% reduction in frequency – 30% reduction in power – less than 10% reduction in performance www.intel.com/education 2006 Intel Distinguished Lecture SimpleSimple DualDual CoreCore ExampleExample yy AssumeAssume SingleSingle CoreCore processorprocessor atat 100W100W – 80W for core, 20W for cache and I/O – 50% die are is core yy DualDual corecore withinwithin samesame powerpower envelopenvelop – 20W for I/O and cache – 40W per core – Die size increases by 50% – Reduce voltage by 21% to reduce core power to 40W – Frequency reduces by ~20% – Single thread perf reduces by ~15% – Throughput increases by 70-80% www.intel.com/education 2006 Intel Distinguished Lecture PossiblePossible ImprovementsImprovements yy DevelopDevelop newnew powerpower efficientefficient corecore – E.g. extensive clock gating – Big power savings with little or no performance loss yy DesignDesign aa smallersmaller corecore withwith lowerlower performanceperformance – Area and power savings much greater than performance loss – Use larger number of cores yy AdjustAdjust frequencyfrequency andand powerpower ofof eacheach corecore withwith loadload factorfactor – Inactive cores can be put in sleep mode – Maintain overall die power constant www.intel.com/education 2006 Intel Distinguished Lecture AA NewNew EraEra…… THE NEW Performance Equals IPC THE OLD Multi-Core Power Efficiency Microarchitecture Performance Advancements Equals Frequency Unconstrained Power Voltage Scaling

Intel Multi-Core Presentation

AMD Athlon™ Processor X86 Code Optimization Guide

Inside Intel® Core™ Microarchitecture Setting New Standards for Energy-Efficient Performance

Evolution of Microprocessor Performance

Intel® Core™ Microarchitecture • Wrap Up

Nt* and Rtl* INT 2Eh CALL Ntdll!Kifastsystemcall

The Intel X86 Microarchitectures Map Version 2.0

Inside Intel® Core™ Microarchitecture and Smart Memory Access an In-Depth Look at Intel Innovations for Accelerating Execution of Memory-Related Instructions

Intel Mobile CPU (2007-2010)

Energy Per Instruction Trends in Intel® Microprocessors

5 Microprocessors

Course #: CSI 440/540 High Perf Sci Comp I Fall ‘09

Itanium Processor