Visual Computing:Computing: Graphicsgraphics Rere--Defineddefined
Total Page:16
File Type:pdf, Size:1020Kb
IntelIntel ArchitectureArchitecture PressPress BriefingBriefing StephenStephen L.L. SmithSmith Vice President Director, Digital Enterprise Group Operations RonakRonak SinghalSinghal Principal Engineer Digital Enterprise Group 17 March, 2008 TodayToday’’ss NewsNews IntelIntel Technology:Technology: DeliveringDelivering onon thethe PromisePromise Mission Critical Tick-Tock Expandable Server Nehalem Larrabee 2 Intel:Intel: TheThe ArchitectureArchitecture forfor LifeLife Internet Netbook HPC Workstation Smartphone MID Mobile PC Mission Critical Embedded Desktop PC Server Milli Watts IA Compatible and Scalable Peta FLOPs 3 Tukwila:Tukwila: DeliveringDelivering PerformancePerformance toto WorldWorld’’ss MostMost PowerfulPowerful ComputersComputers y Quad-core with 30 MB cache y 2 billion transistors y Multi-Threading Technology y Intel QuickPath interconnect y Dual Integrated Memory Controllers y Estimate >2x* performance y Mainframe-class RAS “With Intel’’s upcoming quad-core Tukwila processor, Windows Server solutions running on Itanium-based systems will provide an even more scalable, reliable, agile and dynamic datacenter foundation for our customers.” —Bill Laing, GM Windows Server & Solutions Division, Microsoft * Compared to Dual-core Itanium® Processor 9100 series 4 ProductProduct CadenceCadence forfor SustainedSustained LeadershipLeadership 20072007--0808 PenrynPenryn Processors 45nm45nm TICKTICK TOCKTOCK Delivering Products on Schedule and Moore’s Law 5 ExpandableExpandable andand Scalable:Scalable: QuadQuad--CoreCore IntelIntel®® XeonXeon®® processorprocessor 73007300 y Caneland platform built for virtualization and consolidation y Energy Efficient performance:: LeadingLeading inin benchmarksbenchmarks y Scalable y Enterprise proven reliability and investment protection y Great customer acceptance Industry’’s Virtualization Platform of Choice 6 ExpandableExpandable andand Scalable:Scalable: GetsGets BetterBetter withwith DunningtonDunnington y 6 core Processor y 1.9 billion transistors cores cores y 45nm Hi-K technology y 45nm Hi-K technology cache cache y 16 MB L3 cache system interface y Latest Intel virtualization capabilities cores y Socket compatible with cache Caneland platform y Available 2H’’08 7 EnergyEnergy Efficiency:Efficiency: TopTop SPECpowerSPECpower** ResultsResults SPECPower_ Rank Sponsor ssj2008 result Platform Processors 1 HP 778 DL180 G5 2x Intel® Xeon® E5450 2 Dell 719 PE 2950 III 2x Intel® Xeon® E5440 3 Dell 712 PE 1950 III 2x Intel® Xeon® E5440 4 HP 698 DL160 G5 2x Intel® Xeon® E5450 5 FSC 690 RX300 S4 2x Intel® Xeon® E5440 6 Dell 682 PE 2950 2x Intel® Xeon® E5440 7 FSC 667 TX150 S6 1x Intel® Xeon® X3220 8 HP 662 DL360 G5 2x Intel® Xeon® E5450 9 HP 546 DL580 G5 4x Intel® Xeon® L7345 10 Intel 468 SM 6025B 2x Intel® Xeon® L5335 First industry standard Energy Efficiency benchmark Public SPECpower results from http://www.spec.org/power_ssj2008/results/power_ssj2008.html as of Feb 28, 2008 SPECPower_ssj2008 results measured as ssj_ops/watt 8 *SPEC, the SPEC logo and SPECpower are registered trademarks of the Standard Performance Evaluation Corporation ProductProduct CadenceCadence forfor SustainedSustained LeadershipLeadership 20072007--0808 PenrynPenryn NehalemNehalem Processors Processors 45nm45nm 45nm45nm TICKTICK TOCKTOCK Driving Products to Deliver on Moore’s Law 9 NehalemNehalem MicroMicro--architecture:architecture: DynamicallyDynamically ScalableScalable andand InnovativeInnovative NewNew DesignDesign Scalable from 2 to 8 cores Micro-architecture enhancements (4 –wide) Integrated Memory Controller – 3 Ch DDR3 2-way simultaneous multi-threading Integrated memory controller Core 0 Core 1 Core 2 Core 3 QuickPath interconnect Q Shared and Inclusive Level-3 cache P Shared L3 Cache Dynamic power management I SSE 4.2 Production: Q4’’08 10 NehalemNehalem DesignDesign ScalableScalable ViaVia ModularityModularity Nehalem Building Ex: 4 Core Ex: 8 Core Block Library iGraphics IA Core QPI … … Cache … IMC Sample Range of Product Options 11 Block combinations are for illustration only and do not represent actual product plans. Block sizes are not indicative of die size contributions. Nehalem:Nehalem: CoreCore uArchuArch EnhancementsEnhancements Foundation: Intel® Core™ Microarchitecture Significant Performance and Efficiency Enhancements y Increased parallelism Concurrent uOps Possible 33% more ¾ 33% more micro-ops in flight possible 128 parallelism y Enhanced algorithms 112 96 ¾ Faster “unaligned” cache accesses 80 ¾ Faster synchronization primitives 64 48 y Further branch prediction enhancements 32 16 ¾ New 2nd level branch predictor 0 ¾ Renamed Return Stack Buffer Intel Core Intel Core 2 Nehalem BuildsBuilds uponupon IndustryIndustry LeadingLeading 44 InstructionInstruction issueissue IntelIntel®® CoreCore micromicro--architecturearchitecture 12 SimultaneousSimultaneous MultiMulti--ThreadingThreading (SMT)(SMT) y Each core able to execute two software threads simultaneously y Extremely power efficient y Enhanced with larger caches and more memory bandwidth y Benefits ¾ Highly threaded workloads (eg, multi-media apps, databases, search engines) ¾ Multi-Tasking scenarios SimultaneousSimultaneous MultiMulti--threadingthreading EnhancesEnhances PerformancePerformance andand EnergyEnergy EfficiencyEfficiency 13 EnhancedEnhanced CacheCache SubsystemSubsystem y New 3-level Cache Hierarchy 32K L1 32K L1 32K L1 32K L1 ¾ L1 cache same as Intel Core™ uArch I-cache I-cache I-cache I-cache 32K L1 32K L1 32K L1 32K L1 • 32 KB Instruction/32 KB Data D-Cache D-Cache D-Cache D-Cache ¾ New 256 KB/core, low latency L2 cache 256K 256K 256K 256K ¾ New Large 8MB fully-shared L3 cache L2 L2 L2 L2 Cache Cache Cache Cache • Inclusive Cache Policy - minimize snoop traffic 8 MB Last Level Cache y New 2-level TLB hierarchy ¾ Adds 2nd level 512 entry Translation Look-aside Buffer SuperiorSuperior multimulti--levellevel sharedshared cachecache extendsextends IntelIntel®® SmartSmart CacheCache technologytechnology 14 Nehalem/TylersburgNehalem/Tylersburg PlatformsPlatforms (High(High EndEnd DesktopDesktop andand Server/Workstation)Server/Workstation) Desktop Nehalem Nehalem Nehalem QPI QPI Tylersburg Tylersburg Tylersburg I/O Hub PCI Express* I/O Hub PCI Express* PCI Express* Gen 2 Gen 2 DMI Gen 2 DMI ICH Server ICH ICH y Intel® QuickPath Interconnect y Integrated DDR3 Memory ¾ New point to point Controller interconnect ¾ 3 channels per processor ¾ 2 links per CPU socket ¾ Massive amounts of Bandwidth ¾ Up to 25.6 Gb/sec total ¾ Significant Memory Latency bandwidth/link Reduction HugeHuge LatencyLatency DecreaseDecrease andand BandwidthBandwidth IncreaseIncrease overover PriorPrior GenerationGeneration 15 NehalemNehalem HighHigh EndEnd Desktop/ServerDesktop/Server IMCIMC Nehalem Nehalem y 3 channels per socket y Up to 3 DIMMs/channel Tylersburg y DDR3-800, 1066, 1333 ¾ Future scalability 2 Socket Memory Bandwidth* y Supports RDIMM and UDIMM 5 4 >4x y Very low latency Band- 3 width y Very high bandwidth 2 1 y Built-In RAS Features 0 y Built-In RAS Features Harpertown (1600 Nehalem(3xDDR3- FSB) 1333) LeadershipLeadership MemoryMemory BandwidthBandwidth 16 *Source: Intel internal measurement ProductProduct CadenceCadence forfor SustainedSustained LeadershipLeadership 20092009--1010 WestmereWestmere SandySandy BridgeBridge Processors Processors 3232 nmnm 3232 nmnm TICKTICK TOCKTOCK Continuing the Pace of Innovation 17 IntelIntelIntel®®® AdvancedAdvanced VectorVector ExtensionExtension (AVX)(AVX) 256256-bit-bit vectorvector extensionextension toto SSESSE foforr FPFP intensiveintensive applicationsapplications KEY FEATURES BENEFITS Wider Vectors Increased from 128 bit to 256 bit Up to 2x peak FLOPs output Enhanced Data Organize, access and pull only Rearrangement Use the new 256 bit primitives to necessary data broadcast, mask loads and do data more quickly and efficiently permutes Three Operand, Fewer register copies, better Non Destructive Syntax register use, more opportunities Designed for efficiency and future for parallel loads and compute extensibility operations, smaller code size 18 VisualVisual Computing:Computing: GraphicsGraphics ReRe--defineddefined Mainstream Graphics Visual Computing y Triangle / Rasterization y New life-like Rendering e.g. Global illumination y Rigid pipeline architecture y Programmable, ubiquitous architecture y Tools constrained by y High definition audio and video architecture processing y Inefficient for non-graphics y Combines with model based computing computing (e.g. Physics) 19 20 VisualVisual ComputingComputing Acquiring,Acquiring, Analyzing,Analyzing, ModelingModeling andand SynthesizingSynthesizing VisualVisual WorkloadsWorkloads Photorealistic Computational 3D Rendering Modeling Interactive High Definition User Interface Audio, Video 20 21 VisualVisual Computing:Computing: WhatWhat DoesDoes itit Take?Take? Intel Leadership • Platforms: Client, Workstation, Server • CPU, Graphics, Media Architecture • Process and Technology Leadership • Software, Tools & Developer Support Photorealistic Computational 3D Rendering Modeling Interactive High Definition User Interface Audio, Video 21 22 LarrabeeLarrabee:: VisualVisual ComputingComputing ArchitectureArchitecture y Many IA cores ¾ Scalable to TeraFLOPS y New cache architecture y New vector instruction set ¾ Vector memory operations ¾ Conditionals ¾ Integer and FP arithmetic y New vector processing unit / wide SIMD 22 IntelIntel SoftwareSoftware:: UnleashesUnleashes DeveloperDeveloper FreedomFreedom IndustryIndustry LeadingLeading IntelIntel®® SoftwareSoftware