Intel® Omni-Path Architecture Overview and Update
Total Page:16
File Type:pdf, Size:1020Kb
The architecture for Discovery June, 2016 Intel Confidential Caught in the Vortex…? Business Efficiency & Agility DATA: Trust, Privacy, sovereignty Innovation: New Economy Biz Models Macro Economic Effect Growth Enablers/Inhibitors Intel® Solutions Summit 2016 2 Intel® Solutions Summit 2016 3 Intel Confidential 4 Data Center Blocks Reduce Complexity Intel engineering, validation, support Data Center Blocks Speed time to market Begin with a higher level of integration HPC Cloud Enterprise Storage Increase Value VSAN Ready HPC Compute SMB Server Block Reduce TCO, value pricing Block Node Fuel innovation Server blocks for specific segments Focus R&D on value-add and differentiation Intel® Solutions Summit 2016 5 A Holistic Design Solution for All HPC Needs Intel® Scalable System Framework Small Clusters Through Supercomputers Compute Memory/Storage Compute and Data-Centric Computing Fabric Software Standards-Based Programmability On-Premise and Cloud-Based Intel Silicon Photonics Intel® Xeon® Processors Intel® Solutions for Lustre* Intel® Omni-Path Architecture HPC System Software Stack Intel® Xeon Phi™ Processors Intel® SSDs Intel® True Scale Fabric Intel® Software Tools Intel® Xeon Phi™ Coprocessors Intel® Optane™ Technology Intel® Ethernet Intel® Cluster Ready Program Intel® Server Boards and Platforms 3D XPoint™ Technology Intel® Silicon Photonics Intel® Visualization Toolkit Intel Confidential 14 Parallel is the Path Forward Intel® Xeon® and Intel® Xeon Phi™ Product Families are both going parallel How do we attain extremely high compute density for parallel workloads AND maintain the robust programming models and tools that developers crave? Intel® Xeon® Intel® Xeon® Intel® Xeon® Intel® Xeon® Intel® Xeon® Intel® Xeon® Intel® Xeon® Intel® Xeon Phi™ Intel® Xeon Phi™ processor processor processor processor processor code- processor code- processor code- coprocessor coprocessor named 64-bit 5100 series 5500 series 5600 series named Sandy named Knights Knights Ivy Bridge Bridge EP HaswellEP Corner Landing1 EP Core(s) 1 2 4 6 8 12 18 57-61 72 Threads 2 2 8 12 16 24 36 228-244 288 256 256 256 SIMD Width 128 128 128 128 AVX AVX AVX2 512 2 x 512 More cores More Threads Wider vectors *Product specification for launched and shipped products available on ark.intel.com. 1. Not launched - in planning. Intel Confidential Tick-Tock Development Model Sustained Microprocessor Leadership Nehalem Sandy Bridge Haswell SkyLake Microarchitecture Microarchitecture Microarchitecture Microarchitecture Sandy Nehalem Westmere Ivy Bridge Haswell Broadwell SkyLake Future Bridge 45nm 32nm 32nm 22nm 22nm 14nm 14nm XXnm New New New New New New New New Micro- Process Micro- Process Micro- Process Micro- Process architecture Technology architecture Technology architecture Technology architecture Technology (SSE) (AVX) (AVX2) (AVX512) TOCK TICK TOCK TICK TOCK TICK TOCK TICK Typically, Increase in Transistor Density Enables New Capabilities, Higher Performance Levels, and Greater Energy Efficiency Intel Confidential 8 Intel Confidential Intel® Xeon® processor E5-2600 v4 product family Grantley-Refresh Overview Broadwell microarchitecture Built on 14nm process technology Socket compatible# replacement for Intel® Xeon® processor E5-2600 v3 on Grantley Several new features and capabilities Feature Xeon E5-2600 v3 (Haswell-EP) Xeon E5-2600 v4 (Broadwell-EP) Cores Per Socket Up to 18 Up to 22 Threads Per Socket Up to 36 threads Up to 44 threads Last-level Cache (LLC) Up to 45 MB Up to 55 MB QPI Speed (GT/s) 2x QPI 1.1 channels 6.4, 8.0, 9.6 GT/s PCIe* Lanes/ Controllers/Speed(GT/s) 40 / 10 / PCIe* 3.0 (2.5, 5, 8 GT/s) 4 channels of up to 3 RDIMMs or 3 Memory Population + 3DS LRDIMM& LRDIMMs Max Memory Speed Up to 2133 Up to 2400 TDP (W) 160 (Workstation only), 145, 135, 120, 105, 90, 85, 65, 55 # Requires BIOS and firmware update All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change without notice. & Depends on market availability Intel may make changes to specifications and product descriptions at any time, without notice Intel Confidential Intel Confidential Intel® Xeon Phi™ Product Family Highly-parallel processing to power your breakthrough innovations “Meet Knight's Landing: Intel's most Future powerful chip ever is overflowing with cutting- edge technologies” – PC World 06/2014 Knights Hill 3rd generation Coming Soon . 10 nm process . Integrated Fabric (2nd Knights Landing Generation) Intel® Xeon Phi™ . In Planning… Available Today x200 Product Family . 14 nm process Knights Corner . Host Processor Intel® Xeon Phi™ & Coprocessor 1 x100 Product Family . >3 TF DP Peak . 22 nm process . Up to 72 Cores . Coprocessor only . Up to 16GB HBM 2 . >1 TF DP Peak . Up to 384GB DDR4 . Up to 61 Cores . ~500 GB/s STREAM . Integrated Fabric2 . Up to 16GB GDDR5 *Results will vary. This simplified test is the result of the distillation of the more in-depth programming guide found here: https://software.intel.com/sites/default/files/article/383067/is-xeon-phi-right-for-me.pdf All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subjectIntel to chConfidentialange without notice. 1 Over 3 Teraflops of peak theoretical double-precision performance is preliminary and based on current expecations of cores, clock frequency and floating point operations per cycle. FLOPS = cores x clock frequency x floating-point operations per second per cycle. 2 Host processor only Intel Confidential Three (3) Knights Landing Products Groveport Platform Ingredient in Grantley/Purley Platform Knights Landing Processor Knights Landing Coprocessor “Self-boot” Intel® Xeon Phi™ processor platform Requires Intel® Xeon® processor host Solution for future clusters with both Xeon and Xeon Phi Solution for general purpose servers and workstations . Binary-compatible with Intel® Xeon® processor (Skylake) . Targeted for applications with larger sections of serial work1 . Higher performance density for highly parallel applications2 . Upgrade path from Knights Corner as PCIe* card . Reduced system power consumption2 . Higher perf/Watt & perf/$$3 *Other names and brands may be claimed as the property of others. 1 Projections based on early product definition and as compared to prior generation Intel® Xeon Phi™ Coprocessors For more info, download the Groveport (KNL) Snapshot: 2 Based on Intel internal analysis. Lower power based on power consumption estimates between (2) HCAs https://sharepoint.amr.ith.intel.com/sites/snapshot/Groveport compared to 15W additional power for KNL-F. Higher density based on removal of PCIe* slots and associated HCAs populated in those slots. 3 Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Intel Confidential Knights Landing Architectural Diagram Up to 16GB high-bandwidth Over 3 TF DP peak on-package memory Full Xeon ISA compatibility through AVX-512 (MCDRAM) ~3x single-thread vs. compared to Knights Exposed as NUMA node Corner 2x 512b VPU per core ~500 GB/s sustained BW (Vector Processing Units) Up to 72 cores 2D mesh architecture MCDRAM MCDRAM MCDRAM MCDRAM Tile DDR4 DDR4 6 channels 2 2 DDR4 Up to DDR4 DDR4 VPU VPU Up to 72 cores HUB 384GB DDR4 DDR4 Core Core MCDRAM MCDRAM MCDRAM MCDRAM 1MB L2 Wellsburg DMI PCH HFI Based on Intel® Atom Silvermont processor with many HPC enhancements Common with Deep out-of-order buffers Grantley PCH PCIe Gen3 Gather/scatter in hardware 2 ports Storm Lake x36 Improved branch prediction (IFP) (IFP) Coax Cable Cable Coax Coax Cable Cable Coax - - Integrated Fabric 4 threads/core Micro Micro On-package High cache bandwidth 50 GB/s bi-directional & more Intel Confidential Intel® True Scale Fabric ~10% of Verbs-based Instructions . Network Infrastructure - Optimized Price/Performance interconnect for HPC . Host Architecture - High MPI message rate & low end-to-end latency . Scalable Switch Solution - Performance & Latency scales with network Intel Confidential — Do Not Forward Intel® Omni-Path Architecture: Changing the Fabric Landscape Next Generation Optimizing • Performance • Density Next Intel® Xeon Phi™ processor • Power (Knights Hill) • Cost CPU-Fabric Next Intel® Xeon® processor Integration Intel® Xeon Phi™ processor (Knights Landing) Multi-chip package integration Next Intel® Xeon® processor Discrete PCIe HFI Intel® OPA Intel® Xeon® processor E5-2600 v3 HFI Card + Discrete PCIe HFI Time Intel Confidential Intel® Omni-Path Architecture Product Family Wolf River Prairie River Software Cables HFI Switch ASIC ASIC Intel OPA Gen1 Host Fabric Interface (HFI) Silicon Intel OPA Gen1 Switch Silicon Intel® Fabric Suite Passive Copper 2 x 100 Gbps, 50 GB/sec Fabric Bandwidth 48 ports, 9.6Tb/s, 1200 GB/sec Fabric Bandwidth [based on OFA with & Active Optical Intel® OPA support] Cable (AOC) 768-port (20U) 2 1056-port (in planning) 192-port (7U) 2 Product Line 264-port (in planning) Custom Mezz Standard Integrated Xeon® Edge Switch1 Director Class Custom & PCIe Cards 1 PCIe Board and Xeon Phi™ (Eldorado Forest) 1 (Chippewa Forest) Switches (DCS) Switches (Sawtooth Forest) . OEM products . Low Profile PCIe v3.0 x16 . Knights Landing: . 24 / 48 port individual QSFP28 ports . Full Bisection Bandwidth . OEM products . QSFP-based leaf module Passive Cu based on Wolf . Low Profile PCIe v3.0 x8 2 x 100 Gbps ports . Short reach – QSFP28 Cu cables based on Prairie