OneAPI workshop with HLRN 2021, March 02-03, 2021 Dr. Jean-Laurent Philippe, EMEA HPC Technical and Sales Director, Intel DCG Sales Agenda
• Server CPUs, focus on Ice Lake-SP processor
• Client CPUs, focus on Tiger Lake
• Intel HW Discrete Graphics Accelerators, focus on Ponte Vecchio
• Intel HW FPGAs
Intel Proprietary For OneAPI Workshop with HLRN 2021 March 02-03, 2021 2
Agenda
• Server CPUs, focus on Ice Lake-SP processor
• Client CPUs, focus on Tiger Lake
• Intel HW Discrete Graphics Accelerators, focus on Ponte Vecchio
• Intel HW FPGAs
Intel Proprietary For OneAPI Workshop with HLRN 2021 March 02-03, 2021 4 3rd Gen Intel® Xeon® Scalable Processor (Ice Lake) Optimized for an outstanding HPC and AI experience
Higher memory Better performance bandwidth: 8 DDR4 Faster I / O with per core via new channels & PCIe Gen 4 architecture 3200MT/s
Supporting exascale Security Innovations storage with up to 6TB Intel® SGX & crypto Volume Ramp memory / socket and in Q1 2021 PMem acceleration
Department or Event Name Intel Confidential 5 Ice Lake-SP is new 3rd Gen Intel Xeon Scalable Processor
• 10nm+ process technology Whitley 2-socket System • 2-socket Whitley platform PCIe Gen4 PCIe Gen4
• Incorporates Sunny Cove core Ice Lake-SP Ice Lake-SP (ICX) (ICX) • Brings scalable and balanced UPI architecture for increased throughput DMI DMI and per-core performance across all Lewisburg R Lewisburg R workloads in the datacenter DDR4 DIMMs DDR4/Intel® Optane™ Persistent Memory
Results have been estimated based on pre-production tests as of July 2020. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. Ice Lake-SP new core, new uncore
Ice Lake Server CPU Core – Sunny Cove
• Core Microarchitecture Core Core Core Core Core Core
• New ISA Usages for Server Core Core Core Core Core Core
Core Core Core Core Ice Lake Server – The SoC Core Core Core Core Core Core • Scalable Infrastructure and Architecture Core Core Core Core • Memory Hierarchy and IOs
Core Core • Performance and Power enhancements Die picture of a 28C Ice Lake-SP die Sunny Cove Core Microarchitecture
Cascade Lake Ice Lake (per core) (per core) Out-of-order Window 224 352 In-flight Loads + Stores 72 + 56 128 + 72 Scheduler Entries 97 160 Register Files – 180 + 168 280 +224 Integer + FP 70/thread; Allocation Queue 64/thread 140/1 thread L1D Cache (KB) 32 48 L1D BW (B/Cyc) – 128 + 64 128 + 64 Load + Store L2 Unified TLB 1.5K 2K Mid-level Cache (MB) 1 1.25 • Improved Front-end: higher capacity and improved branch predictor • Wider and deeper machine: wider allocation and execution resources + larger structures • Enhancements in TLBs, single thread execution, prefetching • Server enhancements – larger Mid-level Cache (L2) + second FMA ~18% Increase In IPC On Existing SPECcpu2017(est) Integer Rate Binaries
Results have been estimated based on pre-production tests at iso core count, frequency and memory BW per core as of July 2020. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. AVX Frequency Improvements
Class of Power level Instruction types per class Goal: Minimize frequency impact on AVX 512 instructions (not the full list) 0 SSE/256L all 64b and 128bit bit operations when not bounded by physical 256H FP Mul, INT Mul, VNNI, FMA 256b 1 512L VPCLMUL, VAES, VBMI, Ld, St 512b limits FP Mul, INT Mul, VNNI, FMA, 2 512H VPMADD52 512b • Not all AVX512 instructions consume high power Turbo Frequency scaling for 3 different instruction classes1 1,1
– 512-bit loads, 512-bit stores, 256-bit FP, 1,05 Significantly better frequency profile over prior generation integer multiply are a few examples 1 for AVX256 or AVX512 operations – Smarter mapping between instructions and 0,95 specific power levels 0,9 0,85
0,8 Provides software writers greater latitude when power uncostrained using these instructions to optimize their code 0,75 0,7
for performance Relative Frequencyper instruction class when SSE/256L 256H/512L 512H SKX CLX ICX 1Baseline: all cores active turbo frequency for SSE for each product
Results have been estimated based on pre-production part tests as of July 2020. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. Ice Lake-SP IO and Memory Hierarchy
Integrating PCIe Gen4 controllers Ice Lake SP (28 core example) • New IO Virtualization design, enables up to 3x BW scaling on large UPI PCIe Gen4 PCIe Gen4 UPI PCIe Gen4
payloads (2x frequency, larger TLB, supports 2M/1G pages for in CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC translation requests) SNC Core SNC Core SNC Core SNC Core SNC Core SNC Core
• New P2P credit fabric implementation to reach top P2P BW targets CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC SNC Core SNC Core SNC Core SNC Core SNC Core SNC Core
3 independently clocked UPI links Controller
Ch 0 CHA//SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC Memory Ch 0 Memory 4 Memory Controllers with enhanced per channel schedulers Ch 1 Controller SNC Core SNC Core SNC Core SNC Core Ch 1 CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC • New memory controller design w/ optimizations
SNC Core SNC Core SNC Core SNC Core SNC Core SNC Core Controller
Intel® Total Memory Encryption (TME) Ch 0 CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC Memory Ch 0 Memory Ch 1 Controller SNC Core SNC Core SNC Core SNC Core Ch 1
• DRAM encrypted using AES-XTS 128bit CHA/SF/LLC CHA/SF/LLC PCIe DMI/ PCIeGen4 Gen4 CBDMA UPI Intel Optane Persistent Memory 200 Series (Barlow Pass) SNC Core SNC Core ICL Core • Higher speed and better power profile
Emphasis On Performance Scalability
Results have been estimated based on pre-production tests at iso core count and frequency as of July 2020. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. Intel® Speed Select Technology (Intel® SST) Features Offers a suite of capabilities to allow users to re-configure the processor – dynamically, at runtime to match the usage / WL and maximize performance
Intel Speed Select Technology–Performance Profile Intel Speed Select Technology–Base Frequency
(Intel SST-PP) (Intel SST-BF) Frequency
Core Count
22 20 18 16 14 12 10 8 6 4 2 0 Core Count Base Frequency (SST-BF On) Max Turbo Frequency Base Frequency (SST-BF Off)
Intel Speed Select Technology–Core Power Intel Speed Select Technology–Turbo Frequency Intel SST-CP (Intel SST-TF)
Total Available Cores Cores Running Frequency/Power High Priority WL
Min. Frequency
PCU 21 17 13 9 5 1
New Intel SST Capabilities Enable Prioritization Of Critical WLs with Ease Of Use & Deployment SPR Agenda
• Server CPUs, focus on Ice Lake-SP processor
• Client CPUs, focus on Tiger Lake
• Intel HW Discrete Graphics Accelerators, focus on Ponte Vecchio
• Intel HW FPGAs
Intel Proprietary For OneAPI Workshop with HLRN 2021 March 02-03, 2021 13 the World’s Best Processor for thin and light laptops
new CPU Thunderbolt™ 4 integration
new GPU deep software best Optimizations Industry leading Wi-Fi 6 AI experience
11th Gen Intel® Core™ Processor with Iris® Xe Graphics
As measured by Intel® Core™ i7-1185G7 processor’s status as the world’s best processor for productivity, creation, gaming, collaborating and entertainment on a thin and light laptop. For more complete information about performance and benchmark results, visit www.intel.com/11thgen. BEST for gaming BEST for BEST productivity for creation
11th Gen Intel® Core™ Processor with Iris® Xe Graphics
As measured by Intel® Core™ i7-1185G7 processor’s status as the world’s best processor for productivity, creation, gaming, collaborating and entertainment on a thin and light laptop. For more complete information about performance and benchmark results, visit www.intel.com/11thgen. up to 2.7X faster photo editing up to 2X higher framerates up to on popular 2X games faster video editing
11th Gen Intel® Core™ Processor with Iris® Xe Graphics
For more complete information about performance and benchmark results, visit www.intel.com/11thgen. Project Athena Innovation Program Innovating Beyond the CPU
Thunderbolt™ 4 compact antenna enabling edge-to-edge displays enabling edge-to-edge displays
Wi-Fi 6 intelligent (Gig +) best in class performance wireless connectivity
For more complete information about performance and benchmark results, visit www.intel.com/11thgen. Intel® Evo™ The best laptops for getting things done
Intel® Evo™ platform Verified designs via Project Athena for an exceptional experience, anywhere
As measured by industry benchmark and Representative Usage Guides testing and unique features of 11th Gen Intel® Core™ processors. Intel's comprehensive laptop innovation program Project Athena ensures designs are tested, measured, and verified against a premium specification and key experience indicators. For more complete information about performance and benchmark results, visit www.intel.com/Evo. Intel Proprietary – For OneAPI Workshop with HLRN 2021 – March 02-03, 2021 19 Intel Proprietary – For OneAPI Workshop with HLRN 2021 – March 02-03, 2021 20 Agenda
• Server CPUs, focus on Ice Lake-SP processor
• Client CPUs, focus on Tiger Lake
• Intel HW Discrete Graphics Accelerators, focus on Ponte Vecchio
• Intel HW FPGAs
Intel Proprietary For OneAPI Workshop with HLRN 2021 March 02-03, 2021 21 NEXT DRIVEN BY INSATIABLE AI COMPUTE AI INSATIABLE BY DRIVEN ERA IN HPC IN ERA 1970 1980 1990 2000 2010 2020
EXASCALE 10’s 1000’s 100,000’s 1,000,000’s 10,000,000’s # OF HPC SYSTEMS INTEL GPU IMPACT Billion+ users reach
Gen11 ARCHITECTURE FROM TERA FLOPS TO EXASCALE HPC HPC
HPC EXASCALE
DATACENTER / AI
ENTHUSIAST ONE GPU ARCHITECTURE MID-RANGE
INTEGRATED + ENTRY
TERAFLOPS SCALABILITY
COMPUTE AI PERFORMANCE
HPC PERFORMANCE
HPC FEATURES
SCALABILITY
MEMORY B A N D W I D T H
UNIFIED MEMORY HPC Agenda
• Server CPUs, focus on Ice Lake-SP processor
• Client CPUs, focus on Tiger Lake
• Intel HW Discrete Graphics Accelerators, focus on Ponte Vecchio
• Intel HW FPGAs
Intel Proprietary For OneAPI Workshop with HLRN 2021 March 02-03, 2021 27 2X 5.5M Core Performance Logic Elements
% Heterogeneous Up to 70 Lower Power 3D SiP Integration
Up to10 TFLOPS Intel 14 nm Tri-Gate
Most HBM2 Quad-Core Comprehensive Cortex-A53 Security DRAM ARM Processor Intel Proprietary ForIn OneAPI Package Workshop with HLRN 2021 March 02-03, 2021 28 Stratix 10 for System Advantages Architects
Performance 2X performance & Efficiency 1.2Tbit 70% lower power 800Gbit Up to 10 TFLOPS 400Gbit
f2 f Heterogeneous 3D SiP (System System 3 in Package) integration f1 Integration FPGA Up to 5.5M LE monolithic fabric
f5 Quad-Core ARM® A53 f4
Most comprehensive Security security capabilities in a high-end FPGA
Intel Proprietary For OneAPI Workshop with HLRN 2021 March 02-03, 2021 29 29 30 Heterogeneous System-in-Package (SiP) Integration
Future SiP devices integrate various technologies with FPGAs Enables higher efficiency and flexibility Mixing process nodes and system functions into a single device Reducing board area and power
DAC / ADC Optical Other & Other ASIC Processor Hardened Analog Protocols
Intel Proprietary For OneAPI Workshop with HLRN 2021 March 02-03, 2021 30 Intel Proprietary For OneAPI Workshop with HLRN 2021 March 02-03, 2021 31 Intel Proprietary For OneAPI Workshop with HLRN 2021 March 02-03, 2021 32 33