OneAPI workshop with HLRN 2021, March 02-03, 2021 Dr. Jean-Laurent Philippe, EMEA HPC Technical and Sales Director, DCG Sales Agenda

• Server CPUs, focus on Ice Lake-SP processor

• Client CPUs, focus on

• Intel HW Discrete Graphics Accelerators, focus on Ponte Vecchio

• Intel HW FPGAs

Intel Proprietary For OneAPI Workshop with HLRN 2021 March 02-03, 2021 2

Agenda

• Server CPUs, focus on Ice Lake-SP processor

• Client CPUs, focus on Tiger Lake

• Intel HW Discrete Graphics Accelerators, focus on Ponte Vecchio

• Intel HW FPGAs

Intel Proprietary For OneAPI Workshop with HLRN 2021 March 02-03, 2021 4 3rd Gen Intel® ® Scalable Processor (Ice Lake) Optimized for an outstanding HPC and AI experience

Higher memory Better performance bandwidth: 8 DDR4 Faster I / O with per core via new channels & PCIe Gen 4 architecture 3200MT/s

Supporting exascale Security Innovations storage with up to 6TB Intel® SGX & crypto Volume Ramp memory / socket and in Q1 2021 PMem acceleration

Department or Event Name Intel Confidential 5 Ice Lake-SP is new 3rd Gen Intel Xeon Scalable Processor

• 10nm+ process technology Whitley 2-socket System • 2-socket Whitley platform PCIe Gen4 PCIe Gen4

• Incorporates Sunny Cove core Ice Lake-SP Ice Lake-SP (ICX) (ICX) • Brings scalable and balanced UPI architecture for increased throughput DMI DMI and per-core performance across all Lewisburg R Lewisburg R workloads in the datacenter DDR4 DIMMs DDR4/Intel® Optane™ Persistent Memory

Results have been estimated based on pre-production tests as of July 2020. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.​ Ice Lake-SP new core, new uncore

Ice Lake Server CPU Core – Sunny Cove

• Core Core Core Core Core Core Core

• New ISA Usages for Server Core Core Core Core Core Core

Core Core Core Core Ice Lake Server – The SoC Core Core Core Core Core Core • Scalable Infrastructure and Architecture Core Core Core Core • Memory Hierarchy and IOs

Core Core • Performance and Power enhancements Die picture of a 28C Ice Lake-SP die Sunny Cove Core Microarchitecture

Cascade Lake Ice Lake (per core) (per core) Out-of-order Window 224 352 In-flight Loads + Stores 72 + 56 128 + 72 Scheduler Entries 97 160 Register Files – 180 + 168 280 +224 Integer + FP 70/thread; Allocation Queue 64/thread 140/1 thread L1D Cache (KB) 32 48 L1D BW (B/Cyc) – 128 + 64 128 + 64 Load + Store L2 Unified TLB 1.5K 2K Mid-level Cache (MB) 1 1.25 • Improved Front-end: higher capacity and improved branch predictor • Wider and deeper machine: wider allocation and execution resources + larger structures • Enhancements in TLBs, single thread execution, prefetching • Server enhancements – larger Mid-level Cache (L2) + second FMA ~18% Increase In IPC On Existing SPECcpu2017(est) Integer Rate Binaries

Results have been estimated based on pre-production tests at iso core count, frequency and memory BW per core as of July 2020. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.​ AVX Frequency Improvements

Class of Power level Instruction types per class Goal: Minimize frequency impact on AVX 512 instructions (not the full list) 0 SSE/256L all 64b and 128bit bit operations when not bounded by physical 256H FP Mul, INT Mul, VNNI, FMA 256b 1 512L VPCLMUL, VAES, VBMI, Ld, St 512b limits FP Mul, INT Mul, VNNI, FMA, 2 512H VPMADD52 512b • Not all AVX512 instructions consume high power Turbo Frequency scaling for 3 different instruction classes1 1,1

– 512-bit loads, 512-bit stores, 256-bit FP, 1,05 Significantly better frequency profile over prior generation integer multiply are a few examples 1 for AVX256 or AVX512 operations – Smarter mapping between instructions and 0,95 specific power levels 0,9 0,85

0,8 Provides software writers greater latitude when power uncostrained using these instructions to optimize their code 0,75 0,7

for performance Relative Frequencyper instruction class when SSE/256L 256H/512L 512H SKX CLX ICX 1Baseline: all cores active turbo frequency for SSE for each product

Results have been estimated based on pre-production part tests as of July 2020. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.​ Ice Lake-SP IO and Memory Hierarchy

Integrating PCIe Gen4 controllers Ice Lake SP (28 core example) • New IO Virtualization design, enables up to 3x BW scaling on large UPI PCIe Gen4 PCIe Gen4 UPI PCIe Gen4

payloads (2x frequency, larger TLB, supports 2M/1G pages for in CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC translation requests) SNC Core SNC Core SNC Core SNC Core SNC Core SNC Core

• New P2P credit fabric implementation to reach top P2P BW targets CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC SNC Core SNC Core SNC Core SNC Core SNC Core SNC Core

3 independently clocked UPI links Controller

Ch 0 CHA//SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC Memory Ch 0 Memory 4 Memory Controllers with enhanced per channel schedulers Ch 1 Controller SNC Core SNC Core SNC Core SNC Core Ch 1 CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC • New design w/ optimizations

SNC Core SNC Core SNC Core SNC Core SNC Core SNC Core Controller

Intel® Total Memory Encryption (TME) Ch 0 CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC Memory Ch 0 Memory Ch 1 Controller SNC Core SNC Core SNC Core SNC Core Ch 1

• DRAM encrypted using AES-XTS 128bit CHA/SF/LLC CHA/SF/LLC PCIe DMI/ PCIeGen4 Gen4 CBDMA UPI Intel Optane Persistent Memory 200 Series (Barlow Pass) SNC Core SNC Core ICL Core • Higher speed and better power profile

Emphasis On Performance Scalability

Results have been estimated based on pre-production tests at iso core count and frequency as of July 2020. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.​ Intel® Speed Select Technology (Intel® SST) Features Offers a suite of capabilities to allow users to re-configure the processor – dynamically, at runtime to match the usage / WL and maximize performance

Intel Speed Select Technology–Performance Profile Intel Speed Select Technology–Base Frequency

(Intel SST-PP) (Intel SST-BF) Frequency

Core Count

22 20 18 16 14 12 10 8 6 4 2 0 Core Count Base Frequency (SST-BF On) Max Turbo Frequency Base Frequency (SST-BF Off)

Intel Speed Select Technology–Core Power Intel Speed Select Technology–Turbo Frequency Intel SST-CP (Intel SST-TF)

Total Available Cores Cores Running Frequency/Power High Priority WL

Min. Frequency

PCU 21 17 13 9 5 1

New Intel SST Capabilities Enable Prioritization Of Critical WLs with Ease Of Use & Deployment SPR Agenda

• Server CPUs, focus on Ice Lake-SP processor

• Client CPUs, focus on Tiger Lake

• Intel HW Discrete Graphics Accelerators, focus on Ponte Vecchio

• Intel HW FPGAs

Intel Proprietary For OneAPI Workshop with HLRN 2021 March 02-03, 2021 13 the World’s Best Processor for thin and light laptops

new CPU ™ 4 integration

new GPU deep software best Optimizations Industry leading Wi-Fi 6 AI experience

11th Gen Intel® Core™ Processor with Iris® Xe Graphics

As measured by Intel® Core™ i7-1185G7 processor’s status as the world’s best processor for productivity, creation, gaming, collaborating and entertainment on a thin and light laptop. For more complete information about performance and benchmark results, visit www.intel.com/11thgen. BEST for gaming BEST for BEST productivity for creation

11th Gen Intel® Core™ Processor with Iris® Xe Graphics

As measured by Intel® Core™ i7-1185G7 processor’s status as the world’s best processor for productivity, creation, gaming, collaborating and entertainment on a thin and light laptop. For more complete information about performance and benchmark results, visit www.intel.com/11thgen. up to 2.7X faster photo editing up to 2X higher framerates up to on popular 2X games faster video editing

11th Gen Intel® Core™ Processor with Iris® Xe Graphics

For more complete information about performance and benchmark results, visit www.intel.com/11thgen. Project Athena Innovation Program Innovating Beyond the CPU

Thunderbolt™ 4 compact antenna enabling edge-to-edge displays enabling edge-to-edge displays

Wi-Fi 6 intelligent (Gig +) best in class performance wireless connectivity

For more complete information about performance and benchmark results, visit www.intel.com/11thgen. Intel® Evo™ The best laptops for getting things done

Intel® Evo™ platform Verified designs via Project Athena for an exceptional experience, anywhere

As measured by industry benchmark and Representative Usage Guides testing and unique features of 11th Gen Intel® Core™ processors. Intel's comprehensive laptop innovation program Project Athena ensures designs are tested, measured, and verified against a premium specification and key experience indicators. For more complete information about performance and benchmark results, visit www.intel.com/Evo. Intel Proprietary – For OneAPI Workshop with HLRN 2021 – March 02-03, 2021 19 Intel Proprietary – For OneAPI Workshop with HLRN 2021 – March 02-03, 2021 20 Agenda

• Server CPUs, focus on Ice Lake-SP processor

• Client CPUs, focus on Tiger Lake

• Intel HW Discrete Graphics Accelerators, focus on Ponte Vecchio

• Intel HW FPGAs

Intel Proprietary For OneAPI Workshop with HLRN 2021 March 02-03, 2021 21 NEXT DRIVEN BY INSATIABLE AI COMPUTE AI INSATIABLE BY DRIVEN ERA IN HPC IN ERA 1970 1980 1990 2000 2010 2020

EXASCALE 10’s 1000’s 100,000’s 1,000,000’s 10,000,000’s # OF HPC SYSTEMS INTEL GPU IMPACT Billion+ users reach

Gen11 ARCHITECTURE FROM TERA FLOPS TO EXASCALE HPC HPC

HPC EXASCALE

DATACENTER / AI

ENTHUSIAST ONE GPU ARCHITECTURE MID-RANGE

INTEGRATED + ENTRY

TERAFLOPS SCALABILITY

COMPUTE AI PERFORMANCE

HPC PERFORMANCE

HPC FEATURES

SCALABILITY

MEMORY B A N D W I D T H

UNIFIED MEMORY HPC Agenda

• Server CPUs, focus on Ice Lake-SP processor

• Client CPUs, focus on Tiger Lake

• Intel HW Discrete Graphics Accelerators, focus on Ponte Vecchio

• Intel HW FPGAs

Intel Proprietary For OneAPI Workshop with HLRN 2021 March 02-03, 2021 27 2X 5.5M Core Performance Logic Elements

% Heterogeneous Up to 70 Lower Power 3D SiP Integration

Up to10 TFLOPS Intel 14 nm Tri-Gate

Most HBM2 Quad-Core Comprehensive Cortex-A53 Security DRAM ARM Processor Intel Proprietary ForIn OneAPI Package Workshop with HLRN 2021 March 02-03, 2021 28 10 for System Advantages Architects

Performance 2X performance & Efficiency 1.2Tbit 70% lower power 800Gbit Up to 10 TFLOPS 400Gbit

f2 f Heterogeneous 3D SiP (System System 3 in Package) integration f1 Integration FPGA Up to 5.5M LE monolithic fabric

f5 Quad-Core ARM® A53 f4

Most comprehensive Security security capabilities in a high-end FPGA

Intel Proprietary For OneAPI Workshop with HLRN 2021 March 02-03, 2021 29 29 30 Heterogeneous System-in-Package (SiP) Integration

Future SiP devices integrate various technologies with FPGAs Enables higher efficiency and flexibility Mixing process nodes and system functions into a single device Reducing board area and power

DAC / ADC Optical Other & Other ASIC Processor Hardened Analog Protocols

Intel Proprietary For OneAPI Workshop with HLRN 2021 March 02-03, 2021 30 Intel Proprietary For OneAPI Workshop with HLRN 2021 March 02-03, 2021 31 Intel Proprietary For OneAPI Workshop with HLRN 2021 March 02-03, 2021 32 33