The IBM POWER9 Scale Up Processor

Jeff Stuecheli William Starke POWER Systems, IBM Systems

© 2018 IBM Corporation POWER Processor Technology Roadmap

POWER9 Family 14nm

POWER8 Family Built for the Cognitive Era 22nm POWER7+ − Enhanced Core and Chip 32 nm Enterprise & Architecture Optimized for POWER7 Optimized Emerging Workloads 45 nm Enterprise − Processor Family with - Up to 12 Cores Scale-Up and Scale-Out Enterprise - 2.5x Larger L3 cache - SMT8 - On-die acceleration - CAPI Acceleration Optimized Silicon - 8 Cores - Zero-power core idle state - High Bandwidth GPU Attach - SMT4 − Premier Platform for - eDRAM L3 Cache Accelerated Computing

1H10 2H12 1H14 – 2H16 2H17 – 2H18+

© 2018 IBM Corporation 2 POWER9 Processor – Common Features

SMP/Accelerator Signaling Memory Signaling Leadership New Core Hardware Acceleration Platform • Stronger thread performance Core Core Core Core • Enhanced on-chip acceleration • Efficient agile L2 L2 L2 L2 • NVLink 2.0: High bandwidth • POWER ISA v3.0 L3 Region L3 Region L3 Region L3 Region and advanced new features (25G) L3 Region L3 Region L3 Region L3 Region • CAPI 2.0: Coherent accelerator and L2 L2 L2 L2 Enhanced Cache Hierarchy storage attach (PCIe G4) • 120MB NUCA L3 architecture Core Core Core Core • OpenCAPI: Improved latency and bandwidth, open interface (25G) • 12 x 20-way associative regions PCIe Signaling PCIe On-Chip Accel Signaling SMP

L3 Region L3 Region SMP Interconnect & L3 Region L3 Region • Advanced replacement policies L2 L2 Chip Accelerator Enablement L2 L2 State of the Art I/O Subsystem

• Fed by 7 TB/s on-chip bandwidth Off - Core Core Core Core • PCIe Gen4 – 48 lanes

Cloud + Virtualization Innovation SMP/Accelerator Signaling Memory Signaling High Bandwidth Signaling Technology • Quality of service assists 14nm finFET Semiconductor Process • 16 Gb/s interface • New interrupt architecture • Improved device performance and – Local SMP • Workload optimized frequency reduced energy • PowerAXON 25 GT/sec Link interface • Hardware enforced trusted execution • 17 layer metal stack and eDRAM – Accelerator, remote SMP • 8.0 billion transistors

© 2018 IBM Corporation 3 POWER9 – Dual Memory Subsystems

Scale Out Scale Up Direct Attach Memory Buffered Memory

8 Direct DDR4 Ports 8 Buffered Channels • Up to 150 GB/s of sustained bandwidth • Up to 230GB/s of sustained bandwidth • Low latency access • Extreme capacity – up to 8TB / socket • Commodity packaging form factor • Superior RAS with chip kill and lane sparing • Adaptive 64B / 128B reads • Compatible with POWER8 system memory • Agnostic interface for alternate memory innovations

© 2018 IBM Corporation 4 PowerAXON  High-speed 25 GT/s Signaling

Power Processor

Utilize Best-of-Breed Power Processor

25 GT/s Optical-Style Flexible & Modular Coherence Packaging Signaling Technology OpenCAPI Infrastructure App

FPGA

© 2018 IBM Corporation 5 Scale Out Scale Up Direct Attach Memory Buffered Memory 2 Socket SMP 16 Socket SMP

PowerAXON PowerAXON x24 x48

DDRx4 DMIx4

Mem Cntl Mem Cntl

SMPx2 SMPx3

PCIe PCIe

Mem Cntl Mem Cntl

DDRx4 DMIx4

PowerAXON PowerAXON x24 x48

© 2018 IBM Corporation 6 Scale Up Chipset Block Diagram

PCI G4 Processor Accel Local 3x32 x48 I/O Cores SMP Attach Links (16 GHz) Cache and (16 GHz) Interconnect

32B 4 ops Bypass Remote 32B SMP Link Active 4 KB Operations OpenCAPI NVLINK Control 64 + 12 2 KB Control Control

Connect to Framing / 4x8 6x8 4x24 Memory other POWER9 Parsing PowerAXON 25 GHz Signaling Scale-up chips, Control GPU’s, and x96 OpenCAPI Devices

8 Channels Command, 1B write, per Processor 2B read @ 9.6 GHz 4 ports of Centaur Memory Buffer 1600 MHz 10B DDR4 DDR4 Operation Framing / Sched DRAM and Data Parsing 16 MB DRAM 8 x Control Cache DRAM © 2018 IBM Corporation DRAM 7 Four Socket Server Topology

16 GHz x32

1 TB POWER9 POWER9 Buffered POWER9 POWER9 Scale-Up Scale-Up Memory Scale-Up Scale-Up

© 2018 IBM Corporation 8 Sixteen Socket Server Topology

16 GHz x32 25 GHz x24 POWER9 POWER9 POWER9 POWER9 Scale-Up Scale-Up Scale-Up Scale-Up

POWER9 POWER9 POWER9 POWER9 Scale-Up Scale-Up Scale-Up Scale-Up

POWER9 POWER9 POWER9 POWER9 Scale-Up Scale-Up Scale-Up Scale-Up

POWER9 POWER9 POWER9 POWER9 Scale-Up Scale-Up Scale-Up Scale-Up

© 2018 IBM Corporation 9 Power E980 Server

 Modular, Scalable POWER9 server . 1 to 4 x 5U CEC drawers + 2U Control Unit  POWER9 Enterprise processor  Up to 192 cores in a single system  Up to 64 TB DDR4 memory  Up to 32 PCIe Gen4 slots  PowerAXON 25Gb/s ports . Used for SMP cabling between nodes - 4x bandwidth improvement . Enabled for OpenCAPI accelerators

© 2018 IBM Corporation 10 PCIe Expansion Drawer

PCIe NVMe Bays

SMP/etc Conn.

Cable Card

CXP Conn FPGA

CXP Conn

Cable Card

CXP Conn FPGA

CXP Conn Fan-out Module Fan-out Module

© 2018 IBM Corporation 11 GPU on PowerAXON

16 GHz x32 25 GHz x8 POWER9 POWER9 POWER9 POWER9 Scale-Up Scale-Up Scale-Up Scale-Up

GPU GPU GPU GPU

GPU GPU GPU GPU

GPU GPU GPU GPU

GPU GPU GPU GPU

GPU GPU GPU GPU

GPU GPU GPU GPU

© 2018 IBM Corporation Demonstration of POWER9 capability, does not imply specific system plans 12 Storage Class Memory on PowerAXON

16 GHz x32 25 GHz x8 POWER9 POWER9 POWER9 POWER9 Scale-Up Scale-Up Scale-Up Scale-Up

32 TB SCM 32 TB SCM 32 TB SCM 32 TB SCM

32 TB SCM 32 TB SCM 32 TB SCM 32 TB SCM

32 TB SCM 32 TB SCM 32 TB SCM 32 TB SCM

32 TB SCM 32 TB SCM 32 TB SCM 32 TB SCM

© 2018 IBM Corporation Demonstration of POWER9 capability, does not imply specific system plans 13 Future Evolution of System Architecture

Yesterday’s Plumbing Tomorrow’s Differentiation

B B B B B B B B B B B B B B B B B B JEDEC Buffer OpenCAPI Northbound

Cores & Cores & Cores & Caches System Bus Caches Caches 768 GB/s OpenCAPI & PCI Yesterday’s Plumbing PCI Gen X OpenCAPI / NVLink Tomorrow’s Differentiation

CPU/Accelerator Bandwidth

1x 2x 5x 7-10x System bottleneck 14 © 2018 IBM Corporation 14 OpenCAPI Memory • Signaling  AXON @25.6GHz vs DDR4 @ 3200 MHz – 4 times the bandwidth per IO • Idle latency over traditional DDR – POWER8/9 Centaur design ~10 ns – OpenCAPI target of ~5 ns

• Centaur  One proprietary design • OpenCAPI  Open

SCM

© 2018 IBM Corporation 15 Proposed POWER Processor Technology and I/O Roadmap Todays talk

POWER7 Architecture POWER8 Architecture POWER9 Architecture POWER10

2010 2012 2014 2016 2017 2018 2019 2020+ POWER7 POWER7+ POWER8 POWER8 P9 SO P9 SU P9 P10 8 cores 8 cores 12 cores w/ NVLink 12/24 cores 12/24 cores w/ Adv. I/O TBA cores 45nm 32nm 22nm 12 cores 14nm 14nm 12/24 cores 22nm 14nm New Micro- Enhanced New Micro- Enhanced New Micro- Enhanced New Micro- Architecture Micro- Architecture Micro- Architecture Micro- Enhanced Architecture Architecture Architecture Architecture Micro- With NVLink Direct attach Architecture memory Buffered New Process New Process New Process Memory New New Technology Technology Technology New Process Memory Technology Technology Subsystem

65 GB/s 65 GB/s 210 GB/s 210 GB/s 150 GB/s 210 GB/s 350+ GB/s 435+ GB/s Sustained Memory Bandwidth

PCIe Gen2 PCIe Gen2 PCIe Gen3 PCIe Gen3 PCIe Gen4 x48 PCIe Gen4 x48 PCIe Gen4 x48 PCIe Gen5 Standard I/O Interconnect 20 GT/s 25 GT/s 25 GT/s N/A N/A N/A 25 GT/s 32 & 50 GT/s 160GB/s Advanced I/O Signaling 300GB/s 300GB/s 300GB/s CAPI 2.0, CAPI 2.0, CAPI 2.0, CAPI 1.0 , OpenCAPI3.0, OpenCAPI3.0, OpenCAPI4.0, N/A N/A CAPI 1.0 NVLink 1.0 TBA Advanced I/O Architecture NVLink2.0 NVLink2.0 NVLink3.0

© 2018 IBM Corporation Statement of Direction, Subject to Change 16 Thanks Questions?

© 2018 IBM Corporation