Opencapi, Gen-Z, CCIX: Technology Overview, Trends, and Alignments
Total Page:16
File Type:pdf, Size:1020Kb
OpenCAPI, Gen-Z, CCIX: Technology Overview, Trends, and Alignments BRAD BENTON| AUGUST 8, 2017 Agenda 2 | OpenSHMEM Workshop | August 8, 2017 AGENDA Overview and motivation for new system interconnects Technical overview of the new, proposed technologies Emerging trends Paths to convergence? 3 | OpenSHMEM Workshop | August 8, 2017 Overview and Motivation 4 | OpenSHMEM Workshop | August 8, 2017 NEWLY EMERGING BUS/INTERCONNECT STANDARDS Three new bus/interconnect standards announced in 2016 CCIX: Cache Coherent Interconnect for Accelerators ‒ Formed May, 2016 ‒ Founding members include: AMD, ARM, Huawei, IBM, Mellanox, Xilinx ‒ ccixconsortium.com Gen-Z ‒ Formed August, 2016 ‒ Founding members include: AMD, ARM, Cray, Dell EMC, HPE, IBM, Mellanox, Micron, Xilinx ‒ genzconsortium.org OpenCAPI: Open Coherent Accelerator Processor Interface ‒ Formed September, 2016 ‒ Founding members include: AMD, Google, IBM, Mellanox, Micron ‒ opencapi.org 5 5 | OpenSHMEM Workshop | August 8, 2017 NEWLY EMERGING BUS/INTERCONNECT STANDARDS Motivations for these new standards Tighter coupling between processors and accelerators (GPUs, FPGAs, etc.) ‒ unified, virtual memory address space ‒ reduce data movement and avoid data copies to/from accelerators ‒ enables sharing of pointer-based data structures w/o the need for deep copies Open, non-proprietary standards-based solutions Higher bandwidth solutions ‒ 25Gbs and above vs. 16Gbs forPCIe-Gen4 Better exploitation of new and emerging memory/storage technologies ‒ NVMe ‒ Storage class memory (SCM), persistent memory ‒ HMC, HBM, etc. Ease of programming ‒ accelerator operates in the address space of the process ‒ support for atomic operations 6 6 | OpenSHMEM Workshop | August 8, 2017 NEWLY EMERGING BUS/INTERCONNECT STANDARDS Why 3 different standards bodies? Different groups have been working to solve similar problems However, each approach has its differences Many companies involved with more than one consortium Possible shake outs/convergence as things move forward We’ve been here before: Future I/O + Next Generation I/O => InfiniBand But the landscape is different this time around 7 7 | OpenSHMEM Workshop | August 8, 2017 Technical Overview 8 | OpenSHMEM Workshop | August 8, 2017 OpenCAPI: OPEN COHERENT ACCELERATOR PROCESSOR INTERFACE www.opencapi.org Tightly coupled interface between processor, accelerators and memory ‒ Operates on virtual addresses ‒ virt-to-phys translation occurs on the host CPU ‒ OpenCAPI 3.0: ‒ Bandwidth: 25 Gps/lane x8 ‒ Coherent access to system memory from accelerator ‒ OpenCAPI 4.0: ‒ Support for caching on accelerators ‒ Bandwidth: Support for additional link widths: x4, x8, x16, x32 Use Cases ‒ Coherent access from accelerator to system memory ‒ Advanced memory technologies ‒ Coherent storage controller ‒ Agnostic to processor architecture http://www.opencapi.org/ 9 9 | OpenSHMEM Workshop | August 8, 2017 ATTRIBUTES OpenCAPI Integrated into POWER9 (e.g., Zaius) Supports features of interest to NIC/FPGA vendors ‒ Virtual address translation services ‒ Aggregation of accelerator & system memory Communication with OpenCAPI-attached devices managed by vendor-specific drivers/libraries Accelerator holds virtual address; address translation managed by the host http://opencapi.org/wp-content/uploads/2016/11/OpenCAPI-Overview-SC16-vf.pptx 10 10 | OpenSHMEM Workshop | August 8, 2017 GEN-Z: MEMORY SEMANTIC FABRIC www.genzconsortium.org Scalable from component to cross-rack communications ‒ Direct attach, switched, or fabric topologies ‒ Bandwidth: ‒ 32GB/s to 400+ GB/s ‒ Support for intermediate speeds ‒ Can gateway to other networks e.g., Enet, InfiniBand ‒ Unify general data access as memory operations ‒ byte addressable load/store ‒ messaging (put/get) ‒ IO (block memory) Use Cases ‒ Component disaggregation ‒ Persistent memory ‒ Long haul/rack-to-rack interconnect ‒ Rack-scale composability http://www.genzconsortium.org/ 11 11 | OpenSHMEM Workshop | August 8, 2017 ATTRIBUTES Gen-Z Defines new mechanicals/connectors But will also integrate with existing mechanical form factors, connectors, and cables Multiple ways to move data: ‒ Load/store ‒ Messaging interfaces ‒ Block I/O interfaces Gen-Z to provide libfabric integration for distributed messaging (MPI, OpenSHMEM, etc.) Scalability ‒ Up to 4096 components/subnet ‒ Up to 64K subnets http://www.genzconsortium.org/ 12 12 | OpenSHMEM Workshop | August 8, 2017 CCIX: CACHE COHERENT INTERCONNECT FOR ACCELERATORS www.ccixconsortium.com Tightly coupled interface between processor, Processor/ Processor accelerators and memory Bridge ‒ Bandwidth: CCIX CCIX ‒ 16/20/25 Gps/lane ‒ Hardware cache coherence enabled across the link CCIX CCIX ‒ Accelerator CCIX attached Driver-less and interrupt-less framework for data Memory sharing Use Cases Processor ‒ Allows low-latency main memory expansion Processor CCIX ‒ Extend processor cache coherency to accelerators, CCIX CCIX network/storage adapters, etc. CCIX CCIX CCIX CCIX CCIX ‒ Supports multiple ISAs over a single Switch Accel Accel CCIX CCIX interconnect standard CCIX CCIX CCIX CCIX CCIX CCIX CCIX Accel Accel CCIX Accel Accel CCIX CCIX http://www.ccixconsortium.com/ 13 13 | OpenSHMEM Workshop | August 8, 2017 ATTRIBUTES CCIX Link layer is a straightforward extension of PCIe In box/node system interconnect Preserves existing mechanicals, connectors, etc. Communication with CCIX-attached devices managed by vendor-specific drivers/libraries Address translation services via ATS/PRI http://www.ccixconsortium.com/ 14 14 | OpenSHMEM Workshop | August 8, 2017 COMPARISONS Standard Physical Topology Unidirectional Mechanicals Coherence Layer Bandwidth p2p and 32-50GB/s x16 Full cache coherence between CCIX PCIe PHY PCIe switched processors and accelerators Supports existing PCIe Multiple Signaling Rates: mechanicals/form IEEE 802.3 16, 25, 28, 56 GT/s factors Does not specify cache coherent agent Short and p2p and GenZ operations, but does specify protocols Long Reach switched Multiple link widths: Will develop new, Gen-Z that support cache coherent agents PHY 1 to 256 lanes specific mechanicals/form factors IEEE 802.3 In definition, see Zaius 25GB/s x8 Coherent access to system memory OpenCAPI 3.0 Short Reach p2p design for a possible approach Multiple link widths IEEE 802.3 Coherent access to memory OpenCAPI 4.0 p2p x4, x8, x16, x32 Short Reach Cache on accelerator 12.5, 25, 50, 100GB/s 15 15 | OpenSHMEM Workshop | August 8, 2017 Trends 16 | OpenSHMEM Workshop | August 8, 2017 OPENCAPI MEMBERSHIP Board Level Contributor Level Observer/Academic Level AMD Amphenol Achronix Google Microsemi Applied Materials IBM Molex* Dell EMC Mellanox Parade ELI Beamlines Micron Samsung Everspin* NVIDIA SK hynix* HPE WesternDigital TE Connectivity* NGCodec Xilinx Tektronix SmartDV* Toshiba SuperMicro Synology Univ. Cordoba *New Members (since March, 2017) 17 | OpenSHMEM Workshop | August 8, 2017 GEN-Z MEMBERSHIP General Member Alpha Data IBM Numascale* AMD Integrated Device Tech PDLA Group Amphenol Corporation IntelliProp Red Hat ARM Jabil Circuit Samsung Avery Design* Lenovo Seagate Broadcom Lotes SK Hynix Cadence* Luxchare* SMART Modular Tech* Cavium Mellanox Spin Transfer Cray Mentor Graphics* TE Connectivity* Dell EMC Micron Tyco Electronics* Everspin Technologies* Microsemi Western Digital FoxxConn Molex* Xilinx HPE NetApp* Yadro Huawei Nokia *New Members (since March, 2017) 18 | OpenSHMEM Workshop | August 8, 2017 CCIX MEMBERSHIP Promoter Level Contributor Level Adopter Level No Longer Members AMD Amphenol Integrated Device Tech Arteris ARM Avery Design INVECAS Bull/Atos Huawei Broadcom Netronome* IBM Mellanox Cadence Phytium Tech* Teledyne Qualcomm Cavium PLDA Group* TSMC Xilinx Keysight Shanghai Zhaoxin* Lenovo* Silicon Labs* Micron SmartDV* Netspeed Red Hat Samsung* SK hynix* Synopsys TE Connectivity *New Members (since March, 2017) 19 | OpenSHMEM Workshop | August 8, 2017 CONSORTIA MEMBERSHIP DISTRIBUTION Membership Distribution: March, 2017; All Levels Consortia Membership Breakdown: Consortia Membership Count Single & Multiple Memberships 45 40 35 22 23 30 25 20 17 11 15 12 Member Count Member 10 5 10 13 12 30 0 CCIX Gen-Z OpenCAPI CCIX Gen-Z OpenCAPI Single Membership Multiple Memberhips 20 20 | OpenSHMEM Workshop | August 8, 2017 CONSORTIA MEMBERSHIP DISTRIBUTION Membership Distribution: August, 2017; All Levels Consortia Membership Breakdown: Consortia Membership Count Single & Multiple Memberships 45 40 35 28 28 30 25 25 20 19 16 15 Member Count Member 10 16 5 9 12 41 0 CCIX Gen-Z OpenCAPI CCIX Gen-Z OpenCAPI Single Membership Multiple Memberhips 21 21 | OpenSHMEM Workshop | August 8, 2017 CONSORTIA MEMBERSHIP DISTRIBUTION Membership Distribution: August, 2017; Voting Levels Consortia Membership Breakdown: Consortia Membership Count (Voting) Single & Multiple Memberships (Voting) 45 40 17 20 35 30 25 25 20 15 Member Count Member 10 16 12 16 5 41 5 0 4 CCIX Gen-Z OpenCAPI CCIX Gen-Z OpenCAPI Single Membership Multiple Memberhips 22 22 | OpenSHMEM Workshop | August 8, 2017 MULTIPLE MEMBERSHIP DISTRIBUTION August, 2017 (Voting) August, 2017 CCIX/Gen-Z CCIX/Gen-Z/OpenCAPI 40% CCIX/Gen-Z/OpenCAPI CCIX/Gen-Z 40% 29% 42% March, 2017 CCIX/Gen-Z/OpenCAPI CCIX/Gen-Z Gen-Z/OpenCAPI CCIX/OpenCAPI 31% 38% 20% 0% Gen-Z/OpenCAPI CCIX/OpenCAPI 25% 4% Gen-Z/OpenCAPI CCIX/OpenCAPI 31% 0% 23 | OpenSHMEM Workshop | August 8, 2017 MEMBERSHIP PAIRINGS August, 2017 Voting Membership Levels August, 2017 All Membership Levels GenZ/OpenCAPI 33% March, 2017 All Membership Levels GenZ/OpenCAPI