Opencapi and Its Roadmap Myron Slota, President Opencapispeaker Name, Consortium Title Company/Organization Name

Opencapi and Its Roadmap Myron Slota, President Opencapispeaker Name, Consortium Title Company/Organization Name

OpenCAPI and its Roadmap Myron Slota, President OpenCAPISpeaker name, Consortium Title Company/Organization Name Join the Conversation #OpenPOWERSummit Industry Collaboration and Innovation OpenCAPI and its Roadmap Topics . Industry Background and its influence on IBM’s IO and microprocessor roadmap . Limitations of CAPI running over PCIe . Creation of the OpenCAPI architecture and its benefits . OpenCAPI Consortium . Potential solutions based on coherent attached accelerators Industry Background that Defined OpenCAPI . Emerging workloads driving computational demand (e.g., AI, cognitive computing) . Moore’s Law not being supported by traditional silicon scaling Computation Data Access . Driving increased dependence on Hardware Acceleration for performance to continue on Moore’s Law • Hyperscale Datacenters and HPC • Require much higher network bandwidth (100 Gb/s -> 200 Gb/s -> 400Gb/s are emerging) • HPC and Artificial Intelligence (deep learning, inferencing) • Need more bandwidth between accelerators and memory • Emerging memory/storage technologies • Driving need for bandwidth with low latency 4 Industry Background that Defined OpenCAPI . Hardware accelerators are defining the attributes required of a high performance bus • Offload the computational demand on the microprocessor – if you are going to use accelerators, the bus needs to handle large amounts of data quickly • Growing demand for network performance and offload Computation Data Access • Introduction of device coherency requirements • Emergence of complex storage and memory solutions • Supports all various form factors (e.g., GPUs, FPGAs, ASICs, etc.) 5 …all relevant to modern data centers IBM’s IO Offering Legacy . IBM IO offerings has always been leading edge (IBM's ‘PowerAccel’ family) . Of most recent .PCIE Advanced Generations (first in Gen4) .NVLink (first of it’s kind proprietary link to NVIDIA GPUs) .CAPI (coherent interface that ran over PCIe) .OpenCAPI (a new architecture) 6 POWER Processor Technology Roadmap POWER9 Family 14nm POWER8 Family Built for the Cognitive Era 22nm POWER7+ − Enhanced Core and Chip 32 nm Enterprise & Architecture Optimized for POWER7 Big Data Optimized Emerging Workloads 45 nm Enterprise − Processor Family with - Up to 12 Cores Scale-Up and Scale-Out Enterprise - 2.5x Larger L3 cache - SMT8 - On-die acceleration - CAPI Acceleration Optimized Silicon - 8 Cores - Zero-power core idle state - High Bandwidth GPU Attach - SMT4 − Premier Platform for - eDRAM L3 Cache Accelerated Computing 1H10 2H12 1H14 – 2H16 2H17 – 2H18+ © 2018 IBM Corporation 7 OpenCAPI and its Roadmap POWER8 . Felt a need to introduce to industry of a coherent interface for accelerators . The IO offering expanded • Higher performance with PCIe Gen3 POWER8 • Introduced CAPI 1.0 (Coherent Accelerator Processor Interface) Family 22nm that runs over PCIe Enterprise & Big Data Optimized . CAPI 1.0 Acceptance - Up to 12 Cores • Power Architecture specific - SMT8 - CAPI Acceleration - High Bandwidth GPU • Improvement needed in ease of implementation (i.e., PSL), use Attach and programing 1H14 – • Newness and limited products that leveraged this technology 2H16 • However, proved there was Industry interest to drive accelerators OpenCAPI and its Roadmap POWER9 . Continued expansion of the IO offering . Higher performance with PCIe Gen4 and NVLink 2.0 . Given the validation for the need of a coherent high POWER9 Family performance bus 14nm IBM worked to improve the CAPI architecture that runs Built for the Cognitive Era over PCIe and introduced CAPI 2.0 − Enhanced Core and Chip Architecture Optimized for • Improved bandwidth performance by ~3X Emerging Workloads • Reduced overhead complexity in the accelerator design address − Processor Family with translation on host resulting in more efficiency Scale-Up and Scale-Out Optimized Silicon In addition, IBM introduced a new IO architecture and − Premier Platform for released the OpenCAPI 3.0 and OpenCAPI 3.1 specifications Accelerated Computing 2H17 – 2H18+ Why OpenCAPI and what is it? . OpenCAPI is a new ‘bottom’s up’ IO standard . Key Attributes of OpenCAPI 3.0 • Open IO Standard – Choice for developers and others to contribute and grow an ecosystem • Coherent interface – Microprocessor memory, accelerator and caches share the same memory space • Architecture agnostic – Capable going beyond Power Architecture • High performance – No OS/Hypervisor/FW Overhead for Low Latency and High Bandwidth • Not tied to Power – Architecture Agnostic • Ease of programing • Ease of implementation with minimal accelerator design overhead • Ideal for accelerated computing and SCM including various form factors (FPGA, GPU, ASIC, TPU, etc.) • Optimized for within a single system node • Supports heterogeneous environment – Use Cases . OpenCAPI 3.1 . Applies OpenCAPI technology for use of standard DRAM off the microprocessor . Based on an Open Memory Interface (OMI) . Further tuned for extreme lower latency Use Cases - A True Heterogeneous Architecture Built Upon OpenCAPI OpenCAPI specifications are downloadable from the website OpenCAPI 3.0 at www.opencapi.org - Register - Download OpenCAPI 3.1 Proposed POWER Processor Technology and I/O Roadmap POWER7 Architecture POWER8 Architecture POWER9 Architecture POWER10 2010 2012 2014 2016 2017 2018 2019 2020+ POWER7 POWER7+ POWER8 POWER8 P9 SO P9 SU P9 P10 8 cores 8 cores 12 cores w/ NVLink 12/24 cores 12/24 cores w/ Adv. I/O TBA cores 45nm 32nm 22nm 12 cores 14nm 14nm 12/24 cores 22nm 14nm New Micro- Enhanced New Micro- Enhanced New Micro- Enhanced New Micro- Architecture Micro- Architecture Micro- Architecture Micro- Enhanced Architecture Architecture Architecture Architecture Micro- With NVLink Direct attach Architecture memory Buffered New Process New Process New Process Memory New New Technology Technology Technology New Process Memory Technology Technology Subsystem 65 GB/s 65 GB/s 210 GB/s 210 GB/s 150 GB/s 210 GB/s 350+ GB/s 435+ GB/s Sustained Memory Bandwidth PCIe Gen2 PCIe Gen2 PCIe Gen3 PCIe Gen3 PCIe Gen4 x48 PCIe Gen4 x48 PCIe Gen4 x48 PCIe Gen5 Standard I/O Interconnect 20 GT/s 25 GT/s 25 GT/s N/A N/A N/A 25 GT/s 32 & 50 GT/s 160GB/s Advanced I/O Signaling 300GB/s 300GB/s 300GB/s CAPI 2.0, CAPI 2.0, CAPI 2.0, CAPI 1.0 , OpenCAPI3.0, OpenCAPI3.0, OpenCAPI4.0, N/A N/A CAPI 1.0 NVLink TBA Advanced I/O Architecture NVLink NVLink NVLink © 2018 IBM Corporation Statement of Direction, Subject to Change 12 Development Activities Leading to Solutions with Development Partners • Storage Class Memory • DRAM Memory Solutions • FPGA Accelerators • GPUs • Smart NICs • ASICs • Systems Comparison of Memory Paradigms .Common physical interface between non-memory and memory devices .OpenCAPI protocol was architected to minimize latency; excellent for classic DRAM memory .Extreme bandwidth beyond classical DDR memory interface .Agnostic interface will handle evolving memory technologies in the future (e.g., compute-in-mem) .Ability to handle a memory buffer to decouple raw memory and host interface to optimize power, cost, perf Main Memory Example: Basic DDR attach DDR4/5 OpenCAPI 3.1 Architecture Processor Chip DLx/TLx Data Ultra Low Latency ASIC buffer chip adding +5ns on top of native DDR direct connect!! Emerging Storage Class Memory Storage Class Memories have the potential to be SCM the next disruptive technology….. Processor Chip DLx/TLx Data Examples include ReRAM, MRAM, Z-NAND…… All are racing to become the defacto Tiered Memory DDR4/5 Storage Class Memory tiered with traditional DDR SCM Memory all built upon OpenCAPI 3.1 & 3.0 Processor Chip DLx/TLx Data DLx/TLx Data architecture. Still have the ability to use Load/Store Semantics Acceleration Paradigms with Great Performance Memory Transform Example: Basic work offload OpenCAPI is ideal for acceleration due Acc to Bandwidth to/from accelerators, best Processor Chip DLx/TLx Data of breed latency, and flexibility of an Open architecture Examples: Machine or Deep Learning such as Natural Language processing, sentiment analysis or other Actionable Intelligence using OpenCAPI attached memory Egress Transform Ingress Transform Acc Acc Processor Chip Processor Chip DLx/TLx Data DLx/TLx Data Examples: Video Analytics, Network Security, Examples: Encryption, Compression, Erasure prior to Deep Packet Inspection, Data Plane Accelerator, delivering data to the network or storage Video Encoding (H.265), High Frequency Trading etc NeedleNeedle-in-In-a-A-haystack-Haystack Engine Engine Bi-Directional Transform Acc Large Haystack Processor Chip Data Processor Chip Of Data DLx/TLx DLx/TLx Needles Acc Acc 15 Examples: Database searches, joins, intersections, merges Examples: NoSQL such as Neo4J with Graph Node Traversals, etc Only the Needles are sent to the processor OpenCAPI Consortium 5 OpenCAPI Consortium Incorporated September 13, 2016 Announced October 14, 2016 • Open forum founded by AMD, Google, IBM, Mellanox, and Micron • Manage the OpenCAPI specification, Establish enablement, Grow the ecosystem • Currently over 35 members • Why its own consortium? Architecture agnostic thus capable of going beyond Power Architecture • Consortium now established • Established Board of Directors (AMD, Google, IBM, Mellanox Technologies, Micron, NVIDIA, Western Digital, Xilinx) • Governing Documents (Bylaws, IPR Policy, Membership) with established Membership Levels • Technical Steering Committee with Work Group Process established • Marketing/Communications Committee 5 • Website www.opencapi.org OpenCAPI Consortium Incorporated September 13, 2016 Announced October

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    25 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us