The Need for New Hardware for Artificial Intelligence
‘DOTA’ Neural network visualization from POPLARTM OUR IPU LETS INNOVATORS CREATE THE NEXT BREAKTHROUGHS IN MACHINE INTELLIGENCE MACHINE INTELLIGENCE COMPUTE IS FAST GROWING
300,000x growth
Source: OpenAI # Parameters (Millions) 1000 1200 1400 1600 1800 200 400 600 800 0 2016 Jan ResNet50 25M MODELS AREMODELS ONLY BIGGER GETTING BERT 330M - Large 2018 Oct 1.55Bn GPT2 2019 Mar COLOSSUS IPU the worlds most complex processor chip with 23.6Bn transistors
- 125TFlops @ 120W
- 1216 independent processor cores
- Complete model held inside processor
- 45TB/s memory bandwidth
- 8TB/s on-chip exchange between cores
- 2.5Tbps chip-to-chip IPU-Links C2 IPU PROCESSOR CARD
2 – COLOSSUS GC2 IPU PROCESSORS CARD-TO-CARD IPU-LINKSTM (2.5TBps) 250 TERA-FLOP MIXED PRECISION IPU COMPUTE @ 300W DELL-EMC IPU SERVER 8x C2 IPU PCIe CARDS | 2PFLOP IPU COMPUTE COMPUTE 2.0 DEVELOPMENT FLOW
COMPUTE 1.0 COMPUTE 2.0
Integrated Design Environment: ML Graph Framework: e.g. Visual Studio/ Eclipse e.g. TensorFlow/ PyTorch
Toolchain: e.g. ICC/ Cuda Graph Toolchain: POPLAR™
Compiler: e.g. GCC/ LLVM Graph Compiler
Debugger: e.g. GDB Graph Engine
Profiler: e.g. VTune Graph Debug / Optimization
Program/ Learn Knowledge Algorithm from Model Data DeepVoice GRAPH VISUALIZATION FROM POPLAR®
POPLAR® expands the ML Framework output to a full compute graph. THE POPLAR® SDK
High-level graph APIs built on Poplar® native graph
abstraction ™)
Other frames PopART to be ML supported Poplar Graph
With Training Support RUNTIME ( ADV. POPLAR Control Program POPLAR®
STANDARD MACHINE LEARNING FRAMEWORKS
POPLAR® : Graph Toolchain for IPU
Graph Libraries Graph Framework Graph Compiler popnn poputils poplin popops poprandom popsparse
IPU-Processor IPU Servers and PCIe Card System DEVELOPMENT ENVIRONMENT
High Level Abstraction Poplar SDK Native RunTime Lower Level Coding
PopART™ POPLAR® API
• Deep learning/linear algebra • Poplar Advanced RunTime • API for close to the metal • Tailored for efficiency on • Supports ONNX file format development mainstream deep learning algos • Supports both inference and • Anything is possible (e.g. supervised SGD on a neural training • Can create new paradigms net) • High performance and flexible • Also used for custom operators • Has flexibility for different • Can be used as backend to other within other frameworks optimizers/modifications ML frameworks BULK SYNCHRONOUS PARALLEL (BSP) Software bridging model for parallel computing
Compute BSP Sync Exchange
10,000s of compute threads All threads are Data is exchanged so that every thread all operating in parallel synchronized has all the data that it needs for the each with all the data that next phase of Compute they need, held locally
BSP phase.1 BSP phase.2 BSP phase.3 THE IMPORTANCE OF BSP
CPU/GPU
- Optimization is difficult DRAM (Knowledge model held in DRAM) - Timing hazards occur Convolution Max Convolution Max Convolution Max - Hard to modify or change PROCESSOR Layer pooling Layer pooling Layer pooling - Data handling challenging Step.1 Step.2 Step.3 Step.4
BSP Sync between layers IPU
- High performance IPU Convolution Max Convolution Max Convolution Max straight out of the box Layer pooling Layer pooling Layer pooling
- No timing hazards BSP compute phases - Easy to modify makes modifying layers easy
- Whole Knowledge Model Convolution Max Convolution NEW Convolution Max held inside the IPU IPU Layer pooling Layer LAYER Layer pooling POPLAR™ : Graph Toolchain
POPLAR Graph Libraries POPLAR Framework
popnn poputils poplin
popops poprandom popsparse POPLAR® C++ / Python Graph Framework
Graph Toolchain POPLAR POPLAR Visualization & Debug Tools Compute Graph
POPLAR Graph Compiler POPLAR Graph Engine IPU-Processor IPU Servers and PCIe Card System OPEN-SOURCE GRAPH LIBRARIES
> 50 open-source GRAPH FUNCTIONS example GRAPH FUNCTION 32in_32out_Fully_Connected_Layer available including (matmul, conv, etc) built from…
> 750 optimized COMPUTE ELEMENTS such as (ReduceAdd, AddToChannel, Zero, etc) easily create new GRAPH FUNCTIONS using the library of COMPUTE ELEMENTS modify and create new COMPUTE ELEMENTS
share library elements and new innovations POPLAR GRAPH FRAMEWORK
C++ / PYTHON – POPLAR GRAPH FRAMEWORK LETS YOU EASILY MODIFY OR CREATE YOUR OWN GRAPH FUNCTIONS POPLAR® MAPS AND COMPILES GRAPH TO IPUs
POPLAR® GRAPH COMPILER:
Load balances code across IPU-CORES
Allocates data to IN-PROCESSOR-MEMORY Orchestrates data exchanges POPLAR® GRAPH ENGINE:
Executes graph under BSP on IPU or multiple IPUs ResNet50 network visualization from POPLARTM
WE HAVE DEVELOPED NEW HARDWARE AND SOFTWARE THAT LETS INNOVATORS CREATE THE NEXT GENERATION OF MACHINE INTELLIGENCE THANK YOU [email protected] @fleurdevie