
The Need for New Hardware for Artificial Intelligence ‘DOTA’ Neural network visualization from POPLARTM OUR IPU LETS INNOVATORS CREATE THE NEXT BREAKTHROUGHS IN MACHINE INTELLIGENCE MACHINE INTELLIGENCE COMPUTE IS FAST GROWING 300,000x growth Source: OpenAI MODELS ARE ONLY GETTING BIGGER 1800 GPT2 1600 1.55Bn 1400 1200 1000 800 600 # Parameters (Millions) # Parameters BERT-Large 400 330M 200 ResNet50 25M 0 Jan Oct Mar 2016 2018 2019 COLOSSUS IPU the worlds most complex processor chip with 23.6Bn transistors - 125TFlops @ 120W - 1216 independent processor cores - Complete model held inside processor - 45TB/s memory bandwidth - 8TB/s on-chip exchange between cores - 2.5Tbps chip-to-chip IPU-Links C2 IPU PROCESSOR CARD 2 – COLOSSUS GC2 IPU PROCESSORS CARD-TO-CARD IPU-LINKSTM (2.5TBps) 250 TERA-FLOP MIXED PRECISION IPU COMPUTE @ 300W DELL-EMC IPU SERVER 8x C2 IPU PCIe CARDS | 2PFLOP IPU COMPUTE COMPUTE 2.0 DEVELOPMENT FLOW COMPUTE 1.0 COMPUTE 2.0 Integrated Design Environment: ML Graph Framework: e.g. Visual Studio/ Eclipse e.g. TensorFlow/ PyTorch Toolchain: e.g. ICC/ Cuda Graph Toolchain: POPLAR™ Compiler: e.g. GCC/ LLVM Graph Compiler Debugger: e.g. GDB Graph Engine Profiler: e.g. VTune Graph Debug / Optimization Program/ Learn Knowledge Algorithm from Model Data DeepVoice GRAPH VISUALIZATION FROM POPLAR® POPLAR® expands the ML Framework output to a full compute graph. THE POPLAR® SDK High-level graph APIs built on Poplar® native graph abstraction ™) Other frames PopART to be ML supported Poplar Graph With Training Support RUNTIME ( POPLAR ADV. Control Program POPLAR® STANDARD MACHINE LEARNING FRAMEWORKS POPLAR® : Graph Toolchain for IPU Graph Libraries Graph Framework Graph Compiler popnn poputils poplin popops poprandom popsparse IPU-Processor IPU Servers and PCIe Card System DEVELOPMENT ENVIRONMENT High Level Abstraction Poplar SDK Native RunTime Lower Level Coding PopART™ POPLAR® API • Deep learning/linear algebra • Poplar Advanced RunTime • API for close to the metal • Tailored for efficiency on • Supports ONNX file format development mainstream deep learning algos • Supports both inference and • Anything is possible (e.g. supervised SGD on a neural training • Can create new paradigms net) • High performance and flexible • Also used for custom operators • Has flexibility for different • Can be used as backend to other within other frameworks optimizers/modifications ML frameworks BULK SYNCHRONOUS PARALLEL (BSP) Software bridging model for parallel computing Compute BSP Sync Exchange 10,000s of compute threads All threads are Data is exchanged so that every thread all operating in parallel synchronized has all the data that it needs for the each with all the data that next phase of Compute they need, held locally BSP phase.1 BSP phase.2 BSP phase.3 THE IMPORTANCE OF BSP CPU/GPU - Optimization is difficult DRAM (Knowledge model held in DRAM) - Timing hazards occur Convolution Max Convolution Max Convolution Max - Hard to modify or change PROCESSOR Layer pooling Layer pooling Layer pooling - Data handling challenging Step.1 Step.2 Step.3 Step.4 BSP Sync between layers IPU - High performance IPU Convolution Max Convolution Max Convolution Max straight out of the box Layer pooling Layer pooling Layer pooling - No timing hazards BSP compute phases - Easy to modify makes modifying layers easy - Whole Knowledge Model Convolution Max Convolution NEW Convolution Max held inside the IPU IPU Layer pooling Layer LAYER Layer pooling POPLAR™ : Graph Toolchain POPLAR Graph Libraries POPLAR Framework popnn poputils poplin popops poprandom popsparse POPLAR® C++ / Python Graph Framework Graph Toolchain POPLAR POPLAR Visualization & Debug Tools Compute Graph POPLAR Graph Compiler POPLAR Graph Engine IPU-Processor IPU Servers and PCIe Card System OPEN-SOURCE GRAPH LIBRARIES > 50 open-source GRAPH FUNCTIONS example GRAPH FUNCTION 32in_32out_Fully_Connected_Layer available including (matmul, conv, etc) built from… > 750 optimized COMPUTE ELEMENTS such as (ReduceAdd, AddToChannel, Zero, etc) easily create new GRAPH FUNCTIONS using the library of COMPUTE ELEMENTS modify and create new COMPUTE ELEMENTS share library elements and new innovations POPLAR GRAPH FRAMEWORK C++ / PYTHON – POPLAR GRAPH FRAMEWORK LETS YOU EASILY MODIFY OR CREATE YOUR OWN GRAPH FUNCTIONS POPLAR® MAPS AND COMPILES GRAPH TO IPUs POPLAR® GRAPH COMPILER: Load balances code across IPU-CORES Allocates data to IN-PROCESSOR-MEMORY Orchestrates data exchanges POPLAR® GRAPH ENGINE: Executes graph under BSP on IPU or multiple IPUs ResNet50 network visualization from POPLARTM WE HAVE DEVELOPED NEW HARDWARE AND SOFTWARE THAT LETS INNOVATORS CREATE THE NEXT GENERATION OF MACHINE INTELLIGENCE THANK YOU [email protected] @fleurdevie.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages20 Page
-
File Size-