The Computer That Could Be Smarter Than Us Cognitive Computing

outthink limits The Computer that could be smarter than us Cognitive Computing Ingolf Wittmann Technical Director Leader of HPC Kinetica - Unstructured Databases What is Kinetica? Unparalleled acceleration Kinetica’s in-memory database powered by graphics processing units (GPUs) was built from the ground up to deliver truly real-time insights on data in 2.5x motion: orders of magnitude faster Bandwidth performance (potentially 100, 1000X) at 100100 ticktick QueryQuery Time:Time: CompetingCompeting SystemSystem PCI-EPCI-E x16x16 3.03.0 10% to 25% of the cost of traditional An industry first — POWER8 with data platforms. NVIDIA NVLink delivers 2.5X the Calculation* bandwidth to GPU accelerators, Data Transfer allowing you to experience Kinetica at 73 ticks 27 ticks the speed it was intended compared to x86 based systems. What are the Key Markets? 65% Reduction • Retail: Inventory Mgt, BI, Apps, Big Data tools, HPA Real-time results • Distribution / Logistics : Supply Chain Data Transfer Calculation* Mgt 26 ticks 14 ticks • Financial Services : Fraud Detection, AML 4040 ticktick QueryQuery Time:Time: S822LCS822LC forfor HPC,HPC, NVLinkNVLink 10x • Ad-Tech : More Targeted Marketing * Includes non-overlapping: CPU, GPU, and idle times. Performance • IoT : End Point Management, RFID With the unique capabilities of Tesla 65% reduction in data transfer time (3X improvement) in for P100 + POWER8, Kinetica has 2.4x POWER8 with NVLink the performance of competing Kinetica GPU-accelerated DB PCIe x16 3.0/x86 System System systems enabling you to analyze and • Less data-induced latency in all applications Xeon E5-2640 v4 Power Systems S822LC visualize large datasets in • with 4 Tesla K80s : Unique to POWER8 with NVLink with 4 Tesla P100s: milliseconds vs. hours or minutes. 73,320 queries per hour • Less coding to compensate for slow data movement! 188,852 queries per hour 1.95X of the 2.5X overall performance improvement attributable to NVLink IBM HPC/HPDA Cognitive driven workflow uses Deep Learning - Getting HPC to ‘Work Smart Not Hard’ • Typically HPC development is focused on 1/3 increased speed. of the calculations to achieve • The fastest calculation is the one which you don’t run! • Can we use machine learning to make better 4x decisions on which simulations give the most Orders of magnitude value? resolution increase. • Can we use machine learning to improve An industry first — POWER8 with NVIDIA NVLink delivers 2.5X the resolution of information? bandwidth to GPU accelerators, allowing you to experience Kinetica at the speed it was intended compared Cognitive steering of an ensemble of simulations to x86 based systems. Application of cognitive techniques in HPC can overcome and go beyond Moore’s law IBM HPC/HPDA Cognitive landscape: terms and relationship Big Data Artificial Deep Intelligence & Machine Learning Cognitive Learning (Neuronal Nets) Applications IBM HPC/HPDA Deep Learning / AI Lexicon • Artificial Intelligence > Machine Learning > Deep Learning • Deep Learning = Training (datacenter, compute intensive) + Inference (edge, embedded… closer to user) • Training = neural “inspired”, fed by millions of data points … repetition drives weighting and connection • Platform = Frameworks + Supporting Libraries + Compute • Compute = Acceleration + Extreme Bandwidth • Desired outcome: higher accuracy in perceptive tasks, a model for inference IBM HPC/HPDA Neuromorphic 1x “Right Brain Computing” 20x 100x ∞ TrueNorth P8 o= p i dSk k k http://research.ibm.com/cognitive-computing/neurosynaptic-chips.shtml#fbid=5o2q0UxHcFa IBM HPC/HPDA Deep Learning areas “ The general idea of deep learning is to use neural networks to build multiple layers of abstraction to solve a complex semantic problem.” - Aaron Chavez, IBM Watson Voice assisted recognition Critical environment investigation Fraud prevention Image recognition IBM HPC/HPDA Neuron Function (Architecture) o=dSpkik TrueNorth Chip – Synapse Chip k o Emulation of analog behaviour by +/- 255 INT variable o 2-dimensional on-chip synaptic weighted network and off-chip packet based thru-neuron routing for multi-chip scaling o Update of Synaptic network every ms (logical / biological clock), internal processing ~ 1MHz o Neuron fires a spike (45 pJ) to the network if in the last update cycle a threshold was reached or exceeded o Stochastic and leak behavior configurable Membrane potential for neuron j at time t: 255 Vj(t) = Vj(t-1) + Ai(t) * ij * [(1 - bj) * sj + sign(sj) * bj * F(|sj|, j)] + Leak i=0 S w r synapse membrane potential Weight {0,1} input matrix {0,1} Step {0,1} Random number {uint} {signed int} spike {0,1} Weight {signed int} +1 ( 0 ) j j j j * j j Leak = -1 * [(1 - c ) * l + sign(l ) * c F(|l |, r )] Weight {0,1} Step {0,1} Random number {uint} Leak weight {signed int} IBM HPC/HPDA Liquid Synapse an extreme Blue project Maintenance in 234 flights David Stöckel Julian Heyne Maximillian Löhr Pascal Nieters Liquid state machine with TrueNorth Chip – Synapse Chip Liquid State Machine IBM Neurosynaptic System Sensor Data Neural Network Readout IBM HPC/HPDA Accelerator Connection Bandwidths Extreme Processor / Accelerator Bandwidth and Reduced Latency Coherent Memory and Virtual Addressing Capability for all Accelerators OpenPOWER Community Enablement – Robust Accelerated Compute Options POWER9 PowerAccel State of the Art I/O and Acceleration Attachment Signaling • PCIe Gen 4 x 48 lanes – 192 GB/s duplex bandwidth • 25G Link x 48 lanes – 300 GB/s duplex bandwidth Robust Accelerated Compute Options with OPEN standards • On-Chip Acceleration – Gzip x1, 842 Compression x2, AES/SHA x2 • CAPI 2.0 – 4x bandwidth of POWER8 using PCIe Gen 4 • NVLink 2.0 – Next generation of GPU/CPU interconnect . Up to 2x bandwidth of NVLink1.0 . Easier programming model for complex analytic & cognitive applications • Coherency, virtual addressing, low overhead communication • OpenCAPI 3.0 – High bandwidth, low latency and open interface using 25G Link | 11 IBM HPC/HPDA Accelerators and today’s Systems Today Tomorrow Mellanox Synapse Chip FPGAFPGA Power + CAPI Nvida Quantum Chip FlashSystem Add improved Cognitive capabilities, integration of new technologies: Focus on Data, Analytics, Cognitive, and HPC full (e.g. SyNAPSE, Quantum computing), seamless enablement of workflow performance, Heterogeneous compute heterogeneous compute – on-premises and in the cloud IBM HPC/HPDA Summery: Cognitive Computing in an HPC environment Use real workloads/workflows to drive design points Data-induced latency is an issue for Co-design for customer value Application-driven design every installation Data-centric computing is our answer, pioneered by IBM Research Home of AI, Deep & Machine Learning, Neuromorphic Cognitive Computing, and neuronal networks Acknowledged by our competitors, governments, customers Data motion is expensive Minimize Data Motion Hardware and software to support & enable compute in data Allow workloads to run where they run best Balanced, composable architecture for Big Data & analytics, modeling, and simulation Modularity Modular and driven by accelerators with upgradeable design scalable from subrack to 100s of racks Introduce “active” system elements including network, Enable Compute memory, storage Everywhere IBM HPC/HPDA Ingolf Wittmann IBM-Allee 1 Diplom-Informatiker D-71139 Ehningen Technical Director Mail: D-71137 Ehningen Phone: +49-7034-15-4881 Mobile: +49-171-2265256 [email protected] https://www.facebook.com/ingolf.wittmann.7 @ijwatHAL de.linkedin.com/pub/ingolf-wittmann/27/189/132/ IBM HPC/HPDA.

The Computer That Could Be Smarter Than Us Cognitive Computing

Supermicro GPU Solutions Optimized for NVIDIA Nvlink

High Performance Computing and AI Solutions Portfolio

BRKIOT-2394.Pdf

NVIDIA Gpudirect RDMA (GDR) Host Memory Host Memory • Pipeline Through Host for Large Msg 2 IB IB 4 2

Dell EMC Poweredge C4140 Technical Guide

HPE Apollo 6500 Gen10 System Overview

NVIDIA Geforce RTX 2080 User Guide | 3 Introduction

NVIDIA Geforce RTX 2080 User Guide | 3 Introduction

Performance Analysis of Deep Learning Workloads on Leading-Edge Systems

The Ultimate Pc Gpu Nvidia Titan Rtx

8335-GTB Hardware Architecture Overview

NVIDIA DGX-1 System Architecture White Paper