<<

INTRO TO Carlo Nardone, Sr. Solution Architect, EMEA February 2017

HPC ADMINTECH 2017

1 “THE AI COMPUTING COMPANY”

Computer Graphics GPU Computing

2 3 AI IS EVERYWHERE

“Find where I parked my car” “Find the bag I just saw “What movie should in this magazine” I watch next?”

4 FUELING ALL INDUSTRIES

Increasing public safety with smart Providing intelligent services in Separating weeds as it harvests, video surveillance at airports & malls hotels, banks and stores reduces chemical usage by 90%

5 AI & DEEP LEARNING: THE HPC «KILLER APP»

6

MACHINE LEARNING & DEEP LEARNING

16 THE BIG BANG IN

DNN GPU

“The GPU is the workhorse of modern A.I.”

17 MACHINE LEARNING IN A NUTSHELL

LIVE DATA ANALYSIS PREDICTION

MACHINE ?LEARNING

TRAINING DATA

18 MACHINE LEARNING IN A NUTSHELL

LIVE DATA ANALYSIS PREDICTION

GPUs for MACHINE Classification ?LEARNING

GPUs for Training TRAINING DATA

19 MACHINE LEARNING & DEEP LEARNING

Machine Learning

Neural Networks

Deep Learning

20 TRADITIONAL MACHINE PERCEPTION Hand-tuned features Classifier/ Raw data Feature extraction Result detector

SVM, neural net, …

Speaker ID, HMM, neural net, … speech transcription, …

Topic Clustering, HMM, classification, LDA, LSA machine … translation, sentiment 22 Source: courtesy of Andrew Ng, Stanford U. analysis… NEURAL NETWORKS APPROACH

Loss function output Training: Dog Model

Dog “Classifier” Cat Cat Raccoon “Classifier” Honey badger Testing: Parameters:

“Classifier” Dog

Model (same structure) 23 Source: courtesy of Andrew Ng, Stanford U. DEEP LEARNING — A NEW COMPUTING MODEL “Software that writes software”

LEARNING ALGORITHM

“millions of trillions of FLOPS”

“little girl is eating (A lot of) Labeled Data piece of cake" 24 DEEP NEURAL NETWORK

Today’s Largest Networks

~10 layers 1B parameters 10M images ~30 Exaflops ~30 GPU days

Human brain has trillions of parameters – 1,000 more.

Input Result

Image source: “ of Hierarchical Representations with Convolutional Deep Belief Networks” ICML 2009 & Comm. ACM 2011. Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Ng. 25 AMAZING RATE OF IMPROVEMENT

Image Recognition Pedestrian Detection Object Detection ImageNetIMAGENET Accuracy CALTECH KITTI 100% 100% 100% 96% CV-based DNN-based 95% 95% 90% Top Score 87.5% 93% 86% 90% NVIDIA GPU 90% 80% 83% 79% 88% 75%

72% 85% 85% 70% 84% 66% 62% 80% Accuracy 80% 60%

55% NVIDIA DRIVENet 75% 75% 50% 74% 45% 72% 70% 70% 40% 39%

65% 65% 30% 2010 2011 2012 2013 2014 2015 11/2013 6/2014 12/2014 7/2015 1/2016

26 DEEP NEURAL NETWORKS 101 BIOLOGICALLY INSPIRED “NEURON”

Source: @karpathy U.Stanford course CS231n http://cs231n.github.io/ 28

1 Sigmoid – squashes to range [0,1] 휎 푥 = 1+푒−푥 Tanh - squashes to range [-1, 1].

ReLU – squashes to range [0,1] 푓 푥 = 푚푎푥 0, 푥 Left: Sigmoid non-linearity squashes real numbers to range between [0,1] Right: The tanh non-linearity squashes real numbers to range between [-1,1]. … many more in literature

Left: Rectified Linear Unit (ReLU) activation function, which is zero when x < 0 and then linear with slope 1 when x > 0. Right: A plot from Krizhevsky et al. (pdf) paper indicating the 6x improvement in convergence with the ReLU unit compared to the tanh unit. 29 FEED-FORWARD NEURAL NETWORKS Multilayer architecture

Deep nets: with multiple hidden X1 layers Z1,1 Z2,1 Efficient training enabled with X2 Y1 , in 1974

Z1,2 Z2,2 Each neuron and each weight parameter is just a single X3 Y2 floating point number Z1,3 Z2,3

X4

30 Machine Learning Software Forward Propagation

“turtle” Tre e Backward Propagation Cat

Compute weight update to nudge Dog from “turtle” towards “dog”

31 Machine Learning Software Training Forward Propagation Repeat

“turtle” Tre e Backward Propagation Cat

Compute weight update to nudge from “turtle” towards “dog” Dog Trained Model

32 Machine Learning Software Training Forward Propagation Repeat

“turtle” Tre e Backward Propagation Cat

Compute weight update to nudge from “turtle” towards “dog” Dog Trained Model

Inference “cat”

33 TRAINING VS INFERENCE

34 CONVOLUTIONAL NEURAL NETWORKS Local receptive field + weight sharing

Introduced by Yann LeCun et al in 1998 MNIST: 0.7% error rate

35 Center element of the kernel is placed over the source pixel. 0 The source pixel is then 0 0 replaced with a weighted sum 0 0 of itself and nearby pixels. 0 0 0 1 0 1 0 1 1 1 2 0 2 0 2 1 1 2 0 2 0 2 1 0 1 2 0 0 2 0 4 Source 1 1 0 0 1 0 Pixel 0 1 0 0 1 1 -4 0 1 0 0 1 0 -8 1 0 0 Convolution kernel New pixel value (destination pixel)

36 POOLING (SUBSAMPLING)

Source: @karpathy U.Stanford course CS231n http://cs231n.github.io/ 37 RECURRENT NEURAL NETWORKS Time-Dependent or Sequence-Based Data

38 Source: courtesy of Alex Graves, TUM RECURRENT NEURAL NETWORKS Time-Dependent or Sequence-Based Data

RNN Backpropagation: Rumelhart, Hinton, Williams, 1996

39 Source: courtesy of Alex Graves, TUM and Boris Ginzburg, NVIDIA RECURRENT NEURAL NETWORKS Time-Dependent or Sequence-Based Data

RNN Backpropagation: Rumelhart, Hinton, Williams, 1996

LSTM Cell: Hochreiter and Schmidhuber, 1997

40 Source: courtesy of Alex Graves, TUM and Boris Ginzburg, NVIDIA Prediction becomes action

http://www.ausy.tu-darmstadt.de/

41

GOOGLE DEEPMIND ALPHAGO CHALLENGE

42 ACTIVE LEARNING

Data Scientist Vehicle

Solver Network

Dashboard Model Classification Detection DGX-1: Train Drive PX: Deploy Segmentation GPU DEEP LEARNING TESLA REVOLUTIONIZES DEEP LEARNING

GOOGLE BRAIN APPLICATION BEFORE TESLA AFTER TESLA

Cost $5,000K $200K

Servers 1,000 Servers 16 Tesla Servers

Energy 600 KW 4 KW

Performance 1x 6x

45 GPUS AND DEEP LEARNING

NEURAL GPUS NETWORKS Inherently Parallel ✓ ✓ Matrix Operations ✓ ✓ FLOPS ✓ ✓

Bandwidth ✓ ✓

GPUs deliver -- - same or better prediction accuracy - faster results - smaller footprint - lower power

46 POWERING THE DEEP LEARNING ECOSYSTEM NVIDIA SDK accelerates every major framework SPEECH & AUDIO NATURAL LANGUAGE PROCESSING OBJECT DETECTION IMAGE CLASSIFICATION VOICE RECOGNITION LANGUAGE TRANSLATION RECOMMENDATION ENGINES SENTIMENT ANALYSIS

DEEP LEARNING FRAMEWORKS

Mocha.jl

NVIDIA DEEP LEARNING SDK

47 developer.nvidia.com/deep-learning-software TORCH CAFFE

THEANO MATCONVNET

EVERY DEEP LEARNING MOCHA.JL PURINE

FRAMEWORK IS MINERVA MXNET* GPU-ACCELERATED BIG SUR TENSORFLOW

WATSON CNTK

48 Tiled FFT up to 2x faster than FFT ▪ GPU-accelerated Deep Learning subroutines ▪ High performance neural network cuDNN training ▪ Accelerates Major Deep Learning frameworks: Caffe, , Torch Deep Learning Primitives ▪ Up to 3.5x faster AlexNet training IGNITING ARTIFICIAL in Caffe than baseline GPU INTELLIGENCE Millions of Images Trained Per Day

developer.nvidia.com/cudnn 49 NVIDIA DIGITS Interactive Deep Learning GPU Training System

Interactive deep neural network development environment for image classification and object detection

Schedule, monitor, and manage neural network training jobs

Analyze accuracy and loss in real time

Track datasets, results, and trained neural networks

Scale training jobs across multiple GPUs automatically

developer.nvidia.com/digits 50 A COMPLETE DEEP LEARNING PLATFORM MANAGE TRAIN DEPLOY

DIGITS TENSOR RT (GIE)

TEST TRAIN

MANAGE / AUGMENT EMBEDDED DATA CENTER AUTOMOTIVE 51 DEEP LEARNING EVERYWHERE

52 THANK YOU! [email protected] developer.nvidia.com/deep-learning

53