INTRO TO DEEP LEARNING Carlo Nardone, Sr. Solution Architect, EMEA February 2017
HPC ADMINTECH 2017
1 NVIDIA “THE AI COMPUTING COMPANY”
Computer Graphics GPU Computing Artificial Intelligence
2 3 AI IS EVERYWHERE
“Find where I parked my car” “Find the bag I just saw “What movie should in this magazine” I watch next?”
4 FUELING ALL INDUSTRIES
Increasing public safety with smart Providing intelligent services in Separating weeds as it harvests, video surveillance at airports & malls hotels, banks and stores reduces chemical usage by 90%
5 AI & DEEP LEARNING: THE HPC «KILLER APP»
6
MACHINE LEARNING & DEEP LEARNING
16 THE BIG BANG IN MACHINE LEARNING
DNN BIG DATA GPU
“The GPU is the workhorse of modern A.I.”
17 MACHINE LEARNING IN A NUTSHELL
LIVE DATA ANALYSIS PREDICTION
MACHINE ?LEARNING
TRAINING DATA
18 MACHINE LEARNING IN A NUTSHELL
LIVE DATA ANALYSIS PREDICTION
GPUs for MACHINE Classification ?LEARNING
GPUs for Training TRAINING DATA
19 MACHINE LEARNING & DEEP LEARNING
Machine Learning
Neural Networks
Deep Learning
20 TRADITIONAL MACHINE PERCEPTION Hand-tuned features Classifier/ Raw data Feature extraction Result detector
SVM, neural net, …
Speaker ID, HMM, neural net, … speech transcription, …
Topic Clustering, HMM, classification, LDA, LSA machine … translation, sentiment 22 Source: courtesy of Andrew Ng, Stanford U. analysis… NEURAL NETWORKS APPROACH
Loss function output Training: Dog Model
Dog “Classifier” Cat Cat Raccoon “Classifier” Honey badger Testing: Parameters:
“Classifier” Dog
Model (same structure) 23 Source: courtesy of Andrew Ng, Stanford U. DEEP LEARNING — A NEW COMPUTING MODEL “Software that writes software”
LEARNING ALGORITHM
“millions of trillions of FLOPS”
“little girl is eating (A lot of) Labeled Data piece of cake" 24 DEEP NEURAL NETWORK
Today’s Largest Networks
~10 layers 1B parameters 10M images ~30 Exaflops ~30 GPU days
Human brain has trillions of parameters – 1,000 more.
Input Result
Image source: “Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief Networks” ICML 2009 & Comm. ACM 2011. Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Ng. 25 AMAZING RATE OF IMPROVEMENT
Image Recognition Pedestrian Detection Object Detection ImageNetIMAGENET Accuracy CALTECH KITTI 100% 100% 100% 96% CV-based DNN-based 95% 95% 90% Top Score 87.5% 93% 86% 90% NVIDIA GPU 90% 80% 83% 79% 88% 75%
72% 85% 85% 70% 84% 66% 62% 80% Accuracy 80% 60%
55% NVIDIA DRIVENet 75% 75% 50% 74% 45% 72% 70% 70% 40% 39%
65% 65% 30% 2010 2011 2012 2013 2014 2015 11/2013 6/2014 12/2014 7/2015 1/2016
26 DEEP NEURAL NETWORKS 101 BIOLOGICALLY INSPIRED “NEURON”
Source: @karpathy U.Stanford course CS231n http://cs231n.github.io/ 28 ACTIVATION FUNCTION
1 Sigmoid – squashes to range [0,1] 휎 푥 = 1+푒−푥 Tanh - squashes to range [-1, 1].
ReLU – squashes to range [0,1] 푓 푥 = 푚푎푥 0, 푥 Left: Sigmoid non-linearity squashes real numbers to range between [0,1] Right: The tanh non-linearity squashes real numbers to range between [-1,1]. … many more in literature
Left: Rectified Linear Unit (ReLU) activation function, which is zero when x < 0 and then linear with slope 1 when x > 0. Right: A plot from Krizhevsky et al. (pdf) paper indicating the 6x improvement in convergence with the ReLU unit compared to the tanh unit. 29 FEED-FORWARD NEURAL NETWORKS Multilayer architecture
Deep nets: with multiple hidden X1 layers Z1,1 Z2,1 Efficient training enabled with X2 Y1 backpropagation, in 1974
Z1,2 Z2,2 Each neuron and each weight parameter is just a single X3 Y2 floating point number Z1,3 Z2,3
X4
30 Machine Learning Software Forward Propagation
“turtle” Tre e Backward Propagation Cat
Compute weight update to nudge Dog from “turtle” towards “dog”
31 Machine Learning Software Training Forward Propagation Repeat
“turtle” Tre e Backward Propagation Cat
Compute weight update to nudge from “turtle” towards “dog” Dog Trained Model
32 Machine Learning Software Training Forward Propagation Repeat
“turtle” Tre e Backward Propagation Cat
Compute weight update to nudge from “turtle” towards “dog” Dog Trained Model
Inference “cat”
33 TRAINING VS INFERENCE
34 CONVOLUTIONAL NEURAL NETWORKS Local receptive field + weight sharing
Introduced by Yann LeCun et al in 1998 MNIST: 0.7% error rate
35 CONVOLUTION Center element of the kernel is placed over the source pixel. 0 The source pixel is then 0 0 replaced with a weighted sum 0 0 of itself and nearby pixels. 0 0 0 1 0 1 0 1 1 1 2 0 2 0 2 1 1 2 0 2 0 2 1 0 1 2 0 0 2 0 4 Source 1 1 0 0 1 0 Pixel 0 1 0 0 1 1 -4 0 1 0 0 1 0 -8 1 0 0 Convolution kernel New pixel value (destination pixel)
36 POOLING LAYER (SUBSAMPLING)
Source: @karpathy U.Stanford course CS231n http://cs231n.github.io/ 37 RECURRENT NEURAL NETWORKS Time-Dependent or Sequence-Based Data
38 Source: courtesy of Alex Graves, TUM RECURRENT NEURAL NETWORKS Time-Dependent or Sequence-Based Data
RNN Backpropagation: Rumelhart, Hinton, Williams, 1996
39 Source: courtesy of Alex Graves, TUM and Boris Ginzburg, NVIDIA RECURRENT NEURAL NETWORKS Time-Dependent or Sequence-Based Data
RNN Backpropagation: Rumelhart, Hinton, Williams, 1996
LSTM Cell: Hochreiter and Schmidhuber, 1997
40 Source: courtesy of Alex Graves, TUM and Boris Ginzburg, NVIDIA REINFORCEMENT LEARNING Prediction becomes action
http://www.ausy.tu-darmstadt.de/
41
GOOGLE DEEPMIND ALPHAGO CHALLENGE
42 ACTIVE LEARNING
Data Scientist Vehicle
Solver Network
Dashboard Model Classification Detection DGX-1: Train Drive PX: Deploy Segmentation GPU DEEP LEARNING TESLA REVOLUTIONIZES DEEP LEARNING
GOOGLE BRAIN APPLICATION BEFORE TESLA AFTER TESLA
Cost $5,000K $200K
Servers 1,000 Servers 16 Tesla Servers
Energy 600 KW 4 KW
Performance 1x 6x
45 GPUS AND DEEP LEARNING
NEURAL GPUS NETWORKS Inherently Parallel ✓ ✓ Matrix Operations ✓ ✓ FLOPS ✓ ✓
Bandwidth ✓ ✓
GPUs deliver -- - same or better prediction accuracy - faster results - smaller footprint - lower power
46 POWERING THE DEEP LEARNING ECOSYSTEM NVIDIA SDK accelerates every major framework COMPUTER VISION SPEECH & AUDIO NATURAL LANGUAGE PROCESSING OBJECT DETECTION IMAGE CLASSIFICATION VOICE RECOGNITION LANGUAGE TRANSLATION RECOMMENDATION ENGINES SENTIMENT ANALYSIS
DEEP LEARNING FRAMEWORKS
Mocha.jl
NVIDIA DEEP LEARNING SDK
47 developer.nvidia.com/deep-learning-software TORCH CAFFE
THEANO MATCONVNET
EVERY DEEP LEARNING MOCHA.JL PURINE
FRAMEWORK IS MINERVA MXNET* GPU-ACCELERATED BIG SUR TENSORFLOW
WATSON CNTK
48 Tiled FFT up to 2x faster than FFT ▪ GPU-accelerated Deep Learning subroutines ▪ High performance neural network cuDNN training ▪ Accelerates Major Deep Learning frameworks: Caffe, Theano, Torch Deep Learning Primitives ▪ Up to 3.5x faster AlexNet training IGNITING ARTIFICIAL in Caffe than baseline GPU INTELLIGENCE Millions of Images Trained Per Day
developer.nvidia.com/cudnn 49 NVIDIA DIGITS Interactive Deep Learning GPU Training System
Interactive deep neural network development environment for image classification and object detection
Schedule, monitor, and manage neural network training jobs
Analyze accuracy and loss in real time
Track datasets, results, and trained neural networks
Scale training jobs across multiple GPUs automatically
developer.nvidia.com/digits 50 A COMPLETE DEEP LEARNING PLATFORM MANAGE TRAIN DEPLOY
DIGITS TENSOR RT (GIE)
TEST TRAIN
MANAGE / AUGMENT EMBEDDED DATA CENTER AUTOMOTIVE 51 DEEP LEARNING EVERYWHERE
52 THANK YOU! [email protected] developer.nvidia.com/deep-learning
53