Implementing Deep Learning Using Cudnn 이예하 Vuno Inc

IMPLEMENTING DEEP LEARNING USING CUDNN 이예하 VUNO INC. CONTENTS Deep Learning Review Implementation on GPU using cuDNN Optimization Issues Introduction to VUNO-Net DEEP LEARNING REVIEW BRIEF HISTORY OF NEURAL NETWORK Deep Neural Network (Pretraining) Multi-layered SVM XOR Perceptron ADALINE Problem (Backpropagation) Perceptron Golden Age Dark Age (“AI Winter”) Electronic Brain 1943 1957 1960 1969 1986 1995 2006 1940 1950 1960 1970 1980 1990 2000 2010 B. Widrow - M. Hoff G. Hinton - S. Ruslan S. McCulloch - W. Pitts F. Rosenblatt M. Minsky - S. Papert D. Rumelhart - G. Hinton - R. Wiliams V. Vapnik - C. Cortes • Adjustable Weights • Learnable Weights and Threshold • XOR Problem • Solution to nonlinearly separable problems • Limitations of learning prior knowledge • Hierarchical feature Learning • Weights are not Learned • Big computation, local optima and overfitting • Kernel function: Human Intervention MACHINE/DEEP LEARNING IS EATING THE WORLD! BUILDING BLOCKS Restricted Boltzmann machine Auto-encoder Deep belief Network Deep Boltzmann machine Generative stochastic networks Recurrent neural networks Convolutional neural netwoks CONVOLUTIONAL NEURAL NETWORKS LeNet-5 (Yann LeCun, 1998) CONVOLUTIONAL NEURAL NETWORKS Alex Net (Alex Krizhevsky et. al., 2012) GoogleNet (Szegedy et. Al., 2015) CONVOLUTIONAL NEURAL NETWORKS Network Softmax Layer (Output) Backward Pass Backward Fully Connected Layer Forward Pass Forward Pooling Layer Convoluti on Layer Forward Pass Layer Input / Output Weights Neuron activation FULLY CONNECTED LAYER - FORWARD Matrix calculation is very fast on GPU cuBLAS library FULLY CONNECTED LAYER - BACKWARD Matrix calculation is very fast on GPU Element-wise multiplication can be done efficiently using GPU thread CONVOLUTION LAYER - FORWARD w1 w3 x1 x4 x7 w1 w1 w3 w3 w2 w2 y1 y3 w2 w4 x2 x5 x8 w4 w1 w4 w1 w3 w3 w2 w2 y2 y4 x3 x6 x9 w4 w4 CONVOLUTION LAYER - BACKWARD w1 w3 x1 x4 x7 w1 w1 w3 w3 w2 w2 y1 y3 w2 w4 x2 x5 x8 w4 w1 w4 w1 w3 w3 w2 w2 y2 y4 x3 x6 x9 w4 w4 CONVOLUTION LAYER - BACKWARD Error x1 x4 x7 휕퐿 휕퐿 휕푤1 휕푤3 Gradient x2 x5 x8 휕퐿 휕퐿 휕푤2 휕푤4 x3 x6 x9 HOW TO EVALUATE THE CONVOLUTION LAYER EFFICIENTLY? Both Forward and Backward passes can be computed with convolution scheme Lower the convolutions into a matrix multiplication (cuDNN) There are several ways to implement convolutions efficiently Fast Fourier Transform to compute the convolution (cuDNN_v3) Computing the convolutions directly (cuda-convnet) IMPLEMENTATION ON GPU USING CUDNN INTRODUCTION TO CUDNN cuDNN is a GPU-accelerated library of primitives for deep neural networks Convolution forward and backward Pooling forward and backward Softmax forward and backward Neuron activations forward and backward: Rectified linear (ReLU) Sigmoid Hyperbolic tangent (TANH) Tensor transformation functions INTRODUCTION TO CUDNN (VERSION 2) cuDNN's convolution routines aim for performance competitive with the fastest GEMM Lowering the convolutions into a matrix multiplication (Sharan Chetlur et. al., 2015) INTRODUCTION TO CUDNN Benchmarks https://developer.nvidia.com/cudnn https://github.com/soumith/convnet-benchmarks LEARNING VGG MODEL USING CUDNN Data Layer Convolution Layer Pooling Layer Fully Connected Layer Softmax Layer COMMON DATA STRUCTURE FOR LAYER Device memory & tensor description for input/output data & error Tensor Description defines dimensions of data float *d_input, *d_output, *d_inputDelta, *d_outputDelta cudnnTensorDescriptor_t inputDesc; cudnnTensorDescriptor_t outputDesc; DATA LAYER CONVOLUTION LAYER Initailization CONVOLUTION LAYER CONVOLUTION LAYER CONVOLUTION LAYER CONVOLUTION LAYER CONVOLUTION LAYER CONVOLUTION LAYER CONVOLUTION LAYER CONVOLUTION LAYER CONVOLUTION LAYER CONVOLUTION LAYER CONVOLUTION LAYER POOLING LAYER / SOFTMAX LAYER OPTIMIZATION ISSUES OPTIMIZATION OPTIMIZATION OPTIMIZATION OPTIMIZATION SPEED SPEED PARALLELISM PARALLELISM PARALLELISM INTRODUCING VUNO-NET THE TEAM VUNO-NET VUNO-NET PERFORMANCE APPLICATION APPLICATION APPLICATION VISUALIZATION VISUALIZATION VISUALIZATION THANK YOU .

Implementing Deep Learning Using Cudnn 이예하 Vuno Inc

CSE 152: Computer Vision Manmohan Chandraker

1 Convolution

Deep Clustering with Convolutional Autoencoders

Tensorizing Neural Networks

Fully Convolutional Mesh Autoencoder Using Efficient Spatially Varying Kernels

Persian Handwritten Digit Recognition Using Combination of Convolutional Neural Network and Support Vector Machine Methods

Geometrical Aspects of Statistical Learning Theory

Pre-Training Cnns Using Convolutional Autoencoders

Universal Invariant and Equivariant Graph Neural Networks

Understanding 1D Convolutional Neural Networks Using Multiclass Time-Varying Signals Ravisutha Sakrepatna Srinivasamurthy Clemson University, [email protected]

Convolution Network with Custom Loss Function for the Denoising of Low SNR Raman Spectra †

Advancements in Image Classification Using Convolutional Neural Network