<<

DEEP LEARNING - REVIEW YANN LECUN, & . OUTLINE

- History, Background & Applications.

• Recent Revival.

• Convolutional Neural Networks.

• Recurrent Neural Networks.

• Future. WHAT IS DEEP LEARNING?

• A particular class of Learning Algorithms.

• Rebranded Neural Networks : With multiple layers.

• Inspired by the Neuronal architecture of the Brain.

• Renewed interest in the area due to a few recent breakthroughs.

• Learn parameters from data.

• Non Linear Classification. SOME CONTEXT HISTORY

• 1943 - McCulloch & Pitts develop computational model for neural network.

Idea: neurons with a binary threshold were analogous to first order logic sentences.

• 1949 - Donald Hebb proposes Hebb’s rule

Idea: Neurons that fire together, wire together!

• 1958 - creates the .

• 1959 - Hubel and Wiesel elaborate cells in Visual Cortex.

• 1975 - Paul J. Werbos develops the Algorithm.

• 1980 - Neocognitron, a hierarchical multilayered ANN.

• 1990 - Convolutional Neural Networks. APPLICATIONS

• Predict the activity of potential drug molecules.

• Reconstruct Brain circuits.

• Predict effects of mutation on non-coding regions of DNA.

• Speech/Image Recognition & Language translation MULTILAYER NEURAL NETWORK

COMMON NON- LINEAR FUNCTIONS:

1) F ( Z ) = MAX(0,Z ) 2) SIGMOID 3)LOGISTIC

COST FUNCTION: 1/2[(YL - TL)]2

Source : Deep learning Yann LeCun, Yoshua Bengio, Geoffrey Hinton Nature 521, 436–444 (28 May 2015) doi:10.1038/nature14539 STOCHASTIC

• Analogy: A person is stuck in the mountains and is trying to get down (i.e. trying to find the minima).

• SGD: The person represents the backpropagation algorithm, and the path taken down the mountain represents the sequence of parameter settings that the algorithm will explore. The steepness of the hill represents the slope of the error surface at that point. The instrument used to measure steepness is differentiation (the slope of the error surface can be calculated by taking the derivative of the squared error function at that point). The direction he chooses to travel in aligns with the gradient of the error surface at that point. The amount of time he travels before taking another measurement is the learning rate of the algorithm.

Source: http://sebastianraschka.com/images/faq/closed-form-vs-gd/ball.png; Wikipedia. BACK PROPAGATION

COST FUNCTION:

Error 2 =1/2[(yl - tl)]

Source : Deep learning Yann LeCun, Yoshua Bengio, Geoffrey Hinton Nature 521, 436–444 (28 May 2015) doi:10.1038/nature14539 WHY ALL THE BUZZ?

ImageNet:

• ~5M labeled high resolution images.

• Roughly 22K categories.

• Collected from web & labeled by Amazon Mechanical Turk

http://vision.stanford.edu/teaching/cs231b_spring1415/slides/alexnet_tugce_kyunghee.pdf DIMINISHING ERROR RATES NEURAL NETWORKS- CORE IDEA

• Color Image - 32 x 32 pixels on 3 color palettes.

• Pixel Intensity - 0 - 255.

• Image Representation : 32 * 32 * 3 array of numbers with each pixel ranging between 0 and 255.

• Idea : Feed the numerical array to a ConvNet and obtain probabilities for each class of objects as an N dimensional vector, where N is the number of classes. CONVOLUTIONAL NEURAL NETS

• Multi stage Neural Nets that model V1,V2,V3 areas of the visual cortex.

• (Convolutional + NL Layer + Pooling Layer)^n + Fully Connected Layer.

• Highly correlated local values are easily detected.

• Ideal for volumetric data that come in multiple arrays. e.g., Color images.

• Learn the ‘essence' of images well.

• Applications in INITIAL WORK - YAN LECUN

• Primitive recognition without hand coded features.

• Adaptive, yet constrained architecture.

• Hand written digit recognition served as a simple and powerful model.

• Training Sample : 9298 zip codes on mails passing through Buffalo, NY. CONCEPTUAL OVERVIEW

Source : Deep learning Yann LeCun, Yoshua Bengio, Geoffrey Hinton Nature 521, 436–444 (28 May 2015) doi:10.1038/nature14539 WHAT IS A CONVOLUTION?

• Several meanings depending on the area of application.

• Convolution - Operation of applying filters/kernels through overlapping regions of the image.

• Stride - Extent of overlap during convolution.

• Each filter has the same set of weights and biases. This minimizes the number of parameters.

Sources: http://ufldl.stanford.edu/ http://www.kdnuggets.com/2016/09/beginners-guide-understanding-convolutional-neural- networks-part-1.html FILTERS/KERNELS

• Filters - Carefully designed feature detectors (matrices) to detect edges, curves, colors etc.

• Receptive field - Area covered by a single filter.

• 3*3 and 5*5 are the common sizes.

• Alexnet used 96 Kernels on the input layer.

http://web.pdx.edu/~jduh/courses/Archive/geog481w07/Students/Ludwig_ImageConvolution.pdf RELU & POOLING LAYERS

• RELU - Applies non-linear activation function MAX(0,x) to every pixel. Other common functions include tanh and sigmoid.

• RELU - Addresses the ‘vanishing gradient problem’.

• Pooling - Reduces the spatial size and minimizes overfitting.

• MAX 2x2 is the most common pooling operation.

• Dropout - Random elimination of neurons to minimize overfitting.

• Pooling Vs. Larger Strides. NON LINEARITIES- TREND

https://arxiv.org/pdf/1606.02228.pdf ALEXNET - PERSPECTIVE

http://vision.stanford.edu/teaching/cs231b_spring1415/slides/alexnet_tugce_kyunghee.pdf CONVNETS - THE BIG PICTURE RECURRENT NEURAL NETWORKS - RNN

• RNN - Neural Nets with feedback loops. Multiple copies of the same network, each passing a message to a successor

• Used to train sequential inputs. i.e., Speech, DNA sequences etc.

• Operate over sequences of vectors. Predict next character, word etc.

http://colah.github.io/posts/2015-08-Understanding-LSTMs/ LONG SHORT TERM (LSTM)

• Sequences have long term dependencies.

Why? “the clouds are in the sky” vs. “I grew up in … I speak fluent French.”

• Problem: Hard to store information for very long.

• Solution: LTSTM

http://colah.github.io/posts/2015-08-Understanding-LSTMs/ EXAMPLE BY ANDREJ KARPATHY

• Source : 474MB of C code from Github

• Multiple 3-layer LSTMs;

• Few days of training on GPUs

• Parameters - 10 Million.

http://karpathy.github.io/2015/05/21/rnn-effectiveness/ RNNS AND BEYOND

• RNNs augmented with Memory Networks:

1. Improved Performance.

2. Applications in question answering systems.

• ConvNets + RNNs = Novel Applications FUTURE - DEEP LEARNING

• Extension of recent successes from to .

• End to End Integration :Reinforcement + Convnets + RNNs

• Natural language understanding.

• Complex systems that combine learning, memory and reasoning.

Source: https://developer.amazon.com/alexaprize REFERENCES

• Andrej Karpathy’s Course: http://cs231n.stanford.edu/

• DeepLearning.tv : https://www.youtube.com/channel/UC9OeZkIwhzfv-_Cb7fCikLQ

• Wikipedia!