Deep Learning

Deep Learning

Deep Learning Elisa Ricci University of Perugia Outline ● Introduction to Deep Learning ● Brief history of Neural Networks ● Deep Learning ● Recent technical advances ● Models: Autoencoders !NN RNN ● Open proble#s ● Deep Learning Hands-on &he AI Revolution !omputer 'ision !omputer 'ision ● Amazing progresses in the last few years with !onvolutional Neural Networks )!NNs*+ ,.4M i#ages, 1K categories (2009) !omputer 'ision ● Amazing progresses in the last few years with !onvolutional Neural Networks )!NNs*+ I#ageNet Large%/cale Visual Recognition Challenge )IL/'R!* Traditional ML vs Deep models vs Human /peech 0rocessing ● Machine translation+ Rick Rashid in Tianjin, China, October, 25, 2012 A voice recognition program translated a speech given by Richard F. Rashid, Microsoft’s top scientist, into Mandarin Chinese. /peech 0rocessing ● &e1t &o /peech 2aveNet. !reativity ● /tyle &ransfer+ [Ruder et al.] !reativity ● 3enerate Donald &rump4s &witter eruptions+ Machine Learning vs Deep Learning 2002 2012 Deep Learning ● What is deep learning5 ● Why is it generally better than traditional ML methods on i#age speech and certain other types of data5 Deep Learning ● What is deep learning5 Deep Learning means using a neural network with several layers of nodes between input and output hidden hidden hidden input output More formally ● A fa#ily of parametric #odels which learn non- linear hierarchical representations: input parameters non-linear parameters of the network activation of layer L 6 and infor#ally Deep Learning ● Why is it generally better than other ML methods on i#age speech and certain other types of data5 The series of layers between input and output compute relevant features automatically in a series of stages, just as our brains seem to. Deep Learning +++"ut neural networks have "een around for 78 years+++ /o, what is new5 Biological neuron Dendrites /ynapses • A neuron has Nucleus – Branching input )dendrites* – Branching output )the axon* Axon !ell "ody or /oma • Infor#ation #oves fro# the dendrites to the a1on via the cell body • A1on connects to dendrites via synapses – /ynapses vary in strength – /ynapses #ay be e1citatory or inhi"itory 0erceptron An Artificial Neuron )0erceptron* is a non-linear parameterized function with restricted output range x, w, x7 w7 inputs x w 9 9 Σ h output x w- - activation x w 8 8 function Biological neuron and Perceptron Dendrites /ynapses x, w Nucleus , x7 w7 inputs x w 9 9 Σ h output A1on x w- - activation x w 8 8 function !ell body or /oma Dendrite !ell Body A1on Brief History of Neural Networks [VUNO] 1943 – Mc!ulloch & Pitts Model ● Early model of artificial neuron ● 3enerates a "inary output ● &he weights values are fixed x, w, x7 w7 inputs x w 9 9 Σ output x w- - &hreshold w x8 8 ,:88 – Perceptron by Rose#blatt ● 0erceptron as a machine for linear classification ● Main idea: Learn the weigths and consider "ias. ● One weight per input ● Multiply weights with respective inputs and add bias ● If result larger than threshold return , otherwise ? x, w, x7 w7 inputs x w 9 9 Σ h output x w- - activation x w 8 8 , function b Activation functions @irst AI winter ● &he e1clusive or (AOR* cannot be solved "y perceptrons ● Neural models cannot be applied to complex tasks @irst AI winter ● But, can XOR be solved by neural networks5 Output layer ● Multi-layer perceptrons (ML0* Hidden can solve XOR layer ● @ew years later Minsky built such Hidden ML0 layer x1 x2 ….. xn Multi-layer feed forward Neural Network ● Main idea: ● Densely connect artificial neurons to reali(e compositions of non% Output linear functions layer ● &he information is propagated from Hidden the inputs to the outputs layer ● Directed Acyclic Graph (DA3* Hidden ● &asks: Classification, Regression layer ● &he input data are n dimensional usually the feature vectors x1 x2 ….. xn @irst AI winter ● $ow to train a ML05 ● Rosenblatt4s algorith# not Output yi={0,1} applica"le as it e1pects to know the layer d=? desired target+ Hidden i ● @or hidden layers we cannot know layer the desired target Hidden layer x1 x2 ….. xn 1986 ; Backpropagation ● Backpropagation revitali(e the field ● Learning ML0 for co#plicated functions can "e solved ● =fficient algorith# which processes “large” training sets ● Allowed for complicated neural network architectures ● &oday backpropagation is still at the core of neural network training 2erbos ),:E-*+ Beyond Regression: New &ools for 0rediction and Analysis in the Behavioral /ciences. 0h.D+ &hesis, $arvard Fniversity+ Ru#elhart $intont 2illia#s ),:>B*+ Learning representations "y "ack%propagating errors+ Nature Backpropagation Learning is the process of #odifying the weights of each layer θl in order to produce a network that performs some function: Output layer Hidden layer Hidden layer θl = ? x1 x2 ….. xn Backpropagation ● 0reli#inary steps: ● !ollectGacquire a training set IX, !J ● Define model and initiali(e randomly weights. ● 3iven the training set find the weights: Backpropagation 1 &rue label vector yi 2 $ypotesis = ; θ aL h )xi * 3 1) Forward propagation: su# inputs, produce activations, feed-forward 2) Error estimation+ 3) Back propagate the error signal and used it to update weights Backpropagation Randomly initiali(e the initial weights 2hile error is too large ),* For each training sample )presented in random order* Apply the inputs to the network !alculate the output for every neuron fro# the input layer through the hidden layers, to the output layer )7* Calculate the error at the outputs )9* Use the output error to compute error signals for previous layers Fse the error signals to compute weight adKust#ents Apply the weight adjust#ents 0eriodically evaluate the network perfor#ance Backpropagation ● Optimization with gradient descent: ● &he #ost important component is how to compute the gradient ● &he "ackward computations of network return the gradient Backpropagation Forward Backward Recursive rule: Previous layer Current layer ,::?s - CNN and LSTM ● I#portant advances in the field: ● Backpropagation ● Recurrent Long%/hort &er# Me#ory Networks )/ch#idhu"er ,::E* ● !onvolutional Neural Networks: OCR solved "efore 7???s )LeNet ,::>*+ Second AI winter ● NN cannot exploit many layers ● Overfitting ● 'anishing gradient (with NN training you need to #ultiply several s#all num"ers → they become s#aller and s#aller* ● Lack of processing power (no G0Fs* ● Lack of data (no large annotated datasets*◦ Second AI winter ● .ernel Machines (e.g+ S'Ms* suddenly become very popular ● /imilar accuracies than NN in the sa#e tasks ● Much fewer heuristics and para#eters ● Nice proofs on generalization &he believers 7?06 - Learning deep belief nets ● !lever way of initiali(ing network weights: ● &rain each layer one "y one with unsupervised training )using contrastive divergence* ● Much better than random values ● @ine%tune weigths with a round of supervised learning Kust as is nor#al for neural nets ● /tate of the art perfor#ance on MNI/& dataset [Hinton et al.] 7?,7 - Ale1Net ● $inton4 s group imple#ented a CNN similar to LeNet LLe!un,::>M but+++ ● &rained on I#agenet with two 30Fs ● 2ith some technical improve#ents )ReLF, dropout, data aug#entation* Traditional ML vs Deep models vs Human 2hy so powerful? ● Build an i#proved feature space ● @irst layer learns first order features (e.g+ edges6* ● SubseHuent layers learns higher order features )co#"inations of first layer features, co#binations of edges, etc.) ● @inal layer of transfor#ed features are fed into supervised layer)s* Learning hierarchical representations ● &raditional fra#ework Handcrafted Classification PearSimple Features Model Classifier ● Deep Learning Layer1 Layer2 Layer3 PearSimple θ θ Classifier θ1 2 3 2hy Deep Learning now5 ● &hree #ain factors: ● Better hardware ● Big data ● &echnical advances: ● Layer%wise pretraining ● Opti#i(ation )e+g. Ada# "atch nor#ali(ation* ● Regulari(ation )e+g. dropout* 6+ 30Fs NVIDIA Blog Big Data ● Large !"lly annotated datasets Advances with Deep Learning ● Better: ● Activation functions (RELF* ● &raining sche#es ● 2eights initialization ● Address overfitting )dropout* ● Nor#ali(ation "etween layers ● Residual deep learning ● 6+ 47 /igmoid activations ● 0ositive facts: ● Output can "e interpreted as pro"a"ility ● Output bounded in [? ,M ● Negative facts ● Always multiply with <, gradients can "e s#all ● &he gradients at the tails is flat to ? almost no weights updates 48 Rectified Linear Units ● More efficient gradient propagation: )derivative is 0 or constant* ● More efficient co#putation: )only comparison addition and #ultiplication*+ ● /parse activation: e.g. in a randomly initialized networks only a"out 8?O of hidden units are activated (having a non%(ero output) 49 ● Lots of variations have been proposed recently+ Losses ● Sum squared error "L2) loss gradient seeks the maximu# likelihood hypothesis under the assumption that the training data can "e modeled "y Normally distributed noise added to the target function value. ● @ine for regression "ut less natural for classification+ ● @or classification problems it is advantageous and increasingly popular to use the softmax activation function Kust at the output layer with the cross entropy loss function+ Softmax and Cross Entropy ● /oft#a1: softens 1 of " targets to #i#ic a pro"a"ility vector for each output+ 51 Softmax and Cross Entropy ● !ross entropy loss: #ost popular classification losses for classifiers that output proba"ilities: ● 3enerali(ation of logistic regression for more than two outputs. ● &hese new loss and activation functions helps avoid gradient saturation+ 52 /tochastic 3radient Descent (/3D* ● Fse mini%"atch sampled in the dataset for gradient esti#ate. ● /ometi#es helps to escape from local mini#a ● Noisy gradients act as regularization ● Also suita"le for datasets that change over ti#e ● 'ariance of gradients increases when batch size decreases ● Not clear how #any sa#ple per batch53 Learning rate ● 3reat i#pact on learning perfor#ance 54 Momentu# ● 3radient updates with momentum ● 0revent gradient switching all the ti#e ● @aster and more robust convergence 55 Adaptive Learning ● 0opular sche#es ● #estero$ Moment"m – Calculate point you would go to if using nor#al #omentu#.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    108 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us