Deep Learning

Deep Learning Kairit Sirts Lecture in TUT 19.12.2016 Outline • What can be done with deep learning? • Deep learning demystified • How can you get started with deep learning? 2 Why deep learning? Deep learning Gradient boosting Random Forest Linear model 3 http://www.infoworld.com/article/3003315/big-data/deep-learning-a-brief-guide-for-practical-problem-solvers.html What can be done with deep learning? Handwritten digit recognition MNIST benchmark dataset The best reported error rate is 0.21% 5 Street view number recognition • Obtained from house numbers in Google Street View images • Best error rate is 1.69% 6 Image classification 7 Image classification 10 objects 6000 labeled instances for each object Best accuracy so far 96.53% 8 Image classification 9 Image classification 20 superclasses 100 finegrained classes 600 labeled images per class Best classification accuracy 75.72% 10 Detecting doodles https://quickdraw.withgoogle.com There are other simple and fun AI experiments launched by Google https://aiexperiments.withgoogle.com 11 Image captioning 12 Image captioning – not so great results 13 Automatic colorization of images 14 http://richzhang.github.io/colorization/resources/images/teaser3.jpg Automatic colorization of images - failed 15 DeepDream https://deepdreamgenerator.com 16 DeepDream 17 DeepDream 18 DeepDream 19 Word embeddings 20 http://metaoptimize.s3.amazonaws.com/cw-embeddings-ACL2010/embeddings-mostcommon.EMBEDDING_SIZE=50.png Word embeddings months weekdays numbers 21 Word embeddings • � man − � woman ≈ � king − �(queen) • � walking − � walked ≈ � swimming − �(swam) 22 Automatic text generation – pseudo Shakespeare 23 http://karpathy.github.io/2015/05/21/rnn-effectiveness Machine translation • Google Translate app 24 Learning to play Atari Arcade games 25 https://www.youtube.com/watch?v=cjpEIotvwFY AlphaGo 26 https://www.youtube.com/watch?v=PQCrX1sQSzY Other tasks tackled with deep neural networks • Speech recognition • Various tasks in robotics • Log analysis/risk detection • Recommendation systems • Motion detection from videos • Business and Economics analytics • Etc … 27 Deep learning demystified How does deep learning work? • Biological neuron • Artificial neuron 29 http://www.theprojectspot.com/tutorial-post/introduction-to-artificial-neural-networks-part-1/7 • Biological neural network • Artificial neural network 30 https://www.eeweb.com/blog/rob_riemen/deep-machine-learning-and-the-google-brain http://www.theprojectspot.com/tutorial-post/introduction-to-artificial-neural-networks-part-1/7 What happens inside a neuron? < ℎ = �7�7 + �:�: + ⋯ + �<�< = = �>�> >?7 Output: ℎ = �(�) 31 Activation function E DE 1 if � ≥ th 1 � − � � � = J � � = � � = � � = max (0, �) 0 if � < th 1 + �DE �E + �DE 32 https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/neural_networks.html Single neuron logic gates • Threshold activation function 33 https://blog.abhranil.net/2015/03/03/training-neural-networks-with-genetic-algorithms/ XOR gate • Cannot be done with a single neuron • A hidden layer is necessary �� OR NOT AND AND 0 0 � 0 ∙ 1 + 0 ∙ 1 > 0.5 = 0 � 0 ∙ −1 + 0 ∙ −1 > −1.5 = 1 � 0 ∙ 1 + 1 ∙ 1 > 1.5 = 0 0 1 � 0 ∙ 1 + 1 ∙ 1 > 0.5 = 1 � 0 ∙ −1 + 1 ∙ −1 > −1.5 = 1 � 1 ∙ 1 + 1 ∙ 1 > 1.5 = 1 1 0 � 1 ∙ 1 + 0 ∙ 1 > 0.5 = 1 � 1 ∙ −1 + 0 ∙ −1 > −1.5 = 1 � 1 ∙ 1 + 1 ∙ 1 > 1.5 = 1 1 1 � 1 ∙ 1 + 1 ∙ 1 > 0.5 = 1 � 1 ∙ −1 + 1 ∙ −1 > −1.5 = 0 � 1 ∙ 1 + 0 ∙ 1 > 1.5 = 0 34 https://blog.abhranil.net/2015/03/03/training-neural-networks-with-genetic-algorithms/ How to assign weights? 8 Y 9 + 9 Y 9 + 9 Y 9 + 9 Y 4 = = 270 weights 35 http://neuralnetworksanddeeplearning.com/ Backpropagation • Standard and efficient method for training neural networks • The general idea: • Compute the error with a forward pass • Propagate the error back to change the weights such that the error would become smaller ERROR à ERROR’ ERROR’ < ERROR 36 Diversion to calculus - derivative • �_ = �_ � • Derivative is the slope of the tangent line • It is the rate of change when going in the direction of steepest ascent 37 Derivatives • When �_ � = 0 then it is the local or global maximum or minimum or a saddle point • When �_ � > 0 then the function is increasing • When �_ � < 0 then the function is decreasing 38 Gradients • Generalization of derivatives to multivariate functions • Derivative is a vector pointing to the direction of steepest ascent ab ab • ∇�(�, �) = , ac ad ab ab • , - partial derivatives – take ac ad derivative wrt one variable while treating all others as constant 39 Gradients and backpropagation • Backpropagation is used to compute the gradients with respect to all parameters in a neural network. • The gradients are then used in a general method of gradient descent for minimizing functions. • We want to minimize the cost function that measures the error made by the neural network. • In order to do that we need to move to the direction of deepest descent given by the gradients. 40 Gradient descent • An iterative algorithm • Start with initial parameter values �f • Update parameters iteratively until convergence: �gh7 =: �g − �∇� � • � - learning rate, controls the step size 41 Deep learning demystified How does backpropagation work? Backpropagation explained • Example from: https://mattmazur.com/2015/03/17/ • 2 inputs • 1 hidden layer with 2 neurons • Bias terms in both the hidden and output layer • 2 outputs 43 Initial configuration • Training values • Initial weights: �7, … , �l • Initial biases: �7, �: 44 Forward pass – first hidden unit 45 Forward pass – first hidden unit 46 Forward pass – second hidden unit 47 Forward pass – first output unit 48 Forward pass – second output unit 49 Forward pass – error of the first output 50 Forward pass – output error 51 Forward pass – output error 52 Backwards pass • Consider �n • How much a change in �n affects the total error? • Apply the chain rule: 53 Chain rule • Formula for computing derivative of the composition of two or more functions • � � ≡ �(� � ) ≡ (� ∘ �)(�) – composition of functions � and � • �_ � = �_ � � �_ � • � � = �sc � � = 3� � � � = �u(c) = �sc • �_ � = �_ � � �_ � = (�u(c))′�′(�) = �u c (3�)′ = �sc Y 3 = 3�sc 54 Backwards pass • Consider �n • How much a change in �n affects the total error? • Apply the chain rule: 55 How much does error change wrt the output? 56 How much does output change wrt its net input? 57 Derivative of the sigmoid function 1 � � = 1 + �DE �_ � = �(�)(1 − � � ) 58 How much does output change wrt its net input? 59 How much does net input change wrt �n? 60 Putting it all together 61 This is known as the delta rule • Delta rule is the gradient descent rule for updating the weights of the inputs to neurons in a single-layer neural network 62 Apply delta rule to outer layer weights 63 Update the weights with gradient descent • set learning rate � = 0.5 ��h� =: �� − �� 64 Backpropagation to hidden layer • Continue backwards pass to calculate new values for �7, �:, �s and �| 65 BP through hidden layer • ��€7 affects both �7 and �: and thus needs to take into account both: 66 BP through hidden layer • Consider one of those: • First term can be calculated using values computed before: • Second term is just �n 67 BP through hidden layer • Plug the values in: • Compute the same value for �:: • Compute the total: 68 BP through hidden layer a•‚g a<…g • Next we need ƒ„ and ƒ„ for each a<…gƒ„ a† weight � • Compute the partial derivative wrt a weight 69 BP through hidden layer • Putting it together • We can now update �7 70 BP through hidden layer • Compute the partial derivatives in the same way for �:, �s and �| • Update �:, �s and �| 71 After first update with backpropagation 72 Did the error decrease? • Old error was: 0.298371109 • Improvement: 0.007343335 • After 10000 updates the error will be ca 0.000035085 • The generated outputs will be 0.015912196 for 0.01 target and 0.984065734 for 0.99 target 73 In conclusion • Neural networks consist of artificial neurons organized into layers and connected to each other with learnable weights. • Backpropagation with gradient descent is the standard method for training neural networks. • Backpropagation can be used to compute the gradients of a neural network, regardless of the depth of the network. • Of course, there are other important tricks and tips but this is the basis of understanding neural networks and deep learning. 74 Common neural network architectures Feed-forward network • Simplest type of neural network • Connections between units do not form cycles • Information always moves in one direction • It never goes backwards 76 https://upload.wikimedia.org/wikipedia/en/5/54/Feed_forward_neural_net.gif Recurrent neural network • Connections between units form cycles • They possess internal memory – they “remember” the past inputs • Suitable for modeling sequential/temporal data, such as for instance text and language data 77 Convolutional neural networks • Convolutional layers have neurons arranged in 3 dimensions • Especially suitable for processing image data 78 http://parse.ele.tue.nl/education/cluster2 Autoencoders • Output layer attempts to reconstruct the input • Used for unsupervised feature learning • The hidden layer has typically less neurons, thus performing data compression 79 Getting started with neural networks Courses and tutorials • https://www.coursera.org/learn/machine-learning - • Introductory course on machine learning, provides necessary background • https://www.coursera.org/learn/neural-networks • Course on neural networks – assumes knowledge about machine learning •

Deep Learning

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support