Practical Deep Learning
Total Page:16
File Type:pdf, Size:1020Kb
Practical deep learning December 12-13, 2019 CSC – IT Center for Science Ltd., Espoo Markus Koskela Mats Sjöberg All original material (C) 2019 by CSC – IT Center for Science Ltd. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 Unported License, http://creativecommons.org/licenses/by-sa/4.0 All other material copyrighted by their respective authors. Course schedule Thursday Friday 9.00-10.30 Lecture 1: Introduction 9.00-9.45 Lecture 5: Deep to deep learning learning frameworks 10.30-10.45 Coffee break 9.45-10.15 Lecture 6: GPUs and batch jobs 10.45-11.00 Exercise 1: Introduction to Notebooks, Keras 10.15-10.30 Coffee break 11.00-11.30 Lecture 2: Multi-layer 10.30-12.00 Exercise 5: Image perceptron networks classification: dogs vs. cats; traffic signs 11.30-12.00 Exercise 2: Classifica- 12.00-13.00 Lunch break tion with MLPs 13.00-14.00 Exercise 6: Text catego- 12.00-13.00 Lunch break riZation: 20 newsgroups 13.00-14.00 Lecture 3: Images and convolutional neural 14.00-14.45 Lecture 7: Cloud, GPU networks utiliZation, using multiple GPU 14.00-14.30 Exercise 3: Image classification with CNNs 14.45-15.00 Coffee break 14.30-14.45 Coffee break 15.00-16.00 Exercise 7: Using multiple GPUs 14.45-15.30 Lecture 4: Text data, embeddings, 1D CNN, recurrent neural networks, attention 15.30-16.00 Exercise 4: Text sentiment classification with CNNs and RNNs Up-to-date agenda and lecture slides can be found at https://tinyurl.com/r3fd3st Exercise materials are at GitHub: https://github.com/csc-training/intro-to-dl/ Wireless accounts for CSC-guest network behind the badges. Alternatively, use the eduroam network with your university accounts or the LAN cables on the tables. Accounts to Puhti-AI cluster delivered separately. About this course • Introduction to deep learning • basics of ML assumed • mostly high-school math Lecture 1: Introduction to • much of theory, many details skipped deep learning • 1st day: lectures + small-scale exercises using notebooks.csc.fi • 2nd day: experiments using GPUs at Puhti-AI • Slides at: https://tinyurl.com/r3fd3st • Other materials (and link to Gitter) at GitHub: Practical deep learning https://github.com/csc-training/intro-to-dl • Focus on text and image classification, no fancy stuff • Using Python, TensorFlow 2 / Keras, and PyTorch Further resources • This course is largely “inspired by”: “Deep What is artificial intelligence? Learning with Python” by François Chollet • Recommended textbook: “Deep learning” Artificial intelligence is the ability of a computer to perform by Goodfellow, Bengio, Courville tasks commonly associated with intelligent beings. • Lots of further material available online, e.g.: http://cs231n.stanford.edu/ http://course.fast.ai/ https://developers.google.com/machine-learning/crash-course/ www.nvidia.com/dlilabs http://introtodeeplearning.com/ https://github.com/oxford-cs-deepnlp-2017/lectures, https://jalammar.github.io/ • Academic courses What is machine learning? What is deep learning? Machine learning is the study of algorithms that learn from Deep learning is a subfield of machine learning focusing on examples and experience instead of relying on hard-coded rules learning data representations as successive layers of and make predictions on new data. increasingly meaningful representations. “Traditional” machine learning: handcrafted learned cat features classifier Deep, “end-to-end” learning: learned learned learned learned low-level mid-level high-level cat classifier features features features Image from https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/ Demotivational slide “All of these AI systems we see, none of them is ‘real’ AI” – Josh Tennenbaum “Neural networks are … neither neural nor even networks.” – François Chollet, author of Keras From: Wang & Raj: On the Origin of Deep Learning (2017) Main types of machine learning • Supervised learning cat • Unsupervised learning dog • Self-supervised learning Main types of machine learning • Reinforcement learning Main types of machine learning Main types of machine learning • Supervised learning • Supervised learning • Unsupervised learning • Unsupervised learning • Self-supervised learning • Self-supervised learning • Reinforcement learning • Reinforcement learning Image from https://arxiv.org/abs/1710.10196 By Chire [CC BY-SA 3.0], from Wikimedia Commons Main types of machine learning • Supervised learning • Unsupervised learning • Self-supervised learning Fundamentals of machine • Reinforcement learning learning Animation from https://yanpanlau.github.io/2016/07/10/FlappyBird-Keras.html Data Data • Humans learn by observation and unsupervised learning • Tensors: generalization of matrices • model of the world / to n dimensions (or rank, order, degree) common sense reasoning • 1D tensor: vector • 2D tensor: matrix • Machine learning needs lots of • 3D, 4D, 5D tensors (labeled) data to compensate • numpy.ndarray(shape, dtype) • Training – validation – test split (+ adversarial test) • Minibatches • small sets of input data used at a time • usually processed independently Image from: https://arxiv.org/abs/1707.08945 Optimization Model – learning/training – inference • Mathematical optimization: “the selection of a best element (with regard to some criterion) from some set of available alternatives” (Wikipedia) • Main types: finite-step, iterative, heuristic • Learning as an optimization problem By Rebecca Wilson (originally posted to Flickr as Vicariously) [CC BY 2.0], via Wikimedia Commons • cost function: loss regularization http://playground.tensorflow.org/ • 휃 • parameters and hyperparameters Optimization Gradient descent • Derivative and minima/maxima of functions • Gradient: the derivative of a multivariable function • Gradient descent: • (Mini-batch) stochastic gradient descent (and its variants) Image from: Li et al. “Visualizing the Loss Landscape of Neural Nets”, arXiv:1712.09913 Image from: https://towardsdatascience.com/gradient-descent-algorithm-and-its-variants-10f652806a3 Over- and underfitting, generalization, regularization • Models with lots of parameters can easily overfit to training data • Generalization: the quality of ML model is measured on new, unseen samples Deep learning • Regularization: any method* to prevent overfitting • simplicity, sparsity, dropout, early stopping • *) other than adding more data By Chabacano [GFDL or CC BY-SA 4.0], from Wikimedia Commons Layers Anatomy of a deep neural network • Data processing modules • Many different kinds exist • Layers • densely connected • • Input data and targets convolutional • recurrent • Loss function • pooling, flattening, merging, normalization, • Optimizer etc. • Input: one or more tensors output: one or more tensors • Usually have a state, encoded as weights • learned, initially random • When combined, form a network or a model Input data and targets Loss function • The quantity to be minimized (optimized) during training • the only thing the network cares about • The network maps the input data X to predictions Y′ • there might also be other metrics you care about • During training, the predictions Y′ • Common tasks have “standard” loss functions: are compared to true targets Y • mean squared error for regression using the loss function • binary cross-entropy for two-class classification • categorical cross-entropy for multi-class classification • cat etc. dog • https://lossfunctions.tumblr.com/ Optimizer Anatomy of a deep neural network • How to update the weights based on the loss function • Learning rate (+scheduling) • Stochastic gradient descent, momentum, and their variants • RMSProp is usually a good first choice • more info: http://ruder.io/optimizing-gradient-descent/ Animation from: https://imgur.com/s25RsOr Deep learning frameworks Deep learning frameworks Deep learning Lasagne Keras TF Estimator torch.nn Gluon frameworks + Theano TensorFlow CNTK PyTorch MXNet Caffe • Actually tools for defining static or dynamic general-purpose computational + CUDA, cuDNN MKL, MKL-DNN graphs • Keras is a high-level • Automatic differentiation ✕ ✕ neural networks API • GPUs CPUs • we will use TensorFlow Seamless CPU / GPU usage as the compute backend • multi-GPU, distributed x y 5 • included in TensorFlow 2 as tf.keras • Python/numpy or R interfaces • https://keras.io/ , https://www.tensorflow.org/guide/keras • instead of C, C++, or CUDA • PyTorch is: • Open source • a GPU-based tensor library • an efficient library for dynamic neural networks • https://pytorch.org/ Neuron as a linear classifier Lecture 2: Multi-layer perceptron networks Practical deep learning By User:ZackWeinberg, based on PNG version by User:Cyc [CC BY-SA 3.0], via Wikimedia Commons 1 2 A non-linear classifier? Non-linearity = activation function • A smooth (differentiable) nonlinear function that is applied after the inner product with the weights 3 4 Neural network ● An (artificial) neural network is a collection of neurons ● Usually organized in layers ○ input layer ○ one or more hidden layers (sizes, activation functions are hyperparameters) ○ output layer 6 Output activation Input Backpropagation ● Usually a non-linear activation after each layer Layer 1 • Neural networks are trained with gradient descent, ● Typically ReLU between the layers starting from a random weight initialization ● At the output layer we need to consider the task, i.e., ReLU • Algorithm for computing the gradients for a neural network: what kinds of outputs we want, e.g, Layer 2 ○ Multi-label classification An error (loss) when comparing