Practical Deep Learning

Practical deep learning December 12-13, 2019 CSC – IT Center for Science Ltd., Espoo Markus Koskela Mats Sjöberg All original material (C) 2019 by CSC – IT Center for Science Ltd. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 Unported License, http://creativecommons.org/licenses/by-sa/4.0 All other material copyrighted by their respective authors. Course schedule Thursday Friday 9.00-10.30 Lecture 1: Introduction 9.00-9.45 Lecture 5: Deep to deep learning learning frameworks 10.30-10.45 Coffee break 9.45-10.15 Lecture 6: GPUs and batch jobs 10.45-11.00 Exercise 1: Introduction to Notebooks, Keras 10.15-10.30 Coffee break 11.00-11.30 Lecture 2: Multi-layer 10.30-12.00 Exercise 5: Image perceptron networks classification: dogs vs. cats; traffic signs 11.30-12.00 Exercise 2: Classifica- 12.00-13.00 Lunch break tion with MLPs 13.00-14.00 Exercise 6: Text catego- 12.00-13.00 Lunch break riZation: 20 newsgroups 13.00-14.00 Lecture 3: Images and convolutional neural 14.00-14.45 Lecture 7: Cloud, GPU networks utiliZation, using multiple GPU 14.00-14.30 Exercise 3: Image classification with CNNs 14.45-15.00 Coffee break 14.30-14.45 Coffee break 15.00-16.00 Exercise 7: Using multiple GPUs 14.45-15.30 Lecture 4: Text data, embeddings, 1D CNN, recurrent neural networks, attention 15.30-16.00 Exercise 4: Text sentiment classification with CNNs and RNNs Up-to-date agenda and lecture slides can be found at https://tinyurl.com/r3fd3st Exercise materials are at GitHub: https://github.com/csc-training/intro-to-dl/ Wireless accounts for CSC-guest network behind the badges. Alternatively, use the eduroam network with your university accounts or the LAN cables on the tables. Accounts to Puhti-AI cluster delivered separately. About this course • Introduction to deep learning • basics of ML assumed • mostly high-school math Lecture 1: Introduction to • much of theory, many details skipped deep learning • 1st day: lectures + small-scale exercises using notebooks.csc.fi • 2nd day: experiments using GPUs at Puhti-AI • Slides at: https://tinyurl.com/r3fd3st • Other materials (and link to Gitter) at GitHub: Practical deep learning https://github.com/csc-training/intro-to-dl • Focus on text and image classification, no fancy stuff • Using Python, TensorFlow 2 / Keras, and PyTorch Further resources • This course is largely “inspired by”: “Deep What is artificial intelligence? Learning with Python” by François Chollet • Recommended textbook: “Deep learning” Artificial intelligence is the ability of a computer to perform by Goodfellow, Bengio, Courville tasks commonly associated with intelligent beings. • Lots of further material available online, e.g.: http://cs231n.stanford.edu/ http://course.fast.ai/ https://developers.google.com/machine-learning/crash-course/ www.nvidia.com/dlilabs http://introtodeeplearning.com/ https://github.com/oxford-cs-deepnlp-2017/lectures, https://jalammar.github.io/ • Academic courses What is machine learning? What is deep learning? Machine learning is the study of algorithms that learn from Deep learning is a subfield of machine learning focusing on examples and experience instead of relying on hard-coded rules learning data representations as successive layers of and make predictions on new data. increasingly meaningful representations. “Traditional” machine learning: handcrafted learned cat features classifier Deep, “end-to-end” learning: learned learned learned learned low-level mid-level high-level cat classifier features features features Image from https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/ Demotivational slide “All of these AI systems we see, none of them is ‘real’ AI” – Josh Tennenbaum “Neural networks are … neither neural nor even networks.” – François Chollet, author of Keras From: Wang & Raj: On the Origin of Deep Learning (2017) Main types of machine learning • Supervised learning cat • Unsupervised learning dog • Self-supervised learning Main types of machine learning • Reinforcement learning Main types of machine learning Main types of machine learning • Supervised learning • Supervised learning • Unsupervised learning • Unsupervised learning • Self-supervised learning • Self-supervised learning • Reinforcement learning • Reinforcement learning Image from https://arxiv.org/abs/1710.10196 By Chire [CC BY-SA 3.0], from Wikimedia Commons Main types of machine learning • Supervised learning • Unsupervised learning • Self-supervised learning Fundamentals of machine • Reinforcement learning learning Animation from https://yanpanlau.github.io/2016/07/10/FlappyBird-Keras.html Data Data • Humans learn by observation and unsupervised learning • Tensors: generalization of matrices • model of the world / to n dimensions (or rank, order, degree) common sense reasoning • 1D tensor: vector • 2D tensor: matrix • Machine learning needs lots of • 3D, 4D, 5D tensors (labeled) data to compensate • numpy.ndarray(shape, dtype) • Training – validation – test split (+ adversarial test) • Minibatches • small sets of input data used at a time • usually processed independently Image from: https://arxiv.org/abs/1707.08945 Optimization Model – learning/training – inference • Mathematical optimization: “the selection of a best element (with regard to some criterion) from some set of available alternatives” (Wikipedia) • Main types: finite-step, iterative, heuristic • Learning as an optimization problem By Rebecca Wilson (originally posted to Flickr as Vicariously) [CC BY 2.0], via Wikimedia Commons • cost function: loss regularization http://playground.tensorflow.org/ • 휃 • parameters and hyperparameters Optimization Gradient descent • Derivative and minima/maxima of functions • Gradient: the derivative of a multivariable function • Gradient descent: • (Mini-batch) stochastic gradient descent (and its variants) Image from: Li et al. “Visualizing the Loss Landscape of Neural Nets”, arXiv:1712.09913 Image from: https://towardsdatascience.com/gradient-descent-algorithm-and-its-variants-10f652806a3 Over- and underfitting, generalization, regularization • Models with lots of parameters can easily overfit to training data • Generalization: the quality of ML model is measured on new, unseen samples Deep learning • Regularization: any method* to prevent overfitting • simplicity, sparsity, dropout, early stopping • *) other than adding more data By Chabacano [GFDL or CC BY-SA 4.0], from Wikimedia Commons Layers Anatomy of a deep neural network • Data processing modules • Many different kinds exist • Layers • densely connected • • Input data and targets convolutional • recurrent • Loss function • pooling, flattening, merging, normalization, • Optimizer etc. • Input: one or more tensors output: one or more tensors • Usually have a state, encoded as weights • learned, initially random • When combined, form a network or a model Input data and targets Loss function • The quantity to be minimized (optimized) during training • the only thing the network cares about • The network maps the input data X to predictions Y′ • there might also be other metrics you care about • During training, the predictions Y′ • Common tasks have “standard” loss functions: are compared to true targets Y • mean squared error for regression using the loss function • binary cross-entropy for two-class classification • categorical cross-entropy for multi-class classification • cat etc. dog • https://lossfunctions.tumblr.com/ Optimizer Anatomy of a deep neural network • How to update the weights based on the loss function • Learning rate (+scheduling) • Stochastic gradient descent, momentum, and their variants • RMSProp is usually a good first choice • more info: http://ruder.io/optimizing-gradient-descent/ Animation from: https://imgur.com/s25RsOr Deep learning frameworks Deep learning frameworks Deep learning Lasagne Keras TF Estimator torch.nn Gluon frameworks + Theano TensorFlow CNTK PyTorch MXNet Caffe • Actually tools for defining static or dynamic general-purpose computational + CUDA, cuDNN MKL, MKL-DNN graphs • Keras is a high-level • Automatic differentiation ✕ ✕ neural networks API • GPUs CPUs • we will use TensorFlow Seamless CPU / GPU usage as the compute backend • multi-GPU, distributed x y 5 • included in TensorFlow 2 as tf.keras • Python/numpy or R interfaces • https://keras.io/ , https://www.tensorflow.org/guide/keras • instead of C, C++, or CUDA • PyTorch is: • Open source • a GPU-based tensor library • an efficient library for dynamic neural networks • https://pytorch.org/ Neuron as a linear classifier Lecture 2: Multi-layer perceptron networks Practical deep learning By User:ZackWeinberg, based on PNG version by User:Cyc [CC BY-SA 3.0], via Wikimedia Commons 1 2 A non-linear classifier? Non-linearity = activation function • A smooth (differentiable) nonlinear function that is applied after the inner product with the weights 3 4 Neural network ● An (artificial) neural network is a collection of neurons ● Usually organized in layers ○ input layer ○ one or more hidden layers (sizes, activation functions are hyperparameters) ○ output layer 6 Output activation Input Backpropagation ● Usually a non-linear activation after each layer Layer 1 • Neural networks are trained with gradient descent, ● Typically ReLU between the layers starting from a random weight initialization ● At the output layer we need to consider the task, i.e., ReLU • Algorithm for computing the gradients for a neural network: what kinds of outputs we want, e.g, Layer 2 ○ Multi-label classification An error (loss) when comparing

Practical Deep Learning

Theano: a Python Framework for Fast Computation of Mathematical Expressions (The Theano Development Team)∗

Comparative Study of Caffe, Neon, Theano, and Torch

Tensorflow, Theano, Keras, Torch, Caffe Vicky Kalogeiton, Stéphane Lathuilière, Pauline Luc, Thomas Lucas, Konstantin Shmelkov Introduction

Fashionable Modelling with Flux

The Big Picture

Theano Tutorial

Deep Learning Software Security and Fairness of Deep Learning SP18 Today

A Simple Tutorial on Theano

Overview of Deep Learning Stack at NERSC

Dynamic Control Flow in Large-Scale Machine Learning

Machine Learning Manual Revision: 58F161e

Introduction to Tensorflow Dr