Deep Learning Noé (FU Berlin)

FrankDeep learning Noé (FU Berlin)

Frank Noé Deep learning: Resources 1. Goodfellow, Courville and Bengio: Deep Learning, MIT Press 2016 http://www.deeplearningbook.org/ https://github.com/janishar/mit-deep-learning-book-pdf

2. P. Mehta, M. Bukov, C.-H. Wang, A.G.R. Day, C. Richardson, C.K. Fisher, D.J. Schwab: A high-bias, low-variance introduction to Machine Learning for physicists https://arxiv.org/abs/1803.08823 Deep learning Deep learning

WaveNet Deep learning

WaveNet Deep learning Supervised learning with feedforward neural networks Neuron

Terminal Branches Dendrites of Axon x1 w1 x2 w2

x3 w3 Σ Axon

wn xn Artificial neuron

1 w0

x1 w1

( ) (w>x) · … wn

xn Artificial neuron

bias nonlinear node 1 b activation function x1 w1

( ) (w>x + b) inputs · … wn summation output / activation of the neuron

xn weights Activation functions https://en.wikipedia.org/wiki/Activation_function Activation functions https://en.wikipedia.org/wiki/Activation_function

traditionally used Activation functions https://en.wikipedia.org/wiki/Activation_function

Currently most widely used. Empirically easier to train and results in sparse networks.

Nair and Hinton: Rectified linear units improve restricted Boltzmann machines ICLM’10, 807-814 (2010) Glorot, Bordes and Bengio: Deep sparse rectifier neural networks. PMLR 15:315-323 (2011) Perceptron (Rosenblatt 1957) Used as a binary classifier - equivalent to support vector machine (SVM) 1 w0

x1 w1 … wn

xn Perceptron (Rosenblatt 1957)

1 w0

x1 w1 … wn

xn Multilayer perceptrons (~1985) Universal Representation Theorem Hornik et al., 1989; Cybenko, 1989 Parameter optimization (weight training) Gradient descent of loss function Backprop Computing graphs Stochastic gradient descent Stochastic gradient descent Second-order methods What does a neural network learn? Feature detectors What is this unit doing? Hidden layers become self-organized feature detectors

1 5 10 15 20 25 …

…

strong weight

low/zero weight What does this unit detect?

1 5 10 15 20 25 …

…

strong weight

low/zero weight What does this unit detect?

1 5 10 15 20 25 …

…

strong weight

low/zero weight

it will send strong signal for a horizontal line in the top row, ignoring everywhere else

63 What does this unit detect?

1 5 10 15 20 25 …

…

strong weight

low/zero weight

63 What does this unit detect?

1 5 10 15 20 25 …

…

strong weight

low/zero weight

Strong signal for a dark area in the top left corner

63 What features might you expect a good NN to learn, when trained with data like this? vertical lines horizontal lines small circles

But what about position invariance? Our example unit detectors were tied to specific parts of the image Deep Networks Successive layers can detect higher-level features

detect lines in etc … Specific positions

Higher level detectors ( horizontal line, etc … v “RHS vertical lune” “upper loop”, etc…

What does this unit detect? But: until few years ago, deep networks could not be efficiently trained 2006: the deep learning breakthrough

2019 Turing award Key advances in deep learning Applications to molecular systems Physical Systems — example: O2 molecule Data augmentation Physical Systems — example: O2 molecule Building Physics into the ML model Translation, rotation, energy conservation Permutation invariance Learning to represent energy functions x . . . x x x a Coordinates 1 n b Coordinates H2O 1 . . . n

i i O . . . O H1 . . . H1 H2 . . . H2 Atom i features g1 . . . gk g1 gk g1 gk g1 gk ......

Atom i energy Ei EO EH1 EH2 +

Total energy E

Behler and Parrinello, PRL 98, 146401 (2007) Learning coarse-grained free energy functions Convolutional Neural Networks Convolutional filters

Convolutional Kernel / Filter Apply convolutions Convolutions are translation-equivariant Convolutional filters perform image processing Example: face recognition MNIST dataset MNIST misclassified examples Convolution is a linear operation

Convolutional Networks:

max pooling Motivation from neuroscience Unsupervised learning Principal component analysis Autoencoder

Loss(x; ✓)= [ˆy (x; ✓) x ]2 i i i X Recurrent neural networks RNNs RNNs RNNs Basic universal RNN RNNs are deep — gradient annihilation Long-short time memory Generate Shakespeare Generate Math Generate Math Commonly-used Tricks Multi-task learning Alpha Go Zero Adversarial attacks and training ResNets ResNets Generative networks ==> Monday