Deep Neural Network Tony Wong 2017-02-18 Outline ▸ Introduction to Neural Network ▸ Image classifiers using Tensorflow (version 1.0) ▹ MNIST

▹ IOI Art Class

▹ GFRIEND dataset

Deep Neural Network 2 Machine Learning ▸ Supervised Learning: ▹ Learn from examples to find patterns to explain the relationships between inputs and outputs ▸ http://playground.tensorflow.org

▸ Can you name some other applications that make use of machine learning?

Deep Neural Network 3 Steps to Perform Machine Learning ▸ Collection: Define data structure and collect data ▸ Design: Select suitable algorithm ▸ Implementation: Build the computation graph by writing code ▸ Preprocessing: Split data into training and validation sets ▸ Training: Train the model using training examples ▸ Validation: How well does the model perform? ▸ Serving: Use the model to predict new data

Deep Neural Network 4 Linear Regression ▸ When there are 0 hidden layers ▸ Find parameters θ

h(x) = θ1x1+θ2x2

Deep Neural Network 5 Neural Network ▸ A neural network can help us find non-linear patterns in the data ▸ Example: y = (x < 0) XOR (y < 0)

Deep Neural Network 6 c = sum{xi * wi} + b bias weight y = f(c) Neuron x1 1 activation function weight2 c y x2 ▸ Using an activation function ReLU: y = max(0, c) introduces non-linearity, which Rectified Linear Unit can help the neural network to learn

sigmoid: y = 1/(1-e-x)

tanh: y = 2/(1-e-2x)-1 Deep Neural Network 7 Loss ▸ Error rate ▸ Machine learning is about minimizing the loss by adjusting parameters (weights) loss Gradient Descent

Deep Neural Network 8 Loss != Accuracy

▸ Loss decreases when Which model is better? ▹ Accuracy increases Model 1 Model 2 T 20% 90% ▹ Correct predictions are Q1 made with higher confidence F 80% 10% T 52% 49% ▹ Incorrect predictions are Q2 F 48% 51% made with lower confidence T 47% 52% Q3 F 53% 48% Accuracy 66.7% 33.3%

Deep Neural Network 9 Random initialization ▸ The weights are random initialized so that each neuron can detect different patterns ▸ Different initializations can arrive at different local minima

Deep Neural Network 10 Training and Validation (Testing) Loss ▸ We split the examples into 2 sets: training and validation ▸ For example, if we have 1000 examples ▹ We can make 800 training examples for the network to learn ▹ 200 validation examples to evaluate the network's ability to explain unseen examples

Deep Neural Network 11 All input features, 3 hidden layers: 8-6-4 Underfitting and Overfitting Overfit: gap between test and training loss

x and y features, x and y features, 1 hidden layer: 2 1 hidden layer: 4 Underfit: high training Just right: low training Deep Neural Network 12 and validation losses and validation losses Underfitting and Overfitting ▸ To fix underfitting ▹ More input features ▹ More hidden layers ▸ To fix overfitting ▹ More training examples ▹ Use simpler model ▹ Add regularization

Deep Neural Network 13 MNIST Database ▸ Mixed National Institute of Standards and Technology ▸ Multiclass-classification of hand-written digits ▸ Each example in MNIST consists of the 28x28 image and the corresponding label (0-9) ▸ If we express the label using one-hot encoding, it would be a 1D tensor of shape [10] ▹ [0,0,0,0,0,0,1,0,0,0] for digit 6

CC-BY 3.0: https://www.tensorflow.org/versions/master/tutorials/mnist/beginners/ Deep Neural Network 14 MNIST Database ▸ The database contains 50000 training examples (for learning) and 10000 validation examples (for evaluation) ▸ Can you guess what's the validation accuracy achieved by the best model? (random guessing = 10%) ▸ What about by the average human?

<80% 80-90% 90-95% 95%-97% 97-98% 98-99% 99-99.5% >99.5%

Deep Neural Network 15 Tensorflow ▸ The flow of data is a directed acyclic graph ▸ Tensor (multidimensional matrices) flows through edges ▸ Nodes are operators (e.g. load data, addition, matrix multiplication, max) ▸ Write Python/C++ code to describe the graph ▸ Tensorflow manages the execution (e.g. distribute computation to different CPU / GPU) ▸ pip install --upgrade tensorflow Deep Neural Network 16 Tensor ▸ Example of 1D example: ▹ height weight age loc_nt loc_kln loc_hk ▹ 158 50 12 0 1 0 (one-hot encoding for location) ▸ 2D examples: grayscale images ▹ 28(h) x 28(w) ▸ 3D examples: RGB images ▹ 32(h) x 32(w) x 3(channels)

CC-BY 3.0: https://www.tensorflow.org/versions/master/tutorials/mnist/beginners/ Deep Neural Network 17 Tensor ▸ When we stack 55000 28x28 grayscale images together ▹ We have a tensor of shape [55000, 28, 28,1] ▹ We can also reshape the tensor to [55000, 784]

CC-BY 3.0: https://www.tensorflow.org/versions/master/tutorials/mnist/beginners/

Deep Neural Network 18 Windows Remote Desktop Connection ▸ A virtual machine has been set up with Tensorflow installed ▸ Windows -> type "Remote"

▸ Hostname: hkoi11530.cloudapp.net

Deep Neural Network 19 File Templates and Datasets ▸ Some templates and datasets have been prepared for you ▸ Go to the Public and copy tensorflow, tfpreprocess and mystery.7z to your user's Documents

Deep Neural Network 20 Notes ▸ In the next steps we are only defining the graph ▹ Nothing is computed yet ▸ In Python, indentation defines the scope ▹ Each line in a scope must be indented in the same way ▹ Mixing tabs with spaces results in error ▹ Since Tensorflow is developed by Google, which uses 4 spaces, you should use 4 spaces as well

Deep Neural Network 21 b1 a1 w11 Creating a Neural Network w12 c1 f1 a2 ▸ Let's build a network with 1 hidden layer ci = sum{aj* wij} + bi fi = max(0, ci) ▹ Activation function: ReLU

Input Layer Bias Hidden Layer Bias Output Layer

... 10 nodes ...... (1 for each digit) 28x28 nodes ?? nodes ... (1 for each pixel) (1 for each kernel) ......

Deep Neural Network 22 Creating a Neural Network

▸ Create a file simple.py

Returns Input Tensor unknown number of images images y inference

[-1, 28, 28, 1] [-1, 10] num_classes

image_size channels Deep Neural Network 23 Bias Layer Architecture of a Layer 4 spaces

...... input_size output_size nodes nodes ......

biases [-1, input_size] [output_size] input_tensor local activations matmul + ReLU returns [-1, output_size] weights [input_size, output_size] [-1, output_size]

Deep Neural Network 24 Stacking the Layers 4 spaces image_size = 28 channels = 1 HIDDEN_NODES = 48 num_classes = 10

reshaped hidden output [-1, 28, 28, 1] [-1, 784] [-1, 48] [-1, 10] images reshape layer(784, 48) layer(48, 10) returns

Deep Neural Network 25 Interpreting the Output ▸ We predict the class having the highest value in the output inference

main.py (written for you already)

Deep Neural Network 26 Probabilities ▸ Probabilities should add up to 1 ▸ We apply softmax to the output Unlikely to be 8 ▸ P(y_ = i) = ey[i] / sum{ey[i]}

The digit has 37% chance to be "2" Deep Neural Network 27 Loss function ▸ Compute the loss for the optimizer to train our model

Shape: [-1] Rows containing cross entropy loss of each example

Shape: [1] Average of the cross entropy losses

Deep Neural Network 28 Cross Entropy ▸ For classification, one popular loss function is the Cross Entropy loss ▸ It takes account into the probabilities instead of accuracy

Cross Entropy loss = −log (0.375) ≈ 0.98 (lower is better)

Deep Neural Network 29 Optimizer ▸ The optimizer traverses the graph in reverse topological order to compute gradients at each node 훿푧 훿푧 훿푦 ▸ = 훿푥 훿푦 훿푥 ▸ Then the optimizer adjust the weights variables to lower the loss ▸ Different of optimizers adjust the weights differently

Deep Neural Network 30 Notes ▸ These parts have been written for you ▹ Data loading (it has to be efficient because the data loading part is usually the bottleneck for small networks) ▹ Progress logging (accuracy and loss) ▹ Saving the model so that it can be restored for more training or serving

Deep Neural Network 31 Training ▸ Open command prompt in the working directory

▸ python main.py --logdir log/mnist1

What's the accuracy at Step 1000? Training Validation

Deep Neural Network 32 Visualizing the Training Progress ▸ Tensorflow comes with Tensorboard – a visualization tool ▸ Open another command prompt in the working directory ▸ tensorboard --logdir log --reload_interval 15 --port PORT ▸ Open the browser and enter URL: http://localhost:PORT

Stop the training (CTRL+C) when the validation loss decreases very slowly

Deep Neural Network 33 Confusion Matrix ▸ Confusion Matrix helps us to understand how well the model is performing, and which classes are particular difficult

Deep Neural Network 34 Stopping, Resuming and Resetting Training ▸ Training will resume provided that you specified the same logdir ▹ The graph must be the same ▸ If you would like to reset and restart training using the same logdir, you must: ▹ Stop tensorboard (CTRL+C) otherwise you cannot delete the logdir ▹ Delete the logdir (e.g. mnist1 in log) ▹ Start training ▹ Start tensorboard again

Deep Neural Network 35 Visualizing the Hidden Layer ▸ By visualizing the nodes in the Hidden Layer, we can have a sense of what patterns the neural network is trying to recognize ▸ Beware of indentation!

Deep Neural Network 36 Visualizing the Hidden Layer ▸ python main.py --logdir log/mnist2 ▸ In Tensorboard, go to the Images tab ▸ Here you can see the weights of the 48 nodes in the hidden layer

Deep Neural Network 37 Adding Regularization ▸ We add a L2 loss ▹ The squared weights will be added to the loss function ▹ The regularization controls the strength of regularization (0 = no regularization) ▸ python main.py --logdir log/mnist3 --regularization

What's the accuracy at Step 1000? 0.005 Training Validation

Deep Neural Network 38 Effects of Adding Regularization ▸ Less noise in the weights dying kernel ▹ Especially around the edges where there is little information ▸ The nodes in the hidden layer tries to recognize more sensible patterns ▸ Better generalization

Deep Neural Network 39 Adding Dropout Layer ▸ Dropout is a technique to improve generalization and reduce overfitting ▸ The Dropout layer randomly set some nodes to 0 ▹ The % of nodes chosen can be controlled by a parameter dropout ▸ Reduce inter-dependence between nodes ▸ Recognize better patterns

▸ python main.py --logdir log/mnist4 --regularization 0.005 --dropout 0.2 Deep Neural Network 40 Effects of Adding Dropout ▸ Higher contrast ▸ Fewer dead kernels ▸ Higher capacity with the same computation power ▸ (our neural network is small so the effects of adding Dropout is insignificant)

What's the accuracy at Step 1000? Training Validation

Deep Neural Network 41 Serving ▸ After training the neural network, we can use it to classify unseen inputs ▸ python main.py --predict images/digit-*.png --logdir log/mnist4

Deep Neural Network 42 Optional Exercises ▸ Experiment with different parameter settings

Learning Activation Regularization No. of nodes Dropout Rate Function 0.01 0.02 24 0.003 0.01 32 0.1 tanh current 0.001 0.005 48 0.2 ReLU 0.0003 0.002 64 0.5 sigmoid 0.0001 0.001 96

Deep Neural Network 43 IOI 2013 Art Class (I1312) ▸ There are 4 styles of images ▸ For each style you are given 9 examples Note: this task is solvable ▸ >= 90% accuracy = Accepted without using neural network

Deep Neural Network 44 Data Preparation ▸ Each image has different dimensions ▹ Resize all images to the same size (e.g. 32x32) ▸ Consolidate the images into TFRecord format for efficient training ▸ Under the tfpreprocess folder, execute these ▹ python preprocess.py artclass training 32 ▹ python preprocess.py artclass validation 32 ▸ The dataset in TFRecord format will be saved in tensorflow/data/artclass_(training/validation) Deep Neural Network 45 Training

▸ Edit main.py ▹ Change import mnist as data to import artclass as data ▸ python main.py --logdir log/artclass1 --regularization 0.005 --dropout 0.2 --batch_size 256

Deep Neural Network 46 nothing more to learn from training examples Art Class Hidden Layer

Step 500 Step 1000

Extreme Overfitting! Deep Neural Network 47 Data Augmentation ▸ We have too few training examples ▸ We can artificially create (much) more training examples by ▹ Rotating images ▹ Flipping images ▹ Note: it may not make sense to flip vertically ▹ Cropping images ▹ Adjusting brightness / contrast / colour ▹ Draw more examples using Paint Deep Neural Network 48 Random Adjustment ▸ Edit artclass.py ▸ Add the following code to perform random brightness / contrast adjustment

Deep Neural Network 49 Random Cropping ▸ Code for random cropping has already been written for you because it is a little bit complicated

▸ Change INPUT_SIZE to 48 ▸ We have to preprocess the images again ▸ Under the tfpreprocess folder, execute these ▹ python preprocess.py artclass training 48 ▹ python preprocess.py artclass validation 48 ▸ During training, a random 32x32 region will be selected from the 48x48 image Deep Neural Network 50 Random Flipping ▸ Can you figure out how to perform random flipping? ▹ Copy the line with tf.image.per_image_standardization and change tf.image.per_image_standardization to tf.image.random_flip_left_right

To also flip vertically, add another line with tf.image.random_flip_up_down

Deep Neural Network 51 Training with Augmented Data

▸ python main.py --logdir log/artclass2 --regularization 0.005 --dropout 0.2 --batch_size 256 ▸ It is normal that the accuracy fluctuates because the neural network now learns from data with more randomness

Deep Neural Network 52 Convolutional Neural Network ▸ In our hidden layer, every node considers the whole 32x32 image ▸ Therefore, the neural network is not good at recognizing spatial patterns in small areas, such as: ▹ A small rectangle with the same color ▹ A black cross ▸ By adding convolution layers before fully connected layers, the neural network can then extract patterns in different small areas, which are then processed by the fully connected layers

Deep Neural Network 53 Convolution Filter ▸ Convolution: output 푥 푦 푧 = input 푥 − 2: 푥 + 2 푦 − 2: 푦 + 2 ∗ filter 푧 ▸ Here, we use 16 5x5 filters ▹ Each pixel's surrounding 5x5 area is matched against 16 "patterns" ▹ Each pixel's dimension changes from 3(RGB) to 16 ▹ The same filter weights are applied to each area ▹ Fewer weights to train

Deep Neural Network 54 Max Pooling and Local Response Normalization ▸ After applying convolution with 16 5x5 filters with our 32x32x3 image, we have a 32x32x16 "image" ▸ Then, we add a Max Pooling layer with a stride of 2x2 ▹ The highest response within a 2x2 area is combined into 1 pixel

▹ Output: 16x16x16 1.2 3.4 7.8 2.5 5.6 3.2 1.4 4.6 5.6 7.8 4.5 2.2 1.1 1.5 4.5 2.5 2.4 2.8 2.5 1.3 ▸ Finally, a Local Response Normalization layer is added to increase independence between filters Deep Neural Network 55 Stacking 2 Convolution Layers ▸ Let's add another convolution layer with 32 filters

Deep Neural Network 56 Copy simple.py -> cnn.py Implementation Copy layer function

remove regularization

main.py Deep Neural Network 57 Training Art Class CNN

▸ python main.py --logdir log/artclass3 --regularization 0.0001 --dropout 0.2 --batch_size 128

Deep Neural Network 58 Art Class Confusion Matrix confusion/confusion.htm comment uncomment

Conv layer 1 learned filters

Deep Neural Network 59 Convolutional Neural Network Performance

1 Hidden Layer (48 nodes) Convolutional Neural Network

Conv layer 1: 5x5x3x16 Hidden layer: 32x32x3x48 Number of Conv layer 2: 5x5x16x16 Output layer: 48x4 trainable weights Fully connected: 8x8x16x32

(excluding biases) Output layer: 32x4 Sum = 147648 Sum = 40496 What's the accuracy @ Step 1000?

Deep Neural Network 60 GFRIEND Classifier ▸ Let's use the Convolutional Neural Network to classify images containing GFRIEND members Left to right: Sowon, Eunha, Yerin SinB, , Yuju

▸ Appropriate constraints on the images can help the network perform better ▹ 1:1 ratio ▹ Centered ▹ Crop to hair and chin Deep Neural Network 61 Data Collection ▸ I have spent a long time to collect photos ▸ Wrote a tool using C# to capture screen and save image ▸ Still images from fansites, instagram, baidu, etc... ▸ Capturing from video frames

Deep Neural Network 62 Preprocessing the GFRIEND Dataset ▸ 625 x 6 = 3750 photos are split into ▹ 500 x 6 = 3000 training examples ▹ 125 x 6 = 750 validation examples

▸ Right click mystery.7z > Extract here... ▸ In tfpreprocess folder ▹ python preprocess.py training 42 ▹ python preprocess.py gfriend validation 42

Deep Neural Network 63 GFRIEND Dataset Loader

▸ Copy artclass.py to gfriend.py

▸ Edit main.py, change to import gfriend instead of artclass

Deep Neural Network 64 Training GFRIEND CNN

▸ python main.py --logdir log/gfriend1 --regularization 0.0001 --dropout 0.2 --batch_size 128

Training loss is high -- underfit

Deep Neural Network 65 GFREIND Confusion Matrix confusion/confusion.htm

Deep Neural Network 66 Deeper Convolutional Neural Network ▸ Things to do when under-fitting occurs: ▹ Increase the number of filters in convolution layers -> 96 / 128 / 192 ▹ Dropout added to convolution layers as well ▹ Add another fully-connected layer -> 2 FC layers ▹ Increase the number of nodes in hidden layers -> 384 / 256

Deep Neural Network 67 Deeper Convolutional Neural Network

Copy cnn.py to deep.py

▸ python main.py --logdir log/gfriend2 --rate 0.0001 --regularization 0.01 --dropout 0.2 --batch_size 128 ▸ Unfortunately, training such a large network is slow using CPU ▹ GPU is around 10x faster ▸ 96.5% validation accuracy Deep Neural Network 68

Performance

Step 10000: 0.09668

Step 10000: 96.53%

Deep Neural Network 69 Serving

▸ python main.py --predict images/gfriend*.png --logdir log/gfriend-trained

Deep Neural Network 70 Conclusion

▸ You have learned: ▹ How to use Tensorflow to build a neural network graph ▹ Use Convolutional Neural Network to classify images when under-fitting occurs ▹ Augment data to generate more training examples when over-fitting occurs ▸ Note: If you would like to use main.py to train your own classifier with the GPU version of Tensorflow, you should put with tf.variable_scope('inputs') as scope: within with tf.device('/cpu:0') block Deep Neural Network 71 Conclusion ▸ Now go collect your own data and train your own classifier! ▸ You can find today's code here: ▹ https://github.com/hkoi/neural-network ▸ If you want to learn more about other machine learning algorithms and backpropagation in neural network: ▹ http://www.ml-class.org/

Deep Neural Network 72