Deep Neural Network Tony Wong 2017-02-18 Outline ▸ Introduction to Neural Network ▸ Image Classifiers Using Tensorflow (Version 1.0) ▹ MNIST
Total Page:16
File Type:pdf, Size:1020Kb
Deep Neural Network Tony Wong 2017-02-18 Outline ▸ Introduction to Neural Network ▸ Image classifiers using Tensorflow (version 1.0) ▹ MNIST ▹ IOI Art Class ▹ GFRIEND dataset Deep Neural Network 2 Machine Learning ▸ Supervised Learning: ▹ Learn from examples to find patterns to explain the relationships between inputs and outputs ▸ http://playground.tensorflow.org ▸ Can you name some other applications that make use of machine learning? Deep Neural Network 3 Steps to Perform Machine Learning ▸ Collection: Define data structure and collect data ▸ Design: Select suitable algorithm ▸ Implementation: Build the computation graph by writing code ▸ Preprocessing: Split data into training and validation sets ▸ Training: Train the model using training examples ▸ Validation: How well does the model perform? ▸ Serving: Use the model to predict new data Deep Neural Network 4 Linear Regression ▸ When there are 0 hidden layers ▸ Find parameters θ h(x) = θ1x1+θ2x2 Deep Neural Network 5 Neural Network ▸ A neural network can help us find non-linear patterns in the data ▸ Example: y = (x < 0) XOR (y < 0) Deep Neural Network 6 c = sum{xi * wi} + b bias weight y = f(c) Neuron x1 1 activation function weight2 c y x 2 ▸ Using an activation function ReLU: y = max(0, c) introduces non-linearity, which Rectified Linear Unit can help the neural network to learn sigmoid: y = 1/(1-e-x) tanh: y = 2/(1-e-2x)-1 Deep Neural Network 7 Loss ▸ Error rate ▸ Machine learning is about minimizing the loss by adjusting parameters (weights) loss Gradient Descent Deep Neural Network 8 Loss != Accuracy ▸ Loss decreases when Which model is better? ▹ Accuracy increases Model 1 Model 2 T 20% 90% ▹ Correct predictions are Q1 made with higher confidence F 80% 10% T 52% 49% ▹ Incorrect predictions are Q2 F 48% 51% made with lower confidence T 47% 52% Q3 F 53% 48% Accuracy 66.7% 33.3% Deep Neural Network 9 Random initialization ▸ The weights are random initialized so that each neuron can detect different patterns ▸ Different initializations can arrive at different local minima Deep Neural Network 10 Training and Validation (Testing) Loss ▸ We split the examples into 2 sets: training and validation ▸ For example, if we have 1000 examples ▹ We can make 800 training examples for the network to learn ▹ 200 validation examples to evaluate the network's ability to explain unseen examples Deep Neural Network 11 All input features, 3 hidden layers: 8-6-4 Underfitting and Overfitting Overfit: gap between test and training loss x and y features, x and y features, 1 hidden layer: 2 1 hidden layer: 4 Underfit: high training Just right: low training Deep Neural Network 12 and validation losses and validation losses Underfitting and Overfitting ▸ To fix underfitting ▹ More input features ▹ More hidden layers ▸ To fix overfitting ▹ More training examples ▹ Use simpler model ▹ Add regularization Deep Neural Network 13 MNIST Database ▸ Mixed National Institute of Standards and Technology ▸ Multiclass-classification of hand-written digits ▸ Each example in MNIST consists of the 28x28 image and the corresponding label (0-9) ▸ If we express the label using one-hot encoding, it would be a 1D tensor of shape [10] ▹ [0,0,0,0,0,0,1,0,0,0] for digit 6 CC-BY 3.0: https://www.tensorflow.org/versions/master/tutorials/mnist/beginners/ Deep Neural Network 14 MNIST Database ▸ The database contains 50000 training examples (for learning) and 10000 validation examples (for evaluation) ▸ Can you guess what's the validation accuracy achieved by the best model? (random guessing = 10%) ▸ What about by the average human? <80% 80-90% 90-95% 95%-97% 97-98% 98-99% 99-99.5% >99.5% Deep Neural Network 15 Tensorflow ▸ The flow of data is a directed acyclic graph ▸ Tensor (multidimensional matrices) flows through edges ▸ Nodes are operators (e.g. load data, addition, matrix multiplication, max) ▸ Write Python/C++ code to describe the graph ▸ Tensorflow manages the execution (e.g. distribute computation to different CPU / GPU) ▸ pip install --upgrade tensorflow Deep Neural Network 16 Tensor ▸ Example of 1D example: ▹ height weight age loc_nt loc_kln loc_hk ▹ 158 50 12 0 1 0 (one-hot encoding for location) ▸ 2D examples: grayscale images ▹ 28(h) x 28(w) ▸ 3D examples: RGB images ▹ 32(h) x 32(w) x 3(channels) CC-BY 3.0: https://www.tensorflow.org/versions/master/tutorials/mnist/beginners/ Deep Neural Network 17 Tensor ▸ When we stack 55000 28x28 grayscale images together ▹ We have a tensor of shape [55000, 28, 28,1] ▹ We can also reshape the tensor to [55000, 784] CC-BY 3.0: https://www.tensorflow.org/versions/master/tutorials/mnist/beginners/ Deep Neural Network 18 Windows Remote Desktop Connection ▸ A virtual machine has been set up with Tensorflow installed ▸ Windows -> type "Remote" ▸ Hostname: hkoi11530.cloudapp.net Deep Neural Network 19 File Templates and Datasets ▸ Some templates and datasets have been prepared for you ▸ Go to the Public and copy tensorflow, tfpreprocess and mystery.7z to your user's Documents Deep Neural Network 20 Notes ▸ In the next steps we are only defining the graph ▹ Nothing is computed yet ▸ In Python, indentation defines the scope ▹ Each line in a scope must be indented in the same way ▹ Mixing tabs with spaces results in error ▹ Since Tensorflow is developed by Google, which uses 4 spaces, you should use 4 spaces as well Deep Neural Network 21 b1 a1 w11 Creating a Neural Network w12 c1 f1 a2 ▸ Let's build a network with 1 hidden layer ci = sum{aj* wij} + bi fi = max(0, ci) ▹ Activation function: ReLU Input Layer Bias Hidden Layer Bias Output Layer ... 10 nodes ... ... (1 for each digit) 28x28 nodes ?? nodes ... (1 for each pixel) (1 for each kernel) ... ... Deep Neural Network 22 Creating a Neural Network ▸ Create a file simple.py Returns Input Tensor unknown number of images images y inference [-1, 28, 28, 1] [-1, 10] num_classes image_size channels Deep Neural Network 23 Bias Layer Architecture of a Layer 4 spaces ... ... input_size output_size nodes nodes ... ... biases [-1, input_size] [output_size] input_tensor local activations matmul + ReLU returns [-1, output_size] weights [input_size, output_size] [-1, output_size] Deep Neural Network 24 Stacking the Layers 4 spaces image_size = 28 channels = 1 HIDDEN_NODES = 48 num_classes = 10 reshaped hidden output [-1, 28, 28, 1] [-1, 784] [-1, 48] [-1, 10] images reshape layer(784, 48) layer(48, 10) returns Deep Neural Network 25 Interpreting the Output ▸ We predict the class having the highest value in the output inference main.py (written for you already) Deep Neural Network 26 Probabilities ▸ Probabilities should add up to 1 ▸ We apply softmax to the output Unlikely to be 8 y[i] y[i] ▸ P(y_ = i) = e / sum{e } The digit has 37% chance to be "2" Deep Neural Network 27 Loss function ▸ Compute the loss for the optimizer to train our model Shape: [-1] Rows containing cross entropy loss of each example Shape: [1] Average of the cross entropy losses Deep Neural Network 28 Cross Entropy ▸ For classification, one popular loss function is the Cross Entropy loss ▸ It takes account into the probabilities instead of accuracy Cross Entropy loss = −log (0.375) ≈ 0.98 (lower is better) Deep Neural Network 29 Optimizer ▸ The optimizer traverses the graph in reverse topological order to compute gradients at each node 훿푧 훿푧 훿푦 ▸ = 훿푥 훿푦 훿푥 ▸ Then the optimizer adjust the weights variables to lower the loss ▸ Different of optimizers adjust the weights differently Deep Neural Network 30 Notes ▸ These parts have been written for you ▹ Data loading (it has to be efficient because the data loading part is usually the bottleneck for small networks) ▹ Progress logging (accuracy and loss) ▹ Saving the model so that it can be restored for more training or serving Deep Neural Network 31 Training ▸ Open command prompt in the working directory ▸ python main.py --logdir log/mnist1 What's the accuracy at Step 1000? Training Validation Deep Neural Network 32 Visualizing the Training Progress ▸ Tensorflow comes with Tensorboard – a visualization tool ▸ Open another command prompt in the working directory ▸ tensorboard --logdir log --reload_interval 15 --port PORT ▸ Open the browser and enter URL: http://localhost:PORT Stop the training (CTRL+C) when the validation loss decreases very slowly Deep Neural Network 33 Confusion Matrix ▸ Confusion Matrix helps us to understand how well the model is performing, and which classes are particular difficult Deep Neural Network 34 Stopping, Resuming and Resetting Training ▸ Training will resume provided that you specified the same logdir ▹ The graph must be the same ▸ If you would like to reset and restart training using the same logdir, you must: ▹ Stop tensorboard (CTRL+C) otherwise you cannot delete the logdir ▹ Delete the logdir (e.g. mnist1 in log) ▹ Start training ▹ Start tensorboard again Deep Neural Network 35 Visualizing the Hidden Layer ▸ By visualizing the nodes in the Hidden Layer, we can have a sense of what patterns the neural network is trying to recognize ▸ Beware of indentation! Deep Neural Network 36 Visualizing the Hidden Layer ▸ python main.py --logdir log/mnist2 ▸ In Tensorboard, go to the Images tab ▸ Here you can see the weights of the 48 nodes in the hidden layer Deep Neural Network 37 Adding Regularization ▸ We add a L2 loss ▹ The squared weights will be added to the loss function ▹ The regularization controls the strength of regularization (0 = no regularization) ▸ python main.py --logdir log/mnist3 --regularization What's the accuracy at Step 1000? 0.005 Training Validation Deep Neural Network 38 Effects