Unsupervised Image Feature Learning for Convolutional Neural Networks

Unsupervised Image Feature Learning for Convolutional Neural Networks

UNSUPERVISED IMAGE FEATURE LEARNING FOR CONVOLUTIONAL NEURAL NETWORKS A THESIS SUBMITTED TO THE UNIVERSITY OF MANCHESTER FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN THE FACULTY OF SCIENCE AND ENGINEERING 2019 By Richard Hankins Candidate Department of Electrical and Electronic Engineering School of Engineering Contents Abstract 12 Declaration 13 Copyright 14 Acknowledgements 15 List of Publications 16 1 Introduction 17 1.1 Background . 17 1.2 Aims and Objectives . 21 1.3 Scope . 21 1.4 Structure . 23 2 Classical Methodologies 25 2.1 Feature Representations and Learning . 26 2.1.1 Hand-crafted Feature Representations . 28 2.1.2 Unsupervised Feature Learning . 34 2.2 Classifiers . 37 2.2.1 Logistic Regression . 39 2.2.2 Support Vector Machines . 40 2.2.3 k-nearest Neighbours and Decision Trees . 42 2.3 Datasets . 43 2.3.1 Image Datasets . 44 2.3.2 Video Datasets . 47 2 3 Deep Neural Networks 52 3.1 Introduction . 52 3.1.1 Feedforward Networks . 54 3.2 Related Work . 57 3.3 Convolutional Neural Networks . 65 3.3.1 Layers and Architectures . 67 3.3.2 Other Layers . 72 3.3.3 Optimisation and the Backpropagation Algorithm . 75 3.3.4 Pre-processing . 79 3.3.5 Issues . 81 3.4 2D Convolutional Neural Networks Case Studies . 82 3.4.1 Image Classifcation on the MNIST Dataset . 83 3.4.2 Action Classification on the Weizmann Dataset . 86 3.5 3D Convolutional Neural Networks Case Study . 92 3.5.1 Action Classification on the UCF Sports Dataset . 92 4 Self-Organising Map Network 104 4.1 Introduction . 104 4.2 Related Work . 106 4.3 Methodology . 108 4.3.1 Convolutional Self-Organising Map . 108 4.3.2 Discrete Cosine Transform (DCT) . 109 4.3.3 Markov Random Field . 109 4.3.4 Self-Organising Map Network (SOMNet) . 111 4.3.5 Markov Random Field Self-Organising Map Network (MRF- SOMNet) . 114 4.3.6 Computational Complexity . 115 4.4 Experiments and Discussion . 116 4.4.1 Comparison of Different Features and Encodings . 116 4.4.2 Evaluation on the MNIST Dataset . 117 4.4.3 Optimising Parameters on the CIFAR-10 Dataset . 120 4.4.4 Evaluation on the CIFAR-10 Dataset . 123 4.5 Conclusions and Future Work . 125 5 SOMNet with Aggregated Channel Connections 127 5.1 Introduction . 127 3 5.2 Related Work . 130 5.3 Methodology . 131 5.3.1 Proposed Method . 131 5.4 Experiment and Discussion . 133 5.4.1 Whitening . 133 5.4.2 Digit Classification on the MNIST Dataset . 134 5.4.3 Object Classification on the CIFAR-10 Dataset . 142 5.5 Conclusion and Future Work . 149 6 Filter Replacement in Convolutional Networks using Self-Organising Maps 151 6.1 Introduction . 151 6.2 Related Work . 153 6.3 Methodology . 156 6.3.1 Proposed Method . 156 6.3.2 Self-Organising Maps . 157 6.3.3 Convolutional Neural Networks . 157 6.3.4 Filter Replacement with Self-Organising Maps . 160 6.4 Object Classification Experiments and Discussion on the CIFAR-10 and CIFAR-100 Datasets . 163 6.4.1 Convolutional Neural Networks . 163 6.4.2 Filter Replacement with Self-Organising Maps . 164 6.5 Action Classification Experiments and Discussion on the UCF-50 Dataset181 6.5.1 Convolutional Neural Networks . 181 6.5.2 Filter Replacement with Self-Organising Maps . 184 6.6 Conclusion and Future Work . 190 7 Conclusions and Future Work 193 7.1 Conclusions . 193 7.2 Future Work . 198 7.2.1 SOMNet . 199 7.2.2 Supervised Channel Pooling . 200 7.2.3 Combining Supervised and Unsupervised Learning . 200 7.2.4 Temporal Models . 202 4 Bibliography 204 Word Count: 40269 5 List of Tables 3.1 Validation and test error as well as intra-class error using different sub- jects as the test set on Weizmann. 91 3.2 Absolute misclassification for each class using different subjects as the test set on Weizmann. 91 3.3 3D CNN architecture for UCF Sports . 95 3.4 3D baseline: accuracy on UCF Sports . 97 3.5 3D bounding box: accuracy on UCF Sports . 100 3.6 Accuracy on UCF Sports . 103 4.1 Computational Complexity . 116 4.2 Comparing features and encodings . 117 4.3 Error rate on MNIST . 120 4.4 Variations in block size and overlapping ratio of SOMNet on CIFAR-10 122 4.5 Variations in feature numbers of SOMNet on CIFAR-10 . 122 4.6 Accuracy on CIFAR-10 . 124 5.1 FAC layer: error rate on MNIST (p-values represent the comparison between each method with the method labelled not applicable (N/A) for each section). 136 5.2 SAC layer: error rate on MNIST (p-values represent the comparison between each method with the method labelled not applicable (N/A) for each section). 138 5.3 GAC layer: error rate on MNIST (p-values represent the comparison between each method with the method labelled not applicable (N/A) for each section). 140 5.4 Error rate on MNIST (p-values represent the comparison between each method with the method labelled not applicable (N/A) for each section). 142 6 5.5 FAC layer: accuracy on CIFAR-10 (the p-value represent the com- parison between each method with the method labelled not applicable (N/A)). 144 5.6 SAC layer: accuracy on CIFAR-10 (p-values represent the comparison between each method with the method labelled not applicable (N/A)). 145 5.7 GAC layer: accuracy on CIFAR-10 (p-values represent the comparison between each method with the method labelled not applicable (N/A)). 146 5.8 Accuracy on CIFAR-10 (p-values represent the comparison between each method with the method labelled not applicable (N/A) for each section). 148 6.1 Baseline 2D CNN architecture for CIFAR-10/CIFAR-100 . 159 6.2 Baseline 3D CNN architecture for UCF-50 . 161 6.3 Proposed 2D CNNNIN+SOM architecture for CIFAR-10/100 . 162 6.4 Baseline 2D CNN accuracy on CIFAR-10 (p-values represent the com- parison between each method with the method labelled not applicable (N/A)). 164 6.5 Baseline 2D CNN accuracy on CIFAR-100 (p-values represent the comparison between each method with the method labelled not ap- plicable (N/A)). 164 6.6 2D CNNNIN+SOM accuracy on CIFAR-10 . 166 6.7 2D CNNNIN+SOM accuracy on CIFAR-10 . 168 6.8 2D baseline CNN vs CNN+SOM subset accuracy on CIFAR-10 . 170 6.9 2D baseline CNN vs CNN+SOMTI subset accuracy on CIFAR-100 . 170 6.10 2D CNNNIN+SOMTI accuracy on CIFAR-10 (p-values represent the comparison between each method with the method labelled not appli- cable (N/A) in each column). 172 6.11 2D CNNNIN+SOMTI using 3 × 3 filters accuracy on CIFAR-100 (p- values represent the comparison between each method with the method labelled not applicable (N/A) in each column). 172 6.12 2D CNNNIN+SOMTI using 5 × 5 filters accuracy on CIFAR-100 (p- values represent the comparison between each method with the method labelled not applicable (N/A) in each column). 173 6.13 2D CNNNIN+SOMTI using 7 × 7 filters accuracy on CIFAR-100 (p- values represent the comparison between each method with the method labelled not applicable (N/A) in each column). 173 7 6.14 Accuracy on CIFAR-10 and CIFAR-100 (p-values represent the com- parison between each method with the method labelled not applicable (N/A) for each section). 180 6.15 Baseline 3D CNN accuracy on UCF-50 (p-values represent the com- parison between each method with the method labelled not applicable (N/A) for each section in each column). 183 6.16 3D CNN+SOMTI accuracy on UCF-50 . 185 6.17 3D CNN+SOMTI accuracy on UCF-50 (p-values represent the com- parison between each method with the method labelled not applicable (N/A) for each section). 187 6.18 Accuracy on UCF-50 . 190 8 List of Figures 2.1 Logistic sigmoid function . 39 2.2 A selection of examples from the MNIST dataset . 45 2.3 Frames from example videos from each class of the Weizmann dataset. 47 2.4 Frames from example videos from each class of the UCF Sports dataset. 48 2.5 Frames from example videos from each class of the UCF-50 dataset. 49 2.6 Frames from example videos from each class of the UCF-101 dataset. 50 3.1 Perceptron model (adapted from Rosenblatt 1958 [151]) . 55 3.2 Multilayer perceptron with a single hidden layer . 56 3.3 Non-linear sigmoidly functions . 57 3.4 Simple cells (adapted from Hubel 1995 [83]) . 68 3.5 Complex cells (adapted from Hubel 1995 [83]) . 68 3.6 Non-linear activation functions . 70 3.7 LeNet-5 architecture (adapted from LeCun et al.1998 [112]). 73 3.8 First layer convolutional filters at different stages of training on MNIST. 86 3.9 Confusion matrix for the MNIST experiment . 87 3.10 Comparison of 2D (a) and 3D (b) convolutions (the temporal depth of the 3D filter is equal to 3). The colours indicate shared weights (adapted from Ji et al. 2013 [89]) . 93 3.11 Confusion matrix for the 3D baseline experiment on UCF Sports. 98 3.12 Confusion matrix for the.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    229 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us