Modularity in Artificial Neural Networks
Total Page:16
File Type:pdf, Size:1020Kb
University of Nottingham PhD Thesis Modularity in Artificial Neural Networks Student: Supervisor: Mohammed Amer Dr. Tomás Maul Co-Supervisor: Dr. Iman Yi Liao Faculty of Science and Engineering School of Computer Science June, 2021 Acknowledgements I am indebted to Dr. Tomás for his endless support on the professional and the personal levels during the long PhD journey. I was really lucky to have the opportunity to be his student and to work under his supervision. I would also like to thank my co-supervisor, Dr. Iman, for her valuable advice and feedback during my PhD. I wish to show my gratitude to all my colleagues, especially Moataz, Ijaz, Khoa and Tuong. It was really a privilege to be around you. No words can really express how indebted I am to my father, mother and brother for always being there for me. Their endless caring is the reason I am writing these words right now. I can not express how much I am blessed to have Asmaa, my wife, beside me during the rough and the good times. Without her support, I could not have made it this far. Special mention to my little daughters, Malika and Farida, who were my shining stars during this journey. I dedicate this to Hamid Abdullatif, my grandfather. Publications and Grants • Amer, M., Maul, T. A review of modularization techniques in artificial neural networks. Artif Intell Rev 52, 527–561 (2019). https://doi.org/10.1007/s10462-019-09706-7 • Amer, M., Maul, T. Path Capsule Networks. Neural Process Lett 52, 545–559 (2020). https://doi.org/10.1007/s11063-020- 10273-0 • Amer, M., Maul, T. Weight Map Layer for Noise and Adver- sarial Attack Robustness. arXiv:1905.00568 (2019). [Submit- ted] • Four quarters, each of 200k CPU hours, at HPC Midlands+ (Athena), which was funded by the EPSRC on grant EP/P020232/1 as part of the HPC Midlands+ consortium. Abstract Artificial neural networks are deep machine learning models that excelat complex artificial intelligence tasks by abstracting concepts through multiple layers of feature extraction. Modular neural networks are artificial neural net- works that are composed of multiple subnetworks called modules. The study of modularity has a long history in the field of artificial neural networks and many of the actively studied models in the domain of artificial neural networks have modular aspects. In this work, we aim to formalize the study of mod- ularity in artificial neural networks and outline how modularity can beused to enhance some neural network performance measures. We do an extensive review of the current practices of modularity in the literature. Based on that, we build a framework that captures the essential properties characterizing the modularization process. Using this modularization framework as an anchor, we investigate the use of modularity to solve three different problems in artificial neural networks: balancing latency and accuracy, reducing model complexity and increasing robustness to noise and adversarial attacks. Artificial neural networks are high-capacity models with high data and computational demands. This represents a serious problem for using these models in environments with limited computational resources. Using a differential architectural search tech- nique, we guide the modularization of a fully-connected network into a modular multi-path network. By evaluating sampled architectures, we can establish a relation between latency and accuracy that can be used to meet a required soft balance between these conflicting measures. A related problem is reducing the complexity of neural network models while minimizing accuracy loss. CapsNet is a neural network architecture that builds on the ideas of convolutional neural networks. However, the original architecture is shallow and has wide layers that contribute significantly to its complexity. By replacing the early wide layers by parallel deep independent paths, we can significantly reduce the com- plexity of the model. Combining this modular architecture with max-pooling, DropCircuit regularization and a modified variant of the routing algorithm, we can achieve lower model latency with the same or better accuracy compared to the baseline. The last problem we address is the sensitivity of neural net- work models to random noise and to adversarial attacks, a highly disruptive form of engineered noise. Convolutional layers are the basis of state-of-the-art computer vision models and, much like other neural network layers, they suf- fer from sensitivity to noise and adversarial attacks. We introduce the weight map layer, a modular layer based on the convolutional layer, that can increase model robustness to noise and adversarial attacks. We conclude our work by a general discussion about the investigated relation between modularity and the addressed problems and potential future research directions. Contents 1 Introduction 1 1.1 An Overview of Deep Learning .................. 1 1.2 Basic Topologies of ANNs ..................... 4 1.3 Learning Types ........................... 6 1.4 Application Domains ........................ 7 1.5 Challenges to Deep Learning .................... 8 1.5.1 Data Greediness ...................... 8 1.5.2 Overfitting and Underfitting ................ 8 1.5.3 Complexity ......................... 9 1.5.4 Adversarial Attacks .................... 10 1.5.5 Catastrophic Forgetting .................. 11 1.6 Biological Origins of Modularity .................. 12 1.7 Modularity in ANNs ........................ 13 1.8 Why Modularity? .......................... 14 1.9 Challenges to Modularity ..................... 15 1.10 Our Research ............................ 17 1.10.1 Review of Modularity in ANNs .............. 19 1.10.2 Balancing Latency and Accuracy ............. 20 1.10.3 Reducing Complexity and Maintaining Accuracy .... 21 1.10.4 Reducing Sensitivity to Noise and Adversarial Attacks . 23 1.11 Experimental Design and Implementation ............ 24 1.11.1 MNIST ........................... 24 1.11.2 CIFAR10 .......................... 25 1.11.3 iWildCam2019 ....................... 25 1.12 Thesis Guide ............................ 25 2 A Review of Modularization Techniques in Artificial Neural Networks 27 2.1 Preface ................................ 27 2.2 Introduction ............................. 29 2.3 Modularity ............................. 33 2.4 Modularization Techniques ..................... 37 2.4.1 Domain ........................... 39 2.4.1.1 Manual ...................... 40 2.4.1.2 Learned ...................... 42 2.4.2 Topology .......................... 43 2.4.2.1 Highly-Clustered Non-Regular (HCNR) .... 44 2.4.2.2 Repeated Block ................. 46 2.4.2.3 Multi-Architectural ............... 52 2.4.3 Formation .......................... 53 2.4.3.1 Manual ...................... 54 2.4.3.2 Evolutionary ................... 55 2.4.3.3 Learned ...................... 56 2.4.4 Integration ......................... 58 2.4.4.1 Arithmetic-Logic ................. 58 2.4.4.2 Learned ...................... 59 2.5 Case Studies ............................. 62 2.6 Conclusion .............................. 63 3 Balancing Accuracy and Latency in Multipath Neural Net- works 65 3.1 Preface ................................ 65 3.2 Introduction ............................. 66 3.3 Multipath Neural Networks .................... 69 3.4 Neural Network Compression ................... 69 3.5 Neural Architecture Search .................... 70 3.6 Methodology ............................ 72 3.7 Experiments ............................. 78 3.7.1 Results ........................... 80 3.8 Discussion .............................. 85 3.9 Conclusion .............................. 86 3.10 Chapter Acknowledgements .................... 86 4 Path Capsule Networks 87 4.1 Preface ................................ 87 4.2 Introduction ............................. 89 4.3 Capsule Network .......................... 90 4.4 Multipath Architectures ...................... 92 4.5 Methods ............................... 93 4.6 Results ................................ 96 4.6.1 PathCapsNet Architecture ................. 96 4.6.2 MNIST ........................... 99 4.6.3 CIFAR10 .......................... 99 4.6.4 iWildCam2019 ....................... 100 4.6.5 RSA Analysis ........................ 100 4.7 Discussion .............................. 104 4.8 Conclusion .............................. 106 4.9 Chapter Acknowledgements .................... 107 5 Weight Map Layer for Noise and Adversarial Attack Robust- ness 108 5.1 Preface ................................ 108 5.2 Introduction ............................. 109 5.3 Adversarial Attack ......................... 111 5.4 Methods ............................... 113 5.5 Experiments ............................. 114 5.5.1 Results ........................... 117 5.6 Discussion .............................. 120 5.7 Conclusion .............................. 126 5.8 Chapter Acknowledgements .................... 126 6 Discussion and Conclusion 127 A Acronyms 131 References 135 Chapter 1 Introduction 1.1 An Overview of Deep Learning Machine Learning (ML) is a branch of Artificial Intelligence (AI) that ap- proaches complex tasks, mainly the ones that do not lend themselves to classical types of well-defined algorithms, through tuning mathematical models usinga corpus of data in a process known as learning. Learning is a process that aims at increasing the performance of some mathematical model on a given task by exposing it to experience, that is, a suitable form of information or data. While classical ML approaches revolutionized