A Framework for Vector-Weighted Deep Neural Networks
Total Page:16
File Type:pdf, Size:1020Kb
UNLV Theses, Dissertations, Professional Papers, and Capstones 5-1-2020 A Framework for Vector-Weighted Deep Neural Networks Carter Chiu Follow this and additional works at: https://digitalscholarship.unlv.edu/thesesdissertations Part of the Artificial Intelligence and Robotics Commons, and the Computer Engineering Commons Repository Citation Chiu, Carter, "A Framework for Vector-Weighted Deep Neural Networks" (2020). UNLV Theses, Dissertations, Professional Papers, and Capstones. 3876. http://dx.doi.org/10.34917/19412042 This Dissertation is protected by copyright and/or related rights. It has been brought to you by Digital Scholarship@UNLV with permission from the rights-holder(s). You are free to use this Dissertation in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s) directly, unless additional rights are indicated by a Creative Commons license in the record and/or on the work itself. This Dissertation has been accepted for inclusion in UNLV Theses, Dissertations, Professional Papers, and Capstones by an authorized administrator of Digital Scholarship@UNLV. For more information, please contact [email protected]. A FRAMEWORK FOR VECTOR-WEIGHTED DEEP NEURAL NETWORKS By Carter Chiu Bachelor of Science { Computer Science University of Nevada, Las Vegas 2016 A dissertation submitted in partial fulfillment of the requirements for the Doctor of Philosophy { Computer Science Department of Computer Science Howard R. Hughes College of Engineering The Graduate College University of Nevada, Las Vegas May 2020 c Carter Chiu, 2020 All Rights Reserved Dissertation Approval The Graduate College The University of Nevada, Las Vegas April 16, 2020 This dissertation prepared by Carter Chiu entitled A Framework for Vector-Weighted Deep Neural Networks is approved in partial fulfillment of the requirements for the degree of Doctor of Philosophy – Computer Science Department of Computer Science Justin Zhan, Ph.D. Kathryn Hausbeck Korgan, Ph.D. Examination Committee Chair Graduate College Dean Kazem Taghva, Ph.D. Examination Committee Member Laxmi Gewali, Ph.D. Examination Committee Member Yoohwan Kim, Ph.D. Examination Committee Member Ge Kan, Ph.D. Graduate College Faculty Representative ii Abstract The vast majority of advances in deep neural network research operate on the basis of a real- valued weight space. Recent work in alternative spaces have challenged and complemented this idea; for instance, the use of complex- or binary-valued weights have yielded promising and fascinating results. We propose a framework for a novel weight space consisting of vector values which we christen VectorNet. We first develop the theoretical foundations of our pro- posed approach, including formalizing the requisite theory for forward and backpropagating values in a vector-weighted layer. We also introduce the concept of expansion and aggrega- tion functions for conversion between real and vector values. These contributions enable the seamless integration of vector-weighted layers with conventional layers, resulting in network architectures exhibiting height in addition to width and depth, and consequently models which we might be inclined to call tall learning. As a means of evaluating its effect on model performance, we apply our framework on top of three neural network architectural families| the multilayer perceptron (MLP), convolutional neural network (CNN), and directed acyclic graph neural network (DAG-NN)|trained over multiple classic machine learning and image classification benchmarks. We also consider evolutionary algorithms for performing neural architecture search over the new hyperparameters introduced by our framework. Lastly, we solidify the case for the utility of our contributions by implementing our approach on real- world data in the domains of mental illness diagnosis and static malware detection, achieving state-of-the-art results in both. Our implementations are made publicly available to drive further investigation into the exciting potential of VectorNet. iii Acknowledgments I am deeply grateful for the countless people in my life who gave me strength, drive, and courage as I embarked on this journey of a dissertation: For my advisor, Dr. Justin Zhan, who opened the door for me to the world of research in computer science, provided endless opportunities to grow as a scholar, and whose encouragement and guidance was instrumental in the production of this work; For the members of my committee, Dr. Kazem Taghva, Dr. Laxmi Gewali, Dr. Yoohwan Kim, and Dr. Ge Kan, for their input and continued support throughout my years at the University of Nevada, Las Vegas; For Elliott, Shen, Henry, Matt, Michael, Aeren, and everyone in the Big Data Hub family, with whom I had many fruitful conversations and were an unfailing source of inspiration and motivation; For my parents and family, for helping me find my path to where|and who|I am today, and supporting me every step of the way; And for JJ, for all her love and support, for believing in me, and for brightening every day. Thank you. Carter Chiu University of Nevada, Las Vegas May 2020 iv Table of Contents Abstract iii Acknowledgments iv Table of Contents v List of Tables ix List of Figures xi List of Algorithms xiv Chapter 1 Introduction 1 1.1 Contributions . .3 1.2 Outline . .4 Chapter 2 Literature Review 5 2.1 Network Architectures: Origins and Advancements . .5 2.2 Developments in Hyperparameters . .7 2.3 Weight Spaces . .8 2.4 Ensemble Learning . .9 2.5 Architecture and Hyperparameter Search . 10 2.5.1 Search Spaces . 10 2.5.2 Search Strategies . 11 2.5.3 Performance Estimation Strategies . 12 v Chapter 3 Theory of Vector-Weighted Layers 13 3.1 Definitions . 13 3.2 Forward Propagation . 15 3.3 Backpropagation . 16 3.4 Weight Space Conversion . 20 3.4.1 Expansion . 20 3.4.2 Aggregation . 21 3.5 Comparison with Ensemble Approaches . 21 Chapter 4 VectorNet Architectures 23 4.1 Benchmark Datasets . 23 4.1.1 Machine Learning Benchmarks . 24 4.1.2 Computer Vision Benchmarks . 25 4.2 Vector-Weighted Multilayer Perceptrons . 26 4.2.1 Implementation . 27 4.2.2 Datasets . 30 4.2.3 Settings . 30 4.2.4 Results . 33 Effects of Hyperparameter Selection . 37 4.3 Vector-Weighted Convolutional Neural Networks . 39 4.3.1 Implementation . 40 4.3.2 Datasets . 40 4.3.3 Settings . 41 4.3.4 Results . 44 Effects of Hyperparameter Selection . 46 4.4 Vector-Weighted Directed Acyclic Graph Neural Networks . 47 4.4.1 Definition . 47 Network Representation . 49 Training Procedure Via Evolutionary Algorithm . 50 Comparison with Multilayer Perceptrons . 54 4.4.2 Theory of Vectorized DAG-NNs . 56 Forward Propagation . 56 Backpropagation . 60 4.4.3 Implementation . 60 4.4.4 Datasets . 62 4.4.5 Settings . 63 4.4.6 Results . 64 Effects of Hyperparameter Selection . 67 Chapter 5 Neural Architecture Search Over VectorNet Hyperparameters 70 5.1 VectorNet Hyperparameters . 71 5.2 Evolutionary Algorithm . 76 5.2.1 Phenotype and Genotype . 77 5.2.2 Initialization . 78 5.2.3 Evaluation of Fitness . 80 5.2.4 Selection . 81 5.2.5 Crossover and Mutation . 83 5.2.6 Termination . 85 5.3 Implementation . 89 5.4 Experiments . 92 5.4.1 Preliminary Evaluation . 92 Settings . 93 Results . 95 5.4.2 Main Experiments . 99 Settings . 99 Results . 101 Chapter 6 Case Studies 108 6.1 Diagnosis of Schizophrenia . 108 6.1.1 Methods . 109 Data Sources . 110 Preprocessing and Feature Selection . 110 Models and Experimental Settings . 113 6.1.2 Results . 115 6.2 Static Malware Detection . 118 6.2.1 Methods . 119 Data, Feature Engineering, and Preprocessing . 119 Model and Experimental Settings . 125 6.2.2 Results . 127 Chapter 7 Conclusions and Future Work 130 Appendix A Copyright Acknowledgments 134 Appendix B Source Code 135 Appendix C Comparison of DAG-NNs and Multilayer Perceptrons 136 C.1 Datasets . 136 C.2 Settings . 136 C.3 Results . 140 C.3.1 Comparison with Larger Networks . 140 C.3.2 Runtime Analysis . 141 Bibliography 143 Curriculum Vitae 157 List of Tables 4.1 Benchmark dataset characteristics . 24 4.2 Accuracy (%) on the first multilayer perceptron . 34 4.3 Accuracy (%) on the second multilayer perceptron . 35 4.4 Accuracy (%) on the third multilayer perceptron . 37 4.5 Effect of VectorNet hyperparameters on reduction in error relative to baseline multilayer perceptron (%) . 39 4.6 Classification accuracy (%) on the first convolutional neural network . 44 4.7 Classification accuracy (%) on the second convolutional neural network . 45 4.8 Effect of VectorNet hyperparameters on reduction in error relative to baseline convolutional neural network (%) . 46 4.9 Symbols describing a DAG-NN . 48 4.10 Accuracy (%) on the directed acyclic graph.