Autogenerative Networks Oscar Chang Submitted in Partial
Total Page:16
File Type:pdf, Size:1020Kb
Autogenerative Networks Oscar Chang Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy under the Executive Committee of the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2021 © 2021 Oscar Chang All Rights Reserved Abstract Autogenerative Networks Oscar Chang Artificial intelligence powered by deep neural networks has seen tremendous improvements in the last decade, achieving superhuman performance on a diverse range of tasks. Many worry that it can one day develop the ability to recursively self-improve itself, leading to an intelligence explosion known as the Singularity. Autogenerative networks, or neural networks generating neural networks, is one major plausible pathway towards realizing this possibility. The object of this thesis is to study various challenges and applications of small-scale autogenerative networks in domains such as artificial life, reinforcement learning, neural network initialization and optimization, gradient-based meta-learning, and logical networks. Chapters 2 and 3 describe novel mechanisms for generating neural network weights and embeddings. Chapters 4 and 5 identify problems and propose solutions to fix optimization difficulties in differentiable mechanisms of neural network generation known as Hypernetworks. Chapters 6 and 7 study implicit models of network generation like backpropagating through gradient descent itself and integrating discrete solvers into continuous functions. Together, the chapters in this thesis contribute novel proposals for non-differentiable neural network generation mechanisms, significant improvements to existing differentiable network generation mechanisms, and an assimilation of different learning paradigms in autogenerative networks. Table of Contents Acknowledgments ........................................ 1 Motivation ............................................ 1 Chapter 1: Overview ...................................... 5 1.1 Preliminaries . 5 1.1.1 Limits of Recursive Computation . 5 1.1.2 Importance of Good Representations . 6 1.2 Overview of Common Deep Learning Methods . 8 1.3 Related Work . 11 1.3.1 Early Prior Work (Pre-2000s) . 11 1.3.2 Recent Prior Work . 11 1.4 Summary of Our Contributions . 14 1.5 Publications . 18 Chapter 2: Neural Network Quine ............................... 20 2.1 Introduction . 20 2.1.1 Motivations . 21 2.1.2 Related Work . 22 2.2 Building the Network . 23 i 2.2.1 How can a neural network refer to itself? . 23 2.2.2 Vanilla Quine . 23 2.2.3 Auxiliary Quine . 26 2.3 Training the Network . 27 2.3.1 Network Architecture . 27 2.3.2 How do we train a neural network quine? . 27 2.4 Results and Discussion . 29 2.4.1 Vanilla Quine . 29 2.4.2 Is this a quine? . 31 2.4.3 Hill-climbing . 31 2.4.4 Generational Replication . 32 2.4.5 Auxiliary Quine . 33 2.5 Conclusion . 35 Chapter 3: Agent Embeddings ................................. 37 3.1 Introduction . 37 3.1.1 Our Contribution . 38 3.2 Related Work . 39 3.2.1 Interpretability . 39 3.2.2 Generative Modeling . 40 3.2.3 Meta-Learning . 40 3.2.4 Bayesian Neural Networks . 41 3.3 Learning Agent Embeddings for Cart-Pole . 41 ii 3.3.1 Supervised Generation . 41 3.3.2 Cart-Pole . 41 3.3.3 CartPoleNet . 42 3.3.4 CartPoleGen . 43 3.3.5 Sampling from CartPoleGen . 44 3.4 Experimental Results and Discussion . 46 3.4.1 Convergent Learning . 46 3.4.2 Exploring the Latent Space . 49 3.4.3 Repairing Missing Weights . 51 3.5 Limitations of Supervised Generation . 53 3.5.1 High Sample Complexity . 53 3.5.2 Subpar Model Performance . 54 3.5.3 Scaling Issues . 54 3.6 Potential Applications for AI . 55 3.7 Conclusion . 56 Chapter 4: Hypernetwork Initialization ............................ 57 4.1 Introduction . 57 4.2 Preliminaries . 59 4.2.1 Ricci Calculus . 59 4.2.2 Xavier Initialization . 59 4.2.3 Kaiming Initialization . 60 4.3 Review of Current Methods . 61 iii 4.4 Hyperfan Initialization . 62 4.4.1 Hyperfan-in . 63 4.4.2 Hyperfan-out . 64 4.5 Experiments . 65 4.5.1 Feedforward Networks on MNIST . 66 4.5.2 Continual Learning on Regression Tasks . 67 4.5.3 Convolutional Networks on CIFAR-10 . 68 4.5.4 Bayesian Neural Networks on ImageNet . 69 4.6 Conclusion . 70 Chapter 5: Hypernetwork Optimization ............................ 72 5.1 Introduction . 72 5.2 Catalog of hypergenerative Networks . 72 5.3 Stability under Hypergeneration . 75 5.4 Experiments . 75 5.5 Conclusion . 76 Chapter 6: Gradient-Based Meta-Learning .......................... 77 6.1 Introduction . 77 6.2 Review of Gradient-Based Meta-Learning . 80 6.2.1 MAML . 81 6.2.2 Meta-SGD . 82 6.2.3 MAML++ . 82 6.2.4 Regularization Methods . 83 iv 6.3 Insights from Multi-Task Learning . 84 6.3.1 Multi-Task Learning Regularizes Meta-Learning . 84 6.3.2 Meta-Learning Complements Multi-Task Learning . 84 6.3.3 Applying Multi-Task Learning Asynchronously . 85 6.4 Gradient Sharing . 86 6.5 Experimental Results and Discussions . 88 6.5.1 Acceleration of Meta-Training . 89 6.5.2 Bigger Inner Loop Learning Rates . 89 6.5.3 Comparable Meta-Test Performance . 90 6.5.4 Evolution of m and , through Meta-Training . 91 6.6 Conclusion . 92 Chapter 7: Logical Networks ................................. 95 7.1 Introduction . 95 7.1.1 Our Contribution . 98 7.2 Background . 100 7.2.1 SATNet . 100 7.2.2 Visual Sudoku . 100 7.3 SATNet Fails at Symbol Grounding . 101 7.3.1 The Absence of Output Masking . 101 7.3.2 Visual Sudoku . 103 7.4 MNIST Mapping Problem . 104 7.4.1 Configuring SATNet Properly . 105 v 7.5 Conclusion . ..