Copyright by Mingzhang Yin 2020 the Dissertation Committee for Mingzhang Yin Certifies That This Is the Approved Version of the Following Dissertation

Total Page:16

File Type:pdf, Size:1020Kb

Copyright by Mingzhang Yin 2020 the Dissertation Committee for Mingzhang Yin Certifies That This Is the Approved Version of the Following Dissertation Copyright by Mingzhang Yin 2020 The Dissertation Committee for Mingzhang Yin certifies that this is the approved version of the following dissertation: Variational Methods with Dependence Structure Committee: Mingyuan Zhou, Supervisor Purnamrita Sarkar Qixing Huang Peter Mueller Variational Methods with Dependence Structure by Mingzhang Yin DISSERTATION Presented to the Faculty of the Graduate School of The University of Texas at Austin in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY THE UNIVERSITY OF TEXAS AT AUSTIN May 2020 Dedicated to my loved ones. Acknowledgments I always think the pursuit of truth is a path of beauty. I am fortunate to take such a journey at the University of Texas at Austin with many insightful and supportive people, who make the experience memorable. First and foremost, I want to express my sincere gratitude to my advisor, Dr. Mingyuan Zhou. Back in 2016, I stepped into the deep learning realm and was very new to the independent research. I will never forget the patience he had when I was in difficulties, the firm confidence he gave when I was hesitating and the moments we cheered together for new discoveries. With comprehensive knowledge, he taught me a taste of choosing right problems and an attitude towards science - honest, rigorous and innovative. Also, many exciting projects cannot be completed without his generous allowance and encouragement to explore new areas on my own interest. I appreciate his fostering and protection of my curiosity. In retrospect, I am blessed to have Dr. Zhou as my advisor, to have the opportunity to learn and work with him. I am also grateful to Dr. Purnamrita Sarkar, for the pure joy I felt when we played math on the whiteboard and for the comfort she gave when I hit rock bottom. I want to thank Dr. George Tucker for hosting my internship at Google Brain where we worked with Doctors Chelsea Finn and Sergey Levine. I learned a lot from the inspiring discussions over hundreds of emails. Thanks Dr. v Corwin Zigler for guiding me into the causal inference in the final semester. I want to thank all my other co-authors: Dr. Bowei Yan, who gave generous help when I was a junior in Ph.D. program; Yuguang Yue, who has been studying with me since the undergraduate and Dr. Y.X. Rachael Wang who gave me detailed mentorship during our collaboration. Thanks to Doctors Stephen G Walker, James Scott, Constantine Caramanis, Peter Müller, Qixing Huang for the wonderful courses and for being in my committee. I want to take this chance to thank my friends at Austin with whom I spent happy leisure time: Carlos Tadeu Pagani Zanini, Su Chen, Yang Ni, Kelly Kang, Matteo Vestrucci, Vera Liu, Xinjie Fan, Yanxin Li, Zewen Hanxi, Qiaohui Lin, Yan Long, Xiaoyu Qian, Yibo Hu. Thanks to all my friends that are not at Austin; I can feel your support even remotely. I want to thank the warm staffs and the Department of Statistics and Data Science for the bright office and financial supports. Thanks to all the cafes, bookstores and parks in Austin for hosting my pondering upon life’s smallest and biggest questions. My mother, Ying Hu, and father, Cunzhen Yin, you are my role models. I hope my work can be as solid as the water turbines my mother designed, and as useful as the food machines my father guided to construct. Thanks for your everlasting life guidance, incredible support on my study and your caring about my health. This space is too small to contain all my gratitude. Finally, a very special thank to Jing Zhang. Though often far away, I never felt a second you were not standing by me. You are always in my heart. This thesis is dedicated to you and my parents. vi Variational Methods with Dependence Structure Publication No. Mingzhang Yin, Ph.D. The University of Texas at Austin, 2020 Supervisor: Mingyuan Zhou It is a common practice among humans to deduce, to explain and to make predictions based on concepts that are not directly observable. In Bayesian statistics, the underlying propositions of the unobserved latent variables are summarized in the posterior distribution. With the increasing complexity of real- world data and statistical models, fast and accurate inference for the posterior becomes essential. Variational methods, by casting the posterior inference problem in the optimization framework, are widely used for their flexibility and computational efficiency. In this thesis, we develop new variational methods, studying their theoretical properties and applications. In the first part of the thesis, we utilize dependence structures towards addressing fundamental problems in variational inference (VI): posterior uncer- tainty estimation, convergence properties, and discrete optimization. Though it is flexible, variational inference often underestimates the posterior uncertainty. vii This is a consequence of the over-simplified variational family. Mean-field variational inference (MFVI), for example, uses a product of independent distributions as a coarse approximation to the posterior. As a remedy, we propose a hierarchical variational distribution with flexible parameterization that can model the dependence structure between latent variables. With a newly derived objective, we show that the proposed variational method can achieve accurate and efficient uncertainty estimation. We further theoretically study the structured variational inference in the setting of the Stochastic Blockmodel (SBM). The variational distribution is constructed with a pairwise structure among the nodes of a graph. We prove that, in a broad density regime and for general random initializations, the estimated class labels by structured VI converge to the ground truth with high probability. Empirically, we demonstrate structured VI is more robust compared with MFVI when the graph is sparse and the signal to noise ratio is low. When the latent variables are discrete, gradient descent based VI often suffers from bias and high variance in the gradient estimation. With correlated random samples, we propose a novel unbiased, low-variance gradient estimator. We demonstrate that under certain constraints, such correlated sampling gives an optimal control variates for the variance reduction. The efficient gradient estimation can be applied to solve a wide range of problems such as the variable selection, reinforcement learning, natural language processing, among others. For the second part of the thesis, we apply variational methods to viii the study of generalization problems in the meta-learning. When trained over multiple-tasks, we identify that a variety of the meta-learning algorithms implicitly require the tasks to have a mutually-exclusive dependence structure. This prevents the task-level overfitting problem and ensures the fast adaptation of the algorithm in the face of a new task. However, such dependence structure may not exist for general tasks. When the tasks are non-mutually exclusive, we develop new meta-learning algorithms with variational regularization to prevent the task-level overfitting. Consequently, we can expand the meta-learning to the domains which it cannot be effective on before. Attribution This dissertation incorporates the outcomes from extensive collaborations. Chapter 2 for uncertainty estimation is the product of collabo- ration with Dr. Mingyuan Zhou and was published at International Conference on Machine Learning (ICML) 2018. Chapter 3 pertains to the theoretical analysis of structured VI which was completed with Doctors Y. X. Rachel Wang and Purnamrita Sarkar, and was published at International Conference on Artificial Intelligence and Statistics (AISTATS) 2020. Chapter 4 includes the discrete optimization work with Dr. Mingyuan Zhou and was published at International Conference on Learning Representations (ICLR) 2019. Chapter 5 for meta-learning is the result of collaboration with Doctors George Tucker, Mingyuan Zhou, Sergey Levine and Chelsea Finn, published at ICLR 2020. ix Table of Contents Acknowledgments v Abstract vii List of Tables xiv List of Figures xv Chapter 1. Introduction 1 1.1 Variational Inference . .1 1.2 Variational Methods for Statistical Learning . .6 1.2.1 Expectation-Maximization Algorithm . .7 1.2.2 Deep Generative Model . .7 1.2.3 Multi-task Learning . .9 Chapter 2. Uncertainty Estimation in Variational Inference 11 2.1 Variational Inference with Dependence Structures . 12 2.2 Semi-Implicit Variational Inference . 15 2.3 Optimization for SIVI . 17 2.3.1 Degeneracy Problem . 18 2.3.2 Surrogate Lower Bound . 19 2.4 Experimental Results . 21 2.4.1 Expressiveness of SIVI . 22 2.4.2 Negative Binomial Model . 23 2.4.3 Bayesian Logistic Regression . 26 2.4.4 Semi-Implicit Variational Autoencoder . 30 2.5 Concluding Remarks . 33 x Chapter 3. Structured Variational Inference for Community De- tection 34 3.1 Theoretical Analyisis for Variational Inference . 35 3.2 Problem Setup and Proposed Work . 39 3.2.1 Preliminaries . 39 3.2.2 Variational Inference with Pairwise Structure (VIPS) . 40 3.3 Main Results . 47 3.4 Experimental Results . 52 3.5 Discussion and Generalizations . 56 Chapter 4. Variational Inference with Discrete Latent Variables 59 4.1 Optimization for Discrete Latent Variable Models . 60 4.2 Main Result . 63 4.2.1 Univariate ARM Estimator . 64 4.2.2 Multivariate Generalization . 66 4.2.3 Effectiveness of ARM for Variance Reduction . 67 4.3 Applications in Discrete Optimization . 69 4.3.1 ARM for Variational Auto-Encoder . 70 4.3.2 ARM for Maximum Likelihood Estimation . 72 4.4 Experimental Results . 72 4.4.1 Discrete Variational Auto-Encoders . 75 4.4.2 Maximizing Likelihood for a Stochastic Binary Network 82 4.5 Concluding Remarks . 83 Chapter 5. Meta-Learning with Variational Regularization 85 5.1 Meta-Learning and Task Overfitting . 86 5.2 Preliminaries . 89 5.3 The Memorization Problem in Meta-Learning . 90 5.4 Meta Regularization Using Variational Methods . 94 5.4.1 Meta Regularization on Activations . 95 5.4.2 Meta Regularization on Weights .
Recommended publications
  • “Slice-Of-Life” Customization of Bankruptcy Models: Belarusian Experience and Future Development
    PRACE NAUKOWE Uniwersytetu Ekonomicznego we Wrocławiu RESEARCH PAPERS of Wrocław University of Economics Nr 381 Financial Investments and Insurance – Global Trends and the Polish Market edited by Krzysztof Jajuga Wanda Ronka-Chmielowiec Publishing House of Wrocław University of Economics Wrocław 2015 Copy-editing: Agnieszka Flasińska Layout: Barbara Łopusiewicz Proof-reading: Barbara Cibis Typesetting: Małgorzata Czupryńska Cover design: Beata Dębska Information on submitting and reviewing papers is available on the Publishing House’s website www.pracenaukowe.ue.wroc.pl www.wydawnictwo.ue.wroc.pl The publication is distributed under the Creative Commons Attribution 3.0 Attribution-NonCommercial-NoDerivs CC BY-NC-ND © Copyright by Wrocław University of Economics Wrocław 2015 ISSN 1899-3192 e-ISSN 2392-0041 ISBN 978-83-7695-463-9 The original version: printed Publication may be ordered in Publishing House tel./fax 71 36-80-602; e-mail: [email protected] www.ksiegarnia.ue.wroc.pl Printing: TOTEM Contents Introduction ..................................................................................................... 9 Roman Asyngier: The effect of reverse stock split on the Warsaw Stock Ex- change ......................................................................................................... 11 Monika Banaszewska: Foreign investors on the Polish Treasury bond market in the years 2007-2013 ................................................................................ 26 Katarzyna Byrka-Kita, Mateusz Czerwiński: Large block trades
    [Show full text]
  • Probit — Probit Regression
    Title stata.com probit — Probit regression Description Quick start Menu Syntax Options Remarks and examples Stored results Methods and formulas References Also see Description probit fits a probit model for a binary dependent variable, assuming that the probability of a positive outcome is determined by the standard normal cumulative distribution function. probit can compute robust and cluster–robust standard errors and adjust results for complex survey designs. Quick start Probit model of y on continuous variable x1 probit y x1 Add square of x1 probit y c.x1##c.x1 As above, but report bootstrap standard errors probit y c.x1##c.x1, vce(bootstrap) Bootstrap estimates of coefficients bootstrap _b: probit y c.x1##c.x1 Adjust for complex survey design using svyset data and add x2 svy: probit y c.x1##c.x1 x2 Menu Statistics > Binary outcomes > Probit regression 1 2 probit — Probit regression Syntax probit depvar indepvars if in weight , options options Description Model noconstant suppress constant term offset(varname) include varname in model with coefficient constrained to 1 asis retain perfect predictor variables constraints(constraints) apply specified linear constraints SE/Robust vce(vcetype) vcetype may be oim, robust, cluster clustvar, bootstrap, or jackknife Reporting level(#) set confidence level; default is level(95) nocnsreport do not display constraints display options control columns and column formats, row spacing, line width, display of omitted variables and base and empty cells, and factor-variable labeling Maximization maximize options control the maximization process; seldom used nocoef do not display the coefficient table; seldom used collinear keep collinear variables coeflegend display legend instead of statistics indepvars may contain factor variables; see [U] 11.4.3 Factor variables.
    [Show full text]
  • Times, 14, Bold
    Università degli Studi di Bergamo Facoltà di Ingegneria Dottorato in Economia e Management della Tecnologia Issues on Entrepreneurship: Innovation, Strategy and Financing Luca Bassani Supervisore: prof. Gianmaria Martini Gennaio 2008 Table of Contents Acknowledgements....................................................Error! Bookmark not defined. Chapter 1: Introduction............................................Error! Bookmark not defined. 1.1 Topics developed ........................................Error! Bookmark not defined. 1.2 Aims and scope of the dissertation..............Error! Bookmark not defined. 1.3 Structure of the dissertation.........................Error! Bookmark not defined. Chapter 2: Role of the Entrepreneur in the Economic TheoryError! Bookmark not defined. 2.1 Introduction.................................................Error! Bookmark not defined. 2.2 From the Middle Ages to the Renaissance..Error! Bookmark not defined. 2.3 The Industrial Revolution............................Error! Bookmark not defined. 2.4 From 1870 to the 1970 ................................Error! Bookmark not defined. 2.4.1 Neoclassical Theory .............................Error! Bookmark not defined. 2.4.2 Joseph Alois Schumpeter .....................Error! Bookmark not defined. 2.4.3 Austrian School and Kirzner ................Error! Bookmark not defined. 2.5 From 1970 to our days ................................Error! Bookmark not defined. 2.6 Conclusions.................................................Error! Bookmark not defined. Bibliography........................................................Error!
    [Show full text]
  • Statistical Data Integration in Translational Medicine
    TECHNISCHE UNIVERSITAT¨ MUNCHEN¨ Wissenschaftszentrum Weihenstephan f¨urErn¨ahrung,Landnutzung und Umwelt Statistical data integration in translational medicine Linda Carina Krause Vollständiger Abdruck der von der Fakultät Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt der Technischen Universität München zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften (Dr. rer. nat.) genehmigten Dissertation. Vorsitzende/-r: Prof. Dr. Dietmar Zehn Pr¨ufende/-rder Dissertation: 1. Prof. Dr. Dr. Fabian Theis 2. Prof. Dr. Bernhard K¨uster Die Dissertation wurde am 28.03.2019 bei der Technischen Universit¨atM¨unchen eingereicht und durch die Fakult¨atWissenschaftszentrum Weihenstephan f¨ur Ern¨ahrung,Landnutzung und Umwelt am 27.09.2019 angenommen. Acknowledgment I would like to thank Fabian J. Theis for being my doctoral supervisor and inviting me to come to Munich to work as a doctoral researcher in the rich scientific environment at his institute. I am deeply grateful to my supervisor Nikola S. Mueller for her continuous guidance, her steady support and scientific input. I thank my co-supervisor Stefanie Eyerich for discussing all my results in detail, being open to my ideas and introducing me to the exciting world of immunology. Further, I would like to thank Prof. Dr. Bernhard K¨usterfor being in my thesis committee and Prof. Dr. Dietmar Zehn for being the chair of this committee. I would like to thank all former and current members of the computational cell maps group for great discussion rounds, interesting journal clubs and also for the fun we had together during our events and retreats. A special thanks to all the long-term great room mates I had over the years: beginning with Ivan Kondoversky, Julia S¨ollner and Norbert Krautenbacher, followed by Eudes Barbosa and Karolina Worf and finally Christoph Ogris joined us in room 022.
    [Show full text]
  • Wikipedia Topic: Logistic Function with Selected References
    Selected Articles These are entries printed using the PDF feature of Wikipedia. The title of each article is listed in bookmarks. Excerpts Wikipedia (pronounced /wɪkɨpidi.ə/ WIK-i-PEE-dee-ə) is a multilingual, web- based, free-content encyclopedia project based on an openly editable model. The name "Wikipedia" is a portmanteau of the words wiki (a technology for creating collaborative websites, from the Hawaiian word wiki, meaning "quick") and encyclopedia. Wikipedia's articles provide links to guide the user to related pages with additional information. Wikipedia is written collaboratively by largely anonymous Internet volunteers who write without pay. Anyone with Internet access can write and make changes to Wikipedia articles (except in certain cases where editing is restricted to prevent disruption or vandalism). Users can contribute anonymously, under a pseudonym, or with their real identity, if they choose. People of all ages, cultures and backgrounds can add or edit article prose, references, images and other media here. What is contributed is more important than the expertise or qualifications of the contributor. What will remain depends upon whether it fits within Wikipedia's policies, including being verifiable against a published reliable source, so excluding editors' opinions and beliefs and unreviewed research, and is free of copyright restrictions and contentious material about living people. Contributions cannot damage Wikipedia because the software allows easy reversal of mistakes and many experienced editors are watching to help and ensure that edits are cumulative improvements. Wikipedia is a live collaboration differing from paper-based reference sources in important ways. Unlike printed encyclopedias, Wikipedia is continually created and updated, with articles on historic events appearing within minutes, rather than months or years.
    [Show full text]