Gen: a High-Level Programming Platform for Probabilistic Inference

Gen: A High-Level Programming Platform for Probabilistic Inference by Marco Francis Cusumano-Towner B.S., University of California, Berkeley (2011) M.S., Stanford University (2013) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science and Engineering at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2020 ○c Massachusetts Institute of Technology 2020. All rights reserved. Author............................................................................ Department of Electrical Engineering and Computer Science August 28, 2020 Certified by . Vikash K. Mansinghka Principal Research Scientist Thesis Supervisor Accepted by....................................................................... Leslie A. Kolodziejski Professor of Electrical Engineering and Computer Science Chair, Department Committee on Graduate Students 2 Gen: A High-Level Programming Platform for Probabilistic Inference by Marco Francis Cusumano-Towner Submitted to the Department of Electrical Engineering and Computer Science on August 28, 2020, in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science and Engineering Abstract Probabilistic inference provides a powerful theoretical framework for engineering intelligent systems. However, diverse modeling approaches and inference algorithms are needed to nav- igate engineering tradeoffs between robustness, adaptability, accuracy, safety, interpretability, data efficiency, and computational efficiency. Structured generative models represented as symbolic programs provide interpretability. Structure learning of these models provides data-efficient adaptability. Uncertainty quantification is needed for safety. Bottom-up, discriminative inference provides computational efficiency. Iterative “model-in-the-loop” algorithms can improve accuracy by fine-tuning inferences and improve robustness to out- of-distribution data. Recent probabilistic programming systems fully or partially automate inference, but are too restrictive for many applications. Differentiable programming systems are also inadequate: they do not support structure learning of generative models or hybrids of “model-in-the-loop” and discriminative inference. Therefore, probabilistic inference is still often implemented by translating tedious mathematical derivations into low-level numerical programs, which are error-prone and difficult to modify and maintain. This thesis presents the design and implementation of the Gen programming platform for probabilistic inference. Gen automates the low-level implementation of probabilistic inference algorithms while remaining flexible enough to support heterogeneous algorithmic approaches and extensible enough for practical inference engineering. Gen users define their models explicitly using probabilistic programs, but instead of compiling the model directly into an inference algorithm implementation, Gen compiles the model into data types that encapsulate low-level inference operations whose semantics are derived from the model, like sampling, density evaluation, and gradients. Users write their inference application in a general-purpose programming language using Gen’s abstract data types as primitives. This thesis defines Gen’s data types and shows that they can be used to compose a variety of inference techniques including sophisticated Monte Carlo algorithms and hybrids of Monte Carlo, variational, and discriminative techniques. The same data types can be generated from multiple probabilistic programming languages that strike different expressiveness and performance tradeoffs. By decoupling probabilistic programming language implementations from inference algorithm design, Gen enables more flexible specialization of both, leading to performance improvements over existing probabilistic programming systems. Thesis Supervisor: Vikash K. Mansinghka Title: Principal Research Scientist 3 4 Acknowledgments My primary thesis advisor, Vikash Mansinghka, planted the seeds for the research described in this thesis and created an environment in which this work was possible. In the beginning of my PhD, he directed me to pull at research threads that seemed to never end, resulting in the most intellectually rewarding phase of my life. The research problem and the approach of this thesis build directly on his earlier work, and are a product of a new probabilistic programming research paradigm that he has persistently worked to shape. I am also grateful for his example of independent thinking and entrepreneurial attitude, his conscientiousness as an advisor, and his efforts to support his students in non-technical challenges. Thepast five years would have been much harder without knowing that I could count onhim. Several other advisors and mentors have played roles in my graduate school journey. Josh Tenenbaum has inspired me for years with his insightful work; and led me into probabilistic programming. I am grateful for his approachable demeanor, encouragment, and continuing inspiration over the past five years. I am also grateful to Martin Rinard and Michael Carbin for being on my thesis committee and helping me to communicate this work more effectively. Martin Rinard gave valuable advice on the core terminology, framing, and exposition in this document. Martin Vechev gave helpful mentorship and advice during our collaboration, which built the foundation for the ‘trace translator’ construct in this thesis. I also thank Sam Gershman for his advice and mentorship and support as I started my graduate career at MIT, Sivan Bercovici for supporting my transition back into graduate school, and Pieter Abbeel for providing my first exposure to computer science research. The first few years of graduate school would not have been the same without the friend- ship and support of my cohort of fellow PhD students and researchers, including Feras Saad, Alex Lew, and Ulrich Schaechtle. Feras has been a good friend and ally during the ups and downs of the past five years. Alex has been a constant source of encouragement, and late night white board discussions with him helped to clarify many of the ideas in this thesis. Feras and Alex were crucial in helping to push the Gen PLDI paper over the finish line, and in various other efforts. The research in this thesis was also aided by discussions with many other current and former affiliates of the MIT Probabilistic Computing Project, including Alexey Radul and Anthony Lu, and with support from Amanda Brower, Rachel Paiste, and many others. Several people contributed to the applications of Gen described in this thesis. Ben Zinberg, Austin Garrett, and Javier Felip Leon contributed to the scene graph inference application; Ulrich Schaechtle and Feras Saad devised the algorithm used in the Gaussian process structure learning application. Since Gen was released, many people including Alex Lew, Ben Zinberg, Tan Zhi-Xuan, George Matheos, and Sam Witty have helped to improve it and their contributions and use of Gen have been incredibly encouraging. Alex contributed syntax improvements that are reflected in this thesis. This thesis would not have been possible without the support of my parents Maria Cusumano and Mark Towner who consistently put me before themselves and did everything in their power help me succeed. Finally, I would not have embarked on this PhD journey without Lisa Bashkirova, who has been a constant source of good advice for the past decade. 5 6 Contents 1 Introduction 15 1.1 A new approach to implementing probabilistic inference . 16 1.2 Overview of programming languages concepts in Gen . 21 1.2.1 Generative probabilistic models and probabilistic inference . 22 1.2.2 Using probabilistic programming languages to express generative probabilistic models . 24 1.2.3 Abstract data types for generative functions and traces . 28 1.2.4 Generating implementations of the abstract data types from the source code of probabilistic programs . 30 1.2.5 Approximate probabilistic inference algorithms . 35 1.2.6 Implementing inference algorithms with abstract data types . 38 2 Abstract Data Types for Inference: Generative Functions and Traces 45 2.1 An abstract formal representation for generative models . 46 2.1.1 Random choices, addresses, and choice dictionaries . 46 2.1.2 Probability distributions on choice dictionaries . 48 2.1.3 Marginal likelihood, conditioning, and expectation . 50 2.1.4 Generalizing beyond discrete random choices . 52 2.1.5 Generative functions . 56 2.2 Languages for defining generative functions . 59 2.2.1 Gen Dynamic Modeling Language . 60 2.2.2 Formal semantics of a toy modeling language . 64 2.3 Abstract data types for probabilistic inference . 67 2.3.1 Generative function and trace ADTs . 67 2.3.2 Implementing the ADT operations compositionally . 73 2.4 Related work . 74 3 Implementing Inference Using Generative Functions and Traces 75 3.1 Simple Monte Carlo with traces . 76 3.2 Importance sampling with traces . 78 3.2.1 Regular importance sampling . 79 3.2.2 Self-normalized importance sampling . 81 3.3 Training proposal distributions on simulated data . 86 7 3.4 Markov chain Monte Carlo with traces . 89 3.4.1 MCMC with the trace abstract data type . 90 3.4.2 Metropolis-Hastings using generative functions as proposals . 91 3.4.3 Hamiltonian Monte Carlo with traces . 96 3.4.4 A language for composing MCMC kernels . 98 3.5 Resample-move particle filtering with traces . 102 3.5.1 Trace-based

Gen: a High-Level Programming Platform for Probabilistic Inference

Boundary Detection by Constrained Optimization ‘ DONALD GEMAN, MEMBER, IEEE, STUART GEMAN, MEMBER, IEEE, CHRISTINE GRAFFIGNE, and PING DONG, MEMBER, IEEE

Overview-Of-Statistical-Analytics-And

Annual Report of the Center for Statistical Research and Methodology Research and Methodology Directorate Fiscal Year 2017

Fibrinogen Levels Are Associated with Lymph Node Involvement And

Towards a Fully Automated Extraction and Interpretation of Tabular Data Using Machine Learning

An Example of Statistical Data Analysis Using the R Environment for Statistical Computing

Using SPSS to Analyze Complex Survey Data: a Primer

Real-Time Process Monitoring and Statistical Process Control for an Automated Casting Facility

Statistics with Free and Open-Source Software

STAT 3304/5304 Introduction to Statistical Computing

Some Statistical Models for Prediction Jonathan Auerbach Submitted In

Kavita Ramanan; Nicolai Society for Industrial and Applied Meinshausen Mathematics (SIAM) in 2010