Estimation in Exponential Families with Unknown Normalizing Constant

ESTIMATION IN EXPONENTIAL FAMILIES WITH UNKNOWN NORMALIZING CONSTANT A DISSERTATION SUBMITTED TO THE DEPARTMENT OF STATISTICS AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Sumit Mukherjee May 2014 © 2014 by Sumit Mukherjee. All Rights Reserved. Re-distributed by Stanford University under license with the author. This work is licensed under a Creative Commons Attribution- Noncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/ This dissertation is online at: http://purl.stanford.edu/cf854qf4476 ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Persi Diaconis, Primary Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Sourav Chatterjee I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Amir Dembo Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost for Graduate Education This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives. iii Abstract Exponential families of probability measures has been a subject of considerable interest in Statistics, both in theoretical and applied areas. One of the problems that frequently arise in such models is that the normalizing constant is not known in closed form. Also numerical computation of the normalizing constant is infeasible because the size of the underlying space is huge. As such, carrying out inferential procedures becomes challenging. In this thesis, the main object of study is some specific examples of exponential families on graphs and permutations, where the normalizing constant is hard to compute. Using large deviation results asymptotic estimates of the normalizing constant is obtained, and methods to estimate the normalizing constant are developed. In the case of graphs, this analysis gives some insight into the phenomenon of \degeneracy" observed in empirical studies in social science literature. In the case of permutations, this analysis is used to show the consistency of pseudo-likelihood. iv Acknowledgments One of the best things that happened to me at Stanford was the chance to interact with Persi. Working with Persi was a thoroughly enjoyable experience, more so because the process of learning never stopped being fun. Persi's approach to any problem fascinates me. He has been a constant source of knowledge, support and encouragement throughout my Ph.d. I had the opportunity to work with Amir during my five years at Stanford, and I learnt a lot from him. His door was always open to students, and it is possible I may have over used the privilege. He was always willing to answer any question that I might have, no matter how easy they might be. I had the chance to interact a lot with Sourav regarding my research, more so when he joined Stanford. He always had useful comments and suggestions regarding my research, and for that I'm grateful to him. I thank Susan and David for helpful comments and suggestions during my Ph.d. proposal. I would also like to thank Guenther for writing me a teaching recommen- dation. During my time at Stanford I have had the privilege to take courses under Amir Dembo, Persi Diaconis, Iain Johnstone, Andrea Montanari, Art Owen, Joe Romano, David Siegmund and Jonathan Taylor. v During my Ph.d I collaborated with A. Basak and B. Bhattacharya, and my thesis benefitted a lot from the resulting interactions. I would like to thank Caroline, Cindy, Ellen, Heather, Helen, Nora, Regina for all their help, and for making it very easy to concentrate on the academic side of things. As far as the environment of the department goes, this is a great department to be in. My batch-mates Anirban, Alex, Max, Matan, Michael and Yong have been extremely helpful and were there for me whenever I needed them. My fellow graduate students have been wonderful to be around, especially Amir, Austen, Bhaswar, Dennis, Gourab, Josh, Kinjal, Murat, Nike, Qingyuan, Rahul, Shilin, Stefan, Will and Zhen. I would like to thank the entire Stanford Statistics community for making the last five years special. The single most important reason for my interest in the areas of Statistics and Probability is the Indian Statistical Institute, where I earned both my under graduate and masters degrees in Statistics. I'm grateful to my professors at ISI for introducing me to this field. I also benefitted a lot from my discussions with my friends at ISI, and I'm thankful to them for all their help and co-operation over the years. None of this would have been possible without my parents, and this place is not enough to mention all the things they have done for me. And last, but by no means least, I would like to thank my wife Kaustari for always staying by my side, and being the wonderful person that she is. vi Contents Abstract iv Acknowledgmentsv 1 Introduction1 2 Tools required8 2.1 Graph Limits...............................8 2.2 Permutation Limits............................ 10 2.3 Understanding permutation limits.................... 14 2.4 Large deviations theory.......................... 16 3 Previous work 22 3.1 Methods.................................. 22 3.2 Theoretical work............................. 25 4 Exponential family on sparse graphs 32 4.1 Introduction................................ 32 4.2 Statement of main results........................ 38 4.3 The large deviation principle....................... 43 4.4 Proofs of main results.......................... 49 4.5 A particular example........................... 53 4.6 Fitting the model............................. 58 4.7 Proof of the large deviation principle.................. 64 vii 5 Exponential family on permutations 74 5.1 Introduction................................ 74 5.2 Statement of main results........................ 77 5.3 The large deviation principle....................... 86 5.4 Proofs of main results.......................... 87 5.5 Approximations for small θ ........................ 97 5.6 A particular example........................... 99 5.7 Analysis of the 1970 draft lottery data................. 104 5.8 Proof of the large deviation principle.................. 108 6 Conclusion 119 References 122 viii Chapter 1 Introduction Exponential families of probability measures has been a topic of considerable research in statistics, both in theoretical and applied areas. The concept of exponential families is credited to J. Pitman, G. Darmois, and B. Koopman in 1935-36. The large literature on exponential families is developed in the book length treatments of Barndorff-Nielson ([2]) and Brown ([48]). More recent literature can be found in Jordan-Wainwright ([65]). This thesis will consider exponential families on finite spaces. The definition of exponential family considered in this thesis is given below: Definition 1.1. Let X be a finite space, and let T : X 7! Rd be a real vector valued statistic on X . Then for any θ 2 Rd one has X 0 eθ T (x) < 1; x2X and so the function Z(:): Rd 7! R defined by h X 0 i Z(θ) := log eθ T (x) x2X is finite for all θ. d θ0T (x)−Z(θ) P For x 2 X ; θ 2 R define pθ(x) = e . Then for every θ the sum pθ(x) x2X 1 CHAPTER 1. INTRODUCTION 2 equals 1, and so pθ(:) is a probability measure on X . This is defined to be an exponential family on X , with natural parameter θ and sufficient statistic T . The function Z(θ) plays a very important role in the analysis of exponential families. It is commonly referred to as the log normalizing constant in statistics literature. One of the most important properties of Z(θ) is the following: @Z(θ) @2Z(θ) E(Ti) = ; Cov(Ti;Tj) = ; @θi @θi@θj where the mean and covariance of T is computed under pθ. Another interesting property pertaining to statistical inference is the following: Suppose one observes X ∼ pθ for some θ, and it is required to estimate θ by the Maximum Likelihood Estimate (MLE), defined by 0 θ^ := arg max eθ T (X)−Z(θ) = arg maxfθ0T (X) − Z(θ)g: θ2Rd θ2Rd It readily follows that in this setting the MLE θ^ML solves the equation T (X) = rZ(θ). Thus for such models the MLE is easy to compute, provided one has some control on Z(θ). Thus the analysis of the model pθ becomes a lot simpler when the quantity Z(θ) is available in closed form, and is readily amenable to algebraic calculations. However, there are a lot of examples of exponential families where the function Z(θ) is not ex- pressible in closed form. Moreover, if the underlying space X has a large cardinality, it is not usually feasible to compute Z(θ) numerically. Without the knowledge of the normalizing constant, Bayesian methods can be difficult to implement an analyze. For example, the usual definition of the conjugate prior for θ involves the unknown normalizing constant. As such, the analysis of such models become challenging. Some examples of spaces where such exponential families arise naturally are given CHAPTER 1. INTRODUCTION 3 below: (a) Graphs Suppose there are n researchers in a field of active research, denoted by fs1; ··· ; sng, and given any pair of researchers (si; sj) it is known whether they collaborate or not. A very convenient way to encode this data is through a graph on n vertices labelled by [n] := f1; 2; ··· ; ng, with node i representing researcher si.

Estimation in Exponential Families with Unknown Normalizing Constant

2 Probability Theory and Classical Statistics

Importance Sampling

Probability Distributions and Related Mathematical Constructs

Lecture 17: Bivariate Normal Distribution F (X, Y) Is Given By

Bridgesampling: an R Package for Estimating Normalizing Constants

Bayesian Inference

9 Importance Sampling 3 9.1 Basic Importance Sampling

Derivations of the Univariate and Multivariate Normal Density

Exact Formulas for the Normalizing Constants of Wishart Distributions for Graphical Models

Orthogonal Polynomials in Stein's Method

Beta Distribution

The Continuous Bernoulli: Fixing a Pervasive Error in Variational