Approximate Inference in Graphical Models

Approximate Inference in Graphical Models Philipp Hennig Robinson College dissertation submitted in candidature for the degree of Doctor of Philosophy, University of Cambridge Inference Group Cavendish Laboratory University of Cambridge 14 November 2010 ii Declaration I hereby declare that my dissertation entitled “Approximate Inference in Graphical Models” is not substantially the same as any that I have submitted for a degree or diploma or other qualification at any other University. I further state that no part of my dissertation has already been or is concurrently submitted for any such degree of diploma or other qualification. Except where specifically indicated in the text, this dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. This dissertation does not exceed sixty thousands words in length. Date: 14 November 2010 Signed: Philipp Hennig, Cambridge iii Abstract Probability theory provides a mathematically rigorous yet conceptually flexible calculus of uncertainty, allowing the construction of complex hierarchical models for real-world inference tasks. Unfortunately, exact inference in probabilistic models is often computationally expensive or even intractable. A close inspection in such situations often reveals that computational bottlenecks are confined to cer- tain aspects of the model, which can be circumvented by approximations without having to sacrifice the model’s interesting aspects. The conceptual framework of graphical models provides an elegant means of representing probabilistic models and deriving both exact and approximate inference algorithms in terms of local computations. This makes graphical models an ideal aid in the development of generalizable approximations. This thesis contains a brief introduction to approximate inference in graphical models (Chapter2), followed by three extensive case studies in which approximate inference algorithms are developed for challenging applied inference problems. Chapter3 derives the first probabilistic game tree search algorithm. Chapter4 provides a novel expressive model for inference in psychometric questionnaires. Chapter5 develops a model for the topics of large corpora of text documents, conditional on document metadata, with a focus on computational speed. In each case, graphical models help in two important ways: They first provide important structural insight into the problem; and then suggest practical approximations to the exact probabilistic solution. iv Acknowledgments I would like to thank all members of the Inference Group at the Cavendish Lab- oratory in Cambridge for their support during the preparation of this thesis. In particular, I am grateful to Carl Scheffler, Philip Sterne, Keith Vertanen, Emli- Mari Nel, Tamara Broderick, Oliver Stegle and Christian Steinrücken for engaging discussions and thoughtful comments over the past years, as well as for making this time as much fun as it has been. I feel privileged to have had the chance to work alongside, interact with, and learn from a number of members of the extended Cambridge machine learning community. Among them are Tom Borchert, John Cunningham, Marc Deisenroth, Jürgen van Gael, Zoubin Ghahramani, Katherine Heller, Ferenc Huszár, Gergji Kasneci, David Knowles, Simon Lacoste-Julien, Máté Lengyel, Tom Minka, Shakir Mohamed, Iain Murray, Peter Orbanz, Juaquin Quiñonero-Candela, Carl Ras- mussen, Yunus Saatc˛i,Richard Samworth, Anton Schwaighofer, Martin Szummer, Yee Whye Teh, Ryan Turner, Sinead Williamson and John Winn. I am especially indebted to David Stern and Ralf Herbrich, both of Microsoft Research, for repeatedly and actively offering their time and expertise, often after hours, to help me with final preparations of papers and talks, and for suggesting interesting experiments and extensions. My work was supported through a grant from Microsoft Research Ltd. Besides the financial support, I am also grateful to Microsoft for access to the great researchers and technical possibilities in its Cambridge research laboratories. My deepest gratitude goes to my two supervisors: To David MacKay, for innu- merable enlightening comments and constructive criticism as well as for being a shining example of a scientist’s boldness and freedom. And to Thore Graepel, for relentlessly supporting and challenging me throughout the last three years. This thesis would not have been possible without either of them. Contents 1 Introduction1 1.1 Probability . .1 1.1.1 A Calculus of Uncertainty . .1 1.2 Numerical Complexity Mandates Approximations . .4 1.2.1 The Classic Answer . .4 1.2.2 Provability Is Not Everything . .4 1.2.3 About This Text . .5 2 Graphical Models7 2.1 Factor Graphs . .8 2.1.1 Other Graphical Models . .9 2.1.2 Outlook . 10 2.2 The Sum-Product Algorithm . 10 2.2.1 The Sum Product Algorithm is No Panacea . 14 2.3 Approximate Inference Methods . 15 2.3.1 Exponential Families . 15 2.3.2 Expectation Propagation . 20 2.3.3 Variational Inference . 25 2.3.4 Laplace Approximations . 28 2.3.5 Markov Chain Monte Carlo . 30 2.3.6 Assuming Independence . 31 2.3.7 Comparison of Approximation Schemes . 33 3 Inference on Optimal Play in Games 37 3.1 Introduction . 38 3.2 Methods . 39 3.2.1 Problem Definition . 40 3.2.2 Generative Model . 40 3.2.3 Inference . 43 3.2.4 Algorithm . 47 3.2.5 Replacing a Hard Problem with a Simple Prior . 48 3.2.6 Exploration Policies . 49 vi CONTENTS 3.3 Results . 50 3.3.1 Structure of Go Game Trees . 50 3.3.2 Inference on the Generators . 52 3.3.3 Recursive Inductive Inference on Optimal Values . 53 3.3.4 Errors Introduced by Model Mismatch . 53 3.3.5 Use as a Standalone Tree Search . 54 3.4 Conclusion . 54 3.5 Addendum: Related Subsequent Work . 56 4 Approximate Bayesian Psychometrics 59 4.1 Introduction . 59 4.2 Item Response Theory . 62 4.2.1 Contemporary Item Response Models . 63 4.3 A Generative Model . 64 4.3.1 Expressiveness of the Model . 66 4.3.2 Approximate Inference . 66 4.4 Results . 71 4.4.1 Computational cost . 71 4.4.2 Approximate Bayesian Estimates . 72 4.4.3 Predictive Performance . 72 4.5 Discussion . 76 4.6 Conclusion . 77 5 Fast, Online Inference for Conditional Topic Models 79 5.1 Introduction . 80 5.2 Model . 82 5.2.1 Semi-Collapsed Variational Inference . 84 5.2.2 Laplace Approximation for Dirichlets . 89 5.2.3 Gaussian Regression . 91 5.2.4 Inference from Data Streams . 94 5.2.5 Queries to the Model . 95 5.3 Experiments . 96 5.3.1 Quality of the Laplace Bridge . 96 5.3.2 Experiments on the Twitter Corpus . 98 5.3.3 Learning Sparse Topics . 98 5.3.4 Comparing Conditioned and Unconditioned Models . 99 5.3.5 Single Pass versus Iterative Inference . 102 5.4 Conclusion . 105 6 Conclusions 111 CONTENTS vii A The Maximum of Correlated Gaussian Variables 113 A.1 Introduction . 113 A.2 The Maximum of Two Gaussian Variables . 114 A.2.1 Notation . 114 A.2.2 Some Integrals . 115 A.2.3 Analytic Forms . 117 A.2.4 Moment Matching . 119 A.3 The Maximum of a Finite Set . 124 A.3.1 Analytic Form . 124 A.3.2 A Heuristic Approximation . 125 A.4 Discussion of the Approximation’s Quality . 126 A.5 Conclusion . 127 B Efficient Rank 1 EP Updates 129 B.1 The Step Factor . 132 C A Laplace Map Linking Gaussians and Dirichlets 135 C.1 The Dirichlet in the Softmax Basis . 137 C.1.1 Ensuring Identifiability . 137 C.1.2 Transforming to the Softmax Basis . 138 C.2 The Laplace Map . 139 C.2.1 Mode and Hessian in the Softmax Basis . 140 C.2.2 A Sparse Representation . 141 C.2.3 Inverse Map . 143 C.3 The Two-Dimensional Case . 144 C.4 Summary . 145 viii CONTENTS List of Figures 2.1 Factor graph . .8 2.2 Plates . .8 2.3 Advantages of directed graph notation . .9 2.4 Advantages of factor graph notation . 10 2.5 Derivation of sum-product algorithm . 10 2.6 Messages sent by the sum-product algorithm . 11 2.7 Defect of the Laplace approximation . 29 2.8 Sketch of the explaining away phenomenon . 31 2.9 Factorized regression . 32 3.1 Sketch of game tree . 40 3.2 Sketch of game tree smoothness . 40 3.3 Generative model for game trees . 42 3.4 Inference on optimal values . 45 3.5 Replacing a hard problem with a simple prior . 49 3.6 Empirical distribution of generator values in Go . 50 3.7 Changes to mean values . 51 3.8 Log likelihood of ground truth game values under model . 53 3.9 Inductive model predictions vs ground truth . 54 3.10 Performance in a tree search task . 55 4.1 Marginal frequencies of reply categories . 63 4.2 Directed model for ordinal regression . 65 4.3 Factor graph for the user-item response model . 67 4.4 Ordinal and response factors . 68 4.5 Sketch of test set setup . 71 4.6 Thresholds learned by the approximate inference algorithm . 73 4.7 Sampled threshold values . 74 4.8 Predictions for user responses to items . 75 5.1 Conceptual.

Load more