Statistical Computation with Kernels

Statistical Computation with Kernels by Fran¸cois-Xavier Briol Thesis Submitted to the University of Warwick for the degree of Doctor of Philosophy University of Warwick, Department of Statistics September 2018 Contents List of Figures iv Acknowledgments vii Declarations ix Abstract xiii Abbreviations xiv Chapter 1 Challenges for Statistical Computation 1 1.1 Challenge I: Numerical Integration and Sampling . .1 1.1.1 Applications in Bayesian Statistics . .2 1.1.2 Applications in Frequentist Statistics . .4 1.1.3 Existing Methodology . .5 1.1.4 Issues Faced by Existing Methods . 13 1.2 Challenge II: Intractable Models . 17 1.2.1 Intractability in Unnormalised Models . 17 1.2.2 Intractability in Generative Models . 19 1.3 Additional Challenges . 20 1.4 Contributions of the Thesis . 20 Chapter 2 Kernel Methods, Stochastic Processes and Bayesian Non- parametrics 23 2.1 Kernel Methods . 24 2.1.1 Introduction and Characterisations . 24 2.1.2 Properties of Reproducing Kernel Hilbert Spaces . 26 2.1.3 Examples of Kernels and their Associated Spaces . 28 2.1.4 Applications and Related Research . 29 2.2 Stochastic Processes . 30 i 2.2.1 Introduction to Stochastic Processes . 30 2.2.2 Characterisations of Stochastic Processes . 31 2.2.3 Connection Between Kernels and Covariance Functions . 34 2.3 Bayesian Nonparametric Models . 35 2.3.1 Bayesian Models in Infinite Dimensions . 35 2.3.2 Gaussian Processes as Bayesian Models . 37 2.3.3 Practical Issues with Gaussian Processes . 39 Chapter 3 Bayesian Numerical Integration: Foundations 45 3.1 Bayesian Probabilistic Numerical Methods . 45 3.1.1 Numerical Analysis in Statistics and Beyond . 45 3.1.2 Numerical Methods as Bayesian Inference Problems . 47 3.1.3 Recent Developments in Bayesian Numerical Methods . 48 3.2 Bayesian Quadrature . 50 3.2.1 Introduction to Bayesian Quadrature . 50 3.2.2 Quadrature Rules in Reproducing Kernel Hilbert Spaces . 53 3.2.3 Optimality of Bayesian Quadrature Weights . 54 3.2.4 Selection of States . 56 3.3 Theoretical Results for Bayesian Quadrature . 57 3.3.1 Convergence and Contraction Rates . 58 3.3.2 Monte Carlo, Important Sampling and MCMC Point Sets . 60 3.3.3 Quasi-Monte Carlo Point Sets . 63 3.4 Considerations for Practical Implementation . 65 3.4.1 Prior Specification for Integrands . 65 3.4.2 Tractable and Intractable Kernel Means . 66 3.5 Simulation Study . 68 3.5.1 Assessment of Uncertainty Quantification . 69 3.5.2 Validation of Convergence Rates . 73 3.6 Some Applications to Statistics and Engineering . 74 3.6.1 Case Study 1: Large-Scale Model Selection . 75 3.6.2 Case Study 2: Computer Experiments . 81 3.6.3 Case Study 3: High-Dimensional Random Effects . 83 3.6.4 Case Study 4: Computer Graphics . 86 Chapter 4 Bayesian Numerical Integration: Advanced Methods 91 4.1 Bayesian Quadrature for Multiple Related Integrals . 92 4.1.1 Multi-output Bayesian Quadrature . 93 4.1.2 Convergence for Priors with Separable Covariance Functions 96 ii 4.1.3 Numerical Experiments . 98 4.2 Efficient Point Selection Methods I: The Frank-Wolfe Algorithm . 104 4.2.1 Frank-Wolfe Bayesian Quadrature . 105 4.2.2 Consistency and Contraction in Finite-Dimensional Spaces . 108 4.2.3 Numerical Experiments . 110 4.3 Efficient Point Selection Methods II: A sequential Monte Carlo sampler113 4.3.1 Limitations of Bayesian Importance Sampling . 114 4.3.2 Robustness of Bayesian Quadrature to the Choice of Kernel . 117 4.3.3 Sequential Monte Carlo Bayesian Quadrature . 118 4.3.4 Numerical Experiments . 123 Chapter 5 Statistical Inference and Computation with Intractable Models 129 5.1 Stein's Method and Reproducing Kernels . 130 5.1.1 Distances on Probability Measures . 130 5.1.2 Kernel Stein Discrepancies . 132 5.1.3 Stein Reproducing Kernels for Approximating Measures . 137 5.1.4 Stein Reproducing Kernels for Numerical Integration . 141 5.2 Kernel-based Estimators for Intractable Models . 145 5.2.1 Minimum Distance Estimators . 145 5.2.2 Estimators for Unnormalised Models . 149 5.2.3 Estimators for Generative Models . 155 5.2.4 Practical Considerations . 157 Chapter 6 Discussion 167 6.1 Contributions of the Thesis . 167 6.2 Remaining Challenges . 169 Bibliography 171 Appendix A Background Material 197 A.1 Topology and Functional Analysis . 197 A.2 Measure and Probability Theory . 201 Appendix B Proofs of Theoretical Results 205 B.1 Proofs of Chapter 3 . 205 B.2 Proofs of Chapter 4 . 210 B.3 Proofs of Chapter 5 . 216 iii List of Figures 1.1 Monte Carlo, importance sampling, Markov chain Monte Carlo and quasi-Monte Carlo (Halton sequence) point sets. .7 1.2 Closed Newton-Coates, open Newton Coates and Gauss-Legendre point sets. 12 2.1 Sketch of a Gaussian process prior and posterior. 38 2.2 Importance of model selection for Gaussian processes. 40 2.3 Ill-conditioning of the Gram matrix in Gaussian process regression. 43 3.1 Sketch of Bayesian quadrature. 52 3.2 Test functions for evaluation of the uncertainty quantification pro- vided by Bayesian Monte Carlo and Bayesian quasi-Monte Carlo. 69 3.3 Coverage of Bayesian Monte Carlo (with marginalisation) on the test functions . 71 3.4 Coverage of Bayesian Monte Carlo (without marginalisation) on the test functions. 72 3.5 Convergence rates for Bayesian Monte Carlo and Bayesian quasi- Monte Carlo. 74 3.6 Posterior integrals obtained through Bayesian quadrature for thermodynamic integration in model selection . 79 3.7 Calibration of Bayesian quadrature for thermodynamic integration in model selection . 80 3.8 The Teal South oil field. 81 3.9 Bayesian Markov chain Monte Carlo estimates of posterior means on the parameter of the Teal South oil field model (centered around the exact values) . 82 3.10 Bayesian quasi-Monte Carlo for semi-parametric random effects regression. 86 3.11 Global illumination integrals in computer graphics. 87 iv 3.12 Bayesian quadrature estimates of the red, green and blue colour in- tensities for the California lake environment. 88 3.13 Worst-case error of Bayesian Monte Carlo and Bayesian quasi-Monte Carlo for global illumination integrals. 89 4.1 Test functions and Gaussian process interpolants in multi-fidelity modelling. 101 4.2 Uni-output and multi-output Bayesian quadrature estimates for multi- fidelity modelling. 102 4.3 Uni-output and multi-output Bayesian quadrature estimates in the global illumination problem. 103 4.4 Frank-Wolfe algorithm on test functions. 110 4.5 Quantifying numerical error in a model selection problem using Frank- Wolfe Bayesian Quadrature. 111 4.6 Comparison of experimental design-based quadrature rules on the proteomics application. 113 4.7 Influence of the importance distribution in Bayesian importance sampling. 115 4.8 Sensitivity of Bayesian importance sampling to the choice of both the covariance function and importance distribution. 116 4.9 Lack of robustness of experimental-design based quadrature rules. 118 4.10 Implementation of the stopping criterion for sequential Monte Carlo Bayesian quadrature. 120 4.11 Performance of Sequential Monte Carlo Bayesian quadrature on the running illustration. 123 4.12 Histograms for the optimal (inverse) temperature parameter on the illustrative example. 123 4.13 Sensitivity of sequential Monte Carlo Bayesian quadrature to the choice of initial distribution and to the random number generator. 125 4.14 Performance of sequential Monte Carlo Bayesian quadrature for syn- thetic problems of increasing complexity. 125 4.15 Performance of Sequential Monte Carlo Bayesian quadrature on the running illustration in increasing dimensions. 126 4.16 Performance of sequential Monte Carlo Bayesian quadrature for an inverse problem based on an ordinary differential equation. 127 5.1 Stein greedy points for the Rosenbrock density. 140 v 5.2 Descent trajectories of gradient descent and natural gradient descent algorithms on a one-dimensional Gaussian model for estimators based on the KL, SM and KSD divergences. 162 5.3 Performance of natural gradient descent algorithms on a 20-dimensional Gaussian model for estimators based on the KL, SM and KSD divergences . 163 5.4 Maximum mean discrepancy estimator based on a Gaussian RBF kernel for a Gaussian location model. 164 5.5 Maximum mean discrepancy estimator based on a Gaussian RBF kernel for a Gaussian scale model. 165 vi Acknowledgments I am first and foremost grateful to my advisor, Mark Girolami, for giving me op- portunities which are usually reserved to researchers in later stages of their careers. He introduced me to exciting areas of research across the fields of statistics, applied mathematics and machine learning, and to researchers doing interesting work in these areas. Mark also gave me the opportunity to attend conferences which greatly enriched my PhD experience and broadened my overview of statistics research. Fi- nally, he has always been a great mentor, and regularly taken the time to make sure my PhD was running smoothly. Of course, most of my work would not have been possible without the sup- port and guidance of Chris Oates, who I consider as a great mentor and unofficial second advisor. Chris has been very patient in answering many of my mathematical questions, and taught me how to structure my research in an effective way. I would also like to express my gratitude to many of the faculty members, students and visitors at the various institutions I have attended during my PhD, including the University of Oxford, University of Warwick, Imperial College London and The Alan Turing Institute. I am in particular grateful to all of the lecturers and students in the first year of the Oxford-Warwick Statistics programme which I have greatly enjoyed. I am also thankful to Michael Osborne and Dino Sejdinovic who have initiated me into the world of machine learning, Andrew Duncan and Alessandro Barp who have greatly widened my understanding of mathematics, and Jon Cockayne, Louis Ellam and Thibaut Lienart who have been of great help in answering many of my questions about statistics, machine learning and programming.

Load more