Topics in Statistics and Convex Geometry: Rounding, Sampling, and Interpolation
Total Page:16
File Type:pdf, Size:1020Kb
©Copyright 2018 Adam M. Gustafson Topics in Statistics and Convex Geometry: Rounding, Sampling, and Interpolation Adam M. Gustafson A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2018 Reading Committee: Hariharan Narayanan, Chair Michael D. Perlman Sébastien Bubeck Program Authorized to Offer Degree: Department of Statistics University of Washington Abstract Topics in Statistics and Convex Geometry: Rounding, Sampling, and Interpolation Adam M. Gustafson Chair of the Supervisory Committee: Title of Chair Hariharan Narayanan Department of Statistics We consider a few aspects of the interplay between convex geometry and statistics. We consider three problems of interest: how to bring a convex body specified by a self-concordant barrier into a suitably “rounded” position using an affine-invariant random walk; how to design a rapidly-mixing affine-invariant random walk with maximal volume ellipsoids; and how to perform interpolation in multiple dimensions given noisy observations of the original function on a finite set. We begin with an overview of some background information on convex bodies which recur in this dissertation, discussing polytopes, the Dikin ellipsoid, and John’s ellipsoid in particular. We also discuss cutting plane methods and Vaidya’s algorithm, which we employ in subsequent analysis. We then review Markov chains on general state spaces to motivate designing rapidly mixing geometric random walks, and the means by which the mixing time may be analyzed. Using these results regarding convex bodies and general state space Markov chains, along with recently developed concentration inequalities on general state space Markov chains, we employ an affine-invariant random walk using Dikin ellipsoids to provably bring aconvex body specified by a self-concordant barrier into an approximately “rounded” position. We also design a random walk using John’s ellipsoids and derive its mixing time. Departing somewhat from these themes, we also discuss regression in multiple dimensions over classes of continuously differentiable functions. We provide a cutting plane algorithm for the case of first-order differentiable functions with Lipschitz gradients. We additionally consider the more general case of higher-order continuously differentiable functions, proving Whitney’s extension theorem for this case, and outlining a quadratic program to perform regression for this setting. TABLE OF CONTENTS Page List of Figures ....................................... iii Chapter 1: Introduction ................................ 1 Chapter 2: Some Properties of Convex Geometry and Convex Optimization .... 4 2.1 Convex Sets and Convex Functions ....................... 4 2.2 Dikin Ellipsoids .................................. 10 2.3 John’s Maximal Volume Ellipsoid Theorem ................... 13 2.4 Cutting Plane Methods .............................. 16 Chapter 3: Markov Chains on General State Spaces ................. 22 3.1 Definition ..................................... 22 3.2 Stationarity, Reversibility, and Laziness ..................... 23 3.3 The Metropolis Filter ............................... 25 3.4 Convergence and Conductance .......................... 26 3.5 Conditional Expectation as a Bounded Linear Operator ............ 29 3.6 Ergodicity, the Spectral Gap, and Mixing Times ................ 31 Chapter 4: Rounding a Convex Body Specified by a Self-Concordant Barrier ... 34 4.1 Introduction .................................... 34 4.2 Isotropic Position ................................. 34 4.3 Sampling with the Dikin Walk .......................... 37 4.4 Moment Bounds for Log-Concave Densities in Isotropic Position ....... 39 4.5 Rounding the Body ................................ 40 4.6 Simulation ..................................... 47 4.7 Discussion ..................................... 56 i Chapter 5: An Affine-Invariant Random Walk Using John’s Maximal Volume Ellipsoid 57 5.1 Introduction .................................... 57 5.2 John’s Walk .................................... 59 5.3 Analysis of Mixing Time ............................. 62 5.4 Approximate John’s Ellipsoids .......................... 77 5.5 Discussion ..................................... 82 Chapter 6: Regression for Differentiable Functions with Lipschitz Gradients in Mul- tiple Dimensions .............................. 84 6.1 Introduction .................................... 84 6.2 Regression ..................................... 90 6.3 Simulation ..................................... 102 6.4 Discussion ..................................... 106 Chapter 7: Regression for Higher-Order Differentiable Functions in Multiple Di- mensions .................................. 107 7.1 The Function and Jet Interpolation Problems ................. 107 7.2 Whitney’s Extension Theorem .......................... 109 7.3 Function Interpolation and Regression ..................... 117 Bibliography ........................................ 120 Appendix A: Proof of John’s Maximal Volume Ellipsoid Theorem .......... 126 A.1 Polars of Sets ................................... 126 A.2 Proof of John’s Theorem ............................. 129 Appendix B: The Spectrum of Bounded Linear Operators ............... 133 ii LIST OF FIGURES Figure Number Page 4.1 Mean errors for n = 10. ............................. 50 4.2 Covariance errors for n = 10. .......................... 51 4.3 Mean errors for n = 20. ............................. 52 4.4 Covariance errors for n = 20. .......................... 53 4.5 Mean errors for n = 40. ............................. 54 4.6 Covariance errors for n = 40. .......................... 55 6.1 Generalization error as a function of jEj = n .................. 104 6.2 Interpolants fb in the noisy and noiseless scenarios ............... 105 7.1 CZ Decompositions for d = 2 and n = 2; 4; 8; 16. ................ 113 iii ACKNOWLEDGMENTS The author wishes to express his sincere appreciation for the patience and guidance of Hariharan Narayanan, his research advisor, and Michael D. Perlman, his academic advisor. He would also like to thank Jon Wakefield for his wisdom and advice. iv DEDICATION To my father, David, in loving memory, for pushing me to achieve. To my mother, Deborah, for her unending love and support. To my sister, Danielle, for her compassion. To my step-father, Stephen, for his encouragement. To my friend, Kumi, for always lifting my spirits. v 1 Chapter 1 INTRODUCTION This dissertation considers a few aspects of the interplay between convex geometry and statistics. We consider three problems of interest: how to bring a convex body specified by a self-concordant barrier into a suitably “rounded” position using an affine-invariant random walk; how to design a rapidly-mixing affine-invariant random walk with maximal volume ellipsoids; and how to perform interpolation in multiple dimensions given noisy observations of the original function on a finite set. Chapter 2 introduces background information on convex bodies that recur in this disser- tation. We give a basic overview some convex bodies, such as polytopes and ellipsoids. We discuss a few manners in which convex bodies may be specified, and we also discuss cutting- plane methods for minimizing convex functions using one of these specifications. We finally discuss two types of ellipsoids in detail, namely the Dikin ellipsoid and John’s ellipsoid, and the means by which they provide a suitable approximation for certain convex bodies. Similarly, Chapter 3 introduces background information on Markov chains which recur in this dissertation. Geometric random walks provide the means by which approximately uniform random samples from such convex bodies may be taken, a necessary step in bringing a convex body into a rounded position. This chapter discusses how to construct a random walk with a specified stationary distribution on a convex body, and basic tools for analyzing the mixing time of such random walks. Chapter 4 provides an algorithm to bring a convex body into a suitably rounded position via an affine transformation. The algorithm makes use of samples drawn from anaffine- invariant random walk known as the Dikin walk, which uses samples drawn from Gaussian proposal distributions with covariances specified by scaled Dikin ellipsoids. The proof that 2 this algorithm works rely on recent concentration inequalities for general state space Markov chains. Chapter 5 proposes another affine-invariant random walk known as John’s walk, which utilizes the uniform distribution on scaled John’s ellipsoids to draw samples. The mixing time of this random walk is derived, and an algorithm to provide suitably approximate John’s ellipsoids using cutting-plane methods are Departing somewhat from the theme of the previous chapters, we discuss interpolation in multiple dimensions for the class of first-order differentiable functions with Lipschitz gra- dients in Chapter 6. We discuss noiseless and noisy variants of the problem, the latter which refer to as the regression problem. A cutting-plane method for solving this regression problem is discussed. Sample complexity arguments yielding the choice of the complexity parameter for the regression problem are given. Empirical results of the methods of this chapter are provided. Finally, in Chapter 7 we conclude with describing an algorithm to extend the methods of Chapter 6 to the class of functions whose m-th order derivatives are continuous. We give a Calderón-Zygmund decomposition