Continuous-Time Models of Arrival Times and Optimization Methods for Variable Selection

Continuous-Time Models of Arrival Times and Optimization Methods for Variable Selection by Michael Lindon Department of Statistical Science Duke University Date: Approved: Surya T. Tokdar, Supervisor Merlise A. Clyde Mike West Jennifer M. Groh Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Statistical Science in the Graduate School of Duke University 2018 Abstract Continuous-Time Models of Arrival Times and Optimization Methods for Variable Selection by Michael Lindon Department of Statistical Science Duke University Date: Approved: Surya T. Tokdar, Supervisor Merlise A. Clyde Mike West Jennifer M. Groh An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Statistical Science in the Graduate School of Duke University 2018 Copyright c 2018 by Michael Lindon All rights reserved except the rights granted by the Creative Commons Attribution-Noncommercial Licence Abstract This thesis naturally divides itself into two sections. The first two chapters concern the development of Bayesian semi-parametric models for arrival times. Chapter 2 considers Bayesian inference for a Gaussian process modulated temporal inhomogeneous Poisson point process, made challenging by an intractable likelihood. The intractable likelihood is circumvented by two novel data augmentation strategies which result in Gaussian measurements of the Gaussian process, connecting the model with a larger literature on modelling time-dependent functions from Bayesian non-parametric regression to time series. A scalable state-space representation of the Matern Gaussian process in 1 dimension is used to provide access to linear time filtering algorithms for performing inference. An MCMC algorithm based on Gibbs sampling with slice-sampling steps is provided and illustrated on simulated and real datasets. The MCMC algorithm exhibits excellent mixing and scalability. Chapter 3 builds on the previous model to detect specific signals in temporal point patterns arising in neuroscience. The firing of a neuron over time in response to an external stimulus generates a temporal point pattern or \spike train". Of special interest is how neurons encode information from dual simultaneous external stimuli. Among many hypotheses is the presence multiplexing - interleaving periods of firing as it would for each individual stimulus in isolation. Statistical models are developed to quantify evidence for a variety of experimental hypotheses. Each experimental hypothesis translates to a particular form of intensity function for the dual stimuli iv trials. The dual stimuli intensity is modelled as a dynamic superposition of single stimulus intensities, defined by a time-dependent weight function that is modelled non-parametrically as a transformed Gaussian process. Experiments on simulated data demonstrate that the model is able to learn the weight function very well, but other model parameters which have meaningful physical interpretations less well. Chapters 4 and 5 concern mathematical optimization and theoretical properties of Bayesian models for variable selection. Such optimizations are challenging due to non-convexity, non-smoothness and discontinuity of the objective. Chapter 4 presents advances in continuous optimization algorithms based on relating mathematical and statistical approaches defined in connection with several iterative algorithms for penalized linear regression. I demonstrate the equivalence of parameter mappings using EM under several data augmentation strategies - location-mixture representations, orthogonal data augmentation and LQ design matrix decompositions. I show that these model-based approaches are equivalent to algorithmic deriva- tion via proximal gradient methods. This provides new perspectives on model-based and algorithmic approaches, connects across several research themes in optimization and statistics, and provides access, beyond EM, to relevant theory from the proximal gradient and convex analysis literatures. Chapter 5 presents a modern and technologically up-to-date approach to discrete optimization for variable selection models through their formulation as mixed integer programming models. Mixed integer quadratic and quadratically constrained programs are developed for the point-mass-Laplace and g-prior. Combined with warm- starts and optimality-based bounds tightening procedures provided by the heuristics of the previous chapter, the MIQP model developed for the point-mass-Laplace prior converges to global optimality in a matter of seconds for moderately sized real datasets. The obtained estimator is demonstrated to possess superior predictive performance over that obtained by cross-validated lasso in a number of real datasets. v The MIQCP model for the g-prior struggles to match the performance of the former and highlights the fact that the performance of the mixed integer solver depends critically on the ability of the prior to rapidly concentrate posterior mass on good models. vi To my family and friends vii Contents Abstract iv List of Figures xii List of Abbreviations and Symbols xiv Acknowledgements xvi 1 Introduction1 2 A Continuous-Time Model of Arrival Times7 2.1 The Inhomogeneous Poisson Point Process...............7 2.1.1 The Likelihood Function.....................7 2.1.2 Thinning and Superposition...................8 2.2 The Gaussian Process Modulated Inhomogeneous Poisson Point Process 10 2.2.1 Gaussian Process Intensity Functions.............. 10 2.2.2 Data Augmentation Strategies.................. 11 2.2.3 State-Space Representation of Gaussian Processes....... 16 2.2.4 Forward Filtering Backward Sampling............. 19 2.3 Full Model Specification......................... 23 2.4 Gibbs Sampling.............................. 24 2.5 Illustrations................................ 27 2.5.1 Simulated Data.......................... 27 2.5.2 Coal Mining Disasters...................... 27 viii 2.6 Discussion................................. 28 2.7 A Novel Implementation of Gaussian Processes............. 30 2.7.1 Motivation............................. 30 2.7.2 Conditional Distributions Under State-Space Construction.. 32 2.7.3 The Persistent AVL-Tree..................... 34 2.7.4 Example Implementation..................... 35 3 Detecting Multiplexing in Neuronal Spike Trains 38 3.1 Statistical Model for Multiplexing.................... 40 3.2 Data Augmentation............................ 42 3.3 Complete Model Description....................... 44 3.4 The Euler-Maruyama Approximation.................. 46 3.4.1 Forward Filtering......................... 47 3.4.2 Backwards Sampling....................... 48 3.4.3 Stability.............................. 48 3.5 Gibbs Sampling.............................. 49 3.5.1 Steps for Single Trial Specific Parameters............ 49 3.5.2 Steps for Augmented Realizations................ 50 3.5.3 Steps for Dual Trial Specific Parameters............ 51 3.6 Model Performance on Simulated Data................. 53 3.7 A Simplified Model............................ 56 3.8 Cell YKIC092311 1 609Hz 24˝ 742Hz ´6˝ ............... 58 4 Continuous Optimization for Variable Selection 61 4.1 Spike and Slab Priors........................... 61 4.2 Theory Based on Compatibility Condition............... 62 4.3 Theory Based on ηŃull Consistency Condition............ 70 ix 4.4 Continuous Optimization Methods................... 73 4.5 Introduction................................ 73 4.6 Preliminaries............................... 76 4.6.1 MM algorithm........................... 76 4.6.2 EM algorithm........................... 77 4.6.3 Lipschitz-Gradient Functions.................. 78 4.7 Algorithm Comparisons......................... 79 4.7.1 Proximal Gradient Method.................... 79 4.7.2 Orthogonal Data Augmentation................. 80 4.7.3 Location Mixtures of Spherical Normals............ 82 4.7.4 LQ-decompositions of the Design Matrix............ 84 4.8 Comparative Discussion......................... 85 4.9 Experimentation for Non-Convex Problems............... 86 5 Discrete Optimization for Variable Selection 91 5.1 Discrete Optimization.......................... 91 5.2 Branch and Bound............................ 94 5.3 Mixed Integer Programming....................... 99 5.3.1 Introduction to Solving Mixed Integer Programs........ 100 5.4 A MIQP for Best Subset Selection.................... 103 5.4.1 Big-M Transformation...................... 104 5.4.2 Monotonicity vs Continuous Relaxation............. 105 5.5 Optimality Based Bounds Tightening.................. 106 5.6 Mixed Integer Models for Point Mass-Laplace Mixture Priors..... 107 5.6.1 Example: Diabetes Dataset................... 109 5.6.2 Example: Ozone Dataset..................... 114 x 5.7 Mixed Integer Models for Zellner's G-Prior............... 115 5.7.1 Parameter Expansion....................... 117 5.7.2 Piecewise Linear Approximations................ 121 5.7.3 Model Space Priors........................ 123 5.7.4 Specially Ordered Sets of Type I/II............... 124 5.7.5 Heuristics............................. 126 5.7.6 Bound Tightening......................... 127 5.7.7 Example: Diabetes Dataset................... 128 6 Concluding Remarks 133 A Figures 139 Bibliography 144 Biography 152 xi List of Figures 2.1 Thinning a homogeneous Poisson point process............9 2.2 Posterior summary of intensity function for simulated dataset I with true intensity λptq “ 400σp´1

Load more