Empirical Bayes posterior concentration in sparse high-dimensional linear models Ryan Martin∗ Raymond Mess Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago (rgmartin, rmess1)@uic.edu Stephen G. Walker Department of Mathematics University of Texas at Austin
[email protected] December 6, 2018 Abstract We propose a new empirical Bayes approach for inference in the p ≫ n normal linear model. The novelty is the use of data in the prior in two ways, for centering and regularization. Under suitable sparsity assumptions, we establish a variety of concentration rate results for the empirical Bayes posterior distribution, relevant for both estimation and model selection. Computation is straightforward and fast, and simulation results demonstrate the strong finite-sample performance of the empirical Bayes model selection procedure. Keywords and phrases: Data-dependent prior; fractional likelihood; minimax rate; regression; variable selection. arXiv:1406.7718v5 [math.ST] 5 Dec 2018 1 Introduction In this paper, we consider the Gaussian linear regression model, given by Y = Xβ + ε, (1) where Y is a n × 1 vector of response variables, X is a n × p matrix of predictor variables, β is a p × 1 vector of slope coefficients, and ε is a n × 1 vector of iid N(0, σ2) random errors. Recently, there has been considerable interest in the high-dimensional case, where p ≫ n, driven primarily by challenging applications. Indeed, in genetic studies, where ∗As of August 2016, RM is affiliated with North Carolina State University,
[email protected]. 1 the response variable corresponds to a particular observable trait, the number of subjects, n, may be of order 103, while the number of genetic features, p, in consideration can be of order 105.