Calibrated Bayes Factors for Model Selection and Model Averaging
Total Page:16
File Type:pdf, Size:1020Kb
Calibrated Bayes factors for model selection and model averaging Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Pingbo Lu, B.S. Graduate Program in Statistics The Ohio State University 2012 Dissertation Committee: Dr. Steven N. MacEachern, Co-Advisor Dr. Xinyi Xu, Co-Advisor Dr. Christopher M. Hans ⃝c Copyright by Pingbo Lu 2012 Abstract The discussion of statistical model selection and averaging has a long history and is still attractive nowadays. Arguably, the Bayesian ideal of model comparison based on Bayes factors successfully overcomes the difficulties and drawbacks for which frequen- tist hypothesis testing has been criticized. By putting prior distributions on model- specific parameters, Bayesian models will be completed hierarchically and thus many statistical summaries, such as posterior model probabilities, model-specific posteri- or distributions and model-average posterior distributions can be derived naturally. These summaries enable us to compare inference summaries such as the predictive distributions. Importantly, a Bayes factor between two models can be greatly affected by the prior distributions on the model parameters. When prior information is weak, very dispersed proper prior distributions are often used. This is known to create a problem for the Bayes factor when competing models that differ in dimension, and it is of even greater concern when one of the models is of infinite dimension. Therefore, we propose an innovative criterion called the calibrated Bayes factor, which uses training samples to calibrate the prior distributions so that they achieve a reasonable level of "information". The calibrated Bayes factor is then computed as the Bayes factor over the remaining data. The level of "information" is tied to the concentration of training-updated prior distributions, which is carefully evaluated by monitoring ii the distribution of symmetrized Kullback-Leibler divergence between two likelihood functions drawn independently from the training-updated prior distribution. Monte Carlo Markov chain algorithms are widely used in this research to generate parameter draws from training-updated prior distributions. Subsampling is applied to reduce dependence among parameter draws. In this thesis, we mainly focus on comparisons of one-sample (i:i:d) models and the variable selection problem under regression settings with a variety of model specific prior distributions. We illustrate through simulation studies that the calibrated Bayes factor yields robust and reliable model preferences under various true models. We further demonstrate use of the calibrated Bayes factor on obesity data from the Ohio Family Health Survey (one-sample model) and the ozone data originally studied in Breiman and Friedman (1985) (variable selection problem). The calibrated Bayes factor is applicable to a large variety of model comparison problems because it makes no assumption on model forms (parametric or nonparametric) and can be used for both proper and improper priors. iii This is dedicated to my family, specially my grandmother, my parents, my wife Zhijia, and those ones I love. iv Acknowledgments First, I would like to express my gratitude to my co-advisors, Dr. Steven MacEach- ern and Dr. Xinyi Xu, for inspiring this research and guiding me with endless patient. They are so generous to share their creative ideas with me and always directed me in the right way. I sincerely appreciate their continual support in both academic and finance over these years and the time they spent checking every page of this thesis. I feel very fortunate to have the opportunity to learn from them. Without their patient guidance and unselfish help, this work cannot be done. I also want to thank The Ohio State University, Department of Statistics and Dr. Prem Goel for training me and funding me as a graduate associate during my study. I also wish to thank Dr. Chris Hans and Dr. Peter Craigmile, for serving in my dissertation committee and my candidacy exam committee and their great comments and suggestions. Their generous help and support deserves more than thanks. My gratitude also goes to Beth Duffy and all her families for their generosity and kindness. I want to thank them for hosting me when I just arrived in Columbus and being continuously supportive. It is always my pleasure to have you as my host family. At the end, I would like to thank all my friends at Ohio State. I always feel lucky to meet them and know them. My special thanks also go to my families for their constant trust and encouragement. v Working with everyone who made this thesis possible is a precious experience. vi Vita August 6, 1983 . Born - Tianjin, China 2006 . .B.S. Mathematics, Beijing Institute of Technology 2008 - present . Graduate Teaching/Research Associate, Ohio Sate University. Fields of Study Major Field: Statistics vii Table of Contents Page Abstract . ii Dedication . iv Acknowledgments . v Vita . vii List of Tables . xi List of Figures . xii 1. Introduction . 1 1.1 Bayesian Model Comparison and Bayes Factors . 1 1.1.1 Introduction to the Comparison of Probability Models . 1 1.1.2 Definition of Bayes Factors . 3 1.2 Interpretation of the Bayes Factor . 4 1.3 Advantages of Using the Bayes Factor for Hypothesis Testing . 6 1.4 Motivation for a \New" Bayes Factor. 8 1.4.1 Impact of the prior distribution . 8 1.4.2 Example 1: Lindley's Paradox . 10 1.4.3 Example 2: An i:i:d: Case . 11 1.4.4 Remarks . 17 2. Computational Techniques . 20 2.1 Markov Chain Monte Carlo Algorithms . 20 2.1.1 The Gibbs Sampler . 21 2.1.2 The Metropolis-Hastings Algorithm . 22 2.1.3 Remarks . 24 viii 2.2 Computation of Marginal Likelihood . 25 2.2.1 The Laplace Approximation . 25 2.2.2 Monte Carlo Integration and Importance Sampling . 26 2.2.3 The Harmonic Mean Estimator . 28 2.2.4 Chib's MCMC-based Methods . 29 3. The Calibrated Bayes Factor . 37 3.1 Alternative Forms of the Bayes Factor . 38 3.1.1 Pseudo-Bayes Factors . 38 3.1.2 Posterior Bayes Factors . 39 3.1.3 Partial and Fractional Bayes Factors . 40 3.1.4 Intrinsic Bayes Factors . 41 3.1.5 Remarks . 42 3.2 Information Metric . 44 3.3 Calibrating Bayes Factors . 45 3.4 Example 2 revisited . 46 3.5 One-sample Model . 47 3.5.1 Target Information Level . 47 3.5.2 Calibrating Priors . 49 3.5.3 Simulations . 51 3.5.4 Study 1: Adult Males, Aged 18-24 . 56 3.6 Two-sample Model . 61 3.6.1 Study 2: Diabetes Group vs. Non-Diabetes Group . 61 3.7 Griffin’s MDP Priors . 69 4. Linear Regression Model . 72 4.1 Background . 72 4.2 Model Specification . 74 4.3 Prior Distributions . 75 4.3.1 Improper Priors . 75 4.3.2 Proper Priors . 77 4.4 Calibration . 82 4.4.1 Information Metric . 82 4.4.2 Target of Concentration . 83 4.4.3 Calibration sample size . 84 4.5 Simulations . 85 4.5.1 Independent normal priors on low dimension β . 86 4.5.2 Horseshoe Priors on high dimension β . 96 4.6 Case Study . 99 4.6.1 Description of Ozone data . 99 ix 4.6.2 Pairwise Comparisons among Model 1 - Model 4 . 100 4.6.3 Comparisons with Model 5 . 104 5. Conclusions . 107 Bibliography . 110 Appendices 119 A. An Algorithm for Calibrated Sample Size Search . 119 B. An Overview of the Ozone Data . 121 x List of Tables Table Page 1.1 Jeffreys’ scale of evidence . 5 1.2 Kass and Raftery's scale of evidence . 6 3.1 Summary statistics for OFHS (Male Adult, Aged 18-24) . 57 3.2 Probability of obtaining more non-diabetes observations . 64 3.3 Summary statistics for OFHS (Married Male, Aged 55-64 High School Graduates with Diabetes). ............................... 65 3.4 Summary statistics for OFHS (Married Male, Aged 55-64 High School Graduates without Diabetes). .............................. 67 2 · 2 4.1 Quantiles of χ(1) χ(p) ........................... 84 4.2 Estimation of coefficients for Models 1-4 . 101 B.1 Variables used in the ozone data . 121 xi List of Figures Figure Page 1.1 Motivation Example . 13 2 3.1 Cumulative density function of SKL samples versus χ(1) . 48 3.2 Small departure to normality . 53 3.3 Median departure to normality . 53 3.4 Large departure to normality . 54 3.5 Super large departure to normality . 54 3.6 Calibration sample size versus Departure to normality on log10 . 55 3.7 Histogram of BMI data (Male Adult) . 58 3.8 Bayes factors for BMI data (Male Adult) . 60 3.9 The mean log Bayes factor for the two-sample model . 62 3.10 Histogram of BMI data (Diabetes Group) . 65 3.11 Bayes factors for BMI data (Diabetes Group) . 66 3.12 Histogram of BMI data (Non-Diabetes Group) . 68 3.13 Bayes factors for BMI data (Non-Diabetes Group) . 69 4.1 Box-plot for the posterior draws of βh's and λh's . 80 xii 4.2 Plot of least squares estimates against horseshoe posterior means . 81 2 · 2 4.3 Cumulative density functions of SKL samples versus χ(1) χ(p) . 88 4.4 Bayes factors for multiple prefixed zeros in βγ . 89 4.5 Bayes factors for one prefixed zeros in βγ . 90 4.6 Bayes factors under multicollinearity . 92 4.7 Impact from the diffuseness of priors on original and calibrated Bayes factors . 94 4.8 An example of Bayes factor with two turning points . 95 4.9 Bayes factor for comparing high-dimensional models I . 97 4.10 Bayes factor for comparing high-dimensional models II . 98 4.11 Bayes factors for comparisons among Models 1-4 .