Paradoxes and Priors in Bayesian Regression
Total Page:16
File Type:pdf, Size:1020Kb
Paradoxes and Priors in Bayesian Regression Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Agniva Som, B. Stat., M. Stat. Graduate Program in Statistics The Ohio State University 2014 Dissertation Committee: Dr. Christopher M. Hans, Advisor Dr. Steven N. MacEachern, Co-advisor Dr. Mario Peruggia c Copyright by Agniva Som 2014 Abstract The linear model has been by far the most popular and most attractive choice of a statistical model over the past century, ubiquitous in both frequentist and Bayesian literature. The basic model has been gradually improved over the years to deal with stronger features in the data like multicollinearity, non-linear or functional data pat- terns, violation of underlying model assumptions etc. One valuable direction pursued in the enrichment of the linear model is the use of Bayesian methods, which blend information from the data likelihood and suitable prior distributions placed on the unknown model parameters to carry out inference. This dissertation studies the modeling implications of many common prior distri- butions in linear regression, including the popular g prior and its recent ameliorations. Formalization of desirable characteristics for model comparison and parameter esti- mation has led to the growth of appropriate mixtures of g priors that conform to the seven standard model selection criteria laid out by Bayarri et al. (2012). The existence of some of these properties (or lack thereof) is demonstrated by examining the behavior of the prior under suitable limits on the likelihood or on the prior itself. The first part of the dissertation introduces a new form of an asymptotic limit, the conditional information asymptotic, driven by a situation arising in many practical problems when one or more groups of regression coefficients are much larger than the rest. Under this asymptotic, many prominent \g-type" priors are shown to suffer from two new unsatisfactory behaviors, the Conditional Lindley's Paradox and Essentially ii Least Squares estimation. The cause behind these unwanted behaviors is the existence of a single, common mixing parameter in these priors that induces mono-shrinkage. The novel block g priors are proposed as a collection of independent g priors on distinct groups of predictor variables and improved further through mixing distributions on the multiple scale parameters. The block hyper-g and block hyper-g=n priors are shown to overcome the deficiencies of mono-shrinkage, and simultaneously display promising performance on other important prior selection criteria. The second part of the dissertation proposes a variation of the basic block g prior, defined through a reparameterized design, which has added computational benefits and also preserves the desirable properties of the original formulation. While construction of prior distributions for linear models usually focuses on the regression parameters themselves, it is often the case that functions of the parameters carry more meaning to a researcher than do the individual regression coefficients. If prior probability is not apportioned to the parameter space in a sensible manner, the implied priors on these summaries may clash greatly with reasonable prior knowl- edge. The third part of the dissertation examines the modeling implications of many traditional priors on an important model summary, the population correlation coef- ficient that measures the strength of a linear regression. After detailing deficiencies of standard priors, a new, science driven prior is introduced that directly models the strength of the linear regression. The resulting prior on the regression coefficients belongs to the class of normal scale mixture distributions in particular situations. Utilizing a fixed-dimensional reparameterization of the model, an efficient MCMC strategy that scales well with model size and requires little storage space is developed for posterior inference. iii Dedicated to my father, Mr. Arup Kumar Som, my late mother, Mrs. Sanchaita Som and my grandmother, Mrs. Lila Som. iv Acknowledgments I would like to express my gratitude to everyone who has made it possible for me to arrive at this momentous occasion in my life. First and foremost, I wish to thank my advisors Dr. Chris Hans and Dr. Steve MacEachern for their endless support and amazing mentorship during my four years of doctoral studies. I am greatly indebted to them for their unwavering patience in me, which helped me immensely to overcome my limitations whenever I faltered. I consider myself to be very fortunate that I got the opportunity to learn about the discipline from such brilliant minds and to enhance my knowledge and understanding of statistics under their watchful guidance. My advisors are greatly responsible for instilling in me the tenacity and passion toward research work that is needed to be successful at this high level. Fond memories of our meetings, including the serious discussions that frequently opened my mind to new perspectives on approaching unresolved questions as well as the light hearted chats on non-academic topics, will remain with me forever. I am grateful to Dr. Mario Peruggia for serving on my doctoral and candidacy exam committees and and to Dr. Xinyi Xu for serving on my candidacy exam commit- tee. Their comments and suggestions helped to broaden the scope of my dissertation research. A word of thanks should go to Dr. Radu Herbei as well for introducing me to new areas of statistics in my first year of graduate school and for helping me v to develop a fascination for research in this field. I am thankful to NSF for par- tially supporting my research under grant numbers DMS-1007682, DMS-1209194 and DMS-1310294. My family has always been a great source of love and support and without their immense influence this would never have been possible. Maa, Baba, Dida and Dada have always been there for me whenever I needed them, especially during the tough times that I experienced in my first year living so far away from home for the first time in my life. It was my parents' dream that I get a doctorate degree, but now that I am so close to receiving one, I feel heartbroken that my mother is no longer with us to enjoy this moment with the rest of the family. I dedicate this dissertation to my dad, who has been a constant source of my inspiration, and my mother and my grandmother, both of whom sacrificed so much for me to achieve this cherished dream. I would also like to thank all my wonderful friends who made life in graduate school and in Columbus enjoyable and worth cherishing all my life. There is a long list of friends without whom my life in the last four years would not have been as much fun. Thank you Abrez, Aritra, Arjun, Arka, Casey, Dani, Grant, Indrajit, Jingjing, John, Matt, Parichoy, Prasenjit, Puja, Steve and the many others who I did not mention here (but have been equally important to me in recent years), for helping me get through the thick and thin of life as a graduate student. And most importantly, thank you Sarah, the most special person in my life, for our wonderful times together. I am highly appreciative of Sarah's support, who stood by me throughout, in happiness and in sorrow, and put up with my craziness the past few years. vi Vita August 4, 1987 . Born - Kolkata, India. 2005-2008 . B. Stat. (Hons.), Indian Statistical Institute, Kolkata, India. 2008-2010 . M. Stat., Indian Statistical Institute, Kolkata, India. 2010-2014 . Graduate Teaching Associate, The Ohio State University. Publications Research Publications Som, A., Hans, C. M., MacEachern, S. N. \Block Hyper-g Priors in Bayesian Regres- sion". Technical Report No. 879, Department of Statistics, The Ohio State University, 2014. Fields of Study Major Field: Statistics. vii Table of Contents Page Abstract . ii Dedication . iv Acknowledgments . .v Vita......................................... vii List of Tables . xii List of Figures . xiv Chapters Page 1. Introduction . .1 1.1 Prior Specification in the Linear Model . .3 1.2 Problems with Mono-Shrinkage . .6 1.3 Implications of the Prior on Other Regression Summaries . .9 2. Review of Popular Priors in Bayesian Regression Problems . 12 2.1 Variable Selection Priors . 13 2.1.1 \Spike and Slab"-type Priors . 14 2.1.2 Stochastic Search Algorithms . 16 2.1.3 Priors on the Model Space . 17 2.2 Shrinkage Priors . 19 2.2.1 Scale Mixture Representation . 19 2.3 Classification Based on Prior Knowledge . 24 2.3.1 Subjective Priors . 24 2.3.2 Objective Priors . 25 2.4 Other Discussions on Bayesian Regression . 26 viii 3. Block g Priors . 27 3.1 Review of g Priors . 27 3.1.1 Zellner's g Prior . 28 3.1.2 Mixtures of g Priors . 29 3.2 Asymptotic Evaluations: Paradoxes, Old and New . 31 3.2.1 A Conditional Information Asymptotic ............ 33 3.2.2 A Conditional Lindley's Paradox ............... 34 3.2.3 The CLP in Other Mixtures of g Priors . 36 3.2.4 Avoiding ELS and CLP Behaviors . 39 3.3 Block g Priors . 40 3.3.1 The Block Hyper-g Prior . 42 3.3.2 Sampling from the Block Hyper-g Prior . 42 3.3.3 Expansion with respect to Least Squares Estimates under the Block g prior . 46 3.4 Asymptotic Evaluations of the Block Hyper-g Prior . 52 3.5 Consistency of the Block Hyper-g Prior . 57 3.5.1 Information Consistency . 58 3.5.2 Conditions and Assumptions . 59 3.5.3 Model Selection Consistency . 60 3.5.4 Prediction Consistency . 61 3.6 The Block Hyper-g=n Prior . 64 3.6.1 The Conditional Information Asymptotic in the Block Hyper- g=n Prior . 65 3.6.2 Consistency of the Block Hyper-g=n Prior .