Nonparametric Bayes: Inference Under Nonignorable Missingness and Model Selection
Total Page:16
File Type:pdf, Size:1020Kb
NONPARAMETRIC BAYES: INFERENCE UNDER NONIGNORABLE MISSINGNESS AND MODEL SELECTION By ANTONIO LINERO A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2015 c 2015 Antonio Linero Dedicated to Mike and Hani for their gracious support, and Katie for her patience! ACKNOWLEDGMENTS I would first like to convey my sincerest gratitude to my advisers, Professor Michael J. Daniels and Professor Hani Doss, for their help and encouragement. I have been fortunate to have two excellent mentors to bounce ideas off of, and to offer assurances when I doubted my work. They have provided me with every opportunity to succeed, and it has been an honor to work with them. I would also like to thank my committee members, Professor Malay Ghosh who always had an open door and Professor Arunava Banerjee for providing a valuable Machine Learning perspective. Outside of my committee, I am grateful to Professor Daniel O. Scharfstein for his insights into the missing data problem and for giving me the opportunity to visit Johns Hopkins. I am also particularly grateful to Professor Ron Randles, whose Introduction to Mathematical Statistics course provided the initial inspiration for me to pursue Statistics, and to Professor Andrew Rosalsky for providing me with a deep appreciation of Probability. Lastly, I would like to thank my partner Katie for her inspiration and patience, and my parents, without whose support I would not have had the opportunity to succeed. 4 TABLE OF CONTENTS page ACKNOWLEDGMENTS.................................4 LIST OF TABLES.....................................8 LIST OF FIGURES....................................9 ABSTRACT........................................ 10 CHAPTER 1 PRELIMINARIES ON BAYESIAN NONPARAMETRICS............ 12 1.1 Introduction................................... 12 1.2 Posterior Consistency.............................. 13 1.3 Review of Random Measures.......................... 15 1.3.1 Dirichlet Processes........................... 15 1.3.2 Mixtures of Dirichlet Processes and Dirichlet Process Mixtures... 17 1.3.3 Dependent Random Measures..................... 19 2 INFORMATIVE MISSINGNESS IN LONGITUDINAL STUDIES........ 20 2.1 Introduction................................... 20 2.2 Notation..................................... 21 2.3 Rubin's Classification of Missing Data.................... 23 2.4 Why Bayesian Nonparametrics?........................ 25 2.5 Existing Approaches.............................. 26 2.5.1 Likelihood Factorizations........................ 26 2.5.2 Non-Likelihood Based Approaches................... 29 2.6 Identifying Restrictions and Sensitivity Parameters............. 30 2.7 Partial and Latent Ignorability........................ 33 2.8 Intermittent Missingness............................ 34 2.9 Summary of Our Strategy and Our Contributions.............. 35 3 NONPARAMETRIC BAYES FOR NONIGNORABLE MISSINGNESS..... 37 3.1 Introduction................................... 37 3.2 Strategy for Prior Specification........................ 39 3.3 Posterior Consistency of pobs .......................... 40 3.4 Kullback-Leibler Property for Kernel Mixture Models............ 44 3.5 Identifying Restrictions............................. 49 3.5.1 Monotone Missingness......................... 49 3.5.2 Non-monotone Missingness....................... 51 3.6 Inference by G-Computation.......................... 53 3.7 Discussion.................................... 56 5 4 A DIRICHLET PROCESS MIXTURE WORKING MODEL, WITH APPLICATION TO A SCHIZOPHRENIA CLINICAL TRIAL................... 57 4.1 Introduction................................... 57 4.2 The Schizophrenia Clinical Trial........................ 57 4.3 A Dirichlet Process Mixture Working Prior.................. 59 4.4 The Extrapolation Distribution........................ 61 4.5 Computation and Inference.......................... 62 4.6 Simulation Studies............................... 63 4.6.1 Performance for Mean Estimation under MAR............ 64 4.6.2 Performance for Effect Estimation Under MNAR........... 66 4.7 Application to the Schizophrenia Clinical Trial................ 69 4.7.1 Comparison to Alternatives and Assessing Model Fit........ 69 4.7.2 Inference and Sensitivity Analysis................... 71 4.8 Discussion.................................... 75 5 EMPIRICAL BAYES ESTIMATION AND MODEL SELECTION FOR HIERARCHICAL NONPARAMETRIC PRIORS............................ 77 5.1 Introduction................................... 77 5.1.1 Motivating Examples.......................... 81 5.1.2 Our Contributions............................ 82 5.2 Theoretical Development............................ 83 5.2.1 Marginal Likelihoods.......................... 84 5.2.2 Limiting Cases of the Hierarchical Dirichlet Process......... 87 5.3 Estimation of Bayes Factor Surfaces...................... 91 5.3.1 Testing Against Boundary Values................... 95 5.3.2 Empirical Bayes Estimation...................... 98 5.4 Illustrations................................... 100 5.4.1 Quality of Hospital Care Data..................... 100 5.4.2 Topic Modeling............................. 104 5.5 Discussion.................................... 109 6 DISCUSSION AND FUTURE WORK....................... 110 6.1 Rates of Convergence.............................. 110 6.2 More Work on Non-monotone Missingness.................. 110 6.3 Multivariate Models for Missing Data..................... 111 6.4 Causal Inference................................ 111 6.5 Alternatives to the Hierarchical Dirichlet Process.............. 112 APPENDIX A APPENDIX TO CHAPTER 3............................ 113 A.1 Proof of Theorem 3.2.............................. 113 A.2 Proof of Theorem 3.3.............................. 113 6 A.3 Proof of Theorem 3.5.............................. 119 B APPENDIX TO CHAPTER 4............................ 122 B.1 Blocked Gibbs Sampler............................. 122 B.2 Prior Specification............................... 123 B.2.1 Parametric Priors............................ 123 B.2.2 Nonparametric Default Priors..................... 123 B.3 Simulation Settings............................... 124 B.3.1 Section 4.6.1............................... 124 B.3.2 Section 4.6.2............................... 125 B.4 Exponential Tilting............................... 130 C APPENDIX TO CHAPTER 5............................ 133 C.1 Proof of Theorem 5.2.............................. 133 C.2 Proof of Lemma 5.1............................... 133 C.3 Proof of Lemma 5.2............................... 134 C.4 Impropriety of Posterior Under an Improper Prior.............. 135 REFERENCES....................................... 137 BIOGRAPHICAL SKETCH................................ 146 7 LIST OF TABLES Table page 2-1 Schematic representation of ACMV.......................... 31 2-2 Schematic representation of NFD........................... 32 4-1 Simulation results under MAR............................ 66 4-2 Comparison of results on SCT data under MAR.................. 70 B-1 Results from simulation study in Section 4.6.2.................... 128 8 LIST OF FIGURES Figure page 3-1 Schematic describing the working prior framework................. 40 3-2 Graphical depiction of the coupling interpretation of the transformation method. 50 4-1 Trajectories of two latent classes........................... 59 4-2 Results from simulation study in Section 4.6.2.................... 68 4-3 Model checking for SCT................................ 71 4-4 Improvement of treatments relative to placebo................... 74 4-5 Contour plot for treatment effects as functions of sensitivity parameters..... 75 5-1 Graphical depiction of a Bayesian hierarchical data generating mechanism.... 78 5-2 Graphical representation of the HDP as a directed acyclic graph......... 79 5-3 Draws from a Markov chain targeting improper posterior of γ........... 80 5-4 Models corresponding to boundary values of (α; γ)................. 89 5-5 Models obtained by letting α or γ tend to 0 or ................. 90 1 5-6 MCMC output for (α; γ) under an informative prior................ 102 5-7 Logarithm of Bayes factor surface of (α; γ)...................... 103 5-8 True topics used in simulation experiment...................... 105 5-9 Histogram of samples from the posterior distribution of α and γ.......... 106 5-10 L1 error in estimating the most prevalent topic................... 107 5-11 Sensitivity of estimation of the most prevalent topic to choice of hyperparameter. 108 B-1 Dataset generated under M2............................. 129 B-2 Dataset generated under M3............................. 130 9 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy NONPARAMETRIC BAYES: INFERENCE UNDER NONIGNORABLE MISSINGNESS AND MODEL SELECTION By Antonio Linero August 2015 Chair: Michael J. Daniels Cochair: Hani Doss Major: Statistics This dissertation concerns two essentially independent topics, with the primary link between the two being the use of Bayesian nonparametrics as an inference tool. The first topic concerns inference in the presence of missing data, with emphasis on longitudinal clinical trials with attrition. In this setting, it is well known that many effects of interest are not identified in the absence of untestable assumptions; the best one can do is to conduct a sensitivity analysis to determine