A Bayesian Semiparametric Joint Model for Longitudinal and Survival Data (Grant Title: Smart Early Screening for Autism and Communication Disorders in Primary Care)
Total Page:16
File Type:pdf, Size:1020Kb
Florida State University Libraries Electronic Theses, Treatises and Dissertations The Graduate School 2019 A Bayesian Semiparametric Joint Model fPeongrp eLngo Wnangg itudinal and Survival Data Follow this and additional works at the DigiNole: FSU's Digital Repository. For more information, please contact [email protected] FLORIDA STATE UNIVERSITY COLLEGE OF ARTS AND SCIENCES A BAYESIAN SEMIPARAMETRIC JOINT MODEL FOR LONGITUDINAL AND SURVIVAL DATA By PENGPENG WANG A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2019 Copyright c 2019 Pengpeng Wang. All Rights Reserved. Pengpeng Wang defended this dissertation on April 16, 2019. The members of the supervisory committee were: Elizabeth H. Slate Professor Co-Directing Dissertation Jonathan R. Bradley Professor Co-Directing Dissertation Amy M. Wetherby University Representative Lifeng Lin Committee Member The Graduate School has verified and approved the above-named committee members, and certifies that the dissertation has been approved in accordance with university requirements. ii ACKNOWLEDGMENTS First of all, I would like to express my sincere gratitude to my major advisor Dr. Elizabeth H. Slate for the continuous support of my PhD study and related research, for her motivation, patience, and immense knowledge. Her guidance and encouragement helped me in all the time of my research. Her elegant and well-organized personality has a great impact on my life. I am very grateful to have her as my advisor and she will always be a role model in my life. I would like to thank my co-advisor Dr. Jonathan R. Bradley for sharing his research idea and expertise so willingly, for his insightful comments and encouragement. I appreciate him for spending time on my research and having discussions with me. Without his guidance and help this dissertation would not have been possible. I am also grateful to the rest of my dissertation committee: Dr. Amy M. Wetherby and Dr. Lifeng Lin for their time, support, guidance and good will throughout the preparation for my de- fense. And thanks for their review of this document. I am especially appreciative of the opportunity to be involved with Dr. Amy M. Wetherby's research team on multiple projects including \Autism Adaptive Community-based Treatment to Improve Outcomes Using Navigators (ACTION) Net- work" and \Mobilizing Community Systems to Engage Families in Early ASD Detection & Ser- vices," which provided support for my research. In addition, my sincere thanks go to my family, friends and fellow students who gave me help and support in my PhD study and life in general. Having fun with them made my graduate school life wonderful. iii TABLE OF CONTENTS List of Tables . vi List of Figures . vii Abstract . .x 1 INTRODUCTION 1 2 BACKGROUND 4 2.1 Gaussian Assumption . .4 2.2 Log-Gamma Distribution . .5 2.3 Multivariate Log-Gamma Distribution . .7 2.4 Conditional Multivariate Log-Gamma Distribution . .8 2.5 Slice Sampler . .9 2.6 Dirichlet Process . 10 2.7 Kaplan-Meier Estimator . 12 2.8 Posterior Predictive p-Value . 13 2.9 Review of DIC . 14 3 A GAUSSIAN JOINT MODEL 15 3.1 Model Formulation . 15 3.2 Likelihood . 16 3.3 Dirichlet Process Prior . 18 3.3.1 Dirichlet Process Prior in the Gaussian Joint Model . 18 3.3.2 Concentration Parameter . 19 3.4 Prior Distributions . 20 3.5 Gibbs Sampler . 21 4 LOG-GAMMA JOINT MODEL 26 4.1 Model Formulation . 26 4.2 Likelihood . 27 4.3 Dirichlet Process Prior in the Log-Gamma Joint Model . 29 4.4 Priors and Hyperpriors . 30 4.5 Gibbs Sampler . 31 5 SIMULATION STUDY 38 5.1 Gaussian Joint Model . 38 5.1.1 Simulation Design . 39 5.1.2 Simulation Results . 41 5.1.3 Discussion . 42 5.2 Log-Gamma Joint Model . 48 5.2.1 Simulation Design . 48 5.2.2 Simulation Results . 50 iv 5.2.3 Discussion . 51 5.3 Model Comparison . 57 5.3.1 Comparison between Gaussian Joint Model and Log-Gamma Joint Model . 57 5.3.2 Comparison with Parametric Models . 65 6 APPLICATIONS 68 6.1 HIV Data . 68 6.1.1 Data Description . 68 6.1.2 Results Using Joint Models . 69 6.1.3 Diagnosis . 72 6.2 PSA Data . 78 6.2.1 Data Description . 78 6.2.2 Results Using Joint Models . 79 6.2.3 Diagnosis . 84 6.3 Discussion . 85 7 CONCLUSION AND FUTURE WORK 87 Appendix A ADDITIONAL DETAILS ON THE MULTIVARIATE LOG-GAMMA DISTRI- BUTION 89 A.1 Marginal Multivariate Log-Gamma Distribution . 89 A.2 Data Augmentation Strategies for the cMLG Distribution . 90 A.3 Proof of Proposition in Section 2.2 . 94 A.4 Proof of Proposition in Section 2.3 . 95 A.5 Proof in Section 2.4 . 95 B IRB Approvals 96 Bibliography . 98 Biographical Sketch . 102 v LIST OF TABLES 5.1 Parameter estimates for simulation study of Case 1-3 (Gaussian distributed data) using model GaussianMH (the Gaussian joint model with Metropolis-Hastings) . 43 5.2 Parameter estimates for simulation study of Case 1-3 (Gaussian distributed data) using model GaussianSS (the Gaussian joint model with slice sampler) . 44 5.3 MSE for the Gaussian joint model and the log-gamma joint model . 55 5.4 Parameter estimates for simulation study of Case 4-6 (log-gamma distributed data) using the log-gamma joint model. 56 5.5 Clustering evaluation for the Gaussian joint model and the log-gamma joint model . 61 5.6 Contingency table for the Rand index. 61 5.7 Effective sample size for the Gaussian joint model and the log-gamma joint model . 66 6.1 Parameter estimates for the HIV data using the Gaussian joint model with slice sam- pler and the log-gamma joint model. 72 6.2 Parameter estimates for the PSA data using the Gaussian joint model with slice sampler and the log-gamma joint model. 81 vi LIST OF FIGURES 2.1 Kernel density plots of the log-gamma distributions and the standard Gaussian dis- tribution. .6 5.1 Histograms of simulated survival time. The number of clusters is indicated in the title heading of each panel. 39 5.2 Trace plots and density curves for the last 3,000 iterations using model GaussianSS in Case 3. (a) MCMC trace plot of β01 (intercept for the first subject). (b) Posterior density estimate of β01 from model GaussianSS (solid line) and the true value of β01 (dashed line). (c) MCMC trace plot of β11 (slope for the first subject). (d) Posterior density estimates of β11 from model GaussianSS (solid line) and the true value of β11 (dashed line). 45 5.3 Trace plots and density curves for the last 3,000 iterations using model GaussianSS in Case 3. (a) MCMC trace plot of γ (link parameter). (b) Posterior density estimate of γ from model GaussianSS (solid line) and the true value of γ (dashed line). (c) MCMC trace plot of α (covariate parameter). (d) Posterior density estimates of α from model GaussianSS (solid line) and the true value of α (dashed line). 46 5.4 True longitudinal observations vs. estimated longitudinal trajectories using model GaussianSS in Case 3 (the simulation study with three cluster Gaussian distributied data for the Gaussian joint model). 47 5.5 True hazard rate vs. estimated hazard rate from model GaussianSS in Case 3 (the simulation study with three cluster Gaussian distributied data for the Gaussian joint model). 47 5.6 Histograms of simulated survival time. The number of clusters is indicated in the title heading of each panel. 49 5.7 Trace plots and density curves for the last 3,000 iterations using the log-gamma joint model in Case 6. (a) MCMC trace plot of β0. (b) Posterior density estimates of β0 from the semiparametric log-gamma joint model (solid line) and the true value of β0 (dashed line). (c) MCMC trace plot of β1. (d) Posterior density estimates of β1 from the semiparametric log-gamma joint model (solid line) and the true value of β1 (dashed line). 52 5.8 Trace plots and density curves for the last 3,000 iterations using the log-gamma joint model in Case 6. (a) MCMC trace plot of γ. (b) Posterior density estimates of γ from the semiparametric log-gamma joint model (solid line) and the true value of γ (dashed line). (c) MCMC trace plot of β1. (d) Posterior density estimates of α from the semiparametric log-gamma joint model (solid line) and the true value of α (dashed line). 53 vii 5.9 True longitudinal observations vs. estimated longitudinal trajectories using the log- gamma joint model in Case 6 (three cluster simulation study of log-gamma distributed data). 54 5.10 True hazard rate vs. estimated hazard rate using the log-gamma joint model in Case 6 (three cluster simulation study of log-gamma distributed data). 54 5.11 True vs. estimated longitudinal trajectories and hazard rates at cluster level in Case 3. The red solid lines are the true trajectories and hazard rates in each cluster.