Copula Regression Models for the Analysis of Correlated Data with Missing Values

Copula Regression Models for the Analysis of Correlated Data with Missing Values by Wei Ding A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Biostatistics) in The University of Michigan 2015 Doctoral Committee: Professor Peter X-K Song, Chair Assistant Professor Veronica Berrocal Assistant Professor Hui Jiang Associate Professor Qiaozhu Mei c Wei Ding 2015 All Rights Reserved To my parents ii ACKNOWLEDGEMENTS It was the best of time. It was the worst of time. For Charles Dickinson, it was French Revolution, but for me, it was my life in the past five years as a phd student in Ann Arbor. When I first came here, I was with curiosity, with fear, with doubt, and with uncertainty. I wish I can leave here with confidence, with independence, and with hope, to a bright future. First and foremost, I am deeply grateful to my wonderful advisor, Dr. Peter Song, who encouraged me, inspired me, and helped me with great tolerance and patience. Without him, this dissertation is never made possible. I was moved by his rigorous thinking, broad vision, and his passion to statistics. Moreover, Dr. Peter Song is not just an advisor, but also a mentor, and a role model. I learned a lot from his thoughtfulness and kindness. My thanks go out to Dr. Veronica Berrocal, Dr. Hui Jiang, and Dr. Qiaozhu Mei for serving on my dissertation committee and providing valuable suggestions and support on my research. I am thankful for all my best friends in Ann Arbor and elsewhere around the world. With them, I had a great time, and never felt lonely. The most important of all, I must thank my parents, who love me unconditionally, support me no matter what, and always have absolute faith in me. This thesis is dedicated to my beloved parents. iii TABLE OF CONTENTS DEDICATION :::::::::::::::::::::::::::::::::::::::::: ii ACKNOWLEDGEMENTS :::::::::::::::::::::::::::::::::: iii LIST OF FIGURES :::::::::::::::::::::::::::::::::::::: vi LIST OF TABLES ::::::::::::::::::::::::::::::::::::::: vii ABSTRACT ::::::::::::::::::::::::::::::::::::::::::: ix CHAPTER I. Introduction ....................................... 1 1.1 Summary . 1 1.2 Objectives . 3 1.3 Literature Review . 6 1.3.1 Correlated Data . 6 1.3.2 Missing Data . 7 1.3.3 Copula . 8 1.3.4 EM Algorithm . 9 1.3.5 Composite Likelihood Method . 10 1.3.6 Partial Identification . 10 1.3.7 Multilevel Model . 11 1.3.8 Motivating data I: Quality of Life Study . 11 1.3.9 Motivating data II: Infants' Visual Recognition Memory Study . 12 1.4 Outline of Dissertation . 13 II. EM Algorithm in Gaussian Copula with Missing Data . 14 2.1 Summary . 14 2.2 Introduction . 15 2.3 Model . 18 2.3.1 Location-Scale Family Distribution Marginal Model . 18 2.3.2 Gaussian Copula . 19 2.3.3 Examples of Marginal Models . 20 2.4 EM Algorithm . 21 2.4.1 Expectation and Maximization . 21 2.4.2 Examples Revisited . 26 2.4.3 Standard Error Calculation . 27 2.4.4 Initialization . 28 2.5 Simulation Study . 28 2.5.1 Skewed Marginal Model . 29 2.5.2 Misaligned Missing Data . 31 iv 2.6 Data Example . 32 2.7 Discussion . 36 III. Composite Likelihood Approach in Gaussian Copula Regression Models with Missing Data ................................... 39 3.1 Summary . 39 3.2 Introduction . 40 3.3 Method . 45 3.3.1 Complete-Case Composite Likelihood . 45 3.3.2 Likelihood Orthogonality . 46 3.3.3 Large-Sample Properties . 49 3.3.4 Estimation of Partially Identifiable Parameters . 50 3.4 Implementation . 53 3.5 Gaussian Copula Regression Model . 54 3.5.1 Location-Scale Family Distribution Marginal Model . 54 3.5.2 Gaussian Copula . 55 3.5.3 Complete-Case Composite Likelihood . 56 3.5.4 Implementation . 57 3.6 Simulation Experiments . 58 3.6.1 Three-Dimensional Linear Regression Model . 58 3.6.2 Integration of Four Data Sources . 61 3.6.3 Estimation with Two Partially Identifiable Parameters . 62 3.7 PROMIS Data Analysis . 65 3.8 Concluding Remarks . 66 IV. Multilevel Gaussian Copula Regression Model . 72 4.1 Summary . 72 4.2 Introduction . 73 4.3 EEG Data . 74 4.3.1 Data Exploration . 76 4.3.2 Mixed Effects Model . 77 4.4 Model . 84 4.4.1 Location-scale Family Marginal Model . 85 4.4.2 Gaussian Copula . 85 4.4.3 Example of Marginal Model: Gamma Margin . 87 4.5 Maximum Likelihood Estimation . 87 4.5.1 Log Likelihood Function . 88 4.5.2 Peeling Algorithm . 88 4.5.3 Statistical Inference . 90 4.6 Simulation Study . 90 4.6.1 Multilevel Model with Normal Margins I . 91 4.6.2 Multilevel Model with Normal Margins II . 92 4.7 Analysis of EEG Data . 94 4.8 Conclusions & Discussion . 97 V. Discussions and Future Works ............................100 APPENDIX ::::::::::::::::::::::::::::::::::::::::::: 102 BIBLIOGRAPHY :::::::::::::::::::::::::::::::::::::::: 108 v LIST OF FIGURES Figure 1.1 Connection and Structure of the Dissertation . 13 2.1 Plots of observed and predicted residuals from the EM algorithm. 35 3.1 Upper and Lower Bounds for a Partially Identifiable Correlation Parameter . 52 3.2 Bounds for Partial Identifiable Parameters . 64 4.1 Contour of the 128-Channel EEG Sensor Net (L: left; R: right; A: anterior; P: posterior) . 75 4.2 LSW related time series cross-classified by iron status and stimulus. 76 4.3 Density of LSW measurements at each electrodes over stimulus level. 78 4.4 Scatter Plot of LSW Measurements in Each Subregion Stratified by Pictorial Stim- uli. 80 4.5 Scatter Plot of LSW Measurements in Each Subregion Stratified by Pictorial Stim- uli. 81 4.6 Boxplots of region-specific LSW cross iron status and stimulus. 82 4.7 Boxplots of region-specific LSW cross iron status and stimulus. 83 4.8 Z-statistics of Iron Status Effect on Mother's and Stranger's Pictorial Stimuli . 98 vi LIST OF TABLES Table 2.1 Simulation results of correlation (Kendall's tau) parameters estimation in copula model for marginal skewed distributed data obtained by full data likelihood, EM algorithm and Imputation methods with different missing percentage. (Standard error ratio is calculated by a ratio of two standard errors between a method and the gold standard.) . 30 2.2 Simulation results concerning estimation of Pearson correlation and marginal regression parameters in the copula model for partially misaligned missing at random data obtained EM algorithm, compared with the gold standard with full data, Mul- tiple Imputation and Hot-Deck Imputation. (Standard error ratio is calculated by a ratio of two standard errors between a method and the gold standard.) . 33 2.3 Estimation of correlation and marginal regression parameters in the copula model for quality of life study obtained by univariate analysis, EM algorithm and Multiple Imputation. 35 3.1 Summary of simulation results, including average point estimates, empirical standard errors (ESE) and average model-based standard errors (AMSE), under the misaligned missing data pattern for 3-dimensional linear regression model. 68 3.2 Summary of simulation results in the integration of four data sets, including average point estimates, empirical standard errors (ESE) and average model-based standard errors (AMSE), under various missing data patterns over four different data sources, each of which is generated from a three-dimensional linear regression model . 69 3.3 Summary of simulation results in a 5-dimensional linear model, including average point estimates, empirical standard errors (ESE) and average model-based standard errors (AMSE), under two pairs of misaligned missingness between the first and second margins and between the fourth and fifth margins, respectively. 70 3.4 Estimation of correlation and marginal regression parameters for quality of life study obtained by complete-case univariate analysis, EM algorithm with two initial values of correlation parameters, two runs multiple imputations, and one method of complete-case composite likelihood. 71 4.1 Summary of Pearson Correlation of LSW Measurements in Each Subregion Strati- fied by Pictorial Stimuli . 79 4.2 Maximum Likelihood Estimates of the Fixed Effects . 84 4.3 Restricted Maximum Likelihood Estimates of the Variance Components . 84 vii 4.4 Summary of simulation results from the analysis of correlated normal outcomes under 2-level correlation: the wave correlation and AR-1 correlation. The multilevel Gaussian copula regression model is compared with the univariate analysis, including average point estimates, empirical standard errors (ESE) and average model-based standard errors (AMSE) over 1000 running of simulations. 93 4.5 Summary of simulation results in four-level correlated normally distributed margins with Matérn correlation, exchangeable correlation, AR-1 correlation, and unstruc- tured correlation obtained from the multilevel Gaussian copula regression model, including average point estimates and empirical standard errors (ESE). 94 4.6 Summary of iron status effect on LSW measurements collected from each node and each stimulus, including estimate, model-based standard errors, and z-statistics. 96 4.7 Summary of estimation results for the correlation parameters. 97 viii ABSTRACT Copula Regression Models for the Analysis of Correlated Data with Missing Values by Wei Ding Advisor: Peter X-K Song The class of Gaussian copula regression models provides a unified modeling frame- work to accommodate various marginal distributions and flexible dependence struc- tures. In the presence of missing data, the Expectation-Maximization (EM) algorithm plays a central role in parameter estimation. This classical method is greatly challenged by multilevel.

Copula Regression Models for the Analysis of Correlated Data with Missing Values

Estimation of Common Location and Scale Parameters in Nonregular Cases Ahmad Razmpour Iowa State University

A Comparison of Unbiased and Plottingposition Estimators of L

The Asymmetric T-Copula with Individual Degrees of Freedom

A Bayesian Hierarchical Spatial Copula Model: an Application to Extreme Temperatures in Extremadura (Spain)

Procedures for Estimation of Weibull Parameters James W

Location-Scale Distributions

Lecture 22 - 11/19/2019 Lecture 22: Robust Location Estimation Lecturer: Jiantao Jiao Scribe: Vignesh Subramanian

Robust Estimator of Location Parameter1)

Maximum Likelihood Characterization of Distributions

Parametric Forecasting of Value at Risk Using Heavy-Tailed Distribution

Sufficient Conditions for Monotone Hazard Rate an Application to Latency-Probability Curves

Functions by Name