A Three-Paper Dissertation on Longitudinal Data Analysis in Education and Psychology
Total Page:16
File Type:pdf, Size:1020Kb
A Three-Paper Dissertation on Longitudinal Data Analysis in Education and Psychology Hedyeh Ahmadi Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy under the Executive Committee of the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2019 c 2019 Hedyeh Ahmadi All rights reserved ABSTRACT A Three-Paper Dissertation on Longitudinal Data Analysis in Education and Psychology Hedyeh Ahmadi In longitudinal settings, modeling the covariance structure of repeated measure data is essential for proper analysis. The first paper in this three-paper dissertation presents a survey of four journals in the fields of Education and Psychology to identify the most commonly used methods for analyzing longitudinal data. It provides literature reviews and statistical details for each identified method. This paper also offers a summary table giving the benefits and drawbacks of all the surveyed methods in order to help researchers choose the optimal model according to the structure of their data. Finally, this paper highlights that even when scholars do use more advanced methods for analyzing repeated measure data, they very rarely report (or explore in their discussions) the covariance structure implemented in their choice of modeling. This suggests that, at least in some cases, researchers may not be taking advantage of the optimal covariance patterns. This paper identifies a gap in the standard statistical practices of the fields of Education and Psychology, namely that researchers are not modeling the covariance structure as an extension of fixed/random effects modeling. The second paper introduces the General Serial Covariance (GSC) approach, an extension of the Linear Mixed Modeling (LMM) or Hierarchical Linear Model (HLM) techniques that models the covariance structure using spatial correlation functions such as Gaussian, Exponential, and other patterns. These spatial correlations model the covariance structure in a continuous manner and therefore can deal with missingness and imbalanced data in a straightforward way. A simulation study in the second paper reveals that when data are consistent with the GSC model, using basic HLMs is not optimal for the estimation and testing of the fixed effects. The third paper is a tutorial that uses a real-world data set from a drug abuse prevention intervention to demonstrate the use of the GSC and basic HLM models in R programming language. This paper utilizes variograms (a visualization tool borrowed from geostatistics) among other exploratory tools to determine the covariance structure of the repeated measure data. This paper aims to introduce the GSC model and variogram plots to Education and Psychology, where, according to the survey in the first paper, they are not in use. This paper can also help scholars seeking guidance for interpreting the fixed effect-parameters. Table of Contents Paper 1: A Comprehensive Review of Methods for Analyzing Repeated Measure Data in Education and Psychology . 1 Paper 2: A Simulation Study of Linear Mixed Modeling with Spatial Correlation for Longi- tudinal Data . 129 Paper 3: A Tutorial on Using Linear Mix Modeling and Spatial Correlation with an Online Drug Abuse Prevention Intervention Data in R . 213 i ACKNOWLEDGMENTS I feel tremendously fortunate to have the unflagging support of my academic advisors, my family, and my friends who always believed in me and encouraged me to pursue my academic ambitions. I want to gratefully acknowledge all of my academic and non-academic mentors (whose names may not be listed here), who throughout the years have guided me along the way. I will forever be their student. I would like to first and foremost thank my wonderful husband who has been there for me in the past 12 years, supporting me in every step of my goals, academic and otherwise. He is the wisest and best cheerleader I could ask for. I would like to also thank my amazing advisor, Prof. Elizabeth Tipton, who despite her very busy schedule has been there for me at every step|from situating me in the de- partment, encouraging me to take on jobs I thought were beyond my abilities, stopping me when I was going overboard, and sending me the \hang in there" emails that meant so much to me. She is the mentor and role model that every graduate student dreams of. I would like to also acknowledge Prof. Bryan Keller for being so generous with his time and providing the data set for this dissertation. His sage advice constantly shed light on this project. In addition, I would like to thank Prof. Matthew Johnson for his valuable feedback that always took my dissertation to the next level. I would like to also express my gratitude to the rest of my committee, Prof. Oren Pizmony-Levy and Prof. Minjeong Jeon, who both graciously provided helpful comments. In addition, I would like to extend my sincere thanks to Prof. Daniel Gillen for his informative lecture notes that provided the seed for this dissertation. I feel enormously lucky to have my mom and my in-laws, who patiently waited all ii these years for this moment, sharing all the happy and the more trying days and providing invaluable emotional and professional support. I would like to thank the rest of my family, who always lent me an ear when I needed love and support. Finally, I'm grateful for the memory and lessons of my dad, who I know would be proud today. To my friends (you know who you are!), who listened uncomplainingly in my venting moments, who understood when I missed a lot of their special occasions to stay on track with my goals, and who always reminded me that \this shall pass"|thank you. Last but not least, I would be remiss if I did not mention the fabulous Columbia University campus, with all of its beautiful libraries and cafeterias that sheltered me on all those cold and snowy days and helped me to find a new place to study every month! iii PREFACE This dissertation consists of three papers that are intended to be submitted to three different journals. It therefore departs from the standard dissertation structure of sequen- tial chapters. Each paper has its own abstract, table of contents, main body, discussion, appendix, and references in accordance with Columbia University's formatting guidelines. The structure of the three papers may change upon submission to the respective relevant journals, so readers interested in the most updated version of the three papers can contact me for updated versions of the papers. iv Paper 1: A Comprehensive Review of Methods for Analyzing Repeated Measure Data in Education and Psychology Hedyeh Ahmadi Teachers College, Columbia University 1 ABSTRACT Paper 1: A Comprehensive Review of Methods for Analyzing Repeated Measure Data in Education and Psychology Hedyeh Ahmadi This paper presents a comprehensive review of longitudinal data analysis methods in both qualitative and quantitative formats. The qualitative review can help researchers find examples of methods of interest in the Education and Psychology literature. The quantitative survey and methods sections can help researchers to identify the most commonly used methods in specific journals and in these disciplines overall. For each journal, detailed statistical summaries, including frequency of each method, sample sizes, and number of repeated measurements per study, are provided as well. Recommendations are offered based on these observations that can improve the data collection and analysis methods in Education and Psychology research. The longitudinal methods are surveyed and broken down into two categories to demonstrate how many researchers continue to use traditional methods with rigid and unrealistic assumptions when advanced models can offer improved statistical properties and more realistic assumptions. To better understand the strengths and limitations of each method, this paper also presents a brief statistical methods section for every model reviewed. A summary table of all the reviewed methods is also presented to help scholars consider the pros and cons of each approach and select the optimal method. 2 Table of Contents 1. Introduction to the Review of Longitudinal Data Analysis in Education and Psychology . 6 2. A Survey of Longitudinal Analysis Related Articles . 9 2.1. Sample Size and Number of Repeated Measures for Reviewed Articles . 13 3. Data Analysis Methods for Longitudinal Data in Education and Psychology . 19 3.1. Traditional Approaches . 19 3.1.1. Review: Paired t-test. 20 3.1.1.1. Method: Paired t-test . 21 3.1.2. Review: Analysis of Covariance (ANCOVA) . 22 3.1.2.1. Method: Analysis of Covariance (ANCOVA) . 23 3.1.3. Review: Analysis of Variance (ANOVA) . 25 3.1.4. Review: Regression Analysis . 26 3.1.5. Review: Derived Variable Approach . 26 3.1.5.1. Method: Derived Variable Approach . 27 3.1.6. Review: Repeated Measures Univariate and Multivariate Analysis of Variance (RM ANOVA and RM MANOVA) . 28 3.1.6.1. Method: RM ANOVA and RM MANOVA. 30 3.1.6.1.1. Method: Single-Sample RM ANOVA . 31 3.1.6.1.2. Method: Multiple-Sample RM ANOVA . 38 3.1.6.1.3. Method: One-Sample MANOVA . 42 3.1.6.1.4. Method: Multiple Samples MANOVA . 48 3.2. Advanced Analysis Approaches for Longitudinal Data. 51 3.2.1. Review: Marginal Models via Generalized Estimating Equations . 52 3 3.2.1.1. Method: Generalized Estimating Equations . 53 3.2.2. Review: Mixed-Effects Models . 58 3.2.2.1. Method: Linear Mixed Models . 61 3.2.2.1.1. Method: Random Intercept Model . 62 3.2.2.1.2. Method: Random Coefficient Model . 64 3.2.2.1.3. Method: Random Coefficient Model with a Time-Invariant Covariate . 65 3.2.3. Review: Two Extensions of Mixed-Effects Models . 66 3.2.3.1. Review: Generalized Linear Mixed Models . 66 3.2.3.1.1.