The Utility of Latent Class Analysis to Understand Heterogeneity in Youth’S Coping Strategies
Total Page:16
File Type:pdf, Size:1020Kb
The Utility of Latent Class Analysis to Understand Heterogeneity in Youth’s Coping Strategies: A Methodological Introduction Karen Nylund-Gibson1, Adam C. Garber1, Jay Singh1, Melissa R. Witkow2 Adrienne Nishina3, Amy Bellmore4 1University of California, Santa Barbara, 2Willamette University, 3University of California, Davis, 4Univerity of Wisconsin Corresponding Author: Karen Nylund-Gibson, [email protected] Preprint published February 9, 2021 Note: This preprint has not been peer reviewed The research reported in this paper was supported in part by the Institute of Education Sciences, U.S. Department of Education, through Grant # R305A170559 to the University of California, Davis. The opinions expressed are those of the authors and do not represent views of the Institute of Education Sciences or the U.S. Department of Education 1 Abstract Latent class analysis (LCA) is a useful statistical approach for understanding heterogeneity in a population. This paper provides a pedagogical introduction to LCA modeling and provides an example of its use to understand youth’s daily coping strategies. The analytic procedures are outlined for choosing the number of classes and integration of the LCA variable within a structural equation model framework, specifically a latent class moderation model, and a detailed table provides a summary of relevant modeling steps. This applied example demonstrates the modeling context when the LCA variable is moderating the association between a covariate and two outcome variables. Results indicate that students’ coping strategies moderate the association between social stress and negative mood, however they do not moderate the social stress-positive mood association. Appendices include R (MplusAutomation) code to automate the enumeration procedure, 3-step auxiliary variable integration, and the generation of figures for visually depicting LCA results. 2 Latent class analysis (LCA) is becoming a widely used statistical technique for a range of social scientists. This analytic approach allows researchers to address questions that involve grouping individuals based on who respond similarly to a set of observed items. This modeling approach provides a way for researchers to better understand patterns of responses to a set of observed measures, allowing a deeper understanding of unmeasured groups or types within the data that alternative modeling approaches are not able to reveal. Applications of LCA are found across a wide range of research fields. Within the social sciences, LCA has been used in science education to reveal patterns in how children describe a day (Harlow et al., 2011); in epidemiology, to assess the effectiveness of a bullying and victimization prevention program (Tsiantis et al., 2013); and in behavioral disorder research, to investigate the correspondence between teacher management practices and elementary school students’ disruptive behavior (Gage et al., 2018). Common amongst these fields is a trend towards valuing modeling approaches that are capable of revealing heterogeneity within populations and go beyond descriptions of samples based on standard regression-type estimates (e.g., unconditional means). The general appeal of LCA is that it is a statistical method that can be used to create groups, called latent classes, in a given population. These latent classes are thought of as unobserved “subgroups” or “typologies” that capture heterogeneity in a population with respect to a specific set of measured items. That is, LCA uses patterns in the set of observed items to identify groups of individuals who have similar response patterns. Thus, LCA can be thought of as a multivariate approach to creating classifications. This is in contrast to creating groups using cutoff scores which impose arbitrary boundary conditions to determine class membership (Nylund, et al., 2007b). Once classes are identified, associations with other variables can be 3 made to better understand who comprises these latent groups as well as associations with how class membership relates to a set of distal outcomes. The procedure of conducting an LCA analysis with covariates and distal outcomes is a multi-step process that can be overwhelming to a researcher attempting this approach for the first time. To provide a roadmap for researchers to follow we have included a table which details the modeling steps (Table 1). Modeling Analysis tasks How it shows up in a manuscript Phase 0: Model Introduction: design Articulate research questions In the framing and potentially at the end of the (RQs) Introduction, mention and justify the use of Identify items for the latent class analysis and how it addresses RQs measurement model (Sinha et al., 2020) Identify auxiliary variables to Briefly integrate a rational for looking at use auxiliary variables (both covariates and distal Investigate sample size outcome; McCutcheon, 2002 ) parameter ratio Method: considerations Include justification for selection of items Decide how variables are ( Nylund-Gibson & Choi, 2018 ) going to be coded and Describes auxiliary variables (i.e., covariates modeled (i.e., continuous, and distal outcomes) and rational for including dichotomous, multi- them categorical). Optionally include a path diagram of final Visualize model with a path model in methods section or supplementary diagram for conceptual materials ( Online resource (Draw.io) ) understanding Limitations: Discussion of tradeoffs around item selection, model complexity, and sample size (Wurpts & Geiser, 2014 ) Acknowledge limitation of using sum-scores or any coarse recoding that was done to simplify complexity of model. (Snijders & Bosker 2011 (Ch. 3) ) 1: Pre- Method: processing Clean data using best Describe software used, any data & practices for reproducibility recoding/cleaning done, describing prevalence exploration (Zurr et al., 2010; of missing data and how it was handled. of data Tabachnick & Fidell, 2001 ) Include a data analysis plan, including how you Explore missing data will evaluate model fit patterns If including auxiliary variables include rationale Look at descriptive for choice of method for integration. information of variables (i.e., ( Asparouhov & Muthen, 2014 ) distributions, frequency Results: counts) Create table of descriptive statistics to describe Re-code variables (if needed) the variables. Explore response patterns Limitations: (utility varies by research Acknowledge missing data and/or context; See Response measurement challenges. Also, whether Pattern Tutorial) modeling assumptions are realistic or 4 reasonable. (Enders, 2010) 2: Un- Method: conditional Estimate models with Describe the analytic plan including software model varying number of classes and analysis strategy details (Nylund et al., specificatio ( See Appendix 1.1 ) 2007) n Study information criteria Generate Table 2 of model fit indices (See plot to compare model fit Appendix 1.2) (See Appendix 1.3) Describe the pattern of model fit endorsement Tabulate fit information based on the model fit indices and other Compare conditional item considerations probably plots for contender models ( See Appendix 1.4 ) 3: Results: Enumeratio Study pattern of fit indices Figure 1. Conditional item probability plot of n decision model endorsements. the final model ( See Appendix 1.5 ) Compare conditional item In text, describe each emergent class, to probability plots for each K- clearly translate response pattern shape. class model. Create a latent class name to label each class. Consult substantive theory in Include relative and absolute class size (e.g., light of emergent classes. class 1 is 10%; N=22). ( Masyn, 2013 ) Study classification precision Report Entropy and other classification (AvgPP, Entropy, etc.) information for selected model (e.g., AvePP, Select a subset of candidate etc.; optionally include in the Appendix; Masyn, models 2019 ) Re-consult theory Select final model Discussion: Create the conditional item Summary of the emergent classes and their probability plot for final labels model. Limitations: Specific considerations made when deciding on classes (e.g., no clear model, really small class, etc.) (Oberski, 2015) Mention limitation of solution as needing validation. Not the only model, but the model selected in this context. 4a: Include Method: auxiliary Decide auxiliary variable Justify auxiliary variable integration method variables integration method chosen. McLarnon & O’Neill, 2018 ; Results: Nylund-Gibson et al., 2014 Describe covariate/moderation results (e.g., Follow integration method present in a table or figure; Nylund-Gibson et specification steps based on al., 2019) appropriate method See Create distal outcome comparisons plot. Figure Appendix 1.6 ( ML 3-step ) 3 in manuscript Specify the moderation Scaffold the results section to talk about model See Appendix 1.7 covariate relations with the latent class Generate figures to variables. Then distal outcome relations. If communicate results clearly relevant, distal and covariate direct relations. See Appendix 1.8 – 1.9 Discussion: When relevant, explore Summarize patterns of results relating to covariate-indicator DIFF covariates and distal outcomes. Masyn, 2017 4b: Specify regressions with Method: 5 Moderation latent class variable as In the measures section, hypothesize which moderator Muthen et al., variable is given the status of moderator within 2017; McLarnon & O’Neill, the interaction term based on theory 2018 Results: Include write up of moderation