Latent Class Analysis
Total Page:16
File Type:pdf, Size:1020Kb
International Journal for Cross-Disciplinary Subjects in Education (IJCDSE), Volume 11, Issue 2, 2020 Latent Class Analysis Diana Mindrila University of West Georgia, USA Abstract A variety of phenomena examined in education can difficulties in the implementation of this method. The be described using categorical latent models, which use of LCA became more widespread after 1974, help identify groups of individuals who have a series when Goodman [6] developed a more general of characteristics in common. Usually, data are too procedure for computing maximum likelihood complex to identify and describe groups through parameter estimates. Later, Dempster, Laird, and inspection alone; therefore, multivariate grouping Rubin [7] developed the “expectation-maximization techniques such as latent class analysis (LCA) should algorithm”, which placed LCA in the log-linear model be employed. LCA refers to a set of classification framework and increased its generalizability [8]. The procedures used to identify subgroups of individuals expectation-maximization algorithm allowed who have several characteristics in common [1]. The researchers to assess model fit [9] and to estimate following article provides a general description of more complex models such as LCA with covariates LCA. It describes the latent class model and explains [10] and longitudinal “individual growth trajectories” the steps involved in latent class modeling. Finally, [11]. From this point on, researchers developed the article includes an empirical application of LCA several procedures for estimating changes in latent with binary indicators using the Mplus statistical class membership over time [12]. software. 3. A General Description of LCA 1. Introduction The LCA method is a multivariate classification A variety of phenomena examined in education procedure that allows researchers to categorize can be described using categorical latent variables, individuals into homogeneous groups [2]. It is which help identify groups of individuals who share a sometimes referred to as “mixture modeling based set of characteristics. Usually the array of observed clustering”, “mixture-likelihood approach to data is too complex to identify and describe groups clustering”, or “finite mixture modeling” [13]. In fact, through inspection alone; therefore, researchers use “finite mixture modeling” is a more general term for person-oriented classification procedures such latent latent variable modeling where latent variables are class analysis (LCA) to accomplish this goal. categorical. The latent categories represent a set of Although LCA has been discussed in the sub-populations of individuals, and individuals’ methodological literature, most authors focused either memberships to these sub-populations are inferred on computational procedures [2], on conceptual based on patterns of variations in the data [13]. explanations [1,3], or on describing empirical LCA postulates the existence of a categorical applications [4]. The present article provides an variable free of measurement error [1]. The latent overview of all aspects of LCA, by discussing both the construct that underlies the data is not measured theoretical foundations and the practical applications directly, it is estimated based on the variance shared of this method. For the applied researchers, the article by a set of observed variables. Researchers take a includes a step-by-step description of LCA with a similar approach when conducting factor analysis, clear example of how to apply this procedure using which also infers underlying latent variables based on the Mplus statistical software and how to interpret the the relationships among observed variables. results. Nevertheless, factor analysis groups variables, and is, therefore, a variable-oriented classification procedure. 2. Milestones in the Development of LCA In contrast, LCA groups individuals and is considered a person-oriented classification procedure [14]. Lazarfeld and Henry [5] brought the first major Further, in latent class models the latent variable is contribution to the development of LCA. Although categorical rather than continuous [2], which means they were not the first suggesting the possibility of that groups do not necessarily differ quantitatively but estimating categorical latent variables, they were the have distinct combinations of characteristics and first authors to provide a detailed, comprehensive, represent a phenomenon that is inherently categorical mathematical and conceptual description of LCA. rather than continuous [1]. Nevertheless, the absence of a reliable and general LCA is similar to cluster analysis (CA), which is method for calculating parameter estimates posed also a person-oriented classification procedure. An important distinction between CA and LCA is that Copyright © 2020, Infonomics Society 4323 International Journal for Cross-Disciplinary Subjects in Education (IJCDSE), Volume 11, Issue 2, 2020 cluster analysis does not estimate the error of 푃(푖 1 , 푖 2 , … . , 푖 푟 ) = ∑_(푘 = 1)^퐾 푃(퐶 = 푘)푃(퐶 measurement and is, therefore, conducted “at the = 푘)푃(퐶 = 푘) … 푃(푖 푟 │퐶 = 푘) observed level” [3]. In contrast, LCA assumes that a latent categorical variable underlies the data; it When the observed indicators are continuous, estimates the error of measurement of the latent parameters are represented by the means and categorical variable, which is taken into account in the variances of the observed variables and the LCA calculation of parameter estimates and the estimation model can be expressed as: of class membership probabilities. Additionally, LCA 퐾 allows the computation of fit indices, which show how 푓(푦 푝 ) = ∑ 푃(퐶 = 푘)푓(푦 푝 |퐶 = 푘) well the latent class model fits the data [3]. 푘=1 LCA relies on the assumption that homogeneous sub-populations exist within the data. These where yp represents the set of responses provided by a subgroups have distinct probability distributions and person p on all the observed indicators [1]. are mutually exclusive [15]; therefore, the The statistical model described above is the basis percentages of individuals assigned to each latent of all types of LCA: exploratory LCA, confirmatory class add up to 100%. LCA also relies on the LCA, latent profile analysis (LPA), latent transition assumption of local independence. Specifically, LCA analysis (LTA), multilevel LCA, etc. [1]. Although assumes that the variance shared by the observed most often used as an exploratory procedure, LCA can indicators is accounted for only by the latent also be used as a confirmatory procedure by categorical variable [16]. Further, LCA assumes that constraining model parameters based on the the number of latent classes specified by the latent researchers’ hypotheses [17]. LPA is a special case of model is correct [3]. LCA where observed indicators are continuous rather than ordered categorical or binary [2]. LPA is based 4. The LCA Model on the assumption of multivariate normality, meaning that the multivariate distribution of the continuous A mixture model includes a measurement model observed variables is normal within each group [3]. and a structural model. LCA is the measurement The LTA approach is a variation of LCA for model, which consists of a set of observed variables, longitudinal data, which allows researchers to also referred to as observed indicators, regressed on a examine changes in latent class memberships across latent categorical variable [2]. Figure 1 illustrates time [1]. Multilevel LCA is employed with data from relationships between a set of r observed indicators i complex sampling designs and allows the estimation and an underlying categorical variable C [2]. of a multilevel latent class model to account for the nested structure of the data. Mplus permits the estimation of exploratory and confirmatory LCA models, LTA models, as well as multilevel LCA models. Further, Mplus allows the estimation of relationships among latent categorical variables and second order factors, covariates, or observed dependent variables also known as distal outcomes [2]. The current article will further describe the exploratory LCA model. 4.1 Model estimation Figure 1. Diagram of a general latent class model The computation procedures used for estimating model parameters are based on the type of variables Observed variables can be continuous, counts, used as observed indicators (Table 1). The default ordered categorical, binary, or unordered categorical estimation method for mixture modeling in Mplus is variables [2]. When estimating a latent variable C with robust maximum likelihood (MLR), which uses “log- k latent classes (C=k; k=1, 2,…K), the “marginal item likelihood functions derived from the probability probability” for item ij=1 can be expressed as: density function underlying the latent class model” 퐾 [2]. Other estimation procedures such as maximum 푃(푖 푗 = 1) = ∑ 푃(퐶 = 푘)푃(푖 푗 = 1|퐶 = 푘) likelihood (ML), or Bayesian estimation (BAYES) 푘=1 can be specified in Mplus using the ESTIMATOR Assuming that the assumption of local option of the ANALYSIS command. Although MLR independence is met, the joint probability for all corrects standard errors and test statistics, other observed variables can be expressed as: estimators such as BAYES may provide more accurate results with smaller sample sizes, ordinal data, and non-normal continuous variables [2]. Copyright © 2020, Infonomics Society 4324 International Journal for Cross-Disciplinary Subjects in Education (IJCDSE), Volume 11, Issue 2, 2020 class is labeled based on its