Structural Equation Models and the Quantification of Behavior

Structural equation models and the quantification of behavior Kenneth A. Bollen1 and Mark D. Noble Department of Sociology, University of North Carolina, Chapel Hill, NC 27599 Edited by Donald W. Pfaff, The Rockefeller University, New York, NY, and approved June 1, 2011 (received for review February 8, 2011) Quantifying behavior often involves using variables that contain The purpose of this work is to give a brief overview of latent measurement errors and formulating multiequations to capture variable SEMs and to illustrate them with several hypothetical the relationship among a set of variables. Structural equation examples. It is meant to give the reader a sense of the potential models (SEMs) refer to modeling techniques popular in the social of these procedures for research on the quantification of be- and behavioral sciences that are equipped to handle multiequa- havior. The next section explains SEMs with a brief verbal de- tion models, multiple measures of concepts, and measurement scription followed by a more formal model of latent variable error. This work provides an overview of latent variable SEMs. SEMs and its assumptions. This section is followed by a section We present the equations for SEMs and the steps in modeling, that contains three illustrations. These illustrations are examples and we provide three illustrations of SEMs. We suggest that the meant to give the reader a flavor of the type of applications general nature of the model is capable of handling a variety of that might be done. The section on illustrations is followed by problems in the quantification of behavior, where the researcher the conclusions. has sufficient knowledge to formulate hypotheses. What Are SEMs? error in variables | factor analysis | path analysis | LISREL | covariance SEMs are traceable at least back to the path analysis work of structures Wright (1, 2). SEMs did not receive much attention until they were introduced into sociology in the 1960s by Blalock (3) and mong the many problems in quantifying behavior are the Duncan (4). From sociology, they spread to the other social sci- Achallenges presented by the multiple equations to study and ences and psychology. A turning point was the development of the the difficulty of accurately measuring key concepts. For instance, LISREL model by Jöreskog (5) and the LISREL SEM software. scientists might be interested in general arousal and how it relates There have been numerous contributors to SEM from several to more specific forms of arousal, or researchers might want to disciplines, and we cannot fully describe these here. However, analyze quality of sleep and it relationship to different forms and historic reviews are available from several sources (6, 7). SEMs intensities of pain. Arousal, quality of sleep, pain intensity, and have continued to diffuse through numerous disciplines and are numerous other variables are difficult to measure without con- reaching beyond the social and behavioral sciences into bio- siderable measurement error. Additionally, studying how several statistics, epidemiology, and other areas. Furthermore, there are or more of these difficult to measure latent variables relate to a variety of SEM software packages, including LISREL (8), Mplus each other is an even more arduous task when the multiequation (9), AMOS (10), EQS (11), and the SEM procedure in R (12). nature of the problem is included. SEMs are marked by typically including two or more equations Ignoring these issues leads to inaccuracy of findings. Ignoring the in the model. This process differs from the usual single equation measurement error in arousal or pain data, for instance, leads to regression model that has a single dependent variable and mul- inaccurate assessments of effects. Therefore, our assessment of tiple covariates. In SEMs, it is not unusual to have a number of perception of pain on quality of sleep is unlikely to be correct if we equations with several explanatory variables in each equation. do not take account of the measurement error. We might have The usual terms of dependent variable and independent variable several ways to measure the same latent arousal variable and not be make less sense in this context, because the dependent variable sure how to incorporate these ways intothe model. Also, ifwe ignore in one equation might be an independent variable in another the indirect effects of one variable on another variable and con- equation. For this reason, the variables in a model are called centrate only on the direct effect, we are more likely to be mistaken either endogenous or exogenous variables. At the risk of over- in our assessment of how one variable affects another variable. simplifying, endogenous variables are variables that appear as Structural equation models (SEMs) refer to modeling techniques dependent variables in at least one equation. Exogenous varia- popular in the social and behavioral sciences that are equipped to bles are never dependent variables and typically, are allowed to handle multiequation models, multiple measures of concepts, and correlate with each other, although explaining the source of their measurement error. This general model incorporates more familiar associations is not part of the model. models as special cases. For instance, multiple regression is a special Another division among the variables is between latent and form of SEM, where there is a single dependent variable and mul- observed variables. Latent variables are variables that are im- tiple covariates and the covariates are assumed to be measured portant to the model but for which we have no data in our without measurement error. ANOVA is a another specialization dataset (13). Observed variables are variables that are part of our where the covariates are assumed to be dichotomous variables. analysis but for which we have values in our dataset. For in- Factor analysis is yet another special form of the latent variable SEM. Here, we assume that we have multiple indicators that measure one or more factors and that the factors are permitted to cor- This paper results from the Arthur M. Sackler Colloquium of the National Academy of relate or not correlate. Recursive models, nonrecursive models, Sciences, “Quantification of Behavior,” held June 11–13, 2010, at the AAAS Building in growth curve models, certain fixed and random effects models, etc. Washington, DC. The complete program and audio files of most presentations are avail- can all be incorporated as special cases of the general latent variable able on the NAS Web site at www.nasonline.org/quantification. SEM. However, the structural component in SEM reflects that the Author contributions: K.A.B. and M.D.N. wrote the paper. researcher is bringing causal assumptions to the model, whereas The authors declare no conflict of interest. multiple regression, ANOVA, etc. might be applied purely for de- This article is a PNAS Direct Submission. scriptive purposes without any causal assumptions (1). 1To whom correspondence should be addressed. E-mail: [email protected]. www.pnas.org/cgi/doi/10.1073/pnas.1010661108 PNAS | September 13, 2011 | vol. 108 | suppl. 3 | 15639–15646 Downloaded by guest on September 26, 2021 stance, sharp pain might be a latent variable of interest, and the trices of factor loadings or regression coefficients giving the self-reports of perceived sharp pain would be an observed vari- impact of the latent ηi and ξi on yi and xi, respectively, and εi and able to measure it. We recognize that the subjective measure is δi are the unique factors of yi and xi. We assume that the unique a less than perfect measure of the latent variable of sharp pain. factors (εi and δi) have expected values of zero, have covariance Indeed, we could devise several differently worded questions to matrices of Σεε and Σδδ, respectively, and are uncorrelated with try to tap the latent sharp pain variable. With SEMs, we would each other and with ζi and ξi. build a measurement model of the relationship between each Using these equations, we can illustrate the generality of the indicator and the latent sharp pain variable. This model would model. First, researchers can do all confirmatory factor analysis enable us to estimate the relationship between each indicator models using only Eq. 3. This model includes the observed var- and the latent variable and determine which measure is the most iables or measures in the vector xi, the factor loadings are in the closely related to the latent variable. Λx matrix, and the intercepts of the equation are in αx. The δi Similarly, if interest lies in another latent variable such as vector contains the unique factors. We could have just as easily quality of sleep, then we could follow a similar procedure to de- used Eq. 2 to present the confirmatory factor analysis model. velop several indicators of sleep quality and build a measurement Multiple regression provides another example. By introducing model with each indicator related to the sleep quality latent the restrictions of assuming no measurement error in xi (αx =0, variable. Combining the measurement model of sleep quality with Λx = I, Σδδ = 0) and no error in yi (αy =0,Λy = I, Σεε = 0), using that for sharp pain, we could construct a latent variable model a single dependent variable (yi is a scalar), and setting B =0,we fl that examines whether the latent sharp pain variable in uences get a multiple regression model where the implicit assumption of the latent sleep quality variable while controlling for the mea- no measurement error is made explicit. ANOVA follows if we let surement error in the indicators. xi consist only of dummy variables. The preceding discussion illustrates that the latent variable Econometrics developed the idea of simultaneous equation SEM permits us to study the relationships of latent variables to models, which is a multiequation model where a series of en- each other and between latent and observed measures of these dogenous variables are related to each other as well as to a series latent variables.

Load more