Worse Than Measurement Error 1 Running Head
Total Page:16
File Type:pdf, Size:1020Kb
Worse than Measurement Error 1 Running Head: WORSE THAN MEASUREMENT ERROR Worse than Measurement Error: Consequences of Inappropriate Latent Variable Measurement Models Mijke Rhemtulla Department of Psychology, University of California, Davis Riet van Bork and Denny Borsboom Department of Psychological Methods, University of Amsterdam Author Note. This research was supported in part by grants FP7-PEOPLE-2013-CIG- 631145 (Mijke Rhemtulla) and Consolidator Grant, No. 647209 (Denny Borsboom) from the European Research Council. Parts of this work were presented at the 30th Annual Convention of the Association for Psychological Science. A draft of this paper was made available as a preprint on the Open Science Framework. Address correspondence to Mijke Rhemtulla, Department of Psychology, University of California, Davis, One Shields Avenue, Davis, CA 95616. [email protected]. [This version accepted for publication in Psychological Methods on March 11, 2019] Worse than Measurement Error 2 Abstract Previous research and methodological advice has focussed on the importance of accounting for measurement error in psychological data. That perspective assumes that psychological variables conform to a common factor model. We explore what happens when data that are not generated from a common factor model are nonetheless modeled as reflecting a common factor. Through a series of hypothetical examples and an empirical re-analysis, we show that when a common factor model is misused, structural parameter estimates that indicate the relations among psychological constructs can be severely biased. Moreover, this bias can arise even when model fit is perfect. In some situations, composite models perform better than common factor models. These demonstrations point to a need for models to be justified on substantive, theoretical bases, in addition to statistical ones. Keywords: Latent Variables; Measurement Models; Structural Equation Modeling; Measurement Error; Causal Indicators; Reflective Indicators Worse than Measurement Error 3 Latent variable models – including item response theory, latent class, and factor models – are the current gold standard of measurement models in psychology (e.g., Bandalos, 2018). These models share the canonical assumption that observed item or test scores can be decomposed into two parts: one part that can be traced back to a psychological construct – typically represented as a latent common factor – and one part that is due to measurement error. When this assumption holds, the latent variable model is able to disentangle these two sources of variance, so that the latent variable in the model is a proxy for the psychological construct under consideration. It is widely assumed that successful application of the latent variable model supports the interpretation of test scores as measurements of the underlying construct (Borsboom, 2005, 2008; Cronbach & Meehl, 1955; Markus & Borsboom, 2013; Maul, 2017). In addition, when embedded in a larger structural equation model, latent variables allow the modeller to obtain unbiased estimates for relations among psychological constructs and to model causal processes involving these constructs (Bollen & Pearl, 2013). The most popular alternative to latent variable modeling is to treat observed composites as proxies for psychological constructs. This strategy includes the use of summed or averaged item scores obtained in the absence of a measurement model, as well as the use of weighted composites like principal component scores and related methods like partial least squares regression (Hair, Ringle & Sarstedt, 2011; Rigdon, 2012). As methodologists are keen to point out, however, composite scores contain a mix of true construct scores and measurement error, resulting in biased estimates of construct relations and a host of concomitant problems (Bollen & Lennox, 1991). Recent research has documented the consequences of neglecting measurement error in detail; for instance, Cole and Preacher (2014) argued that the widespread neglect of measurement error in psychological science likely compromises the quality of scientific research. Such concerns have led some scholars to advocate for the use of latent variable models wherever possible (Borsboom, 2008). The existing methodological literature almost uniformly assumes that some version of the latent variable model is true, and then analyzes what goes wrong when measurement error is ignored (e.g., Cole & Preacher, 2014; Jaccard & Wan, 1995; Ledgerwood & Shrout, 2011; Rhemtulla, 2016). But such analyses tend to ignore the other side of the psychometric coin. There is a growing appreciation within some areas of psychology that the latent variable model Worse than Measurement Error 4 may not be the right model to capture relations between many psychological constructs and their observed indicators (Borsboom & Cramer, 2013; Cramer et al., 2012; Dalege, Borsboom, van Harreveld, van den Berg, Conner, & van der Maas, 2016; Kossakowski, Epskamp, Kieffer, van Borkulo, Rhemtulla & Borsboom, 2016; van der Maas, Dolan, Grasman, Wicherts, Huizenga, & Raijmakers, 2006). In the field of marketing, some researchers have used simulations to show that reflective indicator models fit to composite-generated data can result in extreme bias (Jarvis, Mackenzie, & Podsakoff, 2003; Sarstedt, Hair, Ringle, Thiele, & Gudergan, 2016; Hair, Hult, Ringle, Sarstedt, & Thiele, 2017). To date, however, little research has evaluated what goes wrong, and why, when observed scores are not composed of underlying construct scores plus random measurement error, but are nevertheless analyzed as if they were. Terminology Throughout this paper, we use the term “common factor model” to refer to a model that represents a construct as a latent variable with reflective indicators only. We use “target construct” to refer to the conceptual variable that the researcher intends to represent using a statistical proxy, that is, the attribute under investigation (Markus & Borsboom, 2013). Following Rigdon (2012, 2013), we refer to the statistical variable (whether it is a latent variable or a composite score) as a “proxy” that stands in for the construct under investigation. Finally, we use “bias” to refer to the degree to which an asymptotic parameter estimate from a modeling approach deviates from its value in the data-generating model (i.e., the quantity that it is meant to represent). This usage is consistent with previous literature, for example, as measurement error is said to “bias” regression coefficients (Cole & Preacher, 2014; Sarstedt et al., 2016). Overview The goal of the present paper is to elucidate the consequences of using common factor models in circumstances when the true model that generated a set of observed variables is something different. We begin with a theoretical overview of the common factor model that describes the problem of measurement error and how the model solves this problem. We then argue that common factor models are frequently applied without justification. We propose that this practice may lead to construct invalidity (i.e., a mismatch between a construct and its statistical proxy) and, as a result, biased structural parameter estimates. We show analytically Worse than Measurement Error 5 that when the construct is truly equivalent to a sum of a set of observed variables (i.e., it is an “inductive summary”, Cronbach & Meehl, 1955), the bias that arises when it is modeled as a common factor is similar in strength and opposite in direction to the bias that arises when an underlying common factor is erroneously modeled as a composite. We then present three more nuanced examples. In each example, a population model is defined in which the relations among a set of variables that represent a construct do not conform to a common factor model. Example 1 is a set of causal indicators, Example 2 includes both causal and reflective indicators, and Example 3 is a set of variables described by a directed graph. For each example, we fit composite and reflective common factor models to the population data and interpret the resulting discrepancies. Finally, we re-analyse a published data set using three different measurement models and compare the results to demonstrate the interpretational differences that arise. Theoretical Background The common factor model is based on the central equation of classical test theory that scores on an observed variable are composed of true score plus random error, XTE. In classical test theory, a person’s true score is defined as an individual’s expected test score over repeated sampling (Lord & Novick, 1968). The logic underlying the use of factor analysis as a measurement model is a multivariate extension of this idea that relies on the assumption that true scores for a set of observed variables are perfectly linearly related because each of the observed variables measures the same latent variable. As a result, a person’s position on this latent variable can be represented by a single value – the factor score (Jöreskog, 1971). Therefore, what was true score in classical test theory becomes the common factor in the factor model: each person’s score on every measured variable in the set can be decomposed into a common factor score, , weighted by a factor loading for that item, , plus random error: y . Individual differences in determine the common variance among the measured variables and, as such, the factor model can be used to separate this common variance (typically attributed to a theoretical construct of interest)