Measurement, 6: 25–53, 2008 Copyright © Taylor & Francis Group, LLC ISSN 1536-6367 print / 1536-6359 online DOI: 10.1080/15366360802035497

Latent Variable Theory

Denny Borsboom University of Amsterdam

This paper formulates a metatheoretical framework for latent variable modeling. It does so by spelling out the difference between observed and latent variables. This difference is argued to be purely epistemic in nature: We treat a variable as observed when the inference from data structure to variable structure can be made with certainty and as latent when this inference is prone to error. This difference in epistemic accessibility is argued to be directly related to the data- generating process, i.e., the process that produces the concrete data patterns on which statistical analyses are executed. For a variable to count as observed through a set of data patterns, the relation between variable structure and data structure should be (a) deterministic, (b) causally isolated, and (c) of equivalent cardinality. When any of these requirements is violated, (part of) the variable structure should be considered latent. It is argued that, on these criteria, observed variables are rare to nonexistent in ; hence, psychological variables should be considered latent until proven observed.

Key words: latent variables, measurement theory, philosophy of science, , test theory

In the past century, a number of models have been proposed that formulate probabilistic relations between theoretical constructs and empirical data. These models posit a hypothetical structure and specify how the location of an object in this structure relates to the object’s location on a set of indicator variables. It is common to refer to the hypothetical structure in question as a latent structure and to the indicator variables as observed variables. In general, models that follow the idea set forth above are called latent variable models. There are several kinds of latent variable models, which are often categorized in terms of the types of observed and latent variables to which they apply. If the observed and latent variables are both continuous, then the resulting model is called a factor model (Jöreskog, 1971; Lawley & Maxwell, 1963;

Correspondence should be addressed to Denny Borsboom, Department of Psychology, University of Amsterdam, Roetersstraat 15, 1018 WB Amsterdam. E-mail: [email protected] 26 BORSBOOM

Bollen, 1989); if the observed variable is categorical and the latent variable is continuous, then we have an Item Response Theory (IRT) model (Rasch, 1960; Birnbaum, 1968; Hambleton & Swaminathan, 1985; Embretson & Reise, 2000; Sijtsma & Molenaar, 2002); if the observed and latent variables are both categorical, the resulting model is known as a latent class model (Lazarsfeld & Henry, 1968; Goodman, 1974); and if the observed variable is continuous while the latent variable is categorical, then we get a (McLachlan & Peel, 2000), which upon appropriate distributional assumptions becomes a latent profile model (Lazarsfeld & Henry, 1968; Bartholomew, 1987). However, various mixed forms of these models are possible. For instance, at the latent level, one may have several distinct systems of continuous latent variables that themselves define latent classes (Lubke & Muthén, 2005; Rost, 1990), and at the observed front these models may also relate to a mixture of categorical and continuous observed variables (e.g., Moustaki, 1996; Moustaki & Knott, 2000). In fact, any model that relates some kind of latent structure to an observed structure could be called a ; and the possibilities regarding the dimensionality and form of these structures are endless, as is the number of functions that can be used to relate one to the other. Like most statistical techniques, latent variable modeling is not an isolated statistical number crunching endeavor but part of a research procedure embedded in a set of more or less closely associated ideas, norms, and practices regarding the proper treatment of data in scientific research. The present paper represents an attempt to articulate these ideas, by articulating a fitting metatheoretical framework for latent variable modeling. To distinguish this framework from latent variable models themselves, we may indicate it with the term latent variable theory, which indicates that latent variable modeling is central to it and at the same time emphasizes that the theory is broader in scope than the purely statistical formulation of latent variable models.

WHAT BINDS LATENT VARIABLE MODELS?

Mathematically, latent variable models specify a generalized regression function that can be written as f(E(X))=g(), where f is a link function, E is the expec- tation operator, X denotes a matrix of observed variables,  is a latent structure, and g is some function that relates the latent structure to the observed variables. If, upon a suitable choice of f, the function g is linear, then the resulting family of models is covered by Generalized Linear Item Response Theory (Mellenbergh, 1994). This is true for most of the models used in and IRT. By expanding the matrices X and  to apply to series of observations made at different time points, models for time series, like the (Rabiner, 1989; Visser, Raijmakers, & Molenaar, 2002) or the dynamic factor LATENT VARIABLE THEORY 27 model (Molenaar, 1985), may be formulated; it is also possible to model inter- and intraindividual differences simultaneously (Hamaker, Molenaar, & Nessel- roade, 2007). Thus, a latent variable model is simply a model that relates the expectation of observables to a latent structure through some regression function. However, most people working in latent variable modeling have a strong intuition that the group of latent variable models comprises a homogeneous structure, in the sense that they have something in common that separates them from other commonly used statistical models (say, analysis of variance or principal compo- nents analysis). It is, however, useful to note that no such delineation follows from the mathematical structure of the model. Mathematically, all that is being said in this structure is that the expectation of some set of variables is a function of another set of variables, and it is difficult to say why this should specify a latent variable model. In fact, if we should take the mathematical structure itself to define latent variable models, then virtually all statistical techniques count as latent variable models, because it is in the nature of statistical techniques to specify a relation between the expectation of one set of variables and another set of variables. Hence, on this basis a latent variable model would be indistin- guishable from, say, analysis of variance, a technique that one intuitively feels should not be included as a latent variable model. If one wants to explain what binds latent variable models, an appeal to the mathematical structure of the model does not do the trick. Clearly, what makes a latent variable model a latent variable model is not the mathematical structure that is being used to link different sets of variables. Rather, the important feature of the regression function central to latent variable models is that the left-hand side of the equation contains a set of observed variables, whereas the right-hand side contains a latent structure. Hence, if we insist on distinguishing latent variable models from observed variable models, we need to make clear what this distinction consists in.

LATENT AND OBSERVED VARIABLES

What is the difference between latent and observed variables? The use of the term latent, but especially the term observed, suggests that this distinction is of an epistemological character. Observed variables are variables that are somehow epistemically accessible to the researcher, whereas latent variables are not epistemically accessible. It is customary to illustrate this distinction with substantive examples. Thus, one says that IQ scores are recorded, but general intelligence is not; hence general intelligence is a latent variable and IQ an observed variable. However, in order to characterize latent variable models generally, it is not sufficient to point to some illustrative examples. Neither is 28 BORSBOOM it clarifying to give a tautological, and hence not very informative, characteri- zation of latent variables, as is not uncommon in the literature, for instance when scholars say that “a latent variable is a variable that is not directly measured,” or “a latent variable is a theoretical construct,” or “a latent variable is a variable that underlies the observations.” It is important to understand somewhat more precisely what the distinction between observed and latent variables amounts to. This matter is not entirely straightforward. The reason for this is not so much that the concept of a latent variable, as a hypothetical structure of inter- or intraindividual differences, is problematic, but rather that it is difficult to grasp the idea that a variable might be observed. Take familiar examples of variables that, in statistical analyses, are commonly conceptualized as observed variables, such as sex or age. It is hard to uphold the idea that the distinction between these variables and variables that are seen as latent, such as general intelligence, lies in the fact that the first are observed whereas the second are not. In a strict reading of the word observed, nobody can claim to have observed sex, length, or age. These are theoretical constructs just as well as general intelligence is. Age is not a concrete object, subject to our perceptual processes, like stones or trees or people might be taken to be, but a theoretical dimension. Theoretical dimensions do not fall in the category of observable things. Thus, when one says that one has learned, upon interrogation of the twins John and Jane, that John is 15 minutes older than Jane, one cannot claim to have thereby observed the variable age. On the basis of one’s observation of John and Jane, one has made an inference regarding their relative positions on the dimension age, but it is not thereby true that one has observed age itself. It seems that, while the distinction between latent and observed variables has something to do with a difference in the epistemic accessibility of such variables, it is not literally a distinction between observable and unobservable things. So the distinction between latent and observed variables does not parallel the distinction between commonly discussed observables and unobservables like, say, rocks and quarks. The interesting thing is that the main difference here does not lie at the front of latent variables (both the existence of general intelligence and the existence of quarks involve a hypothesis on the structure of the world that is indirectly tested by experimental data) but at the front of observed variables. Age is not observable in the sense that concrete objects like rocks are but is itself a rather abstract dimensional concept. Thus, observed variables like age and latent variables like intelligence seem to have more in common that one might think. This can be further illustrated when we examine the context in which latent variable models are used. For instance, at the university where I work, we have testing sessions in which the entire population of first-year students fills in a substantial number of psychological tests as a part of their curriculum. When I analyze the so obtained data for, say, sex differences in extraversion, I will conceptualize the variable sex as LATENT VARIABLE THEORY 29 an observed variable. Nobody will object to this practice. Strictly speaking, however, I have not even myself observed the people in question, because I was not present at the testing sessions, let alone their sex. What I observe when I am doing data analyses is a column of ones and zeros that has the variable label sex attached to it—nothing more. To make matters worse, I also have a column that contains the scores on an extraversion test, and these have the variable label extraversion attached to them, so in this sense there appears to be very little difference with regard to the situation with the variable sex. Yet I am not inclined to treat extraversion as an observed variable; rather I will treat the test scores as indicators of a latent structure. Why is this? It is tempting, in the context of justifying my discriminative practices regarding the treatment of the variables sex and extraversion in data analysis, to justify my policy on the grounds that extraversion is not identical to a test score. This is consonant with the dominant opinion in the literature on testing: Extraversion is a theoretical construct with surplus meaning over the test scores (Cronbach & Meehl, 1955; Messick, 1989). Therefore I need to conceptualize extraversion as a latent variable. Unfortunately, however, exactly the same line of reasoning holds for the variable sex. What I observe is that, in the row that corresponds to subject no. 2005967, the column sex contains the entry 1. This is what I have to work with. But of course one’s sex is not identical to a number occurring in the relevant entry in my data file any more than extraversion is, so it also carries surplus meaning; and I need to infer the relevant property (e.g., being male) from the data just as well. Thus, following this line of reasoning, equating sex with the column of ones and zeros in my data file is no more justified than equating extraversion with the score on an extraversion test. Therefore, we must draw two conclusions. First, it is misleading to take the distinction between latent and observed variables literally, because observed variables are not at all observed in the normal sense of the word. Second, the difference in handling observed and latent variables in actual data analysis cannot be defended by referring to the surplus meaning of latent variables as theoretical constructs, for observed variables carry such surplus meaning just as well. Moreover, in both cases it is necessary to make an inference from an observed data pattern to an underlying property. It seems that the distinction between latent and observed variables, which appears to be crucial in distinguishing latent variable modeling techniques from statistical techniques in general, has to be justified differently.

EPISTEMOLOGICAL PLANES

Both in the case of latent and of observed variables, the researcher makes an inference from observed data patterns to the conclusion that objects have or do 30 BORSBOOM not have a certain property (e.g., being of above average intelligence; being male). However, there is an important distinction between these inferences: Namely, the probability of an erroneous conclusion on the basis of data is judged differently. That is, if sex is conceptualized as an observed variable, then the researcher assumes that if, say, 1 is observed in the column that codes for sex, it is certain that the corresponding person is male. For a latent variable this is not the case. Upon observing an IQ score of 120, the researcher does not assume that it is certain that the person has above average intelligence; the person in question may, for instance, have been fortunate in guessing the right answers. Thus, the inference to the person’s property (i.e., having above average intelligence) is, at most, probable given the data but not certain. It seems to me that the terminology of latent and observed variables codes precisely this distinction. When we treat a variable as observed, we mean nothing more than that we assume that the location of a person on that variable can be inferred with certainty from the data. When we treat a variable as latent, we mean that the inference in question cannot be made with certainty. It is important to see that this formulates the distinction between latent and observed variables as a purely epistemological distinction. As such, this distinction is partly a function of the observer. Thus variables are not inherently latent or observed. They can only be latent or observed with respect to the data at hand, or, in other words, with respect to the observer and his or her measurement equipment. There is no need, however, to assume that the distinction between latent and observed variables is an ontological distinction between different kinds of variables. Thus, in this view, latent variables are no more of a mystery than observed variables (although observed variables are considerably more mysterious than commonly supposed). The relativity of the predicates observed and latent is graphically illustrated in Figure 1. Figure 1 shows a data plane, which contains strictly observable data structures (appropriately arranged strings of zeroes and ones), and a world plane, which represents the structures in the world to which inferences are to be made (say, “Ed is male”). Such inferences are represented as arrows that run from the data plane to the world plane. If an inference is certain, the connection between the data plane and the world plane is drawn as a continuous arrow; if it is not certain, the connection is drawn as a dotted arrow. Thus, in the figure, data structure x gives a certain inference to . Hence x is an observed variable with respect to . Data structure y gives a certain inference to , whereas z does not give a certain inference to . Hence  is observed with respect to y but not with respect to z. Now consider two researchers, John and Jane. The epistemic accessibility of the world for these researchers is represented by their epistemological planes, which are drawn inside the data plane in Figure 1. John has access to data structures x and y, but Jane has access to only z. Thus,  is an observed variable for John, but it is a latent variable for Jane. Hence, in this scheme of thinking, LATENT VARIABLE THEORY 31

Epistemological planes

y z data x

world ξ ζ

John’s epistemological plane

Jane’s epistemological plane

FIGURE 1 The structure of the epistemic process. Boxes represent observed data structures that sustain certain (continuous arrows) or uncertain (dotted arrows) inferences to variables (represented as spheres). the predicates latent and observed are relative to the epistemological state of the researcher. A variable that I consider latent, given the data structures available for me, may be considered observed by somebody who has access to different data structures. This is consonant with the views of Bollen (2002), who takes the view that variables that are considered latent today may, upon improvement of measurement procedures, be considered observed in the future. Now the term relative as it is used here must be interpreted with caution, for it has a tendency to activate associations with relativist schemes of thinking in the philosophy of science, which are in the present context inappropriate. Relative here does not mean that the researcher is justified in treating variables as observed or unobserved as he or she pleases. It means that the sentence “ is observed for John” may be true, whereas “ is observed for Jane” may be false, even though we are talking about the same structure . It is evident, however, that this observer dependence does not imply that the question whether these statements are true for any given observer is a relative issue. Thus, although a variable may be latent for one person and observed for the next, it could be defended that whether it is latent or observed for any given person is a matter of fact. This is intuitively plausible. A researcher who states that she treats sex as a latent variable (meaning that the inferences from the data structure to the variable structure are epistemically certain) will be given the benefit of the doubt in most situations. But a person who claims to have observed intelligence will be considered as either methodologically naive or a scientific con artist. It appears that, apart from the knowledge that the researcher 32 BORSBOOM has on how to make inferences regarding variables on the basis of data patterns, there is also an objective difference between the state of affairs in these two situations that determines whether the variables can be treated as observed or not. The question then becomes, what is this difference?

CAUSAL STRUCTURES AND EPISTEMOLOGICAL ACCESSIBILITY

One answer that suggests itself is that the degree of epistemological accessibility is a function of the causal structure that gives rise to the observations. Such a view would hold that a variable can be taken as observed if the patterns in the data structure have the proper causal antecedents. The question then is what these proper causal antecedents are. Some basic notation and definitions are required to explicate the problem situation. The researcher observes data patterns in his or her data file (e.g., John has produced data pattern 010010; Jane, 100101). Define the equivalence relation ∼ ∼ with respect to the data patterns D. The notation Da Db then means that two objects a and b have produced the same data pattern. The equivalence relation ∼ ∼ ∼ is reflexive (Da Da is true), symmetric (if Da Db, then Db Da), and transitive ∼ ∼ ∼ (if Da Db and Db Dc, then Da Db) and therefore partitions the objects into equivalence classes that make up the elements of the data structure. It is assumed here that the same construction can be made for the variable structure (i.e., for any three objects a, b, and c having variable levels a, b, and c, it is true that ∼ ∼ ∼ ∼ ∼ ∼ [a] a a; [b] if a b, then b a; and [c] if a b and b c, then a c). Call this system the variable structure. The variable structure may be much richer than this (e.g., when one assumes that the levels of the variable are ordered, or quantitative), but the existence of equivalence classes would seem to be the minimal requirement for present purposes. The assumption that the measured variable has a definite structure requires that such a thing as a variable structure exists independent our observations; i.e., it involves realism about the structure of the variable measured. It should be noted that this is not a metaphysically innocent assumption (e.g., see Borsboom, 2005). We thus have two sets of equivalence classes, one pertaining to the data patterns (e.g., to strings like 100110) and the other to the variable structure. The data patterns are what the researcher looks at on a computer screen when doing the statistical analyses; the variable structure is inherent in the attribute the researcher wants to measure, i.e., a structure that is “out there.” The question before us is what the relation between the variable structure and the data structure should be like for us to be justified in treating the structure as an observed variable. One option that suggests itself is that the causal chain (i.e., the measurement process) that links the variable structure to the data structure be deterministic. In LATENT VARIABLE THEORY 33 this view, we could imagine a mapping of the distinction between deterministic and probabilistic epistemological relations to a corresponding distinction between probabilistic and deterministic causal processes that give rise to the data. Thus, if a variable stands in a deterministic causal relation to the observations in the data structure, then the inferences from observations to that variable are also deterministic, and hence, one might think, it can be treated as observed. I will indicate the requirement that the data structure depends deterministically on a variable structure as the requirement of determination. This idea is consonant with the way observed variables are viewed in certain latent variable modeling quarters. For instance, in the factor model, which has continuous indicators and a continuous latent variable, setting the error variance of an indicator variable equal to zero is equivalent to treating the factor as an observed variable. In the logic of the factor model, this treatment could be interpreted to involve the specification of a deterministic causal relation between the factor and its indicators. Thus, observed variables can then be viewed as limiting cases; they are latent variables that have been measured with perfect reliability. Although in a simple, unidimensional factor model such a view can be upheld, I do not think that it will work in general. An immediate complication arises, for example, in the case of multidimensionality. Suppose that two latent variables conjointly determine the structure of the observations deterministically via some = function x f(1,2). In this case, the value of 1 can be assessed from the observed value of x if 2 is known, and the value of 2 can be assessed if 1 is known, but it is not possible to identify both at the same time from x. Hence, in this case, a deterministic causal structure gives rise to the data, but epistemological accessibility is limited; it is not possible to make inferences to

1 and 2 with certainty. Therefore, neither 1 nor 2 can be considered observed with respect to the epistemological plane determined by x. That the generating causal structure is deterministic is not a sufficient condition for treating a variable as observed. A plausible requirement that might deal with this problem could be called causal isolation. For a variable to be observed, it must not only be related to the data structure in a deterministic fashion but also be the only causally relevant structure at work in producing variation in the data structure. This is the philosophical pendant of the commonly made requirement of unidimensionality in latent variable modeling. Thus, all the variation present in the data structure must be uniquely determined by the variable structure. It could be argued, in a worldview holding that every event has a deterministic cause, that causal isolation implies determination: If the variable structure completely determines the data structure, then there cannot be noise in the data, for this would mean that something besides the variable structure is responsible for this noise, which contradicts determination. In the present work, however, I will not confine myself 34 BORSBOOM to a deterministic worldview, so as to leave open the possibility that there may be genuinely probabilistic processes in nature. Hence I will treat causal isolation and determination as distinct requirements. Are determination and causal isolation sufficient for a variable to be considered observed? It appears that this is not the case. Consider, for instance, a Rasch model (Rasch, 1960), which has a unidimensional continuous latent structure, dichotomous indicators, and a logistic item response function. Now suppose that the indicators are improved so that whether they take the value 1 or 0 depends on the latent structure only, and that they do so in a deterministic way. This means that the item response functions become step functions, i.e., we get Guttman’s (1950) model. In this case, the conditions of determination and causal isolation are met. Nevertheless the variable structure cannot be taken as observed. This is because the structure is much richer than the structure of the observations: The variable structure is continuous, and therefore contains infinitely many levels, whereas the data structure has only five possible data patterns (0000, 1000, 1100, 1110, and 1111). Although inferences of the type “John, who has pattern 1000, has a lower position on the latent variable than Jane, who has pattern 1100” is not subject to error, inferences of the type “John and James both have data pattern 1100, and therefore they have the same position on the latent variable” cannot be made with certainty; hence, even though certain aspects of the variable structure are observable here, part of it is still hidden from our view. For an observed variable, we need not only determination and causal isolation; it should also be the case that the number of distinct data patterns is the same as the number of distinct variable positions occupied by the objects that gave rise to the data structure. Call this the requirement of equivalent cardinality.In the case of the Guttman model discussed above, there will ordinarily be more variable positions than data patterns, which means that the cardinality of data structure and variable structure is not the same. Now suppose the requirements of determination, causal isolation, and equiv- alent cardinality are met. What does this imply for the relation between the variable structure and the data structure? First, equivalent cardinality means that there are as many distinct data patterns as there are distinct variable positions occupied by the objects that gave rise to the data structure. Second, determi- nation means that the causal chain relating the variable structure to the data structure is deterministic, so that there is no measurement error or noise. Third, causal isolation means that the data pattern a given object produces is exclusively dependent on that object’s position in the variable structure, which precludes situations where the same data pattern can originate from distinct positions, as might be the case in a multidimensional variable structure. If these conditions are met, then this implies that distinct variable positions correspond to distinct data patterns (the mapping of variable positions into data patterns is injective) LATENT VARIABLE THEORY 35 and distinct data patterns correspond to distinct variable positions (the mapping is surjective). An injective and surjective (hence, bijective) mapping consti- tutes an isomorphism; in this case, the data structure and variable structure are isomorphic up to equivalence. Determination and causal isolation ensure that the isomorphism exists for the right reasons, i.e., that the causal antecedents of the isomorphism indeed involve the variable measured and do not arise as an accident (as could, for instance, be the case if a column of data arose through the tossing of a coin and happened to turn out such that it is isomorphic to that variable). It seems plausible to me that, in this situation, the variable can be considered observed. Thus, in the case of observed variables, the data-gathering process must be set up in such a way that distinct positions on the variable measured (e.g., being male versus being female) translate into distinct positions in the data structure (e.g., 1 or 0 in the relevant column of the data file) with no further residue; as a result, there is a perfect one-to-one correspondence between the equivalence classes of the data structure and those of the variable structure. Moreover, this isomorphism exists “just like that,” i.e., does not require any additional activity on part of the researcher: This is simply the way the data come in.

THE INTERNAL STRUCTURE OF ATTRIBUTES

The notion of observability as conceptualized above does not imply that all of the relations present in the variable structure are epistemically accessible to the researcher. For instance, suppose that the variable is quantitative (Michell, 1997) and that the above conditions hold. In this case, the fact that different variable positions map into unique data patterns is not enough to utilize the quantitative nature of the variable in question. This requires additional steps on part of the researcher; namely, numerical values have to be assigned to the objects that produced the data patterns that preserve the quantitative structure of the variable measured. To see why this is so, it is useful to realize that, even when the quite strong conditions of determination, causal isolation, and equivalent cardinality hold, the researcher still only has an isomorphism up to equivalence to work with. As it stands, all that can be done with this is nominal measurement: the allocation of objects into unordered categories. Now, it is important to avoid the connotation of arbitrary labeling that surrounds the idea of a nominal scale. In the present scheme of thinking, there is no arbitrary labeling going on; whether or not two objects get the same label is fully determined by the variable structure. Thus, if a psychologist is doing diagnostic work by labeling some people as depressed and others as normal, this is quite insufficient to speak of nominal measurement. One can speak of nominal measurement only if (a) the variable structure is indeed made up of these two 36 BORSBOOM categories, and (b) the psychologist’s diagnostic work (in which the psychologist may act as the measurement instrument) is of such a character that it leads to data patterns that are isomorphic with the variable structure (i.e., there is a perfect correspondence between the variable structure and the data structure). One can view this as a latent class model without error. Such a model will be quite hard to realize in practice. If the data at hand are to support more complicated research practices than allocation to unordered categories, while the idea of a variable being observed is to be retained, then the isomorphism between the data patterns and the variable structure will have to extend beyond the preservation of equivalence classes required for nominal measurement. The problem of explicating which relations ought to hold among the objects in the population in order for stronger repre- sentations to be constructible has been taken up in the literature on axiomatic measurement theory (Krantz, Luce, Suppes, & Tversky, 1971; Narens & Luce, 1986). Note that the concept of homomorphism (many-to-one mapping), as utilized in axiomatic measurement theory, applies to the relation between the set of objects and the variable structure (many objects may occupy the same position on the variable). The notion of observability as defined here requires an isomorphism (one-to-one and onto mapping), not between the set of objects and the variable structure, but between the variable structure and the data structure (each position in the variable structure is uniquely associated with a data pattern in the data structure and vice versa). Hence, in this sense, the present scheme of thinking is consistent with axiomatic measurement theory but applies to a different part of the measurement process. The deterministic nature of the models considered in the representational measurement literature is congruent with the requirements that were argued necessary to speak of an observed variable. The different levels of measurement as introduced by Stevens (1946) and refined in Krantz et al. (1971) can then be thought of as specifying the detail to which the variable structure is mapped in the data structure. One may note that the requirements made here are rather minimalist in the sense that they do not imply full observability of the internal structure of the variable. One might, for instance, counter against the presently proposed views that a variable like length has internal relations (e.g., it sustains transitivity) that are not necessarily observable under the requirements made here. For instance, if one assigned randomly selected numbers to objects of different lengths, then the variable length would still count as observed provided that one does this consistently, and in a way that leads the numbers to causally depend on length (i.e., such that objects of the same length receive the same numbers, and objects of different length receive different numbers). However, the so constructed scale would not have the properties of the scales we commonly use to measure length (e.g., preservation of transitive relations, invariance up to multiplication by a LATENT VARIABLE THEORY 37 constant). Clearly, if the variable figures in statistical analyses that place stronger assumptions on this structure (e.g., when the variable is assumed to be linearly related to some other variable), then more aspects of the variable structure must be observed, that is, preserved in the data structure. The reason for not making principled requirements on this score is that the measurement level is a property of the scale in question, not of the variable measured. This follows from the fact that one can measure a variable like length on nominal, ordinal, interval, and ratio levels, so that a measurement level cannot be uniquely attached to a variable. What one can say, however, is that some variables allow for different scale levels than others; length is measurable on a ratio scale whereas sex is not. This can be constructed as a dispositional property of the variable in question (if appropriate methods were followed, the variable could be measured on a ratio scale level), which has as its base (Rozeboom, 1973) the internal structure of the variable. In this scheme of thinking, a variable could be said to be quantitative in the sense of Michell (1997) if it sustains such a dispositional statement; the base of this disposition could then be constructed to lie in the internal structure of the variable, as articulated axiomatically by Hölder in 1901 for quantitative attributes (Michell & Ernst, 1996, 1997; see also Michell, 1997, 1999). It is important to note that the process of scale construction requires the researcher to do more than just record data patterns; these commonly have to be assigned function values in order to construct the type of isomorphisms required for scales stronger than the nominal one. In such cases, the epistemological plane is partly a function of the knowledge that the researcher has concerning how to assign function values to different data patterns. In assigning such function values, the researcher is actively expanding the data structure in order to achieve a stronger isomorphism. The methods of extensive measurement (Campbell, 1920; Krantz et al., 1971) can be thought of as specifying ways to construct data structures for quantitative variables that form one of the most powerful isomorphisms possible, namely a ratio scale. Extensive measurement is based on experimental tests of the quanti- tative structure of the attribute (Michell, 1997, 1999) through concatenation. Concatenation is the combination of two objects to form a new one; in the case of length, for instance, by laying two rods end-to-end. When the attribute measured combines in an additive fashion (so that the new rod’s length equals the sum of the original rods’ lengths), it is straightforward to form a ratio scale by repeated application of the concatenation operation (see Campbell, 1920, p. 180, for a lucid description of this process). Extensive measurement is possible for those physical attributes that can be concatenated (e.g., mass and length) but has so far proven inapplicable to psychological attributes such as intelli- gence and extraversion. An alternative for establishing the quantitative structure of variables, known as conjoint measurement (Luce & Tukey, 1964), can at 38 BORSBOOM least in theory be used to establish quantitative scales without a concatenation operation. However, this technique is rarely used in psychology (Michell, 1997, 1999). As a result, it is unknown whether variables that figure prominently in psychological testing, such as intelligence or personality variables, can be taken to have quantitative structure. A quantitative variable structure is isomorphic to the real line (Michell, 1997), and, of course, representing variables as lines is very common in latent variable models with continuous variables (e.g., consider factors in a factor model). One could, indeed, defend the thesis that such models assume that latent variables have quantitative structure. However, it is important to see that this does not mean that a variable like general intelligence is on equal footing with a variable like length. There are at least two reasons for this. First, the requirements to speak of an observed variable (as defined above) are not met in the case of general intelligence (in the case of length they are at least approximately met for middle-sized objects), so that the formation of equivalence classes on the basis of observations is very hard: We do not know how to establish that two people are equally intelligent, independently of looking at their test scores (in contrast, we do know how to establish that two rods are equally long without using a tape measure, namely by using our naked eye). Second, the quantitative structure of length is an established fact, directly testable through concatenation, whereas the quantitative structure of general intelligence is a hypothesis, which has not been subjected to direct empirical tests (although one could defend the thesis that such hypotheses are indirectly tested in latent variable modeling; Borsboom & Mellenbergh, 2004).

Probabilistic Structures and Latent Variable Models This is where we are now. A set of data patterns can be treated as an observed variable if (a) the data patterns bear a deterministic causal connection to that variable, (b) the variable in question is the only cause of variation in the measures, and (c) the cardinality of the variable structure and data structure is the same. In this case we have an isomorphism up to equivalence. This establishes nominal measurement; if stronger scales are to be formed, this requires the demonstration that stronger relations between the objects that form equivalence classes (e.g., greater than) exist and are preserved in function values assigned to these objects on the basis of the data patterns they generated. This requires a sensible way of assigning such function values—that is, scale construction. Whether such stronger representations are possible depends on the internal structure of the variable (e.g., whether it is quantitative) as well as on the resources of the researcher (whether the researcher knows how to assign the function values). The work in the representational theory of measurement (Krantz et al., 1971) treats this problem in detail. LATENT VARIABLE THEORY 39

In cases where we measure a variable, while one or more of the above require- ments are violated, we should consider that variable latent. In principle, this may occur either because determination is violated (as in probabilistic models, like the Rasch model), because causal isolation is violated (as in multidimen- sional deterministic models), or because equivalent cardinality is violated (as in a unidimensional deterministic IRT model, like the Guttman model). The most extensively studied case is the case in which the relation between the variable structure and the data structure is probabilistic; hence I will consider this case in some detail. When the measurement process is assumed to be probabilistic, this means that shifting along the levels of the variable affects not the data patterns themselves but the probability with which they arise. Thus, in this case, the researcher cannot map the response patterns D to the variable structure. It is, however, possible to map the probability of the response patterns, P(D), to the variable structure. This is what most latent variable models currently in use do. The exact mapping is given by the function that relates the response probabilities to the latent structure (i.e., the item response function). Ideally, this function should be determined by substantive considerations on the relation between the latent structure and the response process (see Tuerlinckx & De Boeck, 2005, or Dolan, Jansen, & Van der Maas, 2004, for some examples); in practice, however, the choice is often one of mathematical convenience. To accommodate for the fact that in the context of measurement (rather than, for instance, prediction) the measured variable must have causal relevance for the observations (see Borsboom, Mellenbergh, & Van Heerden, 2004), a nondeterministic notation of causality can be adopted. Various schemes for conceptualizing probabilistic causality exist and may be used for this purpose. The notions of causality closest in spirit to common practices in latent variable modeling are those of Pearl (2000) and Spirtes, Glymour, and Scheines (2000), in which causal relations are represented in graphs, and tested via the conditional independence relations that they imply. In this literature, many latent variable models (namely all those that are unidimensional, have multiple indicators, and satisfy local independence) would be classified as (unobserved) common cause models. Common cause models are characterized by the fact that the common cause “screens off” covariation between its effects: If X is the common cause of Y and Z, then Y and Z must be conditionally independent given X. In the latent variable modeling literature, essentially the same requirement is known as local independence (i.e., the indicators are assumed to be statistically independent conditional on the latent variable). In the situation where the relation between data patterns and the variable measured is probabilistic, the causal isolation requirement may be satisfied in modified form. Naturally, variation in the latent variable cannot be the cause of all variation in the data patterns (otherwise it would bear a deterministic 40 BORSBOOM relation to the data). However, it is often taken to be the only cause of systematic variation in the data; that is, the probabilities P(D) depend only on the latent variable. In the latent variable literature, this notion is known as the assumption of unidimensionality. However, this assumption is not strictly necessary, as the existence of multidimensional IRT models and multiple factor models with cross-loadings testifies. The equivalent cardinality requirement is normally violated in probabilistic models, because the number of latent variable positions is usually different from the number of distinct data patterns. For instance, in an IRT model that number is greater (because the latent variable is continuous, whereas the observations are categorical), and in a latent profile model it is smaller (because the observations are continuous, whereas the latent structure is categorical). Also, due to the probabilistic structure of latent variable models, it is possible that objects with different positions on the latent variable obtain the same response patterns and that objects with the same position on the latent variable obtain different response patterns. An interesting question that arises on the present viewpoint is how to determine the scale level of variables in the case of a nondeterministic measurement structure. In the latent variable modeling literature, it has usually been assumed that the scale level is determined by the class of transformations that leave the empirical predictions of the model invariant (i.e., the probabilities assigned to data patterns; see Fischer & Molenaar, 1995; Perline, Wright, & Wainer, 1979; see also Kyngdon, 2008; Borsboom & Zand Scholten, in press). This is in accordance with the fact that latent variable models construct a mapping between the variable structure and the probability of data patterns, rather than between the variable structure and the data patterns themselves. So, for instance, the empirical predictions of the Rasch model are invariant up to linear trans- formations of the parameters in the model, and these parameters are therefore considered to lie on an interval scale. It can be doubted whether this is in keeping with the intended definition of scale levels, as for instance utilized in Krantz et al. (1971); in these works, the isomorphism that should be preserved is one between the actual function values assigned to objects and an empirical relational system in which they play their part (the empirical relational system could be taken to constitute a variable structure in the terminology used in this paper). In latent variable models, the actual function values assigned are model-based estimates. A relevant question would therefore seem to be under which class of transformations of these actual function values the isomorphism between the variable structure and the data structure is preserved. But the immediate answer to this question is “never.” The reason is that one can preserve an isomorphism only if that isomorphism is present in the first place. And under a latent variable model, there cannot be an isomorphism between the data patterns (and hence the function values LATENT VARIABLE THEORY 41 assigned to them) and the variable structure; in fact, as has been argued in this paper, the fact that no such isomorphism exists is exactly what necessitates the use of a latent variable model. Therefore, the question of what the scale type of variables is in case of a nondeterministic measurement structure is, as far I can see, open to investigation; in fact, it is not entirely clear to me that the concept of scales types, interpreted in this particular sense, applies to a nondeterministic measurement model. In scientific research, but especially in sciences like psychology, the assumption that variables are observed is often too strong. In such cases, dropping this assumption is plausible given the substantive context. However, this is not a free lunch; it should be noted that dropping the observability assumption saddles one with serious problems. First, the inference from data patterns to variable positions is no longer automatic; second, such inferences can only be made under the assumption that a particular probabilistic model generated the data, and the number of candidate models is basically infinite, so one has to choose among them; third, the structure of the latent space could be entirely different from that of the observations (e.g., the observations are continuous while the latent space is categorical or vice versa) and there is no easy way of figuring out what it looks like. Of course, there are many ways of attacking these problems—in fact, how best to do this is what much of the work in latent variable modeling is about; relevant topics are model specification, identification, parameter estimation, and model selection. Nevertheless, it is worth noting that giving up the observability assumption means that hard work is going to have to be done.

Some Conceptual Problems of Latent Variable Theory It has been argued here that what differs across situations where one is inclined to use a latent variable model, and situations where one is not, is the degree to which one can plausibly assume the inferences from data structure to variable structure to be free of error. This view is predicated on a categorical distinction between the structure of data and the structure of variables. In all cases, whether labeled observed or latent, an inference from data structure to variable structure is required. This inference has to be based on some theory concerning what is often called the data-generating process. Now it is clear that, according to the present scheme of thinking, the primary causal agents that are supposed to figure in such a theory are variables. This means that we must grant a serious ontological status to variables; they are supposed to exist, have some definite structure, and be causally relevant to the data. However, there are several problems in assigning this role to variables (some of which have been discussed by Borsboom, Mellenbergh, & Van Heerden, 2003). Most of these problems follow from the abstract character of variables. It is useful to discuss these issues to show that they are not detrimental to the theoretical framework proposed here. 42 BORSBOOM

The abstract nature of variables. To see why the abstract nature of variables is problematic, it is useful to note that variables, if they are taken to exist, are not, properly speaking, localized in space or time. People vary in age, but although the people to which different ages are ascribed may be localized in space and time, their ages are not. For instance, on October 1, 2007, at approximately 12:28 local time, I am writing this sentence at the approximate coordinates (52 23’ N, 4 55’ E); my age at the present time is approximately 33 years and 10 months. This localization means that a traveler who happened to walk into my office right now would encounter a human being by the name of Denny Borsboom. But the traveler will not find Denny Borsboom’s age at these coordinates. Neither will it help the traveler to continue his journey to my home address or place of birth. Although this will allow him to pick up various clues as to the property in question, he will not encounter my age there either. There is, in fact, no place in the universe where the traveler will stumble across an object and be able to exclaim, “Aha! Finally, there it is! The age of Denny Borsboom!” This conceptual impossibility applies, as far as I can see, to all variables as utilized in scientific theories. It does not really matter whether these are the subject of currently accepted scientific theory or not. To be sure, one can say that the cup of coffee standing at my desk right now has a certain mass. But it would be strange to say that the cup’s mass is itself located on my desk. The situation is even more complicated when we consider variables in their generic form, rather than specific instances of their levels. Although the sort of language abuse implied by a statement to the effect that the particular mass of this particular cup of coffee is located on my desk may be considered a proverbial mode of speech, stating that the variable mass is located somewhere in the universe is beyond the tolerable bounds of absurdity. This lack of localization also applies to psychological variables. Extraversion, general intelligence, spatial ability, attitudes, and self-efficacy are not in people’s heads. When we open up a person’s head, we find a sort of gray jelly, not psychological variables like general intelligence. It is to me a somewhat absurd idea that we may, in the not too distant future, localize general intelligence in the brain. This has nothing to do with the fact that general intelligence is a psychological rather than a physical thing. It has to do with the fact that it is a variable and not a concrete object; as such it has an inherently abstract character—just like length, age, and volume do—that precludes it being localized anywhere, and hence it cannot be localized in the brain either. The most that reductionists can hope for when it comes to the reduction of psychological variables is that data patterns pertaining to, say, the number of neurons in a person’s head and data patterns pertaining to the performance on psychological tests will conform to a unidimensional measurement model, so that both of these data structures could be taken to depend on the same variable. In that case LATENT VARIABLE THEORY 43 the variable that gives rise to variation in psychological test scores would be identical to the variable that gives rise to variation in, say, counts of the number of neurons in people’s heads. Note, however, that even in this case the variable itself would not be in anybody’s head. This issue should not be confused with the fact that between-subjects attributes and dimensions are not the same as within-subjects attributes and dimensions, a point made in, among others, Borsboom, Mellenbergh, and Van Heerden (2003, 2004); Borsboom and Dolan (2006); Cervone (2005); Hamaker, Molenaar, and Nesselroade (2007); and Molenaar (2004). Within-subjects attributes also refer to an abstract structure, be it one that describes variation across time points rather than across subjects. Although we call such dimensions intraindividual or within-subjects dimensions, this use of language should not be taken to mean that they are literally inside persons. Strictly speaking, at each time point a person can only be said to occupy one of the levels of such a variable. Such variables are person-specific but not inside the person in any physical sense.

The causal relevance of variables. The question that now occurs is how such an abstract, nonlocal thing as a variable can have causal effects. Causal effects are often taken to describe a relation between events. One event happens and then necessitates the occurrence of another event. If a number of such events are coupled, we speak of a causal chain. A variable, however, is not an event and hence cannot enter directly in such a causal chain. Thus, a different way of thinking about this issue has to be found. There are several ways in which this matter can be construed. First, one may think about variables as describing structural differences in properties across different individuals, across time, or both. These properties, which are not variables but may be seen as levels of a variable, are then attached to individuals’ objects at a given time point. The structure of a variable derives from differences in these properties. Thus, in this view, length (in the abstract) is not a property of an object, but “being 7 inches long” is. Although such a property is not an event, the measurement procedure does consist of a sequence of events that lead to a given data pattern, and this sequence may be inter- preted as a causal chain. The property of being 7 inches long could be viewed as a parameter in the model that describes this sequence of events. Because different objects have different values for this parameter, they get different data patterns. Depending on the precise nature of the causal chain, these data patterns may sustain various measurement levels. A causal role for the variable measured can then be construed, because the measurement procedure sustains counterfactuals of the type “if this object had been shorter, the measurement procedure would have led to a different data pattern”; these can, for instance, be interpreted in terms of possible-world semantics (Lewis, 1973; Kripke, 1980). 44 BORSBOOM

A difficulty with this approach is that it reifies the scale level. That is, in interpreting “being 7 inches long” as an inherent property of an object, independent of any other objects that exist and independent of the details of the observational procedure, the scale level is disconnected from our scaling efforts and viewed as a feature of nature rather than as a result of our activities. I find it difficult to believe, however, that such a property actually exists independent from our measurement efforts. Also, the statement is meaningless unless there is a unit of measurement corresponding to the word inch, and this unit is not absolute but a matter of convention. Conventions, of course, are not very promising candidates as ingredients of reality. A second way of construing the issue is by taking the variable structure as a primitive and deriving properties, like “being 7 inches long,” from an object’s place in this structure in conjunction with our scaling activities, including any conventions that may be coupled to these activities. The advantage of such a view is that the concept of “place in a structure” is relational by definition. This means that the same object can occupy a different place in different structures (for instance, between-subjects structures of individual differences versus within- subjects structures of change in time). Also, there is no need to reify the scale level. Nevertheless, given that the object does occupy a certain place in the variable structure, and that a given measurement procedure has been followed, and that certain scaling conventions are in place, the resulting outcome value is fixed. Similarly, if the object had occupied a different place, and the same measurement procedure had been followed, and the same scaling conventions had been in place, then the resulting outcome value would have been different. This appears to me a prima facie plausible way to construct the reality and causal relevance of variables. To sustain counterfactuals like those proposed above, the measurement procedure must have an element of lawfulness. The reason is that such counterfactuals (e.g., “if this object had occupied a different position on the variable, we would have observed a different measurement outcome”) involve a thought experiment that considers what would have happened in a world that is different from the one we inhabit. In order for the outcome of such a thought experiment to be definite, there has to be a lawful relationship between the parameter varied (e.g., the mass of the object) and the consequences of such variation (a different measurement outcome). If there is no such lawfulness, the outcome of counterfactuals is indeterminate. Thus, measurement as conceptu- alized here involves a lawful relation between the variable measured and the measurement outcome. A measurement model can be considered to spell out the structure of such a law. There is an important consequence of this analysis. Namely, for data patterns to count as measures of some variable structure, there must be a causal law that connects positions in the variable structure with the values of measurement LATENT VARIABLE THEORY 45 outcomes; however, that law does not describe a causal system at work in the individual objects subjected to the measurement process. The causal system consists of variables connected by parameters; the individual objects occupy places in the variable structure that pertains to them. But, just like variables themselves are not in the individuals that occupy their levels, the causal system is not inside the individuals measured. What is required from the object measured is that it behaves in accordance with the causal system, not that the object contains, or is itself, the causal system. Exactly the same holds for intraindividual variation and the causal relations that govern it. The system describes how the individual varies over time; it is true of the individual but is not located in the individual.

Pragmatics and the context of explanation. Causal relations are tied to explanations. When we ask why something happened, we expect an answer that subsumes the event under a set of general causal laws that explain its occurrence. There is a natural tendency to ask of such an explanation that it is true. Truth, however, is exclusive in the sense that most people believe that there cannot be two distinct true causal explanations of a phenomenon. Thus, if a true explanation of a phenomenon has been given, there is no room left for another one. This view, however, leads to problems that result from the pragmatic character of why-questions (Van Fraassen, 1980). Consider the following example. As the phenomenon to be explained, we take a penalty missed by Dutch soccer player Frank de Boer during the penalty shootout in the semifinal of the 2000 European Championship. Suppose that we ask why De Boer missed this penalty. Is there a single correct causal explanation that answers this question? A moment’s reflection shows that this is not the case. To see why this is so, consider the following specifications of the question: (1) Why did Frank de Boer miss this penalty (rather than a given other penalty), and (2) why did Frank de Boer miss the penalty (rather than other players in the shootout like, say, Patrick Kluivert, who scored)? Both of these questions are perfectly bona fide requests for a causal expla- nation, but the answer given need not be the same. Any answer to question 1 will seek to delineate the circumstances that set apart this penalty from the many others taken by De Boer. For instance, in this case he had already missed a penalty earlier in the match, which is likely to be included in the required explanation. The answer to question 2, in contrast, will seek to delineate the differences between Kluivert and De Boer at that particular moment in time; say, Kluivert was handling the pressure better than De Boer. These two explanations introduce distinct dimensions of variation: having missed a previous penalty versus not having missed a previous penalty for question 1, and being able to handle the pressure versus not being able to handle the pressure for question 2. Thus, the causal systems these answers invoke do not include the same variable 46 BORSBOOM structures.1 Nevertheless, they are plausible causal explanations that involve one and the same event. The same situation occurs in psychological measurement. Suppose that John has correctly answered an item in an IQ test, and we ask for the explanation of this event. When we consider the question why John answered the item correctly, while Jane did not, we will make reference to dimensions of individual differences between John and Jane. But when we ask why John answered the item correctly this time, while he failed it a few years ago, we will invoke John’s pattern of development over time. These two distinct causal stories may both be true, even though they explain one and the same event. We may conclude that the occurrence of a data pattern in itself does not have a unique causal explanation; the question “why did data pattern D arise?” is ambiguous because it does not articulate a contrast class of alternatives. Is this a problem for the presently articulated view? I suggest it is not. The reason for this is that the type of causal relations that is important in a measurement context should depend on the specification of a contrast class of alternatives because that class of alternatives is constitutive of the variable to be measured. To see this, consider the following example. A match is lit in a wooden house, and this causes the house to burn down. There are many causal stories to be told about this situation depending on the contrast classes of alternatives that are chosen. With respect to a class of alternative wooden houses in which no match was lit, lighting versus not lighting the match is identified as the causally relevant variable. But with respect to a class of alternative houses in which a match was also lit, but where the houses did not burn down due to the fact that they were built of concrete rather than wood, the causally relevant variable is not whether the match was lit but whether the house was made out of concrete or wood. So, in the first population of houses, the observable data patterns house burned down/not burned down can be considered to measure the variable match was lit/match was not lit. In the second population of houses, the same data patterns can be considered to measure the variable house was made of wood/house was made of concrete. Finally, if the intention is to measure change over time, we may consider the observation that the house burned down at time t as an indicator of a transition that took place at an earlier time t’ (i.e., the match was lit). Thus, the same data patterns can measure different variables depending on the contrast class of alternatives that they attach to. By picking out a given contrast class, the researcher specifies the domain of variation of the attribute to

1One may think that these explanations may be merged into one by introducing the additional hypothesis that De Boer was handling the pressure less well (answer 1) because he had missed a penalty before whereas Kluivert had not (answer 2). This, however, does not work because it so happens that Kluivert had also missed a penalty earlier in the match. LATENT VARIABLE THEORY 47 be measured through a given measurement procedure. Thus, any single observed data pattern is “polygamous” in that it can serve as a measure of distinct variable structures depending on the research context.2 This explains, for instance, why it is possible to use one and the same test score (say, a person’s performance on a digit span test) as an indicator of individual differences in one context (e.g., when one studies the factorial structure of working memory capacity tests by examining their covariance structure), as an indicator of the effect of experi- mental manipulations in another (e.g., when one studies the effect of interference on test performance by inspecting mean differences across conditions), and as an indicator of changes in cognitive functioning in studying development (e.g., when one examines a time series of repeated administrations of the test for the same person at different times). In psychological research, the most relevant domains of variation are variation over time, variation over people, and variation over situations. It is important in empirical research that the utilized domain of variation matches the purposes of the researcher. For instance, one should not expect data on interindividual differences to be informative of the structure of intraindividual processes unless there is an explicit rationale to justify such an expectation (which may involve very strong assumptions; see Molenaar, 2004). Nor should one take the stability of individual differences in, say, personality test scores over time to be indicative of consistency of behavior over different situations (Mischel, 1968). In general, one should be very careful when making inferences to domains of variation that were not themselves sampled in the research setup.

IMPLICATIONS FOR PSYCHOLOGICAL RESEARCH

Latent variable modeling is rising in popularity but nevertheless is not a standard tool in many areas in psychology. This is a fact that continues to surprise me: One would expect that, in a field so plagued by measurement problems as psychology, the use of latent variable modeling would be a common instrument to get at least some grip on the relation between the data and the variables one intends to measure, if only to determine whether one can get the job done without a latent variable model (for instance, because under reasonable model assumptions the sumscore is good enough for research purposes). But this is not

2This should not be taken to mean that, to some degree, everything measures everything else, as for instance a correlational conception of validity would imply (Borsboom, Mellenbergh, & Van Heerden, 2004). When treating an observed data pattern as a measure of a variable, that variable has to be causally relevant to the occurrence of the data pattern. However, the example does show that what can be taken as causally relevant depends on the selection of a contrast class, i.e., is dependent on the comparison in which the data pattern figures; and this may vary over studies depending on the details of the investigation. 48 BORSBOOM the case. Instead, researchers use all sorts of procedures to construct variables, which are subsequently treated as observed; and these procedures involve an awkward number of arbitrary decisions and unclear assumptions (e.g., see Borsboom, 2006). To give one example, in psychology, the task of specifying a structure for psychological attributes, or of testing hypotheses concerning that structure, is not widely perceived as a challenge for empirical researchers. Michell (1997, 1999) has called attention to this problem by exposing the fact that psychological attributes like extraversion of intelligence are often considered to have quantitative structure, even though no serious theoretical motivation or empirical backup for this assumption exists. The converse problem, which is that attributes are assumed to be categorical where they might as well be continuous, occurs frequently in psychiatric research; there, the categorical structure of the diagnostic categories in the Diagnostic and Statistical Manual of Mental Disorders (American Psychiatric Association, 1994) is often unproblematically equated with the structure of the mental disorders thought to underlie them. In both cases, the choice for the structure of psychological variables appears to be mainly a function of historically determined conventions that have little empirical or theoretical support. Now, there exist some papers on the problem of distinguishing between different latent variable structures (Molenaar & Von Eye, 1994; De Boeck, Wilson, & Acton, 2005; Waller & Meehl, 1998; Maraun, Slaney, & Goddyn, 2003), but these represent relatively isolated efforts by methodologists and do not play a part in a coordinated massive attack by the psychological research community. One would expect—naively perhaps— that determining the structure of one’s central theoretical terms (e.g., mental disorders) is a matter of monumental importance for any science. But, judging from where the research activity in psychology concentrates, apparently this is not a widely shared conviction. In fact, I do not think that many psychological researchers worry about measurement problems; it actually seems that very few realize what their magnitude really is. The reason for this is that psychology is in the grip of a rather awkward form of operationalism. The general feeling appears to be that if one constructs the data file in such a way that it contains numbers, and these numbers are run through the most popular analyses today—like analysis of variance or principal components analyses—then conclusions that concern these numbers unproblematically generalize to the psychological attributes that the researcher is interested in. In such procedures, psychological attributes are equated with, or assumed to be isomorphic to, the numbers in the data file. It is obvious that the assumption here is that the researcher is dealing with observed variables as defined earlier in this paper. It is also obvious, however, that this assumption is far too strong for most psychological measurement procedures. LATENT VARIABLE THEORY 49

This is of course not meant to imply that every researcher should use a latent variable model in every type of research; in many cases, simple functions of the data patterns (like the sumscore) may be fine for the purposes at hand. It is meant to imply that a researcher, who cannot plausibly argue that he or she is dealing with observed variables, should be aware of the fact that the attributes measured do not automatically conform to the way the numbers are constructed in a data file; hence, that there is a problem; hence, that something should be done about it. What that something is—e.g., making an all-out modeling effort, or estimating the robustness of observed variable techniques under plausible modeling assumptions—depends on the research context. The present investigations do suggest that the onus of proof lies with the researcher who wants to assume that his or her variables are observed. Presently, this is not the case: Researchers use observed variable techniques, except when there are exceptional circumstances that lead them to use latent variable models. It would seem rather more plausible that the researcher proceeds in the opposite direction: using latent variable techniques unless there are circumstances that justify or necessitate treating psychological attributes as observed. As has been argued in this paper, the researcher who treats variables as observed is making some very strong assumptions about the quality of the measurement procedures that have been utilized. If such assumptions lack justification, which would seem to be the rule rather than the exception in psychology, this is theoretically inade- quate (although not necessarily practically inadequate). Thus, the conclusion that can be drawn from the analysis presented here is simple: A psychological variable is latent until proven observed.

DISCUSSION

In this paper, an attempt was made to construct the conceptual foundations for latent variable modeling under the more general heading of latent variable theory. It was argued that there is no reason to make an ontological distinction between latent and observed variables; hence, all variables are ontologically on par. What differs between situations where one treats variables as latent or observed is the degree to which the researcher assumes variable structures to be epistemically accessible. To treat variables as observed is to assume full accessibility; that is, inferences from data to variable structure are assumed to be without error. Such accessibility requires that the causal process that gives rise to variation in data patterns is deterministic, that the variable measured is causally isolated in the sense that it is the only variable at work in producing variation in data patterns, and that the number of distinct patterns in the data equals the number of levels of the variable measured. If one or more of these assumptions are violated, then the inference from data patterns to variable structure is prone to error. The variable the researcher 50 BORSBOOM intends to measure is then to be conceptualized as a latent variable. In setting up a model for such a situation, the researcher faces the problem of specifying the structure of the variable in question as well as the function that relates this structure to the variation in data patterns. These choices are ideally made on substantive grounds, although in practice this is seldom the case. With regard to the choice of form for the latent variable structure, there appears to be a strong influence of one’s statistical upbringing; for instance, those who are accustomed to working with factor models seem to conceptualize theoretical constructs as continuous dimensions more or less automatically. Indeed, one sometimes wonders whether psychologists are sufficiently well-informed on the fact that psychological attributes may not necessarily behave as linearly ordered dimensions (e.g., like the factors in a factor model) and that making this assumption while it is false may seriously distort the interpretation of research findings. With regard to the choice of the function that relates the observations to the latent variable structure, mathematical convenience appears to be a primary determinant; for instance, assuming a logistic function in an IRT model, or assuming linearity in a factor model, enables standard parameter estimation procedures and widely available software to be used. Although one should not mitigate the importance of such practical concerns, I think that psychology stands to gain considerably if more attention is devoted to the substantive underpinnings of such modeling assumptions. The reason for this is not so much technical as theoretical: Thinking about the relation between a psychological attribute and the data patterns that are supposed to measure it forces a deeper investigation into the nature of the attribute and the way the measurement instrument is supposed to work. It requires one to spell out, at least at a very coarse level, why one is justified in treating the data patterns as measurements; i.e., it gives one the beginnings of an argument for the validity of the measurement instrument used. Such arguments are badly needed in psychological measurement. It is unfortunately necessary to recognize that, in many areas of psychology and the social sciences, the construction of measurement models is not among researchers’ favorite activities. Also, there is a widespread trust in the represen- tational power of the numbers that happen to pop up in the data files researchers feed to statistical computing programs. Rarely is there an explicit recognition that these numbers may not actually represent theoretical attributes very well. Of course, with regard to the theoretical attributes hypothesized in the social sciences (e.g., fearfulness, depression, intelligence), we know very little, and this greatly complicates the informed construction of a measurement model. On the other hand, when it comes to the representation of such attributes in the data files used in empirical research, we can be quite certain about one thing: In this particular area of science, observed variables probably do not exist. LATENT VARIABLE THEORY 51

ACKNOWLEDGMENTS

I would like to thank Conor Dolan for his comments on an earlier draft of this paper. This research was supported by NWO innovational research grant no. 451-03-068.

REFERENCES

American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: American Psychiatric Publishing. Bartholomew, D. J. (1987). Latent variable models and factor analysis. London: Griffin. Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397–479). Reading, MA: Addison-Wesley. Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley. Bollen, K. A. (2002). Latent variables in psychology and the social sciences. Annual Review of Psychology, 53, 605–634. Borsboom, D. (2005). Measuring the mind: Conceptual issues in contemporary psychometrics. Cambridge: Cambridge University Press. Borsboom, D. (2006). The attack of the psychometricians. Psychometrika, 71, 425–440. Borsboom, D., & Dolan, C. V. (2006). Why g is not an adaptation: A comment on Kanazawa. Psychological Review, 113, 433–437. Borsboom, D., & Mellenbergh, G. J. (2004). Why psychometrics is not pathological: A comment on Michell. Theory & Psychology, 14, 105–120. Borsboom, D., Mellenbergh, G. J., & Van Heerden, J. (2003). The theoretical status of latent variables. Psychological Review, 110, 203–219. Borsboom, D., Mellenbergh, G. J., & Van Heerden, J. (2004). The concept of validity. Psychological Review, 111, 1061–1071. Borsboom, D., & Zand Scholten, A. (2008). The Rasch model and additive conjoint measurement theory from the perspective of psychometrics. Theory & Psychology, 18, 111–117. Campbell, N. R. (1920). Physics, the elements. Cambridge: Cambridge University Press. Cervone, D. (2005). Personality architecture: Within-person structures and processes. Annual Review of Psychology, 56, 423–452. Cronbach, L. J., & Meehl, P. E. (1955). in psychological tests. Psychological Bulletin, 52, 281–302. De Boeck, P., Wilson, M., & Acton, G. S. (2005). A conceptual and psychometric framework for distinguishing categories and dimensions. Psychological Review, 112, 129–158. Dolan, C. V., Jansen, B. R. J., & Van der Maas, H. L. J. (2004). Constrained and unconstrained normal finite mixture modeling of multivariate conservation data. Multivariate Behavioral Research, 39, 69–98. Embretson, S. E., & Reise, S. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum. Fischer, G. H., & Molenaar, I. W. (1995). Rasch models: Foundations, recent developments, and applications. New York: Springer. Goodman, L. A. (1974). Exploratory latent structure analysis using both identifiable and unidenti- fiable models. Biometrika, 61, 215–231. Guttman, L. (1950). The basis for scalogram analysis. In S. A. Stoufer, L. Guttman, E. A. Suchman, P. L. Lazarsfeld, S. A. Star, & J. A. Clausen (Eds.), Studies in social psychology in World War II: Vol. IV. Measurement and prediction (pp 60–90). Princeton, NJ: Princeton University Press. 52 BORSBOOM

Hambleton, R. K., & Swaminathan, H. (1985). Item Response Theory: Principles and applications. Boston: Kluwer-Nijhoff. Hamaker, E. L., Nesselroade, J. R., & Molenaar, P. C. M. (2007). The integrated trait-state model. Journal of Research in Personality, 41, 295–315. Jöreskog, K. G. (1971). Statistical analysis of sets of congeneric tests. Psychometrika, 36, 109–133. Krantz, D. H., Luce, R. D., Suppes, P., & Tversky, A. (1971). Foundations of measurement (Vol. 1). New York: Academic Press. Kripke, S. A. (1980). Naming and necessity. Oxford: Blackwell. Kyngdon, A. (2008). The Rasch model from the perspective of the representational theory of measurement. Theory & Psychology, 18, 89–109. Lawley, D. N., & Maxwell, A. E. (1963). Factor analysis as a statistical method. London: Butter- worth. Lazarsfeld, P. F., & Henry, N. W. (1968). Latent structure analysis. Boston: Houghton Mifflin. Lewis, D. (1973). Counterfactuals. Oxford: Blackwell. Lubke, G. H., & Muthén, B. (2005). Investigating population heterogeneity with factor mixture models. Psychological Methods, 10, 21–39. Luce, R. D., & Tukey, J. W. (1964). Simultaneous conjoint measurement: A new type of fundamental measurement. Journal of Mathematical Psychology, 1, 1–27. Maraun, M., Slaney, K., & Goddyn, L. (2003). An analysis of Meehl’s MAXCOV-HITMAX procedure for the case of dichotomous indicators. Multivariate Behavioral Research, 38, 81–112. McLachlan, G., & Peel, D. (2000). Finite mixture models. New York: Wiley. Mellenbergh, G. J. (1994). Generalized Linear Item Response Theory. Psychological Bulletin, 115, 300–307. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (pp. 13–103). Washington, DC: American Council on Education and National Council on Measurement in Education. Michell, J., & Ernst, C. (1996). The axioms of quantity and the theory of measurement: Part I, an English translation of Hölder (1901). Journal of Mathematical Psychology, 40, 235–252. Michell, J., & Ernst, C. (1997). The axioms of quantity and the theory of measurement: Part II, an English translation of Hölder (1901). Journal of Mathematical Psychology, 41, 345–356. Michell, J. (1997). Quantitative science and the definition of measurement in psychology. British Journal of Psychology, 88, 355–383. Michell, J. (1999). Measurement in psychology: A critical history of a methodological concept. Cambridge: Cambridge University Press. Mischel, W. (1968). Personality and assessment. New York: Wiley. Molenaar, P. C. M. (1985). A dynamic factor model for the analysis of multivariate time series. Psychometrika, 50, 181–202. Molenaar, P. C. M. (2004). A manifesto on psychology as ideographic science: Bringing the person back into scientific psychology, this time forever. Measurement, 2, 201–218. Molenaar, P. C. M., & Von Eye, A. (1994). On the arbitrary nature of latent variables. In A. von Eye & C. C. Clogg (Eds.), Latent variables analysis (pp. 226–242). Thousand Oaks: Sage. Moustaki, I. (1996). A latent trait and a latent class model for mixed observed variables. British Journal of Mathematical and Statistical Psychology, 49, 313–334. Moustaki, I., & Knott, M. (2000). Generalized latent trait models. Psychometrika, 65, 391–411. Narens, L., & Luce, R. D. (1986). Measurement: The theory of numerical assignments. Psychological Bulletin, 99, 166–180. Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge, England: Cambridge University Press. Perline, R., Wright, B. D., & Wainer, H. (1979). The Rasch model as additive conjoint measurement. Applied Psychological Measurement, 3, 237–255. LATENT VARIABLE THEORY 53

Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77, 257–286. Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Paedagogiske Institut. Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14, 271–282. Rozeboom, W. W. (1973). Dispositions revisited. Philosophy of Science, 40, 59–74. Sijtsma, K., & Molenaar, I. W. (2002). Introduction to nonparametric item response theory. Thousand Oaks: Sage. Spirtes, P., Glymour, C. N., & Scheines, R. (2000). Causation, prediction, and search. Cambridge, MA: MIT Press. Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 667–680. Tuerlinckx, F., & De Boeck, P. (2005). Two interpretations of the discrimination parameter. Psychometrika, 70, 629–650. Van Fraassen, B. C. (1980). The scientific image. Oxford: Clarendon Press. Visser, I., Raijmakers, M. E. J., & Molenaar, P. C. M. (2002). Fitting hidden Markov models to psychological data. Scientific Programming, 10, 185–199. Waller, N. G., & Meehl, P. E. (1998). Multivariate taxonometric procedures. Thousand Oaks: Sage.