1

Latent Variable Models in Clinical

Aidan G.C. Wright

University of Pittsburgh

Please cite as: Wright, A.G.C. (in press). Latent variable models in clinical psychology. In Wright, A.G.C., & Hallquist, M.N. (Eds.). Cambridge handbook of research methods in clinical psychology. New York, NY: Cambridge University

Press. 2

Correspondence concerning this article should be addressed to Aidan

Wright, Department of Psychology, University of Pittsburgh, 4119 Sennott

Square, 210 S. Bouquet St., Pittsburgh, PA, 15260. E-mail: [email protected] Abstract

Most of what clinical psychology concerns itself with is directly unobservable. Concepts like neuroticism and depression, but also learning and development, represent dispositions, states, or processes that must be inferred and cannot (currently) be directly measured. Latent variable modeling, as a statistical framework, encompasses a range of techniques that involve estimating the presence and effect of unobserved variables from observed data. This chapter provides a non-technical overview of latent variable modeling in clinical psychology. Dimensional latent variable models are emphasized, although categorical and hybrid models are touched on briefly. Challenges with specific models, such as the bifactor model are discussed. Examples draw from the psychopathology literature.

Keywords: Latent Variable Models; ; Exploratory Structural

Equation Modeling; Bifactor Models; Factor Mixture Models 4

Latent Variable Models in Clinical Psychology

Most of what clinical psychology concerns itself with is directly unobservable. Concepts like neuroticism and depression, but also learning and development, represent dispositions, states, or processes that must be inferred and cannot (currently) be directly measured. Latent variable modeling encompasses a range of techniques that involve estimating the presence and effect of unobserved variables from observed data. In other words, latent variable models are a class of statistical techniques that allow investigators to work at the construct level, even when the constructs in question elude direct measurement. They can also be used to directly compare distinct and often competing conceptualizations of mental disorder. The basic logic underpinning these approaches is similar to the diagnostician’s task of inferring an inaccessible disease state from outwardly available signs and symptoms. For instance, if an individual patient were to complain of lack of interest and/or pleasure, persistent low mood, guilt, as well as a number of vegetative symptoms like fatigue and appetite disturbance, the clinician might infer that they are suffering from depression. In this example, depression is a clinical concept that is presumed to drive the manifestation of these debilitating mental and physical states. If one wants to study depression, one would ideally have access not just to those observable features, all of which have other potential causes, but to what they share in the form of the unobserved depression episode—latent variable models provide investigators direct 5 access to the inferred construct. This is the crux of this set of techniques, although as I discuss below, they can be leveraged in creative ways to understand the very nature of psychopathology.

This chapter provides a non-technical introduction of latent variable models with an emphasis on how they can be used to answer challenging questions relevant to clinical psychology. Therefore, the emphasis is largely conceptual, and the reader is directed elsewhere for technical treatments and detailed instructions for applications (e.g., Bollen, 1989; Brown, 2015;

Collins & Lanza, 2010; Loehlin, 2004; Mulaik, 2010). I begin with a discussion of the definition of latent variables, then follow with an elaboration of several exemplar models including factor analytic techniques and mixture modeling. Portions of the model overviews borrow heavily from Wright (2017) and Wright & Zimmerman (2015), which can be consulted for additional detail. Throughout, examples from the clinical literature are provided.

Definitions and Conceptual Underpinnings

How should we think about latent variables? What are their defining features? What role can they play in applied clinical research programs?

Scholars and methodologists have answered these questions in different ways over the years, with some providing formal but narrow definitions in specific quantitative terms and others providing informal and loosely defined criteria. In a review on the use of latent variable models in , Bollen (2002) summarized several of the most common definitions, 6 and in so doing contrasted informal and formal definitions. On the informal side, Bollen (2002) noted that several authors have argued that latent variables are “hypothetical variables” or “hypothetical constructs” (Edwards

& Bagozzi, 2000; Harman, 1960; Nunnally, 1978). Although this definition is certainly accurate, it is only conceptual, and does not link back to the formal statistical models in any clear way. Another common informal definition that Bollen (2002) identified is that latent variables are

“unobservable or unmeasurable.” The major concern with this definition is that it presupposes advances in technology that may make what currently defies measurement feasible to measure some day in the future. Finally, some have argued that latent variables are merely summaries of the observed variables serving little more than a descriptive function. Adopting this perspective unencumbers the researcher of making any challenging assumptions about the true nature of latent variables, but it also defangs the models and reduces them to their weakest form.

Although Bollen (2002) also reviewed a number of formal definitions that have been offered for latent variables over the years (e.g., local independence definition), he noted that each of these are overly restrictive or too narrow for various reasons. Ultimately, he offered a new definition that is simultaneously formal yet non-restrictive—a latent variable is a variable (i.e., not a constant) for which there is no sample realization in a given sample. What this means, is that any variable that is not directly observed in (at least some portion of) a sample is latent. One attractive 7 feature of this definition is that it acknowledges that a variable may be latent in one sample (because it is unmeasured), but observed in another, and that this may be something that changes over time (e.g., as new technology is developed). This approach is also useful for accommodating certain technical aspects of latent variable models, such as their use in handling , allowing for correlated residuals, and treating residuals in regressions and random effects in mixed effects models as latent variables. Finally, although very abstract, this definition does provide the necessary link between the unobserved and the observed data. Thus, this definition encompasses all of the informal definitions listed above, and other formal definitions often serve as special cases of this more general definition.

The “sample realization” definition is useful for establishing when a variable is latent, but it does not speak to their ontology. That is, how should we conceive of latent variables? Borsboom, Mellenbergh, and van

Heerden (2003) took up this question in what has become a classic treatment on the conceptualization of latent variables. Borsboom and colleagues raise a number of technical points to motivate this question, but the fundamental issue they consider is whether latent variables exist independent of the data used to estimate them. That is to say, given any set of data, one can run and estimate a and if it fits well, the latent variable can be interpreted. However, that’s not to say that anything meaningfully independent of the data has been ascertained. One 8 needs to make an ontological assumption to link the operational latent variable estimated from the observed data to the formal latent variable of theoretical interest. Borsboom and colleagues describe two distinct ontological stances, although one of these has layers of stringency. The first stance is the realist stance, that assumes that the latent variable in question is something real in nature distinct from the data that are used to measure it. This can be contrasted with the constructivist stance, which regards latent variables as constructed by the human mind and therefore do not exist independent of their measurement. An extreme variation on this latter perspective is that the latent variable is just a data reduction method, much like Bollen (2002) discussed as one informal definition of latent variables.

Borsboom and colleagues (2003) argue that to take latent variable modeling seriously, one must adopt a realist stance. Which is to say, one must assume that the latent variable is something that exists in nature independent of the data and causes the patterns in observed data. In contrast, an operationalist perspective is fundamentally at odds with latent variable theory, and assumes nothing more than a summary of the available data. That is, any latent variable estimated from the data is not independent from it, and it is just a construction of our minds that does not otherwise exist in reality.

Borsboom and colleagues’ arguments are well conceived and articulated, but they would seem to place clinical psychologists in an 9 uncomfortable position by drawing a crisp distinction between realism and constructivism. Either you must assert that our constructs like depression, narcissism, obsessive-compulsivity, and the like are real and exist in nature per se, or you must adopt a constructivist stance that treats our theories and constructs as little more than summaries of what are often unsatisfyingly imprecise data. Given the current state of our science, I believe that a pseudo-realism (but probably more realistic) perspective is justified. Pseudo-realism argues that latent variables represent the sum of the shared processes generating covariation among some set of observed variables. In this perspective, one assumes that the latent data generating processes are real, but our understanding and labelling of these processes are only approximations for what the real process is due our limited knowledge and poor measurement tools. Several implications follow from this perspective. First, a single latent variable need not represent a single process, and can (and likely does in clinical psychology) represent several processes at once. These can include processes of substantive interest

(e.g., neuroticism) as well as artifacts of measurement (e.g., response acquiescence). Second, as with Bollen’s (2002) sample realization definition, the latent variable is assumed to exist separate from the data used to estimate it. Third, because there is no expectation that a latent variable be fixed and real as currently conceived, they require construct validation (Cronbach & Meehl, 1955) and further explication through any 10 variety of research programs (see chapters in this volume by Williams and

Simms and Lenzenweger).

To illustrate this perspective, I will borrow and expand on the thermometer example used by Bollen (2002). A thermometer offers a classic example of measuring and estimating a latent variable. In this case heat is the latent variable and volume of mercury is the observed data. As heat increases mercury expands, and as heat decreases mercury contracts.

Thus, latent heat, an unobserved variable, can be inferred based on the volume of the observed volume of mercury. It would seem that adopting a fully realist perspective here is warranted, as what could be more real than heat? But what is heat exactly? It should be clear that the concept of heat is a mere abstraction, and a more precise description of the link between the latent variable and the observed data requires an understanding of a process—namely, what has long been referred to as heat is now understood to be the transfer of energy among matter. This understanding is predated by the earliest thermometers by centuries or even millennia, depending on how you define a thermometer. In a similar fashion, in clinical psychology, we may treat as real what we can only assume to be approximations of the true underlying processes, and which we hope are better understood and described, if not replaced, in the years, decades, and centuries to come. In this way, psychopathologists may comfortably adopt a pseudo-realist stance, assuming that latent variables allow her to access our current best 11 approximations of clinically relevant processes, even if she also holds the assumption that how we understand these variables will evolve over time.

Assuming that a latent variable exists separate from the observed data does not mean that all measures (i.e., data) will allow for an equivalent and equally good estimate of the latent variable, even if they are designed to do so. Common paradigms for data collection in clinical psychology research, including survey methodology (Samuel et al., this volume), functional magnetic resonance imaging (fMRI; Holmes & MacDonald, this volume), and peripheral psychophysiology (Levinson & Hajcak, this volume) all are based on implicit latent variable models, at least as they pertain to use in clinical psychology. Survey responses, blood oxygenation in the limbic system, and galvanic skin conductance all have been used as indicators of an underlying emotional state, where the underlying emotional state is presumed to drive the measured variables. At the same time, each of these is very different types data, and so it would be imprudent to assume that there are not large methodological processes at work driving the data in addition to the latent variable of interest. Thus, all of these could be used to estimate the same latent variable, but to do so care must be taken to isolate that process.

Some texts provide coverage of both reflective and formative latent variables. Reflective latent variable models are those in which the latent variable causes the data (i.e., the variable is reflected in the data), whereas formative models are those where the data generate (i.e., forms) the latent 12 variable. As Borsboom and colleagues (2003) note, only the reflective latent variable assumes a realist interpretation. Therefore, formative variables will not be considered here, because they are something “different” in that they assume no shared processes in what generates them, although they can be useful to consider. For instance, Bollen (2002) refers to the concept of “exposure media violence,” which might be estimated from the amount of violent television and movies watched and violent videogames played. It would be illogical to assume that the exposure causes the viewing and gaming, but rather the other way around. Other classic examples of putative formative latent variables, like socioeconomic status, are debatable. It may seem to some that an amalgam of variables like educational attainment, employment, neighborhood crime, and the like are distinct enough that they could only be combined arbitrarily. However, strong arguments can be made for social and institutional processes that do, in fact, contribute to positive associations among these variables that are distinct from the data. Whether psychopathology and other constructs relevant for clinical psychology are reflective or formative is no doubt a matter worthy of debate. If it is the latter, it is not clear that latent variable models are the best approach for interrogating their nature. Without foreclosing on this argument, this chapter is intended for those who are interested in using reflective latent variables to study clinical constructs.

Model Estimation and Evaluation 13

The basic logic of estimation underlying latent variable models is to find values for a proposed model’s parameters that would reproduce the observed data. Traditionally latent variable modeling has used summary data (e.g., sample variances, covariances, and means), although contemporary methods make use of the raw data because it allows for certain attractive features (e.g., accommodating missingness and severely non-normal data).

Latent variables are estimated from the observed data through an iterative process, wherein a software-based algorithm auditions and repeatedly modifies parameter coefficient values in an effort to arrive at the best possible match to the data given the model’s structure. Although the exact mathematical procedures differ as a function of the underlying estimator, maximum likelihood estimation is the most widely used estimation technique and therefore will be used as an example. To further keep this example simple, we can consider a basic confirmatory factor analysis (CFA; e.g., as depicted in Figure 1 Panel B). Underlying a CFA is basic formula that defines how the observed variables are related to each other. In the model diagramed in Figure 1 Panel B, circles refer to latent variables and boxes refer to observed variables. The two observed variables

Y1 and Y2 are related to each other because they each are indicators of

(i.e., are caused by, note that the arrows emerge from the latent variable towards the observed variables) the latent variable F1. Thus, their observed correlation is explained by their associations with the latent 14 variable. In contrast, Y2 and Y5 are related to each other not by shared variance with the latent variable, but rather because they are each an indicator for latent variables that are correlated, as well as sharing an additional direct association depicted by the double headed arrow between them. Without going into the details of the matrix algebra that can be used to calculate their predicted association given the model, it is important to understand that such a formula exists and can be used to generate an implied covariance matrix from the model’s parameter values. In maximum likelihood estimation, initially each parameter in the user defined model is assigned a starting value, which can be used to calculate an initial implied covariance matrix. This matrix is then compared to the observed data, although presumably it is far off the observed values. However, this comparison can be used to adjust the parameter values so as to reduce the difference between the implied and observed data, and the amount of overall change in this step can be quantified. This step is repeated iteratively. Initially the amount of change is large, but through successive adjustments the amount of change between one set of values and the next is small. Once this discrepancy is small enough (i.e., reaches an established threshold), the model is said to have converged on a set of values. In the case of maximum likelihood estimation, these should be the set of values that maximize the likelihood of the data given the model. Importantly, these values do not necessarily provide a good match between the model implied and the observed data—rather, they provide the best match possible. That 15 is, the model converges when it is no longer improving beyond a certain point at each iteration, not when the match between it and the data are below some certain discrepancy. This has attractive features; namely the ability to falsify models by testing absolute fit as well as comparing models to each other to ascertain relative fit.

Once the model estimation steps arrive at the closest match between the model implied and the observed data, the set of resulting values can be evaluated. There are several ways in which models can be evaluated, including global tests or indices of how well the data fit the model (usually some form of chi-square test), tests of local strain that are more circumscribed areas of misfit (e.g., residuals between the model implied covariances and the observed covariances), whether the model has generated plausible values (e.g., no values that are out of bounds like a negative variance), and whether the model in question fits better than alternative models under consideration.

Mapping the Latent Variable Landscape

Latent variable models come in different shapes and sizes. I mean this quite literally, in that one of the major continua along which models can be organized is whether they assume the latent variable to be dimensional, categorical, or somewhere in the middle (Masyn, Henderson, & Greenbaum,

2010). Additionally, latent variables can be distinguished from each other by whether they are exploratory or confirmatory in nature. Comprehensive coverage of all possible instantiations of latent variables would extend far 16 beyond the scope of this chapter. Coverage here is selective based on those models that have received the most usage and support in clinical psychology. In particular, dimensional latent variable models are emphasized, including exploratory factor analysis (EFA), confirmatory factor analysis (CFA), exploratory structural equation modeling (ESEM), and bifactor models. Following this, factor mixture models (i.e., categorical and hybrid latent variable models) are introduced briefly, and research is reviewed that has found consistent support favoring latent dimensional structures over these mixture structures. Finally, latent variable models can also be enlisted in the study of processes of change over time. Indeed, a latent growth curve model is just a CFA with factor loadings fixed by the user to specific values. However, due to space limitations longitudinal models will not be covered here, but the interested reader is directed to

Bollen and Curran (2006), Preacher and colleagues (2008), and Newsom

(2015) for more detailed coverage, as well as Wood (Chapter 19, this volume) for advanced applications.

Exploratory Factor Analysis

The earliest latent variable models were exploratory factor analyses

(EFA), first developed by Spearman (1904) as a quantitative approach to test his general theory of intelligence. He had observed that those who did well on one mental test tended to perform well on others, thereby perhaps reflecting an underlying cause. Initially, factor analytic approaches were limited to EFA, which are termed exploratory because the investigator does 17 not specify the patterning of items loading on factors, and instead all associations between latent and observed variables are freely estimated

(the reader is referred to Mulaik [2010] for detailed treatment of EFA). In

Figure 1, Panel A provides a graphical representation of EFA. In this diagram square boxes represent observed variables, circles represent latent variables or factors, straight arrows connecting circles and squares represent factor loadings (i.e., the regression of the observed variable on the latent variable), arrows only pointing towards squares represent observed variable uniqueness (i.e., variability not accounted for by the latent factors, which includes both unique variance and error variance), and curved arrows represent covariances/correlations. Additionally, solid lines are used to represent model specified parameters, whereas dashed lines represent parameters that can be specified by the investigator. In this example there are six observed variables and two correlated factors (i.e., an oblique model), and each of the observed variables loads on each of the two factors.

In EFA the investigator does not assign observed variables to factors, rather the relationship between each is estimated and the pattern of loadings is evaluated or “interpreted” after the analysis is run. Because of this, EFA has sometimes been called an atheoretical analytic approach, which is unfortunate as many aspects of EFA are, in fact, theoretically driven. For one, the latent variables are assumed to be dimensional and normally distributed. Second, it is frequently the case that the investigator 18 has some hypothesis about how many factors are needed to account for the observed variables. Consequently, usually there is a theory, or perhaps this is better construed as an expectation, about which observed variables serve as significant markers for the same factors. More generally, the key modeling decisions in EFA (e.g., selecting which items to include, number of factors to retain, etc.) should ideally be made based on substantive theory.

For instance, factors must be interpreted and labeled, and the emergence of a factor that is uninterpretable may prompt one to select fewer factors, drop items, or collect more data. EFA can be a very interactive technique, in the sense that several models are often run under different conditions and compared before settling on a final solution.

An investigator should consider three core questions when conducting an EFA: 1. Which observed variables should be included in order to arrive at a valid latent structure (this is true of statistical modeling in general)?

One concern here is that if too few indicators for a specific construct are included, a corresponding factor is unlikely to emerge and will not be well determined if it does. An example of this can be found in Wright et al.

(2013), which examined the latent structure of a number of symptoms of mental disorders in in an Australian epidemiological sample. There were only two markers of manic episodes, which is insufficient to determine a distinct mania factor. On the flip side of that coin, an over representation of content from a particular construct will almost guarantee a separate factor, even if the construct is subordinate to another domain (i.e., a bloated 19 specific; see Oltmanns & Widiger, 2016 for a relevant example). 2. How many factors should be retained? Contemporary best practices for selecting the number of factors to retain involve the consultation of quantitative criteria like Horn’s (1965) parallel analysis, Velicer’s (1976) minimum average partial test, Ruscio and Roche’s (2012) comparison data technique, and model fit criteria (e.g., chi-square, RMSEA) when available based on the estimator (e.g., Maximum Likelihood) to inform the number to retain.

However, regardless of which methods are used, these are fallible tools that should be weighed in the decision but not blindly followed. The investigator should still make careful choices based on all pertinent information, especially theory. 3. How should these factors be rotated? Factor rotation involves adjusting the relationship between the factors and the indicators so that they are more interpretable. Usually this involves using an algorithm

(e.g., Varimax, Oblimin, Geomin) to try and achieve something that approximates simple structure (i.e., each indicator loads on only one factor).

Despite the many options available for factor rotation (see Sass & Schmitt,

2010 for a review), the most important distinction is between orthogonal or oblique factors. In an orthogonal rotation, the factors are forced to be unrelated to each other, whereas in an oblique rotation factors are allowed to correlate. Oblique rotations methods are generally preferable because they do not preclude an orthogonal solution from emerging, but allow for substantial factor correlations when indicated. This is a key consideration in psychopathology research, given that there are theoretical and empirical 20 rationales for why factors might be expected to correlate substantially

(Caspi & Moffitt, 2018; Sharp et al., 2015). However, factor rotation will potentially have non-negligible effects on factor interpretation, and therefore it should be given thorough consideration. This is especially the case in psychopathology data, where indicators tend to be positively correlated. Recall that the model will attempt to reproduce the observed correlations with the model parameters. Therefore, in an orthogonal model where factors are uncorrelated, the only way that indicators can be associated is “through” the factors, by increasing the factor loadings

(particularly secondary or cross-loadings). In an oblique model, because the factors are allowed to correlate, thereby accounting for some of the indicator associations, the secondary loadings are often relatively smaller.

See Sharp et al. (2015) for an example of this phenomenon. Conceptually and practically this becomes a question of where you want to have the model complexity, at the level of the items or at the level of latent constructs.

In clinical psychology EFA is often used in the development of assessment tools and measurement inventories (Clark & Watson, 1995; see also Furr, this volume). Often a large number of items are auditioned as potential indicators for latent constructs of interest, and items are retained or excluded based on their pattern and strength of factor loadings. It has also been put to good use in studying the structure of psychopathology, particularly as quantitative structural models of psychopathology have 21 started to incorporate large numbers of indicators without precise a priori hypotheses about how they should relate to each other (e.g., Forbes et al.,

2017; Wright & Simms, 2015).

Confirmatory Factor Analysis

As the name indicates, unlike EFA, CFA is intended to serve primarily as a hypothesis testing analytic approach (the reader is referred to Brown

[2015] for detailed discussion of CFA techniques). The confirmatory aspects are that (a) the user may specify any of the model parameters, and (b) the fit (or, more specifically, the lack of fit) of the observed data to the specified model is tested. Figure 1, Panel B, illustrates a hypothetical typical two- factor CFA. In this model the observed variables Y1-Y3 serve as indicators of latent factor F1 only, and Y4-Y6 serve as indicators of F2 only. Please note that the CFA in Panel B differs from the EFA in Panel A in that each factor loading was user specified, and not all items load on each factor.

Much like the EFA model the factors are allowed to correlate, making it an oblique model. However, there is no rotation to choose, in CFA factors are either correlated (oblique) or uncorrelated (orthogonal). This is because in

CFA the investigator has the ability to impose true simple structure (i.e., indicators load one factor not at all on other factors), which rotation algorithms are designed to approximate. Further, each observed variable has a residual variance, reflecting unique variability unaccounted for by the factor plus measurement error. Finally, notice the curved arrow between 22

Y2 and Y5. This reflects a residual covariance, indicating that there is shared variance in items Y2 and Y5 unaccounted for by the modeled factors.

Recall when testing this model, the statistical package would first optimize the values of the parameters in an effort to match the data set, then it would compare the fit of the model implied covariance matrix to the observed covariance matrix and generate goodness-of-fit indices based on the degree of match. Each modeling decision has implications for the implied pattern of covariation. For instance, in the case where there are no free error covariances, the factors must account for all of the covariation among the observed variables (i.e., conditional independence). Any unaccounted-for residual covariation in the actual data will contribute to worse fit.

CFA does allow for deviation from the assumption of conditional independence. Factor models are usually specified such that there is no covariance among the indicator residuals, the assumption being that the observed variables are independent from each other once the factors are accounted for (i.e., conditional on the factors). Although reasonable given the goal of factor analysis, relaxing this assumption has legitimate uses.

For instance, it can be used to account for method variance between specific item sets (e.g., scales from the same instrument, scales completed by the same reporter). However, unprincipled use of residual covariances is discouraged, as it can capitalize on chance in any given data set, especially when sample size is large, and result in non-replicable model complexity. 23

CFA has a long track record of use in study the latent structure of psychopathology (e.g., Brown, Chorpita, & Barlow, 1998; Girard et al.,

2017; Kotov et al., 2011; Krueger, 1999; Wright et al., 2013). The observation that certain patterns of diagnostic covariation occur at rates much higher than chance beckons for latent variable hypotheses.

Specifically, that there might be underlying processes that generate these patterns of observed covariation. In the adult psychopathology literature, the seminal study by Krueger (1999) used the National Comorbidity Study data to show that patterns of diagnostic covariation among the common mental disorders could be accounted for by two broad factors of

Internalizing (mood and anxiety disorders) and Externalizing (substance use and antisocial behavior). CFA also forms the measurement model in full structural equation models that also allow for structural (i.e., regression paths) among latent variables. So, CFAs are frequently used to develop error-free measures of latent constructs that are then related in more complex patterns of associations in structural equations.

The Exploratory-Confirmatory Spectrum

Often exploration and confirmation are presented as if they are discrete and mutually exclusive modes of inquiry. In practice, though, the boundary between them is much fuzzier. For instance, it is hard to imagine someone conducting an EFA on a set of items with no intuition or expectations about what she might find. Similarly, it is hard to imagine testing a CFA with not only all latent to observed relations articulated, but 24 also their precise values specified a priori. Indeed, either of these scenarios might represent unlikely extremes along a continuum of exploratory to confirmatory. In practice, latent variable modeling contains some mixture of both—in EFAs the investigator usually has some sense about the general structure that might emerge, although the details are quite uncertain, whereas in CFA the investigator often enjoys much greater confidence about many aspects of the model, but some degree of uncertainty remains about the exact values that might emerge. Furthermore, there is likely a willingness to make alterations to non-central features of the model without doing extreme violence to the theoretical structure being tested (e.g., there may be strong theory about the number of factors but weaker theory about whether there are cross-loadings or not). This led me to propose the

Exploratory-Confirmatory Spectrum (Wright, 2017) in factor analysis, which is intended to better contextualize one’s degree of a priori knowledge and expectations about the results. Typical EFA modeling scenarios would fall toward the exploratory end of the spectrum and typical CFA estimation would fall toward the confirmatory end, although both could be pushed further out with procedures of the sort I alluded to above. Falling in the middle between the two are variations in certainty about the expected structure in EFA and willingness to make modifications in CFA. The two poles of the spectrum are brought together with Exploratory Structural

Equation Modeling (ESEM; Asparouhov & Muthén, 2009). 25

ESEM blends the core features of EFA (i.e., exploratory factors, range of rotations) and CFA (i.e., the ability to specify parameters, user specified factors, multiple group analysis) allowing for near total flexibility in modeling. Numerous advantages are gained by this innovation. These include the ability to estimate method factors in EFA analyses of multiple scales from different measures, correlated residuals, and adding parameter equalities across two scientifically interesting groups (e.g., genders, patient vs. non-patients). Figure 1, Panel C, provides a hypothetical example of an

ESEM model. In this diagram, in addition to two obliquely rotated EFA factors (F1 and F2), there is a third, investigator specified factor (F3) that is orthogonal to the other two. F3 could perhaps represent shared method variance for observed variables Y1-Y3, or that they are markers for more than one construct. Finally, the residuals for Y4 and Y6 are allowed to correlate. In the modeling of complex personality data that has large item sets, ESEM benefits from the efficiencies of the EFA framework, while allowing the investigator the control over specific modeling features that are afforded with CFA.

Similar to CFA, ESEM relies on estimation methods that ultimately result in in an implied covariance matrix that can be compared to an observed matrix in various ways to generate goodness-of-fit indices. The fact that the EFA portion of the structure can model a large number of potentially conceptually negligible but statistically significant cross-loadings generally results in considerable improvement in fit over a strict (and 26 implausible) simple structure imposed by many CFAs. However, it is worth emphasizing that factor analytic techniques are largely separable from the estimation approach. While certain estimation methods (e.g., principle factor analysis) are reserved for EFA, estimators like maximum likelihood and weighted least squares can be applied to EFA, CFA, or ESEM. This underappreciated fact often results models erroneously labeled as ESEMs, when in reality only a maximum likelihood EFA has been conducted.

Although this produces fit criteria, no additional user specified parameters have been included. EFA is a very useful technique, and the objection with labeling a maximum likelihood EFA an ESEM is that it creates the perception that there are user specified parameters without the user having specified any beyond a standard EFA. Alternatively, a maximum likelihood

EFA can be considered a special case of ESEM, and the same can be said for CFA.

Given that well validated personality inventories often fit poorly in

CFA models (Hopwood & Donnellan, 2010), personality researchers generally have been early adopters of ESEM. And as such, ESEM in clinical psychology was quickly adopted by personality disorder researchers who deal with similar issues. For instance, Gore and Widiger (2013) estimated the joint factor structure of four personality inventories to examine whether the normal range and pathological scales combined to indicate the same five factors. An initial maximum likelihood EFA resulted in poor fit to the data, so an ESEM was estimated allowing the residuals of indicators from 27 the same personality inventory to correlate across factors. This ultimately resulted in an excellently fitting model, and a theoretically expected five- factor structure. In this case an ESEM allowed Gore and Widiger (2013) to account for the dependency among scales from the same inventory within an otherwise exploratory analytic framework. Other examples include

Wright and Simms (2014, 2015), who dealt with the same issue in a similar but distinct fashion by estimating method factors for each inventory used in an otherwise exploratory model. For example, Wright and Simms (2015) tested whether the joint structure of clinical syndromes, DSM personality disorder dimensions, and maladaptive personality trait scales would conform to a recognizable five-factor structure (Negative Affectivity,

Detachment, Antagonism, Disinhibition, and Psychoticism). In addition to estimating correlated substantive exploratory factors on which all items loaded, orthogonal measure-specific factors were included for the clinical syndrome interview, the personality disorder interview, and the trait scales, on which all indicators from each measure loaded. This served to isolate shared method variance while retaining substantive variance in each indicator.

The advent of ESEM offers investigators considerably more flexibility than EFA or CFA alone. Researchers are encouraged to think of models not as either exploratory or confirmatory, but as falling somewhere along a continuum between those two poles. Thought should be given to whether 28 any parameters can be specified and tested, even if parts of the model will be determined via exploratory techniques.

Bifactor Models

In recent years, the bifactor model has made a considerable impact on our conceptualization of the structure of psychopathology (see Greene et al., in press for a review). Although it was first described in the 1930’s

(Holzinger & Swineford, 1937) the bifactor model received little attention from the applied literature until the mid to late 2000’s. But since then it has had an outside effect on the field; so much so, that it deserves its own section to unpack the merits and demerits of the model. First, it is important to understand that a bifactor model is just an ordinary factor model (it can be estimated as either an EFA or CFA), but with a specific structure (see Figure 1, Panel D). The key feature is that a general factor, on which all indicators load, is estimated as orthogonal to some number of

“group” or “specific” factors, on each of which only a subset of items loads

(or in the case of EFA that load strongly). The specific factors may be either orthogonal or oblique, although orthogonal is more common. In Figure 1

Panel D, a bifactor model is depicted with a general factor and two oblique specific factors.

The principal motivation for estimating a bifactor model is to partition the common variance in each indicator into what it shares with all other indicators (i.e., general variance) and into what it shares with only a smaller subset of indicators (i.e., specific or group variance). Conceptually, this can 29 be understood to isolate those processes that are driving scores in all indicators from those processes that are driving scores on specific groupings. For instance, recently Fournier and colleagues (in press) estimated a bifactor model of the NEO-PI-R’s (Costa & McCrae, 1992) neuroticism scales. The NEO-PI-R is structured hierarchically by design, such that 8 individual questions are combined to form facet scales, and 6 facet scales are combined to form a domain, like neuroticism. Neuroticism facets include depression, anxiety, angry hostility, impulsiveness, self- consciousness, and vulnerability. Fournier et al.’s (in press) motivation was to examine whether what was specific to the depression facet was incrementally related to external variables beyond the general factor (i.e., what was shared across all indicators). Thus, a bifactor model was estimated to partition the variance into neuroticism-general and depression- specific processes. Findings demonstrated, across a community and a clinical sample, that although the standard NEO-PI-R depression scale correlated significantly and strongly with other depression measures, the specific depression facet factor in a bifactor model was unassociated with other depression measures. In other words, whatever processes account for the association between NEO-PI-R depression facet and other measures of depression, these are processes that are general to all neuroticism items and not specific to the depression items.

This sort of theoretically driven analysis, which seeks to partition and separately examine distinct sources of variance in observed indicators, is 30 something the bifactor model is uniquely situated to address (see also Sharp et al., 2015 for another example). However, the bifactor’s popularity has also been driven by its use as an estimate of a higher-order factor in a hierarchical structure while also providing exceptionally good fit to the data. To elaborate, factors estimated from observed indicators can also be used as indicators for higher-order factors, assuming there are sufficient lower order factors present to identify the higher-order factor (i.e., at least

3). See Panel A in Figure 2 for an example. The bifactor is an alternative approach to estimating a higher-order factor, in that the general factor in the bifactor would capture shared variance in all indicators, and group or specific factors could be estimated to capture what is unique in the indicators for the aforementioned first-order factors. See Panel B in Figure

2 for a bifactor hierarchical model in the same data. The general factor and the higher-order factor in these models will be very similar, in that the general factor captures what all the indicators share and the higher-order factor captures what is shared in the lower-order factors, which themselves are based on what groups of indicators share. At the same time, they are not identical, because in the higher-order factor model the shared variance is “funneled” through the lower-order factors on its way up to the higher- order model. The implication is that any additional shared variance among the indicators that is not captured by the group factors would not be explained by the higher-order factor, but would be explained by the general factor in the bifactor model. Importantly, the specific factors will not be 31 identical to, and in many cases in clinical research will be quite distinct from, the first-order factors in a standard higher-order factor model, because the specific factors will only be estimated from the variance in a subset of indicators net of the variance shared with all items. As I will describe below, this last issue has become a point of criticism of the bifactor approach, but those criticisms often neglect an important reality of higher- order models.

In the domain of psychopathology structure, although conceptually distinct domains like internalizing and externalizing have been estimated in many data sets, they often correlate strongly (e.g., population sample r

~ .5). This has led some to hypothesize a higher-order general factor of psychopathology, or the “p-factor” (Caspi et al., 2014; Lahey et al., 2012;

Lahey et al., 2015). In their seminal paper on the topic, Caspi and colleagues (2014) referred to the p-factor as the propensity or liability to developing psychopathology. They compared models with three correlated factors (internalizing, externalizing, and thought disorder) explaining the covariation of mental disorder diagnoses over the lifetime with several other structures, including a bifactor. In this early work, the bifactor model was chosen over the correlated factors model and alternatives because of its markedly superior relative fit in the data. Note that a three-factor oblique model would have identical statistical fit to a higher-order model with one factor explaining the covariation among the three factors. Although papers examining a general factor of psychopathology (e.g., Lahey et al., 2012) 32 predate the work by Caspi et al. (2014), this was the paper that ignited an explosion of literature in the past half a decade examining the p-factor and often comparing its fit to a correlated factor model (Greene et al., in press).

As with any highly visible and potentially transformative research program, the p-factor research has met with detractors, and critics have often taken aim at the bifactor model specifically. Concerns have been raised on both quantitative and conceptual grounds, and these extend well beyond the domain of clinical psychology and p-factors. Nevertheless, they are important to cover here because of the outsized impact this model has had on the field and because bifactors are now used extensively in clinical psychology, not just in the p-factor literature. The statistical concerns largely center on the fact that bifactor models are exceptionally good at matching the empirical structure of the data—perhaps too good, in that some have raised concerns about bias and ill-performing fit in comparisons of bifactor models to other models (Bonifay, 2017; Gignac,

2016; Greene et al., in press; Mansolf & Reise, 2017; Morgan, Hodge, Wells,

& Watkins, 2015; Murray & Johnson, 2013; Reise, Kim, Mansolf, &

Widaman, 2016). To summarize a technically complex and nuanced literature in a couple of brief points, what these studies have shown is that the bifactor model is better able to accommodate departures from the expected pattern of effects generated by other models, most notably the higher-order model described above. In other words, when factor models are not a faithful match to the true data generating mechanism, which is 33 always the case to some degree, the features of the bifactor model allow it to better approximate the data than comparable models, even if it is not the true data generating mechanism. The types of misspecifications that would lead fit statistics to favor the bifactor model over the higher-order factor model, even when the higher-order factor model is the true model, include un-modeled cross-loadings of items in the first-order factors and residual correlations. An added issue is that these sorts of model features can arise spuriously by chance in any given data set, thereby favoring the bifactor spuriously.

However, in applied settings the true data generating mechanism is unknown, and whether effects are spurious or not may not be known either.

As Mansolf and Reise (2017) put it, this leaves applied researchers with a conundrum. If the fit statistics favor the bifactor model, that could be because it is a better representation of the data, or it could be because it has accommodated a unmodeled (and possibly spurious) complexities in the structure from something like the higher-order model. They and Gignac

(2016) also point out that there are conditions under which the bifactor model will have identical fit to the higher-order factor model, even when the bifactor model is the true data generating mechanism. Ultimately, what this boils down to, is that selecting the bifactor model or a comparator based solely on fit is dubious practice, and instead conceptual arguments need to be made in favor of one or the other. As noted above, quantitative indices like fit statistics are useful but fallible guides. 34

Concerns have also been raised about the bifactor model on conceptual grounds. One of the major issues has been that partitioning variance into general and specific in this way is something of a false exercise because individuals only produce one set of scores, not multiple sources of scores. Or, alternatively, one cannot determine from a single set of scores what underlying processes generated their manifestation. This criticism fundamentally misunderstands the utility of latent variable modeling as providing access to something distinct from the observed data.

A second criticism has been that bifactor models can often generate

“nonsensical” specific factor loadings. For instance, it is not uncommon to find null or even negative loadings of indicators on a specific factor, when those same indicators would positively and strongly load on a first-order factor. An example of this can be found in the item loadings depression facet in the community sample of the aforementioned Fournier et al. (in press). The concern is that the specific factors in a bifactor model “are not the same thing” as the first order factors in a higher-order model. This is most assuredly the case, but whether it is a reasonable comparison is another matter. The correct comparison would not be to the first-order factor in a higher-order factor model, but rather to the residuals (sometimes referred to as a disturbance) of those first-order factors after partialling out the higher-order factor (see Figure 2). Often relatively little variance remains in the first-order factor after accounting for the higher-order factor, and these are rarely if ever evaluated independently. One of the reasons 35 this is the case is that it can be difficult to estimate identified structural to models evaluate the effect of these residuals.

In sum, the bifactor model can be a useful tool for partitioning variance into general and specific components when theoretically justified.

Yet it is likely that the field has placed too much weight and confidence in the model due to its ability to provide a better match to what is often complex observed data. Moving forward applied researchers should understand the model for what it is and use it when it makes strong conceptual sense, placing less emphasis on its statistical fit.

Factor Mixture Modeling and Comparing Latent Structures

To this point, only dimensional latent variable models have been considered, with the shared assumption being that the latent variables are continuous and normally distributed. This distributional assumption is not a requirement and, as introduced above, models exist that assume non- normality and discontinuities in the latent space. Although initially developed as distinct models, categorical (e.g., latent class and latent profile analysis) and dimensional latent variable models can now be subsumed within the broader framework of factor mixture modeling. Factor mixtures are not limited to simplistic categorical vs. dimensional dichotomies, but rather encompass a Categorical-Dimensional Spectrum

(Masyn, Henderson, & Greenbaum, 2010). Thus, the modeled structures can range from the fully dimensional (i.e., factor analyses) to the fully categorical (i.e., latent class analysis), with variations that combine aspects 36 of the two in between (see also Hallquist & Wright, 2014). To illustrate some of the latent structures that are possible in this framework, Figure 3 provides graphical depictions of many, but not all, of the possible structures. Factor analysis (Panel A) represents one pole of the categorical- dimensional spectrum, and assumes a fully dimensional latent structure, such that individuals vary continuously along a normally distributed latent trait (or traits). At the other end of the spectrum is latent class analysis

(Panel F) that assumes a fully categorical latent structure, such that individuals differ discretely from each other exclusively in terms of a pattern of features shared among a homogenous subgroup. In terms of hybrid models, Semi-Parametric Factor Analysis (Panel B) estimates a mixture of normally distributed groups along a common dimension to model a non-normal, but continuous, distribution. Thus, individuals vary along the same trait, but it allows for an extreme tail, or other non-normal (e.g., bimodal) distributions. Alternatively, Non-Parametric Factor Analysis (also referred to as Located Latent Class Analysis; Panel C) models discrete latent groups along a shared dimension. In this case, there are defined

“gaps” between latent groups of individuals along the same latent trait. It is also possible to model factor structures that differ across groups (Panel

D), which imply different latent dimensions or that the questions or symptoms have different meanings, or function differently across groups.

This can even be extended to include discrete disjunctions in those factors

(Panel E). This is not an exhaustive catalogue of these models, but rather a 37 sampling to encourage researchers and practitioners to think in more nuanced ways about the possible latent structure of clinical constructs beyond the typical categories and continua. Importantly, these can all be estimated and compared with each other in real data to test theoretical assumptions about the actual latent structure of pathology. This is made possible by advances in maximum likelihood estimation that places all of these models on comparable quantitative footing.

Adjudicating debates about the latent structure of psychopathology is one fruitful use of factor mixture modeling in clinical psychology.

Specifically, numerous studies have now accumulated that have compared dimensional, categorical, and hybrid structures to determine which structure the empirical data fit better. In a typical example of this literature, Conway, Hammen, and Brennan (2012) examined the latent structure of the DSM’s nine borderline personality disorder criteria in a large community sample of young adults at risk for psychopathology. They compared dimensional (factor analytic), categorical (latent class analysis), and hybrid models (non-parametric factor analyses) finding that the data best fit a fully dimensional latent structure. In a more recent study,

Aslinger and colleagues (2018) performed similar analyses on the DSM’s narcissistic personality disorder criteria, but extended the typical approach by including semi-perimetric factor analyses as an alternative hybrid model as well as attempting to replicate initial exploratory results across 5 samples. The consistent finding favored the dimensional model (CFA) over 38 hybrids or fully categorical models. Expanding the lens beyond single disorders to broad domains of psychopathology (e.g., internalizing, externalizing, psychosis), the finding has been consistent, with fit criteria favoring latent dimensional models over categorical or hybrid models (e.g.,

Eaton et al., 2013; Markon & Krueger, 2005; Walton, Ormel, & Krueger,

2011; Wendt et al., in press; Witkiewitz et al., 2013; Wright et al., 2013).

Although the broad picture is quite consistent, that the latent structure of psychopathology is likely dimensional in nature, several other aspects of this modeling space are worth noting. For one, factor mixture models can quickly become complex, because they allow basically any parameter to vary across classes (e.g., factor loadings, variances, residuals, intercepts). Thus, exhaustive testing of structures is impractical and runs the risk of capitalizing on chance. As a result, most applied use of factor mixtures in clinical science has been restricted to these coarse comparisons as outlined in the preceding paragraph (cf. Hallquist & Pilkonis, 2012). It may be that judicious and targeted selection of hybrid models will prove useful for some questions. One example is in the modeling of non-normality in a dimensional latent space (Masyn et al., 2010). A further consideration is that even when a particularly latent structure is estimated it implicitly provides support for another. For instance, Bornovalova and colleagues

(2016) estimated a latent class model of borderline personality disorder symptoms, but the four retained classes differed from each only in severity, not in the configuration of symptom endorsements (i.e., quantitative, not 39 qualitative differences between classes), suggesting a dimensional latent structure. Just because a particular structure is estimated and is well fitting and provides sensible values, does not mean it is the most appropriate fit. In many respects this is similar to the issues revolving around bifactor modeling, and serves as a reminder that considerations of interpretability and conceptualization must be close bedfellows of quantitative indices.

Conclusion

Latent variable models play an important role in contemporary clinical psychology. Here I have provided a conceptual overview of the techniques, in part because any technical treatment limited to a single chapter would be woefully incomplete. More importantly than that, though, is that the utility of these rests most strongly on how we think about them and what they can tell us. At the start, I argued that it is important to take the ontology of latent variable models seriously, but that does not mean one has to adopt an extreme position. The pseudo-realist perspective asserts that the latent variables estimated are meaningful and distinct from the data, but hedges on whether they are identifying one true underlying cause. In fact, as I think will be clear to any readers, much of what clinical psychology is working with is approximations. This seems reasonable for a science as young as ours, and I believe it will continue to improve with the help of latent variable models. Consistent with this theme, throughout the chapter

I noted that many of the ongoing thorny issues in latent variable modeling 40

(e.g., the overreliance on fit statistics when evaluating bifactor models) require deep conceptual thinking, not careful adherence to statistics.

The review presented here strongly favored factor analysis or latent dimensional models. This is in large part because these models have provided more replicable and interpretable results. However, the use of mixture models has been and continues to be (Aslinger et al., 2018; Wendt et al., in press) useful in the adjudication of latent structure. Another limitation is that all of the examples I provided were based on self-report or diagnostic interview data. There is no reason to limit these models to psychometric scale data of this sort, and one can just as easily use biological or observational behavior data as well. The Venables and Patrick chapter in this volume gives examples of how this might proceed fruitfully.

More broadly, I hope readers take away the broader perspective, that anytime a measure of something observable stands in for some underlying construct, the researcher is invoking a latent variable model conceptualization. References

American Psychiatric Association (2013). Diagnostic and Statistical Manual

of Mental Disorders – Fifth Edition. Washington, DC: Author.

Aslinger, E.N., Manuck, S.B., Pilkonis, P.A., Simms, L.J., & Wright, A.G.C.

(2018). Narcissist or narcissistic? Evaluation of the latent structure of

narcissistic personality disorder. Journal of Abnormal Psychology,

127(5), 496-502.

Asparouhov, T. & Muthén, B. (2009). Exploratory structural equation

modeling. Structural Equation Modeling, 16, 397-438.

Bollen, K.A. (2002). Latent variables in psychology and the social

sciences. Annual review of psychology, 53(1), 605-634.

Bollen, K.A. (1989). Structural equations with latent variables. New York:

Wiley.

Bollen, K.A., & Curran, P.J. (2006). Latent Curve Models: A Structural

Equation Perspective. John Wiley & Sons.

Bonifay, W. (2017). On the complexity of item response theory models.

Multivariate Behavioral Research, 1-20.

Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2003). The Theoretical

Status of Latent Variables. Psychological Review, 110(2), 203-219.

Brown, T.A. (2015). Confirmatory factor analysis for applied research, 2nd

Ed. New York: Guilford Press.

Brown, T. A., Chorpita, B. F., & Barlow, D. H. (1998). Structural

Relationships Among Dimensions of the DSM-IV Anxiety and Mood Disorders and Dimensions of Negative Affect, Positive Affect, and

Autonomic Arousal. Journal of Abnormal Psychology, 107(2), 179-192.

Caspi, A., Houts, R. M., Belsky, D. W., Goldman-Mellor, S. J., Harrington, H.,

Israel, S., . . . Poulton, R. (2014). The p factor one general

psychopathology factor in the structure of psychiatric disorders?

Clinical Psychological Science, 2(2), 119-137.

Caspi, A., & Moffitt, T. E. (2018). All for one and one for all: Mental

disorders in one dimension. American Journal of Psychiatry, 175(9),

831-844.

Clark, L. A., & Watson, D. (1995). Constructing Validity: Basic Issues in

Objective Scale Development. Psychological Assessment, 7(3), 309-

319.

Collins, L.M., & Lanza, S.T. (2010). Latent class and latent transition

analysis: With applications in the social behavioral, and health

sciences. Hoboken, NJ: Wiley.

Conway, C., Hammen, C., & Brennan, P. A. (2012). Comparison of latent

class, latent trait, and factor mixture models of DSM-IV borderline

personality disorder criteria in a community setting: Implications for

DSM-5. Journal of Personality Disorders, 26, 793–803.

Costa, P. T., Jr., & McCrae, R. R. (1992). Revised NEO Personality

Inventory (NEO–PI–R) and NEO Five-Factor Inventory (NEO–FFI)

professional manual. Odessa, FL: Psychological Assessment

Resources. Cronbach, L., & Meehl, P. (1955). in psychological

tests. Psychological bulletin, 52(4), 281-302.

Eaton, N. R., Krueger, R. F., Markon, K. E., Keyes, K. M., Skodol, A. E.,

Wall, M., Hasin, D. S., ... (2013). The structure and predictive validity

of the internalizing disorders. Journal of Abnormal Psychology, 122,

86–92.

Edwards, J.R., & Bagozzi, R.P. (2000). On the nature and direction of

relationships between constructs and measures. Psychological

Methods, 5, 155–174.

Forbes, M. K., Kotov, R., Ruggero, C. J., Watson, D., Zimmerman, M., &

Krueger, R. F. (2017). Delineating the joint hierarchical structure of

clinical and personality disorders in an outpatient psychiatric

sample. Comprehensive psychiatry, 79, 19-30.

Fournier, J.C., Wright, A.G.C., Tackett, J.L., Uliaszek, A., Pilkonis, P.A.,

Manuck, S.B., & Bagby, R.M. (in press). Decoupling personality and

acute psychiatric symptoms in a depressed and a community sample.

Clinical Psychological Science.

Gignac, G. E. (2016). The higher-order model imposes a proportionality

constraint: That is why the bifactor model tends to fit better.

Intelligence, 55, 57-68.

Girard, J.M., Wright, A.G.C., Beeney, J.E., Lazarus, S., Scott, L.N., Stepp,

S.D., & Pilkonis, P.A. (2017). Interpersonal problems across levels of

the psychopathology hierarchy. Comprehensive Psychiatry, 79, 53-69. Gore, W.L., & Widiger, T.A. (2013). The DSM-5 dimensional trait model and

five-factor models of personality. Journal of Abnormal Psychology,

122(3), 816-821.

Greene, A.L., Eaton, N.R., Li, K., Forbes, M.K., Krueger, R.F., Markon, K.E.,

Waldman, I., Cicero, D.C., Conway, C.C., Docherty, A.R., Fried, E.I.,

Ivanova, M.Y., Jonas, K.G., Latzman, R.D., Patrick, C.J., Reininghaus,

U., Tackett, J.L., Wright, A.G.C., Kotov, R. (in press). Are fit indices

used to test psychopathology structure biased? A simulation study.

Journal of Abnormal Psychology.

Hallquist, M.N. & Wright, A.G.C. (2014). Mixture modeling methods for the

assessment of normal and abnormal personality part I: Cross-sectional

models. Journal of Personality Assessment, 96(3), 256-268.

Harman, H.H. (1960). Modern Factor Analysis. Chicago: University of

Chicago Press

Holzinger, K. J., & Swineford, F. (1937). The bi-factor method.

Psychometrika, 2(1), 41-54.

Hopwood, C. J., & Donnellan, M. B. (2010). How should the internal

structure of personality inventories be evaluated? Personality and

Social Psychology Review, 14, 332–346.

Horn, J. L. (1965). A rationale and test for the number of factors in factor

analysis. Psychometrika, 30(2), 179–185.

Kotov, R., Ruggero, C. J., Krueger, R. F., Watson, D., Yuan, Q., &

Zimmerman, M. (2011). New dimensions in the quantitative classification of mental illness. Archives of General Psychiatry, 68,

1003–1011.

Krueger, R. F. (1999). The structure of common mental disorders. Archives

of General Psychiatry, 56, 921–926.

Lahey, B. B., Applegate, B., Hakes, J. K., Zald, D. H., Hariri, A. R., &

Rathouz, P. J. (2012). Is there a general factor of prevalent

psychopathology during adulthood? Journal of Abnormal Psychology,

121(4), 971.

Lahey, B. B., Rathouz, P. J., Keenan, K., Stepp, S. D., Loeber, R., & Hipwell,

A. E. (2015). Criterion validity of the general factor of

psychopathology in a prospective study of girls. Journal of Child

Psychology and Psychiatry, 56(4), 415-422.

Loehlin, J.C., (2004). Latent Variable Models: An Introduction to Factor,

Path, and Structural Equation Analysis. Taylor & Francis.

Mansolf, M., & Reise, S. P. (2017). When and why the second-order and

bifactor models are distinguishable. Intelligence, 61, 120-129.

Markon, K. E., & Krueger, R. F. (2005). Categorical and continuous models

of liability to externalizing disorders: A direct comparison in NESARC.

Archives of General Psychiatry, 62, 1352.

Masyn, K. E., Henderson, C. E., & Greenbaum, P. E. (2010). Exploring the

latent structures of psychological constructs in social development

using the dimensional–categorical spectrum. Social

Development, 19(3), 470-493. Morgan, G. B., Hodge, K. J., Wells, K. E., & Watkins, M. W. (2015). Are fit

indices biased in favor of bi-factor models in cognitive ability

research?: A comparison of fit in correlated factors, higher-order, and

bi-factor models via Monte Carlo simulations. Journal of Intelligence,

3(1), 2-20.

Mulaik, S. (2010). Foundations of factor analysis (2nd Edition). Boca Raton,

FL: Chapman & Hall.

Murray, A. L., & Johnson, W. (2013). The limitations of model fit in

comparing the bi-factor versus higher-order models of human

cognitive ability structure. Intelligence, 41(5), 407-422.

Newsom, J. T. (2015). Longitudinal structural equation modeling: A

comprehensive introduction. Routledge.

Nunnally JC. (1978). Psychometric Theory. New York: McGraw-Hill

Oltmanns, J. R., & Widiger, T. A. (2016). Self-pathology, the five-factor

model, and bloated specific factors: A cautionary tale. Journal of

Abnormal Psychology, 125(3), 423-434.

Preacher, K.J., Wichman, A.L., Briggs, N.E., & MacCallum, R.C.

(2008). Latent growth curve modeling. Sage.

Reise, S. P., Kim, D. S., Mansolf, M., & Widaman, K. F. (2016). Is the

bifactor model a better model or is it just better at modeling

implausible responses? Application of iteratively reweighted least

squares to the Rosenberg Self-Esteem Scale. Multivariate Behavioral

Research, 51(6), 818-838. Ruscio, J., & Roche, B. (2012). Determining the number of factors to retain

in an exploratory factor analysis using comparison data of known

factorial structure. Psychological Assessment, 24(2), 282-292.

Sass, D. A., & Schmitt, T. A. (2010). A comparative investigation of rotation

criteria within exploratory factor analysis. Multivariate Behavioral

Research, 45, 73–103.

Sharp, C., Wright, A.G.C., Fowler, J.C., Freuh, C., Allen, J.G., Oldham, J., &

Clark, L.A. (2015). The structure of personality pathology: Both

general ('g') and specific ('s') factors? Journal of Abnormal

Psychology, 124(2), 387-398.

Spearman, C. (1904). General intelligence, objectively determined and

measured. American Journal of Psychology, 15, 201-293.

Velicer, W.F. (1976). "Determining the number of components from the

matrix of partial correlations". Psychometrika 41, 321–327.

Walton, K. E., Ormel, J., & Krueger, R. F. (2011). The dimensional nature of

externalizing behaviors in adolescence: Evidence from a direct

comparison of categorical, dimensional, and hybrid models. Journal of

Abnormal Child Psychology, 39, 553–561.

Wendt, L.P., Wright, A.G.C., Pilkonis, P.A., Nolte, T, Fonagy, P., Montague,

R.P., Benecke, C., Krieger, T., & Zimmermann, J. (in press).

Evaluating the latent structure of interpersonal problems: Validity of

dimensions and classes. Journal of Abnormal Psychology. Witkiewitz, K., King, K., McMahon, R. J., Wu, J., Luk, J., Bierman, K. L., Coie,

J. D., ... (2013). Evidence for a multi-dimensional latent structural

model of externalizing disorders. Journal of Abnormal Child

Psychology, 41, 223–237.

Wright, A.G.C. (2017). The current state and future of factor analysis in

personality disorder research. Personality Disorders: Theory,

Research, and Treatment, 8(1), 14-25.

Wright, A.G.C., Krueger, R.F., Hobbs, M.J., Markon, K.E., Eaton, N.R., &

Slade, T. (2013). The structure of psychopathology: Toward an

expanded quantitative empirical model. Journal of Abnormal

Psychology, 122(1), 281-294.

Wright, A.G.C., & Simms, L.J. (2014). On the structure of personality

disorder traits: Conjoint analyses of the CAT-PD, PID-5, and NEO-PI-3

trait models. Personality Disorders: Theory, Research, and Treatment,

5(1), 43-54.

Wright, A.G.C. & Simms, L.J. (2015). A metastructural model of mental

disorders and pathological personality traits. Psychological ,

45(11), 2309-2319.

Wright, A.G.C. & Zimmermann, J. (2015). At the nexus of science and

practice: Answering basic clinical questions in personality disorder

assessment and diagnosis with quantitative modeling techniques. In

S. Huprich (Ed.), Personality disorders: Toward theoretical and empirical integration in diagnosis and assessment (pp. 109-144).

Washington, DC: American Psychological Association. Y1 Y1

F1 Y2 F1 Y2

Y3 Y3

Y4 Y4

F2 Y5 F2 Y5

Y6 Y6

A. Exploratory Factor Analysis B. Confirmatory Factor Analysis

Y1 Y1

F1 Y2 F3 Y2 S1

Y3 Y3

G

Y4 Y4

F2 Y5 Y5 S2

Y6 Y6

C. Exploratory Structural Equation Modeling D. Bifactor Modeling

Figure Legend

• Squares represent observed variables • Circles represent latent factors • Straight arrows connecting observed and latent variables reflect factor loading estimates • Free straight arrows attached to observed variables reflect residual variances • Curved arrows represent covariances • Solid lines reflect EFA specified parameters • Dashed lines reflect parameters which the user can specify as free or fixed

Figure 1. Examples of common factor models. Y1 Y1

Y2 S1 Y2 F1

Y3 Y3

Y4 Y4

G Y5 S2 Y5 F2 F1

Y6 Y6

Y7 Y7

Y8 S3 Y8 F3

Y9 Y9

A. Bifactor Model B. Higher-Order Factor Analysis

Figure Legend

• Squares represent observed variables • Circles represent latent factors • Straight arrows connecting observed and latent variables reflect factor loading estimates • Free straight arrows attached to observed or latent variables reflect residual variances

Figure 2. Two alternative representations of a hierarchical factor model. Figure 3. Graphical depiction of latent distributions associated with various factor mixture models. Lines at the base with arrows represent continuous dimensions. Columns represent groupings of individuals with no variance in latent scores.