Topics in Testing Mediation Models: Power, Confounding, and Bias

DISSERTATION

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University

By

Robert Arthur Agler

Graduate Program in Psychology

The Ohio State University

2015

Dissertation Committee:

Dr. Paulus De Boeck, Advisor

Dr. Robert Cudeck

Dr. Andrew Hayes

Dr. Duane Wegener

Copyrighted by

Robert Arthur Agler

2015

Abstract

In this dissertation we consider different statistical methodologies to be employed at all stages of testing mediation claims. We begin by examining the relative performance of various methods of testing direct and indirect effects, both in terms of statistical power and the risk of Type I errors. Specifically, we compare a normal-theory approach to testing direct and indirect effects (Sobel, 1982) using either regression or structural equation models with different estimations to bootstrapping techniques (Efron, 2003). We then discuss factor models as an alternative model to mediation models for cases where they make conceptual sense, and as a method of examining worst-case confounding scenarios. We present formulae that discuss their relationships, and investigate the use of structural equation modeling as a way to distinguish between these two models. Finally, we investigate the utility of fungible weights (Waller, 2008) for examining parameter sensitivity in mediation. Fungible weights provide almost equal description of the dependent variable as do the optimal weights, yet may be quite discrepant with the optimal weights and suggest alternative interpretations. We also provide a function to facilitate their use.

ii

Acknowledgments

I do not believe that I can adequately express my gratitude for the opportunities and support I have been given by my friends, family, and colleagues. Specifically, I wish to thank my advisor Dr. Paulus De Boeck for the opportunity to study quantitative psychology, my parents for always believing in me, and my girlfriend for being by my side through this process. I have come further and overcome far more than I could have ever believed before I began my schooling, and it is because of the many chances and words of encouragement that my friends and family have given me.

iii

Vita

2010...... B.S. Psychology, James Madison University

2012...... M.A. Social Psychology, The Ohio State

...... University

2010-2011 2010-2011 ...... Graduate Fellow, Department of

...... Psychology, The Ohio State University

2011- present 2011- present ...... Graduate Teaching Associate, Department

...... of Psychology, The Ohio State University

Publications

Arkin, R.M, & Agler, R.A. (2012). Focus on individual differences: A throwback and a

throw down. PsycCRITIQUES, 57(23).

Carroll, P. J., Agler, R. A., & Newhart, D. W. (2015). Beyond Cause to Consequence:

The Road from Possible to Core Self-Revision. Self and Identity, 14(4), 482-498.

Hayes, A. F., & Agler, R. A. (2014). On the standard error of the difference between

independent regression coefficients in analysis. Multiple Linear

Regression Viewpoints, 40 (2), 16-27.

iv

Fields of Study

Major Field: Psychology

v

Table of Contents

Abstract ...... ii

Acknowledgments...... iii

Vita ...... iv

List of Tables ...... viii

Chapter 1: Introduction ...... 1

Chapter 2: Relative Performance of Methods of Testing Mediation Effects ...... 14

Chapter 3: Factor Model as an Alternative Explanation ...... 57

Chapter 4: Testing a Factor Model against a Mediation Model ...... 86

Chapter 5: Fungible Weights in Mediation ...... 103

Chapter 6: General Discussion...... 141

References ...... 146

Appendix A: Full Results for Chapter 3 ...... 154

Appendix B: Formulas for Converting Correlations to Factor Loadings, One Factor ....156

Appendix C Formulas for Converting Regression Weights to Factor Loadings,

vi

One Factor ...... 157

Appendix D: Formulas for Converting Correlations to Factor Loadings, Two Factors ..159

Appendix E: Fungible Mediation Function ...... 161

Appendix F: Fungible Mediation Example...... 167

vii

List of Tables

Table 1. Power for testing the direct effect for all methods, collapsed across all effect size combinations...... 31

Table 2. Type I error rates when testing the direct effect for all methods, collapsed across all effect size combinations...... 32

Table 3. Power for testing the indirect effect for all methods, collapsed across all effect size combinations...... 39

Table 4. Type I error rates when testing the indirect effect for all methods, collapsed across all effect size combinations...... 40

Table 5. Sample correlation and regression coefficients based on vector angles and lengths, for four select cases, and vector lengths of 0.8 and 0.5...... 79

Table 6. Comparison of model fit for the models we estimate here. EL = equal loadings. CE = correlated errors. LL = lag-lag from the latent variable at t0 to the one at t2...... 97

Table 7. Regression results for predictors of fungible interval of the direct and indirect effects in the single mediator case...... 118

Table 8. Regression results for the predictors of the fungible interval of the direct and indirect effects. The results for the indirect effect ab2 are not shown, but the results are of

viii

the same nature for ab2, excepting that the effects are related to , , and

rather than , , and ...... 126

Table 9. Results presented in Chapter 3, based on possible factor space angles and vector lengths of .8 for a given combination of mediation results...... 154

Table 10. . Results presented in Chapter 3, based on possible factor space angles and vector lengths of .5 for a given combination of mediation results...... 155

ix

List of Figures

Figure 1. ROC curve for the direct effect and N = 50, collapsed across all effect size combinations. Full plot comparing the specificity (1 – observed Type I error rate) and sensitivity (1 – observed Type II error rate) is on the right, and the plot on the left shows a limited range of nominal α levels for comparison...... 28

Figure 2. ROC curve for the direct effect and N = 100, collapsed across all effect size combinations. Full plot comparing specificity (1 – observed Type I error rate) and sensitivity (1 – observed Type II error rate) is on the right, and the plot on the left shows a limited range of nominal α levels for comparison...... 29

Figure 3. ROC curve for the direct effect and N = 200, collapsed across all effect size combinations. Full plot comparing specificity (1 – observed Type I error rate) and sensitivity (1 – observed Type II error rate) is on the right, and the plot on the left shows a limited range of nominal α levels for comparison...... 30

Figure 4. ROC curve for the indirect effect and N = 50, collapsed across all effect size combinations. Full plot comparing specificity (1 – observed Type I error rate) and sensitivity (1 – observed Type II error rate) is on the right, and the plot on the left shows a limited range of nominal α levels for comparison...... 36

xi

Figure 5. ROC curve for the indirect effect and N = 100, collapsed across all effect size combinations. Full plot comparing specificity (1 – observed Type I error rate) and sensitivity (1 – observed Type II error rate) is on the right, and the plot on the left shows a limited range of nominal α levels for comparison...... 37

Figure 6. ROC curve for the indirect effect and N = 200, collapsed across all effect size combinations Full plot comparing specificity (1 – observed Type I error rate) and sensitivity (1 – observed Type II error rate) is on the right, and the plot on the left shows a limited range of nominal α levels for comparison...... 38

Figure 7. Difference between the standard deviation of ab estimates across all replications for a given condition, compared to the observed standard deviation of the replications. Calculated as the mean estimated SE for regression across all replications –

SD of replications...... 43

Figure 8. Difference between the standard deviation of ab estimates across all replications for a given condition, compared to the observed standard deviation of the replications. Calculated as the mean estimated SE for SEM with ML across all replications – SD of replications...... 43

Figure 9. Mean of the mean for bootstrap resamples across all replications, compared to the mean of all replications...... 45

Figure 10. Median of the median for bootstrap resamples across all replications, compared to the median of all replications...... 45

Figure 11. Mean of the skew for bootstrap resamples across all replications, compared to the skew of all replications...... 46

xi

Figure 12. Mean of the kurtosis for bootstrap resamples across all replications, compared to the kurtosis of all replications...... 46

Figure 13. Mean of the standard deviation for bootstrap resamples across all replications, compared to the standard deviation of all replications...... 47

Figure 14. Difference between the mean ab across all replications for a given condition, compared to the mean ab from each bootstrapped distribution. Calculated as mean mean of bootstrapped distributions - mean of replications...... 48

Figure 15. Difference between the median ab across all replications for a given condition, compared to the median of the median ab from each bootstrapped distribution. Calculated as the median median of the bootstrapped distributions - median of replications...... 48

Figure 16. Difference between the skew of the distribution of the estimated values of ab for a given condition across all replications, compared to the mean skew from each bootstrapped distribution. Calculated as mean skew of bootstrapped distributions - skew of replications...... 49

Figure 17. Difference between the kurtosis of the distribution of the estimated values of ab for a given condition across all replications, compared to the mean kurtosis from each bootstrapped distribution. Calculated as mean kurtosis of bootstrapped distributions - kurtosis of replications...... 49

Figure 18. Difference between the standard deviation of the distribution of the estimated values of ab for a given condition across all replications, compared to the mean standard deviation from each bootstrapped distribution. Calculated as mean standard deviation of bootstrapped estimates – the standard deviation of the replications...... 50

xii

Figure 19. One-factor longitudinal model. X0, M1, and Y2 are in bold to indicate that in an incomplete longitudinal design they would be the variables that are measured and subject to a mediation analysis...... 64

Figure 20. Affect circumplex (taken with permission from Yik, Russell, & Steiger, 2011). 68

Figure 21. Sample mediation triangle placed within factorial space. The three variables from the core affect example are represented as vectors. The specific situation shown is the case where the vector length of each variable is .8, X and Y are orthogonal, and the mediator is 45° from both X (frustrated) and Y (depressed). In this case, X and Y are uncorrelated, and the mediator is correlated with both X and Y at r = .45. Variable labels are somewhat arbitrary and for illustrative purposes only...... 71

Figure 22. Heat map of the calculated indirect and direct effects when rXY = 0. Values vary as a function of the magnitude of rXM and rMY...... 74

Figure 23. Calculated direct effects in a shared factor space. X-axis represents the magnitude of angle MY as a proportion of angle XY. For example, for ∠XY = 60° and proportion = .33 (1/3), ∠XM = 20° and ∠MY = 40°...... 77

Figure 24. Calculated indirect effects in a shared factor space. X-axis represents the magnitude of ∠MY as a proportion of ∠XY. For example, for ∠XY = 60° and proportion

= .33 (1/3), ∠XM = 20° and ∠MY = 40°...... 78

Figure 25. One-factor longitudinal model. X0, M1, and Y2 are in bold to indicate that in an incomplete longitudinal design they would be the variables that are measured and subject to a mediation analysis...... 90

xiii

Figure 26. Plot of fungible weights with three predictors when all variable correlations are r = .5 and = .98. The point in the center is the OLS estimate of the weights, b1 = b2 = c’ = .25. Histograms illustrate the spread of the fungible weights, and show the clear bimodality associated with them...... 108

Figure 27. Plot of fungible weights with three predictors when all variable correlations are r = .3 and = .98. The point in the center is the OLS estimate of the weights, b1 = b2 = c’ = .19. Histograms illustrate the spread of the fungible weights, and show the clear bimodality associated with them...... 109

Figure 28. Comparison of fungible intervals and confidence intervals with R2. The lines represent the intervals about the OLS estimated weights. As a given value of R2 value may result in different fungible intervals depending on the correlations between the variables, the lines have been jiggered about R2. Grey lines indicate confidence intervals, and the black lines indicate fungible intervals. Confidence intervals are based on Sobel standard error and N = 100...... 114

Figure 29. Relation of fungible intervals and confidence intervals with the magnitude of the indirect effect. The lines represent the interval about the OLS estimated weights. Grey lines indicate confidence intervals, and the black lines indicate fungible intervals. The lines are overlain because a given value of R2 value may result in different fungible intervals, depending on the correlations between the variables. Confidence intervals are based on Sobel standard error and N = 100...... 115

xiv

Figure 30. Range of fungible intervals for the direct effect in the one mediator case as a function of , based on . The results are the same for the range of b as a function of ...... 120

Figure 31. Range of fungible intervals for indirect effects in the one mediator case compared as a function of to , based on ...... 121

Figure 32. Fungible intervals for the direct effect based on a criterion value rcrit derived from the standard error of R2 and N=100...... 133

Figure 33. Fungible intervals for the indirect effect based on a criterion value derived from the standard error of R2 and N=100...... 134

xv

Chapter 1: Introduction

Theories that describe causal relationships are made more convincing when one may provide evidence of intervening processes. Doing so requires researchers to develop and test causal hypotheses in a variety of ways, both in terms of the methodologies employed and the statistical analyses used. In regards to the statistical analysis, whatever the model employed, it is then necessary to both test the statistical significance of its parameter estimates, and to examine the quality of the estimates (e.g., degree of bias) that the model yields, as both are crucial components of demonstrating the validity of a causal theory.

Perhaps the most popular analysis explicitly intended for making causal claims is mediation. As the many citations of Baron and Kenny’s (1986) seminal article suggest, using mediation is straightforward. In addition to the numerous software packages created for mediation analyses (e.g., PROCESS; Hayes, 2013; mediation; Tingley,

Yamamoto, Hirose, Keele, & Imai, 2014), its use is facilitated by the fact that the logic of mediation is straightforward in that it supposes that one variable causes one or more variables, which then cause some ultimate outcome. In the simplest case, this takes the form of an independent variable (X), a mediator (M), and a dependent variable (Y), with X predicting M, and both then predicting Y, although it is easy to add additional mediators, either in parallel or in serial. When using regression, parallel and single mediator models may be represented by the following formulas: 1

(1)

(2)

Where represents the intercept for regressed on X, represents the regression weight for X in the regression equation for , the intercept for Y regressed on X and all Ms, and the regression coefficients for each mediator in the regression equation for . The indirect effects are quantified by the product terms aibi, and c’ represents the direct effect of X on Y. If there is no missing data, these two quantities then sum to the total effect c, which is itself equal to the covariance between X and Y.

Although the model itself is easy to use and understand, neither testing nor estimation has proven to be simple. Significance testing requires the utilization of methods with excellent Type I and Type II error rates, and in the case of mediation this has proven to be a difficult issue because the product term ab is non-normal, and so many standard methods of testing significance have dubious statistical properties, with deflated

Type I error rates and low power (MacKinnon, Lockwood, Hoffman, West, & Sheets,

2002).

Accurate estimation of the parameters of interest is similarly difficult, as the b and c’ paths are necessarily partly correlational (Sobel, 2008). As with any analysis, establishing a clear relationship between variables is difficult because of the common problems of missing variables, unmodeled moderators, redundancy of measurement (e.g.,

M and Y being the same variable), etc. For mediation analysis this difficulty is compounded because mediation necessarily has at least two hypothesized effects on Y

2

(this remains true even in the absence of a significant direct effect), and it is further the case that the indirect effect is comprised of two estimated parameters. In general, the estimated effects for a given model can be expected to be biased if the model is in some way inaccurate and not fully reflective of the truth - an issue that many would consider all but given when using statistical analyses (cf. Box, 1976). The degree of bias may be minor and safely ignorable, but in extreme cases bias may strongly affect how a set of variable relationships are interpreted.

Before continuing, it is useful to further explicate the issues that arise when testing and estimating mediation models. There is of course a rich literature for issues that arise when using regression in general that apply here as well, as it is how mediation models are often tested. However, in the interest of conciseness we limit ourselves to discussing work exclusively focused on mediation.

Testing Effects

The original causal steps associated with Baron and Kenny (1986) required significant a, b, and c paths before any claims of mediation could be made. This approach has been outmoded for a variety of reasons, including its lack of quantification of the indirect effect, and because it is has low power (Hayes, 2013; MacKinnon et al., 2002).

This is because, contrary to what might be expected because estimating the indirect effect requires estimation of two parameters rather than one, the test of the total effect often has lower power than the test of the indirect effect (Kenny & Judd, 2013; Rucker, Preacher,

Tormala, & Petty, 2011). Further, the indirect effect(s) and the direct effect may be in opposing directions, resulting in suppression (MacKinnon, Krull, & Lockwood, 2000).

3

Together, these issues make the causal steps approach requirement of a significant total effect an unnecessary and high bar for making mediation claims that may prevent detection of genuine effects.

Baron and Kenny (1986) also suggested the use of a normal-theory standard error derived by Sobel (1982) using the multivariate delta method. Although the performance of this of this method of testing indirect effects is clearly superior to that of the causal steps approach and has much higher power (MacKinnon et al., 2002), it nonetheless suffers from a few shortcomings related to the distribution of ab. The distribution of a product term is typically non-normal without large sample sizes (Kisbu-Sakarya,

MacKinnon, & Miočević, 2014), and so the normal theory approximation rarely holds in practice. The result of this then is that the Sobel test is overly conservative, with relatively low power and Type I error rates well below the nominal and normative α =

.05, and users of mediation are advised to avoiding using it (e.g., Hayes & Scharkow,

2013).

At present, the preferred approach for testing indirect effects is to make use of bootstrapped confidence intervals. Bootstrapping as used in mediation simply resamples the data naively with replacement some number of times (e.g., 5000), and for each sample estimates of the indirect effect ab are obtained. These estimates may be used to obtain significance values, using either percentile cutoffs (e.g., 2.5% and 97.5% in the case of α = .05) or some form of correction for bias in the case of the bias-corrected bootstrap, or bias and skew in the case of the bias-corrected and accelerated bootstrap

(Efron, 1987). Though these bootstrapped confidence intervals differ in their details, each

4

generally outperforms the Sobel test in terms of statistical power (Hayes & Scharkow,

2013). Their Type I error rates are generally more accurate than the Sobel test as well, although still below the nominal rates in some cases (Hayes & Scharkow, 2013;

MacKinnon et al., 2002), and inflated in others. Specifically, the bias-corrected and accelerated bootstrap shows some signs of inflated Type I error rate when either the a or the b path is large and the other path is zero, or when sample size is small (Koopman,

Howe, Hollenbeck, & Sin, 2015).

An additional approach to testing mediation is the use of structural equation modeling (SEM). This is typically done with latent variable models, but strictly manifest variable models are easily employed as well. Latent variable mediation models introduce additional considerations beyond the scope of this dissertation, and so we generally limit ourselves to SEM-based mediation models using manifest variables alone (an exception may be found in Chapter 4 where we use SEM as a way to test the validity of mediation claims). To the author’s knowledge, only one article has explicitly advocated for the use of structural equation modeling over regression, regardless of whether latent or manifest variables are used. Iacobucci and colleagues (Iacobucci, Saldanha, & Xeng, 2007) argued that structural equation modeling always outperforms ordinary least squares (OLS) regression, and so should always be used when testing mediation. Although Iacobucci et al. are unique in their suggestion to use SEM exclusively over regression, their results are not, as Cheung (2009) compared multiple methods of estimating standardized confidence intervals for the indirect effect, and showed that at least when using the Sobel standard error, SEM outperforms regression.

5

Although the suggestion by Iacobucci et al. (2007) to use SEM exclusively over regression is not without merit, it must be qualified by the fact that they based their claims on the use of the Sobel test and did not consider bootstrapped confidence intervals. The issue that arises here when comparing SEM to regression is that both will yield identical parameter estimates for a mediation model, and so will also yield identical bootstrapping performance whether using regression or SEM. For testing indirect effects then, whether one uses regression or SEM is then irrelevant if one also makes use of bootstrapped confidence intervals (cf. Cheung, 2009)

Further, Iacobucci et al. (2007) only used one of many methods to estimate structural equation models. Although they do not state which method they used, it is safe to assume that they used maximum likelihood (ML) because of its popularity and statistical properties (e.g., efficiency). Nonetheless, there is also weighted least squares

(WLS), unweighted least squares (ULS), diagonally weighted least squares (DWLS), and generalized least squares (GLS) available to estimate SEMs. Although each estimation method is asymptotically equivalent (Browne, 1974; Shapiro, 1985) and further will yield identical parameter estimates for a saturated model such as mediation, their performance in finite sample sizes can be expected to differ in terms of Type I and Type II error rates due to their differing assumptions regarding the error terms associated with the predicted covariance matrix, e.g., homoscedasticity, independence, etc. (Savalei, 2014).

Of course, some a priori predictions may be made based on the assumptions each method makes, but it is difficult to anticipate the tradeoff between types of errors when testing indirect effects because of its unusual distribution. Applying past work is further

6

complicated by the fact that most such work is focused on latent variable models (which are to be recommended in their own right of course) rather than a strictly observed and saturated model like mediation, and so it is unclear which is to be preferred when testing for mediation. The superiority of SEM is then conditional on the behavior of these various estimations methods relative to bootstrapping, rather than being a general truth.

Examining Effects

In addition to significance testing, it is also important to consider any bias or confounding that may affect the parameters of interest. Such bias may be modest and mostly ignorable, but in extreme cases the estimates may be so biased as to yield oppositely signed estimates that would entail completely different interpretations. The issue of biased estimates has been raised in one form or another by many authors (e.g.,

Bullock, Green, & Ha, 2010; Jo, 2008; MacKinnon, Krull, & Lockwood, 2000; Sobel,

2008; VanderWeele & Vansteelandt, 2009), who point out that in general it is necessary for many strong assumptions to be satisfied in order for mediation models to yield unbiased estimates of the indirect effect (cf. Jo, 2008; Sobel, 2008). These assumptions will be enumerated further later in this dissertation (in Chapter 5), but discussions chiefly focus on the frequency with which confounding may occur (e.g., spurious mediator or a proposed mediator as simply a correlate of the dependent variable; Bullock, Green, & Ha,

2010; Fiedler, Schott, & Meiser, 2011).

The need for methods to consider possible bias in the estimated direct and indirect effects has not been ignored in the mediation literature, and there are multiple methods to examine what effects such variables may have on the trustworthiness of the effect

7

estimates. As with any analysis, the most straightforward method of dealing with biased estimates is to statistically control for additional covariates. Somewhat similarly, the design of the study itself may be changed to one that better supports testing competing explanations. In addition to experimental studies, which when possible are always to be preferred over correlational studies (Stone-Romero & Rosopa, 2008), it is also advisable to make use of longitudinal designs. One such example may be found in Maxwell and

Cole (2007), who showed that biased estimates may occur with cross-sectional designs such that genuine effects may be not significant or of opposite sign, in addition to the possibility that variables with no direct relationship may yield significant paths. To deal with this possibility it is necessary to measure X, M, and Y each at three different points using a cross-lagged panel design to deal with this possibility, and to analyze it accordingly. Latent growth curves and latent difference scores models may also be used to reduce bias, provided that they are more appropriate for the data (cf. Selig & Preacher,

2009).

In cases where alternative study designs or measuring possible confounds are not possible (e.g., because of ethical or financial constraints), there are also methods that instead make use of formulas to determine the sensitivity of the estimated weights to a potential confounder. Rosenbaum and Rubin (1983) and VanderWeele (2010) worked with binary confounders, and derived formulas that may make use of arbitrarily selected bias estimates to examine parameter sensitivity. VanderWeele and Arah (2011) extended those approaches to be much more general and able to deal with categorical or continuous variables in a way that is not restricted to any given estimation method (Cox, Kisbu-

8

Sakarya, Miočević, & MacKinnon, 2013). Additional methods make use of correlated residuals, based on the logic that if there is no confounding, then the residuals of the two regressions shown in equation (1) should be uncorrelated (e.g., Imai, Keele, &

Yamamoto, 2010).

As confounders are typically latent in nature, it is a simple extension to view them through the lens of factor analysis. For mediation this is perhaps not often considered because factor models typically require a larger number of variables to estimate than are often used in mediation models and further because factors are typically used for scale construction. In such cases, the factors serve as a useful summary of variable relationships that also ideally explains the relationships. As such, our argument here is that factor models may be used to examine a worst-case confounding scenario by representing a perfectly confounded relationship rather than mediation. This remains true regardless of the true number of confounding variables, or of the number of estimated factors, and the factor(s) may be viewed as composites that summarize all confounding effects, with a weight for each manifest or latent variable.

In addition to describing relationships, factors may also be used to rule out some confounding explanations. This is because the possible patterns of loadings on a number of factors are limited to some degree, with some correlation patterns requiring at least two factors (i.e., confounds). With that being said, even if a single factor is adequate to describe the variable relationships, it may simply represent a composite of multiple confounding variables. Showing that one factor is adequate is then only a starting point,

9

but demonstrating as such is useful to establish the validity of claims of mediation as at least some competing explanations are ruled out.

In addition to establishing that the general conceptual form of a mediation model is appropriate to describe a set of variables (i.e., one variable causing others, which in turn cause another), it is still necessary to consider the effects of less extreme confounding, and to consider the effects of other forms of model violations (e.g., inaccurate functional forms of the relationships or inappropriate error term assumptions).

More specifically, it is necessary to consider the degree to which the estimates may be biased as a result of such possibilities (e.g., Imai et al., 2010). Parameter sensitivity is perhaps not often considered in the practice of mediation (though it is advised in the literature, e.g., VanderWeele, 2010), but doing so is necessary because in practice one rarely knows just how a model is inaccurate. Rather, it is necessary to consider the degree to which a model must be wrong before the conclusions drawn from it are invalid (Box,

1976).

Despite the fact that the reason a model is wrong is rarely known, the previously mentioned approaches generally assume some knowledge of the form of the inaccuracy in that using them implies a belief that the model is biased only because the relationships are confounded. They are then inappropriate for other violations of model assumptions.

Such violations arise because a model is necessarily incomplete, with a few examples being missing paths, measures with less than perfect reliability, missing terms (e.g., quadratic), and issues related to error terms such as non-normal residuals or outliers.

Although all approaches relevant to regression apply to mediation (e.g., outlier detection

10

and residual inspection), at present, no general techniques have been developed and applied to mediation specifically.

The general approach to considering parameter sensitivity in mediation that we will explore is based on fungible parameters (Waller, 2008), which all yield identical values of R2. In brief, the appeal of fungible parameters is that all sets of them explain the dependent variable equally well, and do so only slightly worse than the optimal (e.g., ordinary least squares) weights. If these weights are highly discrepant yet still explain the dependent variable almost as well, then any conclusions drawn from them should be considered less trustworthy (Green, 1977; Waller, 2008).

Purpose

Fully proving or disproving mediation claims is of course often impossible, but the plausibility of mediation claims may nonetheless be improved by considering sampling variability, and by testing mediation against alternative explanations. The purpose of this dissertation then is to examine the performance of various methods of testing direct and indirect effects, and to develop tools that researchers may use to consider the quality of the mediation estimates obtained. Of course, that is not to say that the methodologies to be developed here will be sufficient in all cases (and they will of course also require some refinement), but instead may serve as useful and simple tools applicable to many situations. All studies presented herein are focused on mediation between manifest variables of the Gaussian outcome type with normally distributed error terms, as this is in keeping with common practice for mediation research.

11

We will begin by comparing various methods of testing direct and indirect effects in Chapter 2, with special consideration paid to structural equation modeling using manifest variables methods due to the relatively few examinations of it in the literature.

For each method employed, we will examine the Type I and Type II error rates across a variety of α levels (rather than just the normative α = .05 level) so as to obtain a fuller picture of the performance and quality of each method. This will be facilitated by the use of receiver-operating characteristic (ROC) curves (Hanley & McNeil, 1982), which compare the risks of Type I and Type II errors across decision thresholds (i.e., α levels in this case). As an extension of considering alternative α levels, we will also make some attempt to better understand the performance of bootstrapping investigating the shape of the bootstrap distribution for indirect effects.

We will take two approaches to biased estimates. The first approach regarding model inaccuracies focuses on confounding effects and makes use of an alternative logic to explanation the relationships between variables. All possible confounding effects can be captured through a factor model with as many factors as there are variables in the mediation model minus one (e.g., two factors for three variables). Alternatively, factor models may be viewed as viable alternative explanations in their own right. We will first consider how mediation results may look given a factor model in Chapter 3, and consider under what conditions the models may diverge. In Chapter 4, we will also make use of the flexibility of SEM to propose an analysis for longitudinal designs that is a mediation model in its own right, but in the sense that a variable may “mediate” itself. In brief, the approach simply tests the possibility that a latent variable is sufficient to explain the

12

variable correlations, and whether or not the inclusion of additional paths that would yield claims consistent with a mediation model of the sort shown in equations 1 and 2 is warranted.

In Chapter 5, we will consider a general approach to model inaccuracies that may be used without specification – or for that matter, knowledge - of the type of model inaccuracy. For this area of research simulations are not of much use, as there are infinitely many possible model violations. Instead, we will work with the basic idea that if there are model inaccuracies, then the parameters from a valid model (vis-à-vis the true weights) can lead to a decreased when used in an inaccurate model (e.g., a model missing a mediator or moderator). However, when the weights do not change much due to such a decrease, then one may trust the results even when they stem from an inaccurate model. We investigate the range of the alternative, fungible weights as a means of quantifying parameter sensitivity and the uncertainty associated with a possibly biased model.

13

Chapter 2: Relative Performance of Methods of Testing Mediation Effects

How best to test for mediating effects has been a recurrent point in the mediation literature. Much of this has been driven by the fact that the distribution of the indirect effect is non-normal, and so it has been necessary to both closely examine approaches that assume normality (e.g., MacKinnon et al., 2002), as well as make use of approaches that do not assume normality (e.g., bootstrapping; Shrout & Bolger, 2002).

The historically most popular approach does not test the indirect effect per se, but rather the individual significance levels of the two constituent terms a and b (Baron &

Kenny, 1986). Known as the causal steps approach, this approach has been criticized on numerous grounds, including lack of power, the lack of quantification of the indirect effect, and no testing of the indirect effect itself (Hayes, 2013). The limitations of this approach are such that mediations methodologists uniformly agree that it should be not used by researchers (e.g., Hayes, 2013; MacKinnon et al., 2002).

In addition to their popularized approach, Baron and Kenny (1986) also suggested using a normal-theory standard error based upon the multivariate delta method (Sobel,

1982). The formula for the standard error is as follows:

14

Where is the estimated standard error, and are the squared estimates of

the path coefficients, and and are their respective squared standard errors. A test of significance is performed by way of the following formula:

With then compared to the appropriate value. This approach is also advised against, as it tends to have relatively poor small sample size performance because it assumes normality, but distribution of the product approaches normality only with large sample sizes (e.g., MacKinnon, Fritz, Williams, & Lockwood, 2007).

At present, the preferred approach for testing indirect effects is to create confidence intervals using nonparametric bootstrapping, with the percentile bootstrap or the bias-corrected and accelerated (BCa) approaches being preferred over other alternatives. Much of the advantage of bootstrapping is owed to the fact that it does not assume normality of the indirect effect, and so it has higher power than the Sobel test

(e.g., Fritz & MacKinnon, 2007; Hayes & Scharkow, 2013). Bootstrapping methods also tend to have Type I error rates more in line with the nominal α level than the Sobel test does. However, this is conditional on the magnitude of the non-zero path and the sample size. For smaller sample sizes (e.g., N < 200), deflated Type I error rates are observed when both paths are zero or one path is small (Fritz, Taylor, & MacKinnon, 2012;

Koopman et al., 2015).

Another approach that may be used to test mediation effects is structural equation modeling. In addition to the generally higher power afforded by SEM over regression (cf.

Cheung, 2009; Iacobucci, Saldanha, & Xeng., 2007), such an approach is appealing 15

because mediation models are global models meant to describe the relationships between multiple variables, and estimating all paths at once is consistent with that fact.

Despite the promise of SEM for testing mediation, a potential limitation of SEMs for mediation is that it is necessary to make additional decisions when using SEMs that are not necessary for regression, and the advantages of SEM must be considered while keeping in mind the various estimation methods for SEM. It is important to note that each of these various estimation methods have effects upon the standard errors for each parameter, and so on their power and Type I error rates. It is further the case that SEMs tend to perform poorly in small sample sizes, and the apparent power that comes with its use may simply reflect a downwardly biased standard error estimate because ML is considered a large sample estimator (Bentler & Yuan, 1999).

An additional consideration that must be made when evaluating SEMs for mediation is that both regression and structural equation modeling yield identical parameter estimates, and so will perform identically when bootstrapped. Iacobucci et al.

(2007) neglected this point, and so their claims must be qualified by that fact. Rather than claim that SEM is superior to regression for testing mediation, it is instead a question of the relative performance of the significance tests that result from each method of estimating the standard errors for each parameter of interest.

Model and Estimation Method Assumptions

There are multiple common estimation methods for structural equation modeling, and here we make use of the five readily available in the lavaan package in R (Rosseel,

2012), which are maximum likelihood (ML), unweighted least squares (ULS),

16

generalized least squares (GLS), weighted least squares (WLS; also known as asymptotically distribution free estimation), and diagonally weighted least squares

(DWLS). These estimation methods are readily available in most statistical packages and therefore easy to implement in practice. Although each method yields identical parameter estimates for a saturated model like mediation, the relative performance of each method in regards to Type I and Type II error rates will depend on the assumptions (or lack thereof) of each method regarding the error terms, and so will be the focus of our investigation.

To better understand the importance of the assumptions made by an SEM estimation method, it is useful to view SEM as a non-linear regression model that predicts a covariance matrix, rather than individual data points (Savalei, 2014). There are three relevant assumptions made by each estimation method for SEMs in terms of the residuals. The first regards the distribution of the errors (normally distributed or not), the equality of variance of the errors (homoscedasticity), and independence of the errors of the covariance matrix. Weighted least squares does not make any of these assumptions; maximum likelihood and generalized least squares assume normality, and diagonally weighted least squares assumes normality and independence, and unweighted least squares assumes all three (Savalei, 2014; Schumacher & Lomax, 2004).

As with linear regression, the assumptions of each method will affect the standard errors of the parameter estimates. The first two assumptions, normality and equality of the error terms, are not necessarily violated when estimating SEMs (Savalei, 2014), although the normality assumption for the covariance error terms may not be satisfied in

17

small sample sizes because the variances and covariances of the residuals are only asymptotically normally distributed (Savalei, 2014). The remaining assumption, independence of errors, is particularly crucial for our purposes here. For ordinary least squares regression this assumption is easily satisfied. As long as no observation affects another (e.g., participants are all measured individually, and do not have any pre-existing relationships) then the error terms may be assumed to be independent. In contrast, for

SEM this assumption is necessarily violated because it is not individual observations predicted, but rather variances and covariances which necessarily share at least some information in their calculation. As a result, most SEM estimation methods do not make such an assumption. Nonetheless, unweighted least squares (ULS) and diagonally weighted least squares (DWLS) assume independence of the covariance residuals, and so will always be inefficient in an SEM context and perform poorly without some form of correction (Savalei, 2014). Such corrections, often known as robust standard errors, serve to account for the model misspecification made by certain inefficient estimators such as

ULS and DWLS, as well as violations for other estimation methods that may arise due to small sample sizes such as non-normality of the residuals.

Robust standard errors are calculated using the following formula (adapted from

Savalei, 2014):

(3)

Where includes the naïve covariance matrix of the parameter estimates, is the true asymptotic covariance matrix as estimated by the sample covariance matrix of the estimates, and is the matrix of the model derivatives evaluated at the parameter

18

estimates for . The subscript LS refers to any fit function that results in model residuals of a quadratic form, which here includes the estimators ULS, DWLS, GLS, and

WLS. The naïve covariance matrix is correct only when the fit function is correctly specified. For when it is not, serves to correct misspecification that may arise when e.g., the residuals associated with the covariance matrix are not independent.

In addition to using robust standard errors to correct for the inefficiency of LS estimators, one may also make use of robust standard errors to minimize the impact of violations of the assumptions of ML. Under the ideal conditions we will investigate here

(i.e., normally distributed variables with standard normal error), robust standard errors for maximum likelihood may seem an odd inclusion because the corrections are largely unnecessary. Even so, robust standard errors are considered of use in small sample sizes

(Savalei, 2014), and the corrections may meaningfully affect testing of the effects of interest in mediation.

Another alternative standard error one may use for ML makes use of first-order partial derivatives of the marginal log-likelihood, rather than the second-order partial derivatives. This method typically underestimates standard errors (Cai, 2008), and so might be expected to have upwardly biased Type I error rates for testing mediation.

However, such an underestimate may prove appropriate here because most methods of testing indirect effects have very low Type I error rates. An underestimate may then result in more accurate Type I error rates – ideally, without exceeding the nominal rate – as well as higher power for testing the indirect effect, and so we consider their use here.

19

Note that these concerns are not simply a matter of considering the asymptotic efficiency of an estimator, even for the direct effect. In finite sample sizes it is not straight-forward to rank the efficiency of all estimators across all conditions (Savalei,

2014), and similarly Type I error rates need not be constant, let alone always reflect the nominal rate. This issue is compounded by the fact that the distribution of the indirect effect complicates significance testing, making it further unclear how well each estimation method will perform when testing mediation.

Bootstrapping

Relative to the other methods of testing significance we have discussed, bootstrapping is unique in that tests are performed without any assumptions regarding the distribution of the parameter of interest. This has proven to be advantageous for dealing with the non-normal nature of the indirect effect, and investigations into its performance have consistently shown that it is more powerful than the Sobel (1982) test, and it has become the standard approach for testing mediation.

Two often used bootstrapping confidence intervals for tests of mediation are the percentile bootstrap (Efron, 1979 and the bias-corrected and accelerated bootstrap (Efron,

1987). The former simply makes use of percentile cut-offs applied to the bootstrapped resamples to yield a confidence interval. The latter applies a correction for bias in the estimated effect, as well as a correction for skewness in the distribution (Efron, 1987).

Which method is to be preferred is somewhat unclear at present due to concerns regarding Type I errors (e.g., Fritz, Taylor, & MacKinnon, 2012; Hayes & Scharkow,

2013), but nonetheless both methods are considered superior to the Sobel test.

20

However, despite the popularity of bootstrapping, the reasons it performs so well have not been thoroughly investigated. Although bootstrapping is known to generally perform well when considering assumption-free distributions such as that of the indirect effect, little is known regarding the empirical distribution of the bootstrapped resamples, e.g., bias, dispersion, skewness, etc. Further, although recommended for small sample sizes (Shrout & Bolger, 2002), bootstrapping is in fact not necessarily stable in small sample sizes and may suffer from increased Type I error rates when one path is large

(Koopman et al., 2015). That this is the case suggests that the bootstrapped distribution of the indirect does not always accurately reflect the true distribution of the indirect effect.

Comparing Method across α levels.

Nearly all research regarding methods of testing indirect effects has focused on α

= .05 (although one exception is MacKinnon, Lockwood, and Williams (2004) who also considered α levels of .1 and .2, but did so largely in passing). This focus on α = .05 seems likely due to two reasons. The first is that it is the normative level for researchers, who are unlikely to use higher α levels. The second is that focusing on α = .05 is to some degree a matter of necessity, as typically investigations regarding the performance of tests of the indirect effect present results in a number of tables. Often this includes one table for Type I error, another for statistical power, and perhaps another for coverage rates. Considering other α levels is then difficult, if only because of space limitations.

Further, it is a great deal of information to integrate, and may be done so only with considerable effort and risk of error.

21

Despite this difficulty, there is some merit in considering both lower and higher α levels for tests of the indirect effect, as doing so may help to better understand the performance of the bootstrap, as well as whether or not its use should be encouraged for other α levels. Ideally, the bootstrap distribution should have higher power at all α levels relevant to practice (typically .1 or less), but the asymmetry of the bootstrapped distribution may result in the superiority of bootstrapping over the Sobel test being reduced, if not reversed, at different decision thresholds.

One way to deal with the difficulty of considering other α levels is to make use of receiver-operating characteristic (ROC) curves (Hanley & McNeil, 1982). Within the context of statistical testing, ROC curves compare the ability of a method to detect an effect given that it is there (i.e., sensitivity, power, or 1 – the risk of a type II error [β]) with the risk of false positives when there is no effect (i.e., specificity or 1 – the risk of a

Type I error [α]), based on various decision thresholds (in this case, α levels). These values are then used to create a curve, and method comparison may be accomplished by considering the area under the curve (AUC). The AUC is equal to the probability that a testing method will correctly classify a randomly drawn effect (or lack thereof) from two categories as being present or not across all decision thresholds, and may range from 0 to

1. A value of .5 indicates chance performance, and higher values are associated with better classification methods.

The obvious advantage of ROC curves is that they afford a simple means to consider the relative performance of methods across a variety of decision thresholds. If two curves do not cross, then the relative ranking of the two methods are constant across

22

all α levels. The parametric methods we use here can be expected to behave in such a manner, with all methods performing consistently better (or worse) than another method across decision thresholds. In contrast, the ROC curves for nonparametric bootstrapping might be expected to cross those of the Sobel-based approaches, as bootstrapping does not assume symmetry of the distribution of the parameter of interest. Whether or not this is the case is unclear, but it is possible that at some decision thresholds bootstrapping may not be as clearly preferable when using other α levels, and may in fact be worse in some cases.

Purpose

The primary purpose of this study is simply to compare the relative performance of a variety of methods of testing for indirect effects, and to do so across different α levels both as a means of informing practice, and as a means of better understanding bootstrapping. In service to that, we will also discuss the direct effect because it serves as a control condition and provides a simple starting point before discussing the more complex indirect effect. Further, the secondary purpose of this study is to better understand the performance of bootstrapping by way of considering its distribution. We do so because it is well-known that the BCa has higher power than the percentile bootstrap (e.g., Hayes & Scharkow, 2013), but at present it is unclear why its corrections are necessary to achieve this result. Both purposes will be facilitated by the use of ROC curves.

In regards to SEM estimation methods and regression, in general it can be expected that the estimation methods will perform comparably well, with a slight

23

advantage for ML because of its efficiency for multivariate normal data, which is what we make use of here. Caveats to the hypothesized superiority of SEM are the estimation methods ULS and DWLS, which are known to be inefficient in an SEM context because of their use of a diagonal weighting matrix (Savalei, 2014). As such, they are likely to have much lower power than other estimation methods, and so lower AUC.

The expected behavior of the ROC curve for bootstrapping is less clear however.

Work on bootstrapping in mediation has typically, if not exclusively, focused on the relative performance of each method at a nominal α of .05 but because of the positive skew of the distribution of the product the advantages of bootstrapping may be reduced, if not reversed, at other α levels. At what α levels this may be true was not predicted a priori.

Method

Design.

Sample sizes. The present study made use of sample sizes of N = 50, 100, and

200. These sample sizes were chosen as being reasonably representative of common sample sizes employed in psychological research, and have often been employed in past mediation simulation research (e.g., Hayes & Scharkow, 2013).

Regression weights. We used regression weights of 0, .14, .39, and .59, as these weights are often used in mediation simulation work (e.g., Hayes & Scharkow, 2013), and represent no, small, medium, and large effect sizes, respectively.

Data Generation. In order to generate the data employed here, the data were created by generating values of X from a standard normal distribution, with M generated

24

from the values of X using the appropriate effect size and standard normal error (e.g., values of X * .14). Values of Y were then generated based on the appropriate weights based on the values of X and M , again with standard normal error. Together there are 64 possible effect size combinations, and with sample size considered there were in total 192 conditions. For each condition there were 1000 replications, for a total of 192000 simulated data sets.

Methods of Significance Testing.

For each method listed below, we calculated confidence intervals and rejection rates based on α levels of .01, .02, .03, .04, .05, .06, .08, .10, .12, .14, .16, .18, .20, .25,

.30, .04, and .5. Additionally, we also calculated the average standard error of each method per condition, so as to compare it to the standard deviation of the estimates themselves. This afforded a method of examining any bias that might occur in the estimation of standard errors for parametric methods.

For bootstrapping, we saved the standard deviation of the bootstrapped resamples for each replication, as well as their mean, skewness, and kurtosis, so as to provide some sense of the average shape of the bootstrapped distribution to compare it to the distribution of the estimates across all replications. It is worth acknowledging that the standard deviation is somewhat irrelevant for bootstrapping because of its use of cutoffs from an empirical distribution and the asymmetry of the indirect effect, but it nonetheless serves as a measure of dispersion (that is admittedly non-optimal because of its reliance on the mean of a skewed distribution) that may be used to describe the distribution of the bootstrapped resamples.

25

Regression. For regression models, tests of the direct and indirect effect were calculated using the normal theory approaches of the Sobel standard error for the indirect effect, and typical regression standard errors for the direct effect.

Structural equation modeling. We made use of five SEM estimation methods, and refer to them using the abbreviations from the package lavaan (Rosseel, 2012).

Specifically, we used maximum likelihood (ML), unweighted least squares (ULS), generalized least squares (GLS), weighted least squares (WLS), and diagonally weighted least squares (DWLS).

Further, we also considered alternative standard errors. In the case of maximum likelihood, we made use of standard errors based on first-order partial derivatives (MLF), as well as robust standard errors (MLM). For robust standard errors we applied them only to ULS (ULSM), because any least squares estimation method for a mediation model yields identical results using robust standard errors.

Bootstrapping. Each simulated data set was bootstrapped 5000 times, and we made use of two methods for bootstrapping confidence intervals. The first was the percentile bootstrap, which simply selects values at appropriate percentiles to obtain a significance test (Efron, 1979). The second was the bias-corrected and accelerated bootstrap (BCa; Efron, 1987). This method includes corrections for bias and skewness in the bootstrapped distribution.

Results

To begin, we created ROC curves to compare power to detect effects of any magnitude, relative to the risk of Type I error rates when at least one path in the indirect

26

effect (a or b) was equal to 0. This allowed a comparison of Type I error rates against the gains in power that are afforded by increasing α levels, and whether or not the relative performance of a method changed across α levels. In order to create each curve and to calculate the AUC, an additional α level of 1 was added, for a total of 17 points. The points between all points were then linearly interpolated to yield smooth curves, with the exception of straight lines observable in the curves at lower levels of specificity as a result of the gap between α = .5 and α = 1. Additionally, so as to avoid confusion, all percentages referred to here are absolute percentages or differences, e.g., if one method rejected 20% of the time and another 60% of the time, then this was reported as a 40% difference.

ROC curves for the direct effect are shown in Figures 1, 2, and 3, and for the indirect effect in Figures 4, 5, and 6. Each of these figures provides plots of the sensitivity across the full range of specificity values, as well as additional plots of the sensitivity as a function of nominal α levels for a restricted range of values (α = .01 to .2) because the uniformly low observed Type I error rates result in overlapping lines for the

AUC plots that are difficult to visually distinguish. Tables 1 and 2 provide sharper resolution regarding the correct and incorrect rejection rates for the direct effect for αs of

.01, .05, .10, and .20, and Tables 3 and 4 do so for the indirect effect.

27

2

8

Figure 1. ROC curve for the direct effect and N = 50, collapsed across all effect size combinations. Full plot comparing the specificity (1 – observed Type I error rate) and sensitivity (1 – observed Type II error rate) is on the right, and the plot on the left shows a limited range of nominal α levels for comparison.

28

2

9

Figure 2. ROC curve for the direct effect and N = 100, collapsed across all effect size combinations. Full plot comparing specificity (1 – observed Type I error rate) and sensitivity (1 – observed Type II error rate) is on the right, and the plot on the left shows a limited range of nominal α levels for comparison.

29

30

Figure 3. ROC curve for the direct effect and N = 200, collapsed across all effect size combinations. Full plot comparing specificity (1 – observed Type I error rate) and sensitivity (1 – observed Type II error rate) is on the right, and the plot on the left shows a limited range of nominal α levels for comparison.

30

Observed Power for Methods of Testing the Direct Effect

α .01 .05 .10 .20

N 50 100 200 50 100 200 50 100 200 50 100 200

Regression 0.449 0.641 0.742 0.596 0.732 0.818 0.667 0.780 0.859 0.747 0.835 0.904 Percentile 0.461 0.646 0.744 0.607 0.736 0.820 0.676 0.784 0.861 0.754 0.838 0.905

BCa 0.460 0.645 0.744 0.605 0.736 0.820 0.675 0.784 0.861 0.753 0.838 0.905 ML 0.490 0.654 0.746 0.620 0.739 0.821 0.685 0.786 0.861 0.758 0.839 0.905 MLM 0.500 0.657 0.748 0.627 0.742 0.822 0.691 0.789 0.863 0.763 0.841 0.906

3 1 MLF 0.402 0.625 0.737 0.552 0.718 0.811 0.628 0.768 0.854 0.715 0.824 0.900

WLS 0.511 0.660 0.749 0.635 0.745 0.823 0.698 0.791 0.864 0.767 0.842 0.907 ULS 0.289 0.546 0.697 0.474 0.669 0.775 0.571 0.727 0.821 0.674 0.795 0.874 ULSM 0.505 0.658 0.749 0.631 0.744 0.823 0.695 0.790 0.863 0.765 0.842 0.907 GLS 0.483 0.652 0.745 0.616 0.738 0.820 0.682 0.785 0.861 0.755 0.838 0.905 DWLS 0.134 0.396 0.625 0.356 0.586 0.737 0.483 0.673 0.790 0.626 0.765 0.853

Table 1. Power for testing the direct effect for all methods, collapsed across all effect size combinations.

31

Observed Type I Error Rates for Methods of Testing the Direct Effect

α .01 .05 .10 .20

N 50 100 200 50 100 200 50 100 200 50 100 200

Reg ression 0.009 0.010 0.011 0.051 0.053 0.050 0.102 0.104 0.102 0.199 0.199 0.200 Percentile 0.015 0.014 0.012 0.064 0.062 0.055 0.117 0.114 0.106 0.218 0.212 0.204 BCa 0.016 0.014 0.012 0.064 0.062 0.056 0.117 0.113 0.105 0.219 0.210 0.204 ML 0.016 0.014 0.012 0.064 0.061 0.054 0.119 0.112 0.107 0.221 0.208 0.206 3

2

MLM 0.022 0.016 0.013 0.077 0.067 0.058 0.133 0.12 0.108 0.235 0.219 0.207 MLF 0.008 0.010 0.010 0.040 0.047 0.047 0.080 0.091 0.096 0.162 0.180 0.190 WLS 0.025 0.017 0.013 0.083 0.070 0.059 0.140 0.123 0.110 0.244 0.223 0.209 T ULS 0.002 0.003 0.002 0.019 0.022 0.024 0.047 0.056 0.056 0.117 0.132 0.133 ULSM 0.024 0.017 0.013 0.080 0.069 0.058 0.136 0.121 0.109 0.240 0.220 0.208 GLS 0.014 0.013 0.012 0.062 0.060 0.053 0.115 0.11 0.106 0.216 0.206 0.205 DWLS 0.001 0.002 0.002 0.014 0.017 0.020 0.039 0.045 0.046 0.110 0.118 0.112

Table 2. Type I error rates when testing the direct effect for all methods, collapsed across all effect size combinations.

32

Direct Effect.

We begin with the direct effect because it is normally distributed, and simple to examine before discussing the more complex indirect effect. Immediately apparent is that most methods performed comparably well for testing c’, with minimal differences between them across α levels and sample size combinations. As there are clear groupings of estimator performance, we will discuss them in four blocks.

The first grouping consisted of regression, maximum likelihood (ML), and generalized least squares (GLS), and these methods performed similarly across conditions. Interestingly, regression had a slight advantage in terms of AUC over both estimation methods when averaged across conditions. However, this was driven partly by the fact that regression was observed here to have Type I error rates below the nominal rate, whereas SEM had more accurate Type I error rates. Further, SEM using ML had higher power than did regression, at least for detecting small effects (for larger effects the differences were minimal, e.g., .2%). Where c’ = .14, regression rejected 17.1%, 26.8%, and 48.4% of the time for N = 50, 100, and 200, respectively. In contrast, ML rejected

18.6%, 28.3%, and 49.1% of the time. This is of course a modest difference, but it does appear to show that SEM outperformed regression in terms of power and in terms of the accuracy of Type I error rates.

The second grouping consisted of weighted least squares, unweighted least squares with robust SEs (ULSM), and maximum likelihood with robust SEs (MLM), all performed about the same. These methods were generally worse than ML or regression in terms of AUC, albeit only slightly so. This difference was driven by the fact that although these methods generally had slightly higher power rates than e.g., ML (roughly 1-2%

33

higher), this gain in power came at the cost of Type I error rates that were 1-2% higher than the nominal rate for N = 50 or N = 100; this difference decreased for N = 200.

The third category consists of the remaining parametric methods, which all performed clearly worse than all other methods. This category consisted of maximum likelihood with first-order SEs (MLF), unweighted least squares (ULS), and diagonally weighted least squares (DWLS) generally performed worse. In regards to MLF, although its AUC values were comparable to ML, this similarity belies the fact that MLF had both low Type I error rates and low power, particularly for small sample sizes. The remaining two, ULS and DWLS, performed particularly poorly. Similar to the previous category, these methods had Type I error rates 1-2% lower than the nominal rate. However, the power of these methods was far less than that of all other approaches for N = 50 or N =

100, and further for small effects these methods correctly rejected the null up to about 7% less than the other methods (see Tables 2.1 and 2.2). DWLS performed particularly poorly, and for medium effects it rejected 50% less given N = 50, and 30% less given N =

100. For large effects, the difference in power was as much as 80% for N = 50, and 44% for N = 100.

The fourth category consists of both bootstrapping methods. These confidence intervals performed well, albeit with slightly lower power than maximum likelihood; this is unsurprising given that the simulated data adheres to the assumptions of maximum likelihood. Across effect sizes, the percentile bootstrapped performed slightly better than the BCa in terms of AUC. Both methods had Type I error rates in line with the nominal rates, and had higher power than regression alone, but lower than ML, WLS, and GLS, and ML and ULS with robust standard errors.

34

It should be acknowledged that there was not a linear effect of sample size observed here. Specifically, for N = 100, all methods, both parametric and nonparametric, had increased Type I errors relative to N = 50 and N = 200. As the Type I error does not depend on sample size – only our simulated Type I error here does – we do not interpret it.

35

36

Figure 4. ROC curve for the indirect effect and N = 50, collapsed across all effect size combinations. Full plot comparing specificity (1 – observed Type I error rate) and sensitivity (1 – observed Type II error rate) is on the right, and the plot on the left shows a limited range of nominal α levels for comparison.

36

37

Figure 5. ROC curve for the indirect effect and N = 100, collapsed across all effect size combinations. Full plot comparing specificity (1 – observed Type I error rate) and sensitivity (1 – observed Type II error rate) is on the right, and the plot on the left shows a limited range of nominal α levels for comparison.

37

38

Figure 6. ROC curve for the indirect effect and N = 200, collapsed across all effect size combinations Full plot comparing specificity (1 – observed Type I error rate) and sensitivity (1 – observed Type II error rate) is on the right, and the plot on the left shows a limited range of nominal α levels for comparison.

38

Observed Power for Methods of Testing the Indirect Effect

α .01 .05 .10 .20

N 50 100 200 50 100 200 50 100 200 50 100 200

Regression 0.135 0.370 0.530 0.309 0.514 0.661 0.416 0.593 0.735 0.544 0.693 0.826 Percentile 0.236 0.445 0.572 0.392 0.564 0.695 0.480 0.632 0.762 0.582 0.715 0.839 BCa 0.283 0.474 0.593 0.449 0.599 0.722 0.542 0.668 0.790 0.649 0.756 0.864 ML 0.150 0.377 0.532 0.326 0.520 0.664 0.431 0.599 0.738 0.556 0.697 0.828 MLM 0.166 0.383 0.536 0.336 0.526 0.665 0.440 0.605 0.740 0.565 0.700 0.828

39 MLF 0.076 0.327 0.515 0.225 0.482 0.642 0.338 0.563 0.723 0.479 0.667 0.815 WLS 0.178 0.389 0.538 0.349 0.531 0.667 0.452 0.608 0.742 0.572 0.704 0.830 ULS 0.029 0.253 0.483 0.174 0.446 0.612 0.302 0.541 0.698 0.467 0.653 0.796

ULSM 0.171 0.386 0.537 0.343 0.529 0.666 0.446 0.606 0.741 0.569 0.702 0.829

GLS 0.144 0.374 0.532 0.320 0.517 0.662 0.424 0.596 0.737 0.551 0.696 0.827

DWLS 0.000 0.007 0.334 0.004 0.251 0.543 0.086 0.434 0.642 0.344 0.601 0.760 Table 3. Power for testing the indirect effect for all methods, collapsed across all effect size combinations.

39

Observed Type I Error Rates for Methods of Testing the Indirect Effect

α .01 .05 .10 .20

N 50 100 200 50 100 200 50 100 200 50 100 200

Regression 0.000 0.001 0.004 0.011 0.017 0.024 0.034 0.048 0.054 0.097 0.118 0.131 Percentile 0.006 0.007 0.008 0.031 0.036 0.036 0.064 0.072 0.074 0.133 0.148 0.154 BCa 0.011 0.012 0.011 0.052 0.054 0.051 0.106 0.104 0.101 0.216 0.216 0.212 ML 0.001 0.002 0.004 0.014 0.019 0.025 0.039 0.050 0.056 0.106 0.122 0.133

4

0

MLM 0.002 0.003 0.004 0.018 0.023 0.026 0.046 0.056 0.059 0.116 0.130 0.138 MLF 0.000 0.001 0.003 0.005 0.012 0.020 0.018 0.036 0.047 0.064 0.097 0.120 WLS 0.002 0.003 0.004 0.021 0.025 0.027 0.051 0.058 0.060 0.123 0.133 0.140 ULS 0.000 0.000 0.001 0.002 0.007 0.013 0.011 0.025 0.036 0.055 0.084 0.102 ULSM 0.002 0.003 0.004 0.020 0.024 0.027 0.049 0.057 0.060 0.120 0.131 0.139 GLS 0.001 0.001 0.004 0.013 0.019 0.024 0.037 0.049 0.055 0.102 0.120 0.133 DWLS 0.000 0.000 0.000 0.000 0.002 0.008 0.003 0.015 0.028 0.038 0.069 0.087 Table 4. Type I error rates when testing the indirect effect for all methods, collapsed across all effect size combinations.

40

Indirect Effect.

As in the case of the direct effect, we will again present the methods as falling into four categories, with the first three categories parametric in nature, and the remaining fourth the nonparametric bootstrapping approaches. Specifically, the parametric estimation methods may be considered to fall into three categories, with the first including ML, GLS, and regression, the second WLS, robust standard error approaches

(ULSM and MLM), the third category ULS, DWLS, and MLF.

In general, the relative performance of the parametric methods was as might be expected from the results concerning the direct effect, with again modest differences in

AUC values for different methods. Those methods in the first category again had the highest values of AUC. Further, as in the case of the direct effect, the differences between regression and SEM with ML were small, with the latter appearing to be preferable in that both methods had approximately equal Type I error rates, but SEM with ML had higher power (cf. Iacobucci et al., 2008). This may be seen in Tables 3 and 4.

Given the recommendation of Iacobucci et al. (2008) to use SEM exclusively over regression and that SEM estimation methods tend to require larger sample sizes, we also considered possible bias in the estimated standard error, relative to the observed standard deviation of the estimated effects across all 1000 replications, which then serves as an estimate of the true distribution of the indirect effect. As can be seen in Figures 7 and 8, as sample size increased so too did the accuracy of both OLS regression and SEM with

ML, such that for N = 200 both approaches yielded reasonably accurate estimates of the true distribution. In contrast, for N = 50 both approaches yielded clearly biased estimates.

Interestingly, the bias for these two methods were in opposite directions, such that for N

41

= 50 regression tended to overestimate the standard deviation, whereas SEM with ML tended to underestimate it.

The remaining parametric approaches - MLF, DWLS, and ULS - again had the lowest values of AUC among the parametric methods, but the differences were quite small compared to those observed for testing the direct effect. This however belies the fact that for small α levels and sample sizes these methods have much lower power than the other methods, as is readily apparent by the clearly lower lines in Figures 4, 5, and 6.

In regards to bootstrapped confidence intervals, both the percentile and the BCa bootstrap approaches generally had lower AUC values than the parametric approaches.

At first glance this appears to stand in contrast to standard practice and recommendations regarding the use of bootstrapping, but in fact our results here do not conflict with such recommendations. The gain in power afforded by bootstrapping approaches comes at the modest cost of higher (but still generally acceptable) Type I error rates, with this trade-off resulting in similar AUC values.

42

43

Figure 7. Difference between the standard Figure 8. Difference between the standard deviation of ab deviation of ab estimates across all replications estimates across all replications for a given condition, for a given condition, compared to the observed compared to the observed standard deviation of the standard deviation of the replications. replications. Calculated as the mean estimated SE for SEM Calculated as the mean estimated SE for with ML across all replications – SD of replications. regression across all replications – SD of replications.

43

Comparison of Distributions

In the previous section we found that methods of testing the indirect effect perform roughly as might be expected from the results for the direct effect, and further as might be expected by the literature regarding testing indirect effects. However, it remains unclear as to how well it recovers the true distribution of the indirect effect.

To help better understand the performance of the percentile bootstrap, and by extension why it is necessary to provide some form of correction as the BCa does in order to maximize power and coverage of the true effect (cf. Hayes & Scharkow, 2013), we considered the average properties of the distribution of the estimates for a condition (the initial estimates for each replication), relative to the averaged bootstrapped distributions

(1000 replications, with 5000 bootstraps each). Stated less precisely, we compared the observed distribution of the estimates to the bootstrapping distributions. Specifically, we considered the mean, median, standard deviation vs. standard error, skewness and kurtosis of these two distributions. Comparisons of these measures are shown in Figures

9 through 13. The differences between these two distributions as they relate to the magnitude of ab are shown in Figures 14 through 18.

44

45

Figure 9. Mean of the mean for bootstrap resamples Figure 10. Median of the median for bootstrap resamples across all replications, compared to the mean of all across all replications, compared to the median of all replications. replications.

46

45

4

6

Figure 11. Mean of the skew for bootstrap resamples Figure 12. Mean of the kurtosis for bootstrap resamples across all replications, compared to the skew of all across all replications, compared to the kurtosis of all replications. replications.

46

47

Figure 13. Mean of the standard deviation for bootstrap resamples across all replications, compared to the standard deviation of all replications.

47

4

8

Figure 14. Difference between the mean Figure 15. Difference between the median ab across all ab across all replications for a given replications for a given condition, compared to the median of condition, compared to the mean ab from the median ab from each bootstrapped distribution. Calculated each bootstrapped distribution. Calculated as the median median of the bootstrapped distributions - as mean mean of bootstrapped median of replications. distributions - mean of replications.

48

49

Figure 16. Difference between the skew of Figure 17. Difference between the kurtosis of the distribution the distribution of the estimated values of of the estimated values of ab for a given condition across all ab for a given condition across all replications, compared to the mean kurtosis from each replications, compared to the mean skew bootstrapped distribution. Calculated as mean kurtosis of from each bootstrapped distribution. bootstrapped distributions - kurtosis of replications. Calculated as mean skew of bootstrapped distributions - skew of replications.

49

50

Figure 18. Difference between the standard deviation of the distribution of the estimated values of ab for a given condition across all replications, compared to the mean standard deviation from each bootstrapped distribution. 51 Calculated as mean standard deviation of bootstrapped estimates – the standard deviation of the replications.

50

Bootstrapping generally accurately estimated the mean of the distributions

(Figures 9 and 14). Figure 14 suggests a slight tendency for overestimation of the mean when a = b = .59, but the magnitude of the difference is so small as to be considered zero and ignorable. As such, bootstrapping may be considered to perform well in terms of recovery of the mean of the indirect effect for all effect sizes.

For the remaining distribution characteristics, it is useful to distinguish between cases when ab = 0 and when ab > 0. We will begin with the cases where ab = 0. For such cases, bootstrapping accurately estimated the median (Figures 10 and 15), with increasing sample size further improving accuracy. In contrast, bootstrapping tended to underestimate the magnitude of skew such that the skew of the bootstrapping distributions was reduced to near zero, despite the distribution of the replications having negative skew (Figures 11 and 16). Similarly, under the null bootstrapping grossly underestimated more extreme values of kurtosis for the indirect effect distribution (e.g., a kurtosis of 12 compared to a kurtosis of 7; Figures 12 and 17). Finally, consistent with the well-known fact that the use of percentile bootstrapped confidence intervals rarely results in a Type I error, bootstrapping clearly overestimated the standard deviation of the true distribution when ab = 0 (Figures 13 and 18). In sum then, when there was no indirect effect, bootstrapping seems to have accurately captured measures of central tendency (the mean and the median), overestimated dispersion, and underestimated the magnitude of skew and kurtosis.

We now turn to the cases where ab > 0. Here, bootstrapping again accurately estimated the mean of the true distribution, with very little difference between the bootstrapping distribution and the replication distribution. However, bootstrapping

51

showed a tendency to underestimate the median (Figure 10), with the magnitude of the underestimate having increased with ab (Figure 15), and decreasing with N. This bias was modest however, and was at most at an underestimate of .011 for ab = .3481 and N =

50. In regards to skew (Figures 12 and 16), bootstrapping again tended to underestimate the magnitude of skew, with the result being that it was less skewed than the replication distribution. The degree of underestimation decreased with ab. For kurtosis, unlike when ab = 0, the kurtosis of the true and bootstrapping distributions were similar. Additionally, as ab increased, bootstrapping more precisely estimated the kurtosis of the distribution of the replications. In regards to the standard deviation when the null was false, we found that in general bootstrapping overestimated the standard error of the distribution of the replications. However, this effect was curvilinear such that the degree of overestimation decreased until a = b = .39, and so ab = .1521, but again increased when either a or b was equal to .59, and so ab = .2301 or .3481.

Discussion

Past work on methods of testing indirect effects has almost exclusively focused on

α = .05, and such work has clearly established the superiority of bootstrapping over the

Sobel test at that decision criteria. As we have shown here, it is also the case that bootstrapping is preferable for other decision thresholds as well, including those relevant to common practice (i.e., α = .01 or .10).

For testing direct or indirect effects with parametric methods, there was relatively little difference between the AUC values of most methods, and most performed comparably well across α levels, excepting the poor performance of ULS, DWLS, and

ML with first-order standard errors (MLF). Of note is that, consistent with Iacobucci et

52

al. (2007), ML with naïve standard errors had higher power at lower α levels, as well as more accurate Type I error rates than regression. Nonetheless, although ML had more accurate Type I error rates and higher power, the AUC of ML and regression typically agrees to the second or third decimal place, with regression outperforming SEM under some conditions. Perhaps unsurprisingly then, it is then a matter of one’s weighting of

Type I and Type II errors in deciding between these two methods.

In regards to bootstrapping there were two main findings. The first was simply that at all α levels we considered here, and regardless of parameter estimation method, bootstrapping maintained its power advantage over the Sobel test. As such, at least when using strictly manifest variable mediation models, the use of SEM or regression will yield identical conclusions regarding the indirect effect if one makes use of bootstrapping, and as such the recommendation of Iacobucci et al. (2008) to use SEM over regression must be qualified by that fact.

The second finding is that, interestingly, the distribution of the bootstrapped indirect effect does not accurately reflect the true distribution of the indirect effect, at least not for the sample sizes we considered here. What we present here is not the first example of bootstrapping failing to accurately recover the true distribution. Indeed, the failure to do when creating confidence intervals for standard errors is originally what led

Efron (1987) to create the bias-corrected and accelerated (BCa) bootstrap.

Although our results here are of course preliminary, they do serve to provide some explanation for the superiority of bootstrapping. It is well-known that the percentile bootstrap has low Type I error rates (e.g., Koopman et al., 2015), and as we show here this may be attributed to the fact that under the null bootstrapping clearly has wider

53

distributions than was accurate. In contrast, for small effects this overestimation is much smaller and so may serve to explain in part why bootstrapping performs so well for such effects.

In practice, which bootstrapped confidence interval is to be preferred depends upon one’s weighting of Type I and Type II errors (cf. Hayes & Scharkow, 2013), as well as effect size. If one is most concerned about false positives, then the percentile bootstrap appears to be the best option as in general it has Type I error rates below the nominal level, yet is still more powerful than Sobel-based approaches. Additionally, the percentile bootstrap also had higher values of the AUC than the BCa, and in general will result in more accurate conclusions (note that this is across α levels, rather than specific to α =

.05).

If one is most concerned about power, then the BCa is to be preferred when testing indirect effects. For α = .05, with correct rejection rates were 3-5% higher than the percentile bootstrap. However, this gain in power comes at a cost of increased Type I error rates, both relative to the decreased error rates of other methods and in absolute terms, as false positives may exceed the nominal α level if one path is large or sample size is small (cf. Koopman et al., 2015). In such cases, Type I error rates are roughly 1-

4% higher than the nominal rates.

Future Directions.

Here we considered only the single mediator case because of its popularity and because it is a natural starting point for mediation research, but it is unlikely that only a single process mediates the relationships of interest in psychology (Baron & Kenny,

1986). A clear next step for future research then is to consider the relative performance of

54

the methods we considered here in a parallel or serial mediator context, which would have at least two benefits. The first is that it would consider the performance of these methods under conditions that are generally considered more likely to reflect the underlying processes. The second is that it would also allow some investigation into the effects of misspecification were one to falsely assume that the path between mediators was equal to zero (i.e., to treat a serial mediator as a parallel mediator case). This is perhaps not often of interest to users of mediation, but given that the interpretation of the variable relationships may be strongly contingent on the relationship between the two mediators it is worth considering.

Regarding the performance of bootstrapping, what we showed here was only an initial investigation into the reasons for its performance. In addition to future research considering the above point regarding parallel and serial mediator cases, a more thorough investigation into the effects both sample size and effect size is warranted. This is because, consistent with past work showing that bootstrapping tends to have poor small sample size performance (Koopman et al., 2015), we found that as sample size increased the accuracy of the bootstrapped distributions improved. In regards to effect size, the bias observed for both skew and the standard deviation were clearly non-constant across effect sizes, and a deeper investigation would likely reveal how this relates to the observed performance of bootstrapping. Such an investigation may also investigate why there was no apparent crossing of the ROC curves. Speculatively, it is possible that this is due to the non-constant bias in skew and dispersion as effect sizes change, but a more in depth analysis is required to understand these findings.

55

Finally, although we considered why the corrections offered by the BCa are necessary when testing the indirect effect, we did not investigate the effectiveness of the corrections in relation to the true distribution. The BCa is known to have inflated Type I error rates under some cases (e.g., small sample sizes), and so it is clear that the corrections do not perfectly yield the true distribution of the indirect effect. This may perhaps be investigated in greater detail by way of comparing the analytic approximation to the BCa developed by Efron (2003) to the distribution of the product developed by

MacKinnon and colleagues (MacKinnon, Lockwood, & Williams, 2004).

56

Chapter 3: Factor Model as an Alternative Explanation

In practice it is rarely the case that all confounding variables are known, and there is further often limited information regarding the nature, effects, and number of the unknown confounders. Even so, it is possible to consider the sensitivity of the estimated effects to possible confounding. Doing so provides some information about how wrong a model must be in order for any conclusions drawn from it to be undermined, or alternatively may be considered to provide information regarding alternative ways of viewing the data one has.

Sensitivity analysis for mediation may take a few forms. Ideally, parameter sensitivity is considered by way of measured variables, i.e., statistically controlled for.

Without such variables, sensitivity analysis may be conducted using formulae that allow one to consider arbitrarily strong unknown confounders. These approaches have developed to the point that one may consider dichotomous and continuous confounders, without being confined to a particular model (e.g., VanderWeele & Arah, 2011).

A third approach that is not often explicitly discussed when using mediation is to make use of different models as a means of examining parameter sensitivity. Selig and

Preacher (2009) discuss a few relevant models, including autoregressive models, latent growth curves, and latent difference scores. Additionally, Maxwell and Cole (2007) discuss the use of cross-lagged panel designs to reduce bias in the estimated mediation

57

effects. The approach we take here also considers the utility of alternative models in examining the quality of mediation claims. Specifically, we make use of factor models, which are formally equivalent with mediation models because both may perfectly recreate the correlations between a set of variables, provided that enough factors are employed (at most, one less than the number of variables involved in a mediation scheme).

Using factor models to examine parameter estimates has a few advantages. The first is that it captures the potential effects of confounding in an elegant way, and it is simple and straight-forward to use and interpret. Factor models have the useful property of constraining all correlations between variables to be 0, conditioned on the effects of the latent variable. Applied to mediation, doing so results in a way to quantify the worst- case confounding scenario, where all apparent direct and indirect effects are in fact due to other variables not included in the mediation scheme. Regardless of the true number of missing variables, the factors themselves may be interpreted as an amalgamation of all confounding effects, with the loadings serving as weights that summarize their effects upon the manifest variables of interest.

In addition to utilizing factor models to capture all confounding effects in a worst- case scenario, a factor model may also be considered a viable alternative model in its own right. This is particularly true when the variables involved in the study are of a similar kind (e.g., emotions, attitudes, etc.), as the more related a set of measured variables are the more likely it is that they simply represent the same variable measured repeatedly.

Indeed, this is well-known to users of mediation and reflected in concerns that M and Y may represent the same variable (Bullock, Green, & Ha, 2010). The result of this then is

58

that researchers must prove that they are in fact measuring different variables for X, M, and Y – or at least more than a single variable.

Further, psychological research and theory often assumes a small number of latent variables are responsible for a variety of observed variables. This is clearest in the case of self-reported scales that use multiple indicators for a latent variable, but a few general examples include stereotype threat effects explained by working memory (Schmader &

Johns, 2003), persuasion by levels of thought confidence (Petty, Briñol, & Tormala,

2002), and performance by arousal (e.g., Anderson, 1994). Of course, variables that are not conceptually or methodologically similar (e.g., self-report vs. physiological measures) are less likely to share a factor space, though they may well share a common cause (e.g., a stressor affecting both reported anxiety levels and cortisol levels).

In such cases where a factor model is appropriate, either as an alternative explanation or as a method of capturing confounding effects, the implication is that it is not necessarily the variables of interest that are responsible for any effects, but rather it is the underlying factors that are responsible for the relationships and the changes in the variables of interest. Woody (2011) briefly discussed this possibility, and showed with a single example that it is easy to use scale items to test for mediation, resulting in statistically significant results. Specifically, he showed that using three items from a dieting scale (Restraint Scale; Herman & Mack, 1975) as X, M, and Y resulted in apparent mediation. Such an example is perhaps a bit contrived, but it nonetheless illustrates a case where a mediation model would be in conflict with both standard practice and intuition.

That is not to say that mediation as it is often described could not occur between single items of a self-reported measure, but rather that because of the similarity of the variables

59

it would be more difficult to prove that there are any causal relationships in the manner supposed by mediation.

Factor Spaces in Mediation

Before we begin illustrating how mediation results may appear when a factor space underlies the variables, we will first further develop why a factor space may be used to capture confounding effects or serve as an alternative explanation in its own right.

As how a factor model is interpreted is quite up to its user, we will not too heavily belabor the distinction between confounding and alternative models, and instead leave it to the user to decide which is most appropriate for a set of variables. The viability of a factor model is most evident in the case of non-experimental, cross-sectional data, using conceptually similar variables, and so we will begin there.

We focus primarily on conceptually similar variables as they seem likely to be the cases where a factor model would be of the greatest utility. By conceptually similar variables, we refer to variables that may be considered to be of the same type (e.g., emotions, indicators of intelligence or self-esteem, interpersonal styles, etc.). The similarity of such variables makes it likely that they share similar causes, and in cases involving conceptually similar variables, a low-dimensional factor space is a simple and self-evident alternative to a mediation scheme that is consistent with common practice.

The possibility of a lesser number of variables being responsible for a set of variable relationships is of course one of the core motivations behind latent variable modeling more generally, and situations where theory describes a shared factor space describes the observed variables are not uncommon. In general, psychological research and theory assumes a small number of latent variables are responsible for a variety of

60

observed variables. Some circumplex examples include emotions (Russell, 1980; 2003) and interpersonal relationships (Gurtman, 2009), and one or two factors are also described for domains such as self-monitoring (Lennox & Wolfe, 1984; Snyder, 1974), narcissism (Dickinson & Pincus, 2003; Raskin & Terry, 1988), and self-esteem

(Rosenberg, 1965; Tafarodi & Swann, 2001).

One might be inclined to believe that if latent variables are used to capture X, M, and Y with multiple indicators for each factor, but doing so does not preclude the possibility of a shared factor space. Factors are simply summaries intended to capture some dimension of a measure and may themselves be summarized by still higher-order dimensions. A few examples include self-compassion and its six sub-scales (Neff, 2003) and the higher-order factors of stability and plasticity in relation to the Big Five personality dimensions (DeYoung, Peterson, & Higgins, 2002). The importance of this fact is that while structural equation modeling may be used to great effect for testing mediation (cf. Iacobucci et al., 2007; Selig & Preacher, 2009), it does not provide prima facie evidence that a shared factor space does not underlie the variables of interest.

Although we focus on conceptually similar variables as being most likely to share a factor space, we wish to also note that the issue we discuss here is potentially present when dealing with methodologically similar variables – in particular, self-report variables. Methodological factors may result in a shared factor space that affects multiple measures, and such factor spaces may occur simply due to a response process as when respondents use overlapping information about themselves or others (e.g., Borkenau,

1986; Wojciszke, 1994). For example, participants – particularly apathetic ones, as any researcher familiar with undergraduate participant pools may attest to - may use the same

61

behavior (e.g., doing a difficult favor for a friend) to answer questions about friendliness, helpfulness, and kindness, and perhaps even competency, loyalty, and reliability.

Similarly, affect has sufficient heuristic value as to influence a variety of dependent variables (cf. Podsakoff, MacKenzie, Lee, & Podsakoff, 2003). As a result then, even conceptually distinct variables may nonetheless share a factor space, and so a factor model may still be used to great effect.

A point that we have neglected thus far is that a set of variables may follow a factor model, even if there are direct relationships between them beyond the relationships described by the factor model. Such relationships are violation of the factor model assumption of local independence, i.e., given the latent variables the observed variables do not influence each other in a direct way. Detection of such effects would eliminate a factor space interpretation, but it is not possible to do so with only three variables because of the model equivalency. However, if there are multiple indicators for the variables of interest then it is possible to make use of the differences between the two models to better validate and test mediation claims. We do not discuss this further here, and instead develop this point in greater detail in Chapter 4. For now, it is sufficient to simply assume that one has only three variables, as that is not uncommon when using mediation models.

Shared Factor Spaces and Study Designs

Broadly, the design of a study may differ in two respects. The first is the degree of manipulation of X, and the second is the amount of time that passes between measurements of the variables of interest. From a causal interpretation point of view, the ideal design is experimental and longitudinal, with random assignment to some manipulation of X, with M following X, and Y following M (cf. Maxwell & Cole, 2007;

62

Stone-Romero & Rosopa, 2008). In practice, however, it is not always possible to have such a design, as some variables are difficult if not impossible to manipulate (e.g., socioeconomic status, gender, or health).

It is also not possible to simultaneously apply an experimental manipulation to M without interfering with the mediation of effects (Bullock, Green, & Ha, 2010), as mediation concerns the transference of an effect X has on M. Randomly assigning both variables in the same experiment interferes with any effect X may have on M, and the effect on Y caused by a manipulation of M may not be the same effect that mediates the

XY relationship. One may experimentally manipulate M in a separate study as a test of the mediation relationship between M and Y, but it is not possible to show that an effect of X is being passed through M if a manipulation is applied to both variables simultaneously.

As a consequence, even experimental mediation designs are necessarily partly correlational (cf. Sobel, 2008) and do not eliminate the possibility of a shared factor space. Specifically, if X is a manipulation, the shared factor space idea still applies to the mediator(s) and the dependent variable. However, because the factor space idea is necessarily limited for experimental studies, we will not pursue this design. It is simply worth making clear that experimental manipulations, while generally preferable in examining mediation, do not circumvent the possibility of a shared factor space or otherwise confounded relationship (Bullock, Green, & Ha, 2010; Jo, 2008; Sobel, 2008).

A similar issue applies in regards to the time course of a study. Although longitudinal designs are generally preferable over cross-sectional designs, their use does not eliminate the possibility of confounding. Instead, the advantages it affords relate to reducing bias in the estimates of the direct and indirect effects (Cole & Maxwell, 2003;

63

Maxwell & Cole, 2007; Maxwell, Cole, & Mitchell, 2011). Still, it is worth briefly describing the necessary modifications to the factor space idea in the case of a longitudinal design. Maxwell and Cole (2007) describe both an autoregressive approach and a random effects approach to testing for mediation. Figure 17 gives a representation that can be used for both approaches. We will not use such a design here (but will do so in the next Chapter); it is simply to illustrate that a factor model may still apply regardless of the time between measurements.

β β1212 01 θt=0 θt=1 θt=2

λX 0 λM λY λX 0 λM λY λX 0 λM λY

X0 M0 Y0 X1 M1 Y1 X2 M2 Y2

Figure 19. One-factor longitudinal model. X0, M1, and Y2 are in bold to indicate that in an incomplete longitudinal design they would be the variables that are measured and subject to a mediation analysis.

64

What looks like three different latent variables in Figure 17 is in fact one latent variable that possibly changes across time, and is measured at t = 0, t = 1, and t = 2. For the autoregressive model, θt+1 = β0 + βt,t-1θt + εt+1, with the additional constraint that β01 =

β12, i.e., there is a fixed autoregressive effect, constant across time. Alternatively, for the latent growth curve model, there are random effects such that θt = θt=0 + βt + εt , with θt=0 as the random intercept, β is the random slope, and t is a measure of time elapsed since t

= 0. In contrast to the autoregressive approach, the regression coefficients β in Figure 17 would not be equal. Instead, they would depend on t and on the variance and covariance of the random variables of the growth model.

Although the above models are comprised of three factors with three indicators each, in practice one may make use of an incomplete longitudinal design that only uses each indicator once, with X, M, and Y measured at three distinct time points. However, as in the case of a cross-sectional design, a one-factor model would likely be sufficient to recreate their correlations, and therefore also able to explain the results from a mediation analysis. The nature of the study design has no effect on this, as the correlations are agnostic to study design. A factor model may apply equally well regardless.

Purpose and Plan

The relevance of factor models to mediation models will be developed in two parts. First, we start from factor models in order to illustrate what the mediation analysis results might look like in such cases. This is our main purpose, as what can be expected from mediation analysis results when the true model is a factor model has not been illustrated. Second, although estimating factor models using just three variables is

65

difficult and is likely the primary reason that they are not considered when conducting tests of mediation, we will show it is possible to calculate some factor structures using just the three variables in a mediation model.

To illustrate our point regarding factor and mediation model equivalence, we will show what sorts of mediation results may occur given different positions of a set of variables in a shared factor space. We have chosen the case of a single mediator between the independent and dependent variables because it is the simplest mediation model, and it is sufficient for illustrative purposes. Note however that regardless of the number of variables in a mediation scheme, a factor model may be used either to capture confounding effects or to serve as an alternative model entirely.

We shall work with X, M, and Y as manifest variables that are located in a two- dimensional factor space together with a number of other variables. In the present case, the variables may be considered to be measured with perfect reliability, such that the variance that is not contained in the factor space is unique variance rather than error variance. The effects of unreliability are then eliminated so that the effects of belonging to a factor space can be seen in a pure form, and without any distortions of the paths due to unreliability.

For didactic purposes we shall use Russell’s (1980; 2003; Yik, Russell, & Steiger,

2011; see Figure 18) model of emotions1, as the factor space is well established. Further,

1 We acknowledge that this model represents core affect and not emotions per se (for more

information, see Russell, 2003), but for the sake of explicating our point we will treat this

circumplex as a model of emotions.

66

emotions have the advantage of being plausibly linked to one another across a variety of settings due to the rich phenomenological diversity of manifest emotions, and because they are often measured as discrete states. It is also easily understood as representing either confounding or an alternative explanation. For example, a researcher may wish to see if being frustrated (X) may lead to being miserable (M), and then potentially to feeling depressed (Y). Alternatively, a researcher may wish to see if being delighted (X) may lead to being happy (M), and then potentially to being glad (Y). Such hypotheses seem reasonable, and it is relatively easy to imagine situations where such transitions between emotional states may occur, and so it is easy to generate an enticing rationale to conduct tests of mediation. However, past work on emotions suggests that these emotions do not influence each other. Instead, it is the underlying dimensions of valence and arousal that are responsible for their intercorrelations (cf. Russell, 2003). A mediation model involving three or more emotions would then be confounded with these two latent variables.

67

Figure 20. Affect circumplex (taken with permission from Yik, Russell, & Steiger, 2011).

68

Regarding distinguishing between factor and mediation, we will provide formulae that may be easily used. These formulae are based on simple derivations that one can use to translate the parameters from one model to those of the other model. Perhaps their greatest utility is that they demonstrate that not all mediation estimates are possible given a one-factor model, and so if such patterns do occur then one can be confident that at the very least there is more than one variable involved. As such, the formulae we provide may be used under some conditions to rule out at least some alternative explanations for a set of mediation results.

From Factor Models to Mediation Analysis Results

Although not ideal from a practical perspective to work from a factor model to a mediation model because researchers are likely to have mediation results before they are interested in any alternative explanations, it is necessary because for two factors there are an infinite number of factor loading patterns that may result in the same set of regression weights. However, the reverse is not true; a set of factor loadings will yield only one set of regression weights. We therefore derive all results here working from a factor space.

Orthogonal Independent and Dependent Variables

We begin with the simple case of where the independent and dependent variables are orthogonal to each other. This case is relatively straightforward to discuss, and allows a convenient starting point because it results in rXY = 0, and so no total effect c. To illustrate such a relationship we will use frustrated (X), miserable (M), and depressed (Y).2

2 If these emotions would be precisely at the three reference lines we use here (X,

IX, and VIII in Figure 2), the angle between frustrated and depressed would be

only 60°. The angles we use are approximations useful for didactic purposes.

69

Roughly, these emotions represent activated displeasure, displeasure, and deactivated displeasure, respectively (see Figure 18). One may hypothesize that feeling frustrated leads to feeling miserable and that feeling miserable leads to feeling depressed. In other words, that feeling miserable mediates the relationship between feeling frustrated and feeling depressed. This hypothesis is easily tested with a mediation analysis to obtain estimated direct and indirect effects.

Figure 19 illustrates the vectors for these emotions, with their placement approximately the typical emotion circumplex rotated 90° clockwise, and then mirrored.

Frustrated is then in the upper left quadrant, depressed in the upper right quadrant, and miserable between them (45° from both frustrated and depressed). Further, Figure 3 presents the mediation model superimposed on the factor space. The indirect path roughly follows a semi-circle from frustrated on the left, over miserable in the middle to depressed on the right. The direct path goes straight from frustrated on the left to depressed on the right.

70

Figure 21. Sample mediation triangle placed within factorial space. The three variables from the core affect example are represented as vectors. The specific situation shown is the case where the vector length of each variable is .8, X and Y are orthogonal, and the mediator is 45° from both X (frustrated) and Y (depressed). In this case, X and Y are uncorrelated, and the mediator is correlated with both X and Y at r = .45. Variable labels are somewhat arbitrary and for illustrative purposes only.

71

For this specific case, if we assume that all vectors are 0.8 in length (with standardized variables the length may vary from 0 to 1, and longer vectors indicate that the variables are better explained), the respective factor loadings on Factor I (Valence: unpleasant vs. pleasant) and Factor II (Arousal: passive vs. arousal) are 0.566 and -0.566 for frustrated (X; displeasure and activated), 0.800 and 0.000 for miserable (M; displeasure)3, and 0.566 and 0.566 for depressed (Y; displeasure and deactivated). These loadings yield the following correlations between the three variables: rXY = 0, rXM = .453, and rMY = .453. The mediation regression weights are then a = 0.453, b = 0.569, ab =

.258, and c’ = -0.258. Such large effects are quite likely to be statistically significant, and to be interpreted as meaningful. However, the relationships between the variables follow directly from their position in the factor space. Specifically, when the direct and indirect effects have opposing signs, then the explanation from a confounding perspective is that

X and Y are in fact orthogonal, with M a vector between the two, with no effect of one feeling on another in the way a mediation model would imply. Similarly, from a factor model perspective, the variable relationships are due to the shared underlying variables of valence and arousal, and any changes would be due to the underlying factor(s).

To better illustrate the results under conditions where rXY = 0, we derived how mediation effects depend on the position of the M vector relative to X and Y. We allowed for the full range of possible vector lengths (0 to 1), and allowed M to vary between X and Y in the two-dimensional factor space. We opted to do this because the position of M in between X and Y seems the most natural one for a mediator, but it is nonetheless

3 Placing the mediator directly on the Y-axis is by convention, and this placement

is equivalent to all other rotations.

72

possible to select “mediating” variables that are not between X and Y in the factor space.

The resultant range for rXM and rMY is then between r = 0 and r = 0.707, which is the highest value that the two correlations can have simultaneously when rXY = 0. This would occur when the vector lengths are 1 and M is at a 45° angle from both X and Y (similar to the example in Figure 19). We then used the calculated correlations to conduct tests of mediation. As Figure 20 illustrates, in the absence of a total effect, mediation necessarily creates opposing effects that increase rapidly in magnitude as rXM and rMY do. Because rXM and rMY are positive, the indirect effects are also positive and increase in magnitude as the correlations do. The left side of Figure 20 illustrates this. Similarly, the direct effects are necessarily negative and increase as the correlations do, as the right side of

Figure 20 shows. These two figures are mirror images because the magnitude of the indirect effect equals minus the magnitude of direct effect if the total effect (the correlation between X and Y) is zero. More specifically, the precise absolute magnitude of

2 both effects is rXMrMY/(1-rXM ) which is itself the semi-partial correlation multiplied by

. The larger the product of the nonzero correlations and the larger the squared correlation between X and M, the larger the indirect effect is and the larger the negative value of indirect effect is.

73

Figure 22. Heat map of the calculated indirect and direct effects when rXY = 0. Values vary as a function of the magnitude of rXM and rMY.

74

From a confounding perspective, the implication of the above is that when one finds that the direct and indirect are large and in opposition to one another, X and Y are orthogonal in the shared factor space, and M is simply somewhere between them. As the position of M changes in the factor space, a very large range of values for the direct and indirect effect may be obtained. In some cases, the magnitude of the effects is such that one would likely be quite willing to interpret them as meaningful and as a rather unique finding given that there would be no total effect of X on Y. However, in a factor space it simply means that X and Y are orthogonal to each other, and no interpretation beyond that is necessary.

Generalizing to other Cases

In order to consider more broadly the behavior of mediation analyses involving variables stemming from a shared factor space, we next derived what sort of results may occur given other angles between X and Y. For this study, we allowed for the angle between the X and Y vectors in the shared factor space to be 0°, 30°, 45°, 60°, 90°, 120°,

150°, or 180°. Based on the logic that a mediator is a “variable in the middle,” the angles between X and M were then set to be a proportion of the angle between X and Y: 0/3, 1/3,

1/2, 2/3, or 3/3. The angle between M and Y was then the remainder of the proportion

(i.e., 3/3, 2/3, 1/2, 1/3, or 0/3, respectively). For example, in the case of ∠XY = 90°, if the

∠XM was 30°, then ∠MY was 60°. In the case of a ∠XY = 60°, if ∠XM was 40°, then the

∠MY was 20°. When the angle between X and Y is 0°, the special case of a unidimensional structure within the two-dimensional space is the result. In order to focus on the effect of ∠XY and the position of M between these two, all vector lengths were set to be equal to each other, and either .5 or .8 in length (recall that longer vectors indicate

75

that the factors explain the variables better). We then calculated the loadings for each of the three manifest variables, calculated their correlations, and ultimately conducted tests of mediation. We shall focus on the larger vector length, as the effects are of clearly larger magnitude (the full details of the results are available in the Appendix A).

Figures 21 and 22 illustrate the direct and indirect effects, respectively. To use the outdated Baron and Kenny (1986) terminology, the combinations of effects range from the absence of mediation to apparent complete mediation. Further, there are two general trends apparent in the figures:

1. Effect of the angle between X and Y: The direct and indirect effects show the same trend in that they both decrease with the angle between X and Y.

2. Effect of the relative position of M between X and Y: The direct and indirect effect show opposite trends. The direct effect is smaller (including more negative) and the indirect effect is larger, the closer M comes to roughly halfway between X and Y. In other words, when M moves toward the middle between X or Y, the indirect effect increases and the direct effect decreases.

Interestingly, a mediator is literally a variable in the middle, and the trends reflect this. However, this is only roughly the case, since in fact the maximum indirect and minimum direct effects are reached slightly before or after halfway. How close this point is to the middle also depends on the vector length. Finally, that the curves in Figures 21 and 22 cross at certain points follows from the nonlinear nature of the formulas for the direct and indirect effects.

76

Figure 23. Calculated direct effects in a shared factor space. X-axis represents the magnitude of angle MY as a proportion of angle XY. For example, for ∠XY = 60° and proportion = .33 (1/3), ∠XM = 20° and ∠MY = 40°.

77

Figure 24. Calculated indirect effects in a shared factor space. X-axis represents the magnitude of ∠MY as a proportion of ∠XY. For example, for ∠XY = 60° and proportion = .33 (1/3), ∠XM = 20° and ∠MY = 40°.

78

Vector Angle Angle Angle Length XY XM MY rXY rXM rMY c a b ab c' 0.8 0 0 0 0.640 0.640 0.640 0.640 0.640 0.390 0.250 0.390 0.8 90 45 45 0.000 0.453 0.453 0.000 0.453 0.569 0.258 -0.258 0.8 120 0 120 -0.320 0.640 -0.320 -0.320 0.640 -0.195 -0.125 -0.195 0.8 60 30 30 0.320 0.554 0.554 0.320 0.554 0.544 0.302 0.018 0.5 0 0 0 0.250 0.250 0.250 0.250 0.250 0.200 0.050 0.200 0.5 60 30 30 0.125 0.217 0.217 0.125 0.217 0.199 0.043 0.082 0.5 90 45 45 0.000 0.177 0.177 0.000 0.177 0.182 0.032 -0.032 0.5 120 120 0 -0.125 -0.125 0.250 -0.125 -0.125 0.238 -0.030 -0.095

Table 5. Sample correlation and regression coefficients based on vector angles and lengths, for four select cases, and vector lengths of 0.8 and 0.5.

79

Table 5 details the relationships between select cases that we wish to draw further attention to. The first is the case where X and Y are orthogonal, and the mediator is

45° from either. This is the situation of Figure 20. It leads to an indirect effect that is equal to the direct effect and thus of opposite signs. Note that there are an infinite number of cases where X and Y are orthogonal when dealing with a shared factor space and also that there may be a large number of intermediately located variables that would all yield an indirect effect with a corresponding opposite direct effect.

The second is a case where M is also exactly in between X and Y but the angle between X and Y is smaller (∠XY = 60°, ∠XM = 30°, and ∠MY = 30°). This more accurately represents our example where the mediation model is that frustrated feelings cause miserable feelings, which then causes depressed feelings. With variables at these angles, a test of mediation results in an indirect effect and a near zero direct effect.

Though this may better fit with intuition, it would still be misleading if in fact the apparent relationships were confounded by the underlying factors. It also illustrates that whether or not a direct effect results depends on the angle between X and Y.

Third, when two of the three vectors are identical, such as when ∠XY and ∠XM are both equal to 120°, and thus M and Y have identical positions in the factor space, apparent direct and indirect effects still appear. It is also worth noting that such apparent effects also occur when X and M have identical positions and Y is distinct, and that apparent mediation when M is identical to X or Y occurs for nearly all possible angles of

X and Y (excepting cases where X and Y are orthogonal).

Finally, a special situation is where the angle between all three variables is 0°. In this case, mediation yields both a positive indirect effect and a positive direct effect. For

80

the confounding perspective, if one finds direct and indirect effects of the same sign then one may in fact be measuring the same variable repeatedly. Of course, even given suspicion that this is the case, it is quite difficult to detect in practice, but with short vector lengths such as 0.5 the magnitude of the correlations is as small as 0.25, and one would not even suspect they may simply be measures of the same variable(s). Even given a longer vector length of 0.8, the correlations between the three variables are r = 0.64 and it is therefore easy to argue that the measurements of a study do in fact represent distinct variables. Nonetheless, the effects are completely confounded.

Factor Models and Mediation Models: Equivalence and Differentiation

In practice, factor models are mostly used for a rather large number of variables, whereas mediation models are commonly used for a small number of variables. This is in some sense a matter of convention in that factor models are not often used outside of scale validation efforts, but there are also practical reasons in that estimating a factor model is difficult with a small number of variables. Nonetheless, as we will show, it is possible to determine the factor loadings for the case of three variables and one factor. It remains impossible to estimate factor loadings for two factors and three variables, and even if one imposes numerous restrictions only one of many possible loading patterns may be obtained. Even so, it is useful to know what a test of mediation may suggest regarding the underlying dimensionality of the three variables, as ruling out a single factor explanation is a useful strategy when making causal claims because it establishes that, at the very least, more than one variable is necessary to explain the observed relationships. We will first formulate the relationships between correlations, factor

81

loadings, and mediation effects for the one-dimensional case, before doing so for the two- dimensional case.

One factor case

A one-factor model is appealing in that it is parsimonious and easily used as a method of considering the effects of confounding in a mediation scheme. In a one- dimensional factor space, one can derive the squared factor loadings from the equations as explained in Appendix B. Two conditions can be derived from the equations. First, either all correlations between the three variables must have the same sign, or alternatively two of the correlations are negative and the third is positive. This is a necessary condition. If only one correlation is negative, it follows that two factors are needed. As such, the one factor model can be empirically rejected in practice. Second, the absolute value of each correlation must be equal to or greater than the absolute value of the product of the other two correlations. This is a necessary and sufficient condition (see

Appendix A).

The mediation coefficients can be expressed in terms of the factor loadings, and of course the reverse is also true, as explained in Appendix B. Equation (6) in Appendix

B implies that if the direct effect has a sign opposite to the indirect effect, then the one- factor model is violated and thus two factors are needed. However, the direct and indirect effects having the same sign is not a sufficient condition for a one-factor model, as it is necessary that this be true for all alternative orderings of the three variables (e.g., YXM) before a one-factor explanation may be considered valid. This implies that in the one factor case, changing the roles of X, M, and Y in the mediation analysis (e.g., with M as the independent variable, Y as the mediator, and X as the dependent variable), the results

82

will never include direct and indirect effects of opposing sign. The equal signs condition is a sufficient condition for a one-factor solution (see Appendix B).

Two factor case

In the case of a two-dimensional factor space it is necessary to fix multiple loadings. One such case is shown in Appendix C, where the first factor explains rXM and rXY, with the second factor explaining the remaining relationship of Y with M. This solution was chosen because it nicely separates the effects of X and M. Even so, this is only one such possibility and the problem of rotational indeterminacy remains. As such, a two-factor model cannot be rejected with only three variables.

Discussion

The issue we have discussed here regards the use of an alternative model to consider the effects of confounding for a mediation model, or to serve as a viable alternative explanation. We showed what a factor space would look like given a set of three variables. We have also presented and discussed methods for how one can translate the parameters from one model to the other when the models are formally equivalent, and we have presented and discussed methods for how they can be differentiated in case they are not formally equivalent.

A factor space may be considered a variable in its own right, or it may be viewed as a summary of confounding effects or model violations. For the two perspectives we discuss, confounding and alternative explanations, the interpretations are largely interchangeable; from the mediation point of view the factors are confounding variables, whereas from the factor model perspective the mediation paths are violations of local independence. They are simply two different perspectives, and both are worth

83

consideration when explaining the relationships between a set of variables. Whether due to an underlying structure or a methodological artifact, the possibility of shared factor spaces and thus confounding is a challenge to researchers’ claims of mediation.

Despite the theoretical utility of a factor model when discussing apparent mediation, we wish to reiterate that even in cases where a factor model of relatively low dimensionality may apply, it is often not possible to differentiate the factor model from the mediation model when using only three variables (each at only one time point) and what we have provided here does not change that. In such cases, a mediation model can explain the data quite well, even if fully confounded with underlying latent variables. To deal with such a possibility, it is necessary to make use of longitudinal designs with repeated measurement of each variable of interest, and to estimate alternative models (cf.

Selig & Preacher, 2009).

Future Directions.

We have discussed a factor model as either a means of capturing a confounded model, or a theoretically meaningful alternative. For the confounding perspective, the next step would then be to more directly compare it to other means of examining parameter sensitivity in mediation given a confounded relationship (e.g., VanderWeele,

2010 citation). Such approaches are typically focused on a single confounding variable, but the factor model approach we use here affords some limited flexibility in considering a larger number of confounding variables (as shown in the Appendices B and C), and so may provide some means of extending such approaches.

Another avenue for future research is not statistical, but rather more theoretical in nature. We argued here that a factor model is most appropriate given conceptually similar

84

variables, and further refinement of this point may be useful to encourage researchers to make use of the approach we consider here. In general, it seems likely that a factor model

(or more generally, an interpretation of a single variable being responsible for the observed effects) is increasingly appropriate as the correlations between variables increase, but as we showed a high observed correlation between variables is not a necessary condition for a factor space to be present, and it is of course also not sufficient given the possibility of spurious correlations, and so other relevant conditions should be considered.

85

Chapter 4: Testing a Factor Model against a Mediation Model

The previous chapter argued that factor models may be used to consider the worst-case possibility regarding a confounded relationship between X, M, and Y. Further, in many cases factor models are themselves a viable alternative that should be ruled out when making claims of mediation. Nonetheless, factor models are rarely considered when testing for mediation, presumably because researchers often only make use of manifest variables and because estimating factor models usually requires more than three variables. Even with additional variables, researchers are presumably unaware of how to make use of the variables to distinguish between factor and mediation models.

The primary purpose of the work here then is to develop and illustrate an easy to use methodology to test the two competing explanations. In order to do so, we will make use of an approach that is already often discussed in the mediation literature, i.e., longitudinal designs (Maxwell & Cole, 2007; Selig & Preacher, 2009). In general, such designs are considered desirable because they provide additional support for a causal interpretation of a set of variable relationships because they allow adequate time to distinguish between cause and effect. More relevant to our purpose here is that longitudinal designs are useful for reducing bias in the estimates of the direct and indirect effects, and to model alternative explanations. For example, cross-lagged panel designs with X, M, and Y measured at each time point may be used to reduce bias in the estimated effects (Maxwell & Cole, 2007). With the nine observed variables from such a design it

86

is possible to estimate more complex models such as autoregressive models (e.g.,

Maxwell, Cole, & Mitchell, 2011) and growth curve based mediation models (Selig &

Preacher, 2009) that provide a fuller picture of the relationships between the variables of interest.

Here we will propose another model that may be used to describe the set of relationships between X, M, and Y. The basis of the model we make use of is similar to a latent autoregressive model, excepting that the autoregressive parameter may change depending on the pair of consecutive points in time, and we estimate additional parameters that allow one to test mediation claims at the same time. This model then has the advantage of comparing both factor and mediation models at the same time, and is also quite easy to estimate and use.

Methodology

The general logic of the method we propose is straight-forward. It is based on the supposition that if adding latent variables to a model makes the direct relationships non- significant, then mediation is unlikely. Conceptually, this is scarcely different from common practice with regression: if adding an additional variable (e.g., controlling for an alternative explanation) results in another relationship becoming non-significant then no claims may be made regarding the original variable’s relationship with a dependent variable. Here we simply control for latent variables, rather than manifest variables as in the case of regression.

Three General Models.

The approach we present here makes use of three broad classes of models that are compared based on both model fit and the significance of paths between variables. For

87

ease of exposition our presentation here is limited to comparing simple mediation models to factor models with a single indicator for X, M, and Y, each of which is measured three times. Nonetheless, it is straightforward to include additional mediators whether in parallel or in serial, while still comparing mediation and factor space explanations.

The first class of models makes use of only a single factor for all indicators across time. Such models are typically limited to cross-sectional designs, but may easily be used for longitudinal designs because correlations are agnostic to the source of any covariation. Doing so does make relatively strong claims, but one may hypothesize that all responses are due to something relatively constant about people (e.g., personality), or one may be concerned that some unmeasured cause(s) before t0 may explain all observed variables. In such cases a one-factor model is a defensible approach. An additional reason to consider a one-factor model is that it provides a point of comparison for more complex models, and so serves as useful starting point when considering the relationships between the variables we present here.

The second class of models assumes that there is a still single latent variable underlying all three indicators, but it is one that may vary over time and so there are three factors – one for each time point - with paths between them. The skeleton of this model is shown in Figure 23. Note that this class of models does not include any paths of the sort implied by the typical mediation scheme. Instead, the latent variable “mediates” itself over time. This model is essentially a latent autoregressive model, but we allow for the autoregressive parameter to vary. Each time point specific factor is then indicated by the observed variables measured at that time point.

88

Finally, the third class of models tests for the possibility of mediation between the observed variables, as in the case of a simple mediation scheme. Each item can be expected to have unique variance not attributable to the factor of interest, which allows room for adding paths from Xt0 to Mt1 and Mt1 to Yt2, as well as from Mt0 to Yt1 and from

Xt1 to Mt2 if so desired. This approach is taken because the mediation logic would imply such paths. If these added paths are statistically significant (whether by way of normal- theory approaches or by way of bootstrapping methods), and further result in an improved goodness of fit (e.g., lower RMSEA), then it serves as evidence in support of a mediation hypothesis between X, M, and Y because it suggests that the shared factor space is insufficient to explain the relationships between the three variables, and that mediation paths need to be added to the model.

89

β01 β1212 θ t=0 θt=1 θt=2

λX 0 λM λY λX λM λY λX 0 λM λY 0

X0 M0 Y0 X1 M1 Y1 X2 M2 Y2

Figure 25. One-factor longitudinal model. X0, M1, and Y2 are in bold to indicate that in an incomplete longitudinal design they would be the variables that are measured and subject to a mediation analysis.

90

Additional Considerations. For each class of models there are a few decisions that must be made regarding the nature of the latent variables. Each decision results in implicit claims about the underlying nature of the variables involved, and so should be considered carefully.

Further, these decisions will affect model fit statistics to some degree, and possibly alter the conclusions made regarding the appropriate model if one were to rely solely on model fit statistics. As such, one should be mindful of their theoretical rationale when determining which set of constraints is appropriate.

Correlated indicator residuals. As mentioned previously, each variable can be expected to have some unique variance not attributable to the factor of interest. When estimating the models we discuss here, we strongly recommend the use of correlated residuals in most cases. It is unlikely that there is no systematic variance attributable to a variable measured using the exact same method of measurement (e.g., item wording) multiple times, and failing to account for any systematic variance attributable to the specific items will likely result in poor model fit and biased parameter estimates.

Equal loadings across time. Another decision to make is whether or not to constrain the loadings for each indicator variable to be equal across time. It should also be noted that for the methodology we discuss here, it is unique in that it is the only imposed restriction we discuss -he other decisions are relaxations of restrictions and allowing a path to vary. Ultimately, this is a question regarding the nature of the latent variable and the quality of measurement. The use of equal loadings assumes measurement invariance, and so that the latent variable measured is the same at each time point.

91

If one believes that the relationship between the latent variable and the indicator variables is constant, then it follows that the loadings for each variable should be fixed for each latent variable. This is in fact an assumption made by latent autoregressive models. If the loadings are constrained to be equal then the result will likely be poorer model fit, and the difference may result in different conclusions were one to strictly and unwisely adhere to well-known guidelines (cf. Steiger, 2007).

Path from t0 to t2. It is also up to the user to decide whether or not to include a path from the latent variable at t0 to the latent variable t2. Although not immediately obvious, such a path is somewhat akin to a direct effect in a mediation model because both are double lags. For what we present here, we do not advocate the use of such a path because it makes more sense to state that the most proximal state of the latent variable is more causative than the more distal state. Further, whereas in the case of a simple mediation model where three variables are involved, there is only one here and so considerations regarding missing pathways or variables are less relevant.

With that being said, we will nonetheless consider such a path between the latent variables at t0 and t2 with what we present here. We do so because although we believe it to be unnecessary, its conceptual relationship to the direct effect merits at least some testing of such an effect. Additionally, users of the model we discuss here may find such a path relevant to their research topic, and we do not wish to preclude such a possibility.

Empirical Example

In order to demonstrate our model we make use of data from Crocker, Canavello,

Breines, and Flynn (2010). Crocker et al.’s data has been used to demonstrate a variety of effects regarding the effects of interpersonal goals, including depression (Garcia &

92

Crocker, 2008), academic performance (Crocker, Olivier, & Nuer, 2009), and well-being

(Crocker, 2008). The data were collected over the course of a semester, with 10 measurements per participant spaced a week apart (excepting slow responders resulting in delayed measurement). Additionally, the data are comprised of N = 230 participants, split into 115 roommate dyads. Eighty-six (75%) of these pairs were female, and participant age ranged from 18 to 21, with M = 18.1 and SD = .36.

We wish to make it clear that for the results presented here, we do not make any substantive claims regarding the variable relationships. In addition to the fact that we ignore the obvious dyadic dependencies for our example here, our choice of measures reflects only a desire to stay consistent with the previous chapter. As such, although it would be simple to demonstrate a case where a mediation model seems clearly better than a factor model, we make use of an example where a factor model is adequate to describe the variable relationships. Any interpretation we offer is only for the sake of explication and demonstration, and our choice of items are strictly for didactic purposes and are not meant to make any claims regarding the superiority of a factor model approach over a mediation approach for these data. Instead, the items were chosen only because they seem likely to be unidimensional.

Measure.

The measure we make use of here is an adapted version of a scale meant to measure objective and subjective burdens placed upon an individual by a depressed significant other (Coyne et al., 1987). Crocker et al. used the subjective burden items, and altered the wording of the original measure to instead reflect perceptions of the burdens placed upon them by their roommate.

93

Dimensionality.

As an exploratory analysis to determine items to be used here, we first conducted a principal-components analysis of all scale items at t0. The results suggested that one or two dimensions are sufficient to describe them, as the first five eigenvalues for the 14 items are 7.86, 1.530, .881, .777, and .655. Closer investigation of the items suggests that the items measuring discouraged, ashamed, and depressed may be unidimensional, as these items load strongly on a single principal component, and so suitable for our demonstration here.

Drawing inspiration from past work on the development of depression, a reasonably plausible mediation story may be told using these data. Specifically feeling discouraged may lead individuals to feel shame (e.g. Miller, 2013), which then may lead to feeling depressed (e.g., Andrews, Qian, & Valentine, 2002; Shepard & Rabinowitz,

2013). In the case of the present data, the story may be something akin to roommates feeling discouraged by their inability to correct or control their roommates’ behaviors, which then may lead to feelings of shame because their roommates’ actions may speak to their own character or competency, which then ultimately leads to feeling depressed by their roommates’ behavior.

Regression analyses.

On the surface, the data support a mediation hypothesis. Using discouraged at t =

0, ashamed at t = 1, and depressed at t = 2 as X, M, and Y, respectively. The correlations between the three variables are rX0Y2 = .116, rX0M1 = .272, and rM1Y2 = .292, and these correlations result in apparent mediation using regression. Specifically, there is a significant effect of discouraged on ashamed, a = .272, p < .05, and a significant effect of

94

ashamed on depressed, b = .282, p < .05. This yields an indirect effect of ab = .077, and a bootstrapped confidence interval does not include 0, 95% CI [.014, .208]. There was no significant direct effect of feeling discouraged on feeling depressed, c’ = .040, p = .56.

Structural equation modeling analyses.

For presentational purposes, we will focus on the three classes of models because doing so most reflects the logic of the approach we discuss here. For each class, we will also consider the effects of (un)correlated errors, (un)equal loadings, and the presence of a path between the latent variables at t0 and t2 or not. As mentioned previously these aspects reflect claims about the nature of the underlying factor structure and so may be considered topic-specific, rather than central to our point, and so we leave it to the reader to choose which is most appropriate for their research problem. Additionally, all models were estimated using maximum likelihood.

All model fit statistics are shown in Table 6. Model 1 represents a single factor for all three time points, Model 2 a factor for each time point, and Model 3 includes the additional mediated path between the indicator variables. The significance tests represent likelihood ratio tests comparing a given model to the previous one with the same constraints or lack thereof (as a reminder, equal loadings represent an additional imposed constraint upon the estimated model, whereas correlated errors and a path from t0 to t2 are relaxations of constraints). For example, the test for Model 2 with equal loadings and correlated errors (Model 2D) was compared to the variant of Model 1 with equal loadings and correlated errors (Model 1D). There is of course not a clean comparison to be made for variants of Model 2 with a t0 to t2 path for the latent variable, and so they were simply

95

compared to Model 1 variants with the same restrictions regarding loadings and error terms (or lack thereof, e.g., Model 2H to Model 1D).

96

Model 1 EL CE LL AIC BIC Log-likelihood ΔDF p RMSEA

A 2474.097 2535.100 -1219.048 344.669 0.232 B X 2435.100 2526.605 -1190.550 287.672 0.262 C X 2486.842 2527.511 -1231.421 369.4144 0.216 D X X 2450.410 2521.580 -1204.205 314.982 0.235 Model 2 A 2297.739 2365.520 -1128.869 164.311 180.358 2 0.000 0.160 B X 2180.471 2278.754 -1061.235 29.043 258.629 2 0.000 0.061 C X 2314.427 2361.874 -1143.214 193.000 176.415 2 0.000 0.154 D X X 2205.514 2283.462 -1079.757 66.086 248.896 2 0.000 0.096 E X 2299.428 2370.598 -1128.714 164.000 180.669 3 0.000 0.163 F X X 2182.188 2283.860 -1061.094 28.760 258.912 3 0.000 0.065 G X X 2316.365 2367.201 -1143.183 192.938 176.476 3 0.000 0.157 H X X X 2207.296 2228.634 -1079.648 65.869 249.113 3 0.000 0.100 Model 3 A 2297.438 2371.997 -1126.719 160.011 4.301 2 0.116 0.165 B X 2180.696 2285.757 -1059.348 25.268 3.775 2 0.151 0.061 C X 2317.015 2371.240 -1142.507 191.587 1.412 2 0.493 0.160 D X X 2203.920 2288.647 -1076.960 60.493 5.594 2 0.061 0.096 E X 2303.098 2381.047 -1128.549 163.671 0.329 2 0.848 0.171 F X X 2182.325 2290.775 -1059.162 24.897 3.863 2 0.246 0.065

9

7 G X X 2318.928 2376.542 -1142.464 191.501 1.437 2 0.487 0.163

H X X X 2205.627 2293.743 -1076.814 60.200 5.669 2 0.059 0.100

Table 6. Comparison of model fit statistics for the models we estimate here. EL = equal loadings. CE = correlated errors. LL = lag-lag from the latent variable at t0 to the one at t2.

97

We begin by considering the possibility that a single latent variable may explain all nine observed variables. In general this class of model yields a very poor fit to the data, with excessively high values of RMSEA. Given that the indicators were measured a week apart, this is unsurprising. Specifically, Models 1A, 1C, and 1D yielded RMSEA ~

.23, and Model 1B yielded RMSEA = .265. An overall factor solution for all nine variables is clearly not to be preferred, and is instead rejected.

Next, we considered our second class of models, which are based on a three- factor approach. For this model, each of the three burden items are considered to be indicators of the same latent variable that varies over time, with the latent variable at t = 0 predicting the latent variable at t = 1, which then predicts the latent variable at t = 2. All models of this class fit significantly better than the one-factor alternatives, with all chi- square tests yielding ps < .001. Similarly, all had superior AIC, BIC, and RMSEA values.

In regards to the specific constraints, models with correlated residuals yielded the best fit, with Model 2B yielding the best fit at RMSEA = .061, as well as superior AIC,

BIC, and log-likelihood values relative to all other models in this class. Model 2D fit somewhat worse and yielded RMSEA = .096, and further comparison of AIC and BIC values between the two models suggests that unequal loadings are to be preferred. In contrast, the models with uncorrelated residuals fit poorly, with RMSEAs of .160 and

.154 for Models 2A and 2C, respectively. Finally, the inclusion of a path between the latent variables at t0 and t2 (LL) for Models 2 and 3 resulted in AIC, BIC, and RMSEA values similar to more restricted model, and so do not suggest that such a path improves model fit.

98

In regards to the estimated latent variables, for the four models with correlated errors, the item loadings were positive and of a similar magnitude. For each item the loadings were roughly .35 ± .1 for both models, and all item loadings were statistically significant, ps < .001, suggesting that any change in the nature of the underlying dimension was minimal. Additionally, for both models the autoregressive parameters were roughly .6 at each time point. Specifically, the autoregressive parameters for Model

2B bt0t1 = .527, and bt1t2 = .643, and for Model 2D they were bt0t1 = .590, and bt1t2 = .608.

For testing the mediation claim, we now turn to our third class of models. These models simply add correlated residuals between the indicator items of the sort that mediation would imply, e.g., Xt0 to Mt1 to Yt2. Although a single underlying latent variable that varies over time seems plausible based on the previous results, each item has its own unique variance unaccounted for by the factor, and this remaining variance may still yield a mediated effect. To test such a possibility, we allow for correlated errors between discouraged at t = 0 and ashamed at t = 1 and between ashamed at t = 1 and depressed at t = 2.

As a reminder, the logic of our approach is that if these paths are significant and model fit is significantly improved, then a mediation interpretation remains a viable explanation. Here, this was not true for any set of restrictions, and adding these paths did not significantly improve model fit, with all Chi-square tests yielding ps > .05, with AIC,

3 BIC, and RMSEA yielding similar results. In addition, the paths between X0 and M1, and between M1 and Y2 across time were not significant, further diminishing the strength of any mediation claims that could be made using these data. Specifically, bootstrapping the

3 We also estimated similar models with paths from M0 to Y1, X1 to M2, and X0 to Y2 as in the case of a cross-lagged panel design. The results were similar to those of the models we presented here, and so are not reported. 99

product of these two paths as in the case of ab resulted in an indirect effect of .000, 95%

CI [-.001, .001] for Model 3B, and for Model 3D the result was an indirect effect of .000,

95% CI [-.002, .001]. As a result then, a mediation explanation for the relationships between discouraged, ashamed, and depressed is not supported. Instead, a single latent variable that varies over time, with some additional variance attributable to each indicator, appears an adequate explanation for these data that is not rejected by our tests.

Discussion

The general approach we discussed here is intended to be a way one may distinguish between factor and mediation model explanations for a set of variable relationships. It represents a practical extension of the previous chapter that affords a flexible method of considering a factor model interpretation that may be applied in all cases where one makes use of full longitudinal designs.

Our approach is based on a latent autoregressive model, but it is unique in that we compare two competing explanations within the same framework. As in the previous chapter, the factors themselves may be viewed either as alternative explanations or as representing confounded relationships. The example we presented here is most consistent with an alternative explanation, but for other cases the confounding interpretation may be more appropriate. In such cases, the utility of our approach is that when used, one provides a test of the robustness of any claims of mediation if there are unmeasured confounders.

Additionally, our approach is easily extended to allow for tests of more complex mediation models. In order to do so, it is simply necessary to make use of additional time points and to allow for additional residual correlations. The logic would be as we present

100

here: If adding such paths improves model fit and the additional paths are significant, then a mediation model remains a plausible explanation for the variable relationships.

Modeling more than one factor for the same set of variables at each point in time is somewhat more difficult, but only marginally so. In order to estimate the loadings for two factors one would need at least five indicators per time point; for three factors eight indicators are necessary. If one has fewer indicators it is necessary to fix at least one loading. Further, one concern is that more complex factor spaces might be somewhat unwieldy and difficult to interpret. Still, precise interpretations of the factors are not necessary because the crucial point is simply whether or not the additional paths explain the covariance between the items in a way that is consistent with a mediation hypothesis.

Future Directions.

In line with the fact that in what we focus on here with shared factor spaces is an example of a misspecified model, future research on this model may focus on specific forms of misspecification of the longitudinal model we discuss here. This may be number of latent variables (e.g., one vs. two factors), but also incorrect functional forms of the relationships (e.g., non-linear) or inappropriate assumptions (e.g., falsely assuming equal loadings).

Further, a comparison to other approaches for considering bias and alternative interpretations to that offered by a simple mediation scheme would likely prove useful.

This may be either longitudinal models such as the cross-lagged panel design that

Maxwell & Cole (2007) advocate, or other methods of considering parameter sensitivity

(e.g., VanderWeele, 2010). Comparing these approaches may be done by way of testing the performance of one given that another is appropriate, or by way of other forms of

101

misspecification and comparing the relative ability of each method to nonetheless yield results that would be interpreted appropriately.

102

Chapter 5: Fungible Weights in Mediation

Whenever estimating a statistical model, there is always some uncertainty regarding the validity of the estimated weights. In general, many assumptions must be made regarding the relationships between variables. Though there are differing schools of thought regarding these assumptions (cf. Jo, 2008), for the SEM and regression approach that we have used throughout, these assumptions are detailed in Sobel (2008), and are as follows:

Ignorability of mediator status: This is considered the most important assumption in the SEM/regression approach (Jo, 2008). It is in fact composed of two assumptions

(Imai et al., 2010). The first is that, when conditioned on the observed pretreatment covariates, X is independent of all possible values of M and Y. This assumption is satisfied with an experimental manipulation of X with random assignment. The second assumption requires that individuals have the same characteristics regardless of their mediator status, again after conditioning on the observed covariates. In other words, when using mediation models one assumes that participants may be treated as having been randomly assigned to their mediator status. If this condition is not satisfied, then neither the direct or indirect effect may be considered a causal parameter.

103

Constant effect: This assumption is that there is no unmodeled moderation whatsoever for the relationship between X and M, i.e., the variables do not interact and so effects on Y do not change across levels of the mediator. This concern is to some degree addressed by the so-called MacArthur approach (Kraemer, Kiernan, Essex, & Kupfer,

2008), which requires the inclusion of such an term. However, users of mediation typically only make use of simple mediation models and so it remains a concern.

Linearity: This assumption requires that the dependent variable linearly increases or decreases across levels of the mediator, rather than e.g., a logistic or quadratic relationship.

In short, in a simple mediation model, identifying causal parameters using the

SEM/regression approach requires that there is no confounding (similarly, no missing additional mediators), no interactions, and no deviations from linearity. These conditions are may not be met in practice.

In general then, the resultant parameter estimates for an estimated model, Mest, may be biased to some degree whenever it is not isomorphic with the true model, Mtrue.

So as to avoid the stickiness associated with the notion of true models (Edwards, 2013), here we refer simply to models that would have the best cross-validated goodness of fit.

The discrepancies between the two models are of two types. Sobel (2008) details the first type, as it relates to missing terms in the equation, e.g., omitted variables, omitted interactions, and nonlinear effects. The second type of discrepancy is related to error.

This includes variables with measurement error, as well as aberrances in the error term such as outliers that may bias the estimates. Measurement error is likely to be an issue

104

when using regression as it assumes that the predictors are measured without error, but it is extremely rare that this assumption is satisfied when using psychological data. Outliers are similarly of concern, and are quite difficult to deal with as evidenced by the many methods developed to detect them (e.g., Cook’s D and residual plots). In general then, excepting rare cases, if any aspect of the estimated model is incorrect then the estimated parameters are likely biased to some degree. The degree of bias may be large or small, but regardless of the quality and size of a sample the issue remains as it is an issue more closely associated with models than with data (Green, 1977).

Although parameter uncertainty applies to any estimated model, the issue is compounded when testing mediation models because indirect effects are somewhat unique in that they are quantified as the product of two regression coefficients. As a result, the uncertainty of one weight is multiplied by that of another, and so too is the need to consider bias in the estimated effects before drawing conclusions based on them.

Inaccurate weights are of course quite likely, but the important issue is not whether they are wrong, but to what degree they must be wrong before affecting any conclusions drawn regarding the relationships between variables. Indeed, in some cases the bias associated with an inaccurate model may be substantial enough that even the simple approach of determining sign and significance (without concern regarding the magnitude of the effect) will yield incorrect interpretations (e.g., Maxwell & Cole, 2007).

Further, mediation methodologists generally assume biased effect estimates because the relationship between a given X and Y is presumed to be multiply mediated

(Baron & Kenny, 1986; 2007; Preacher & Hayes, 2008). At the very least then, bias is to be expected when conducting tests using only a single mediator because if multiple

105

mediators transmit the effects of X on Y, then X is correlated with an unknown number of

Ms that are themselves likely correlated, with the number, direction, magnitude, etc. all unknown.

Broadly, considering the possible consequences of an inaccurate model is accomplished by way of methods for examining parameter sensitivity. Such methods include non-optimal weights in a model and examine the degree to which doing so reduces model fit (e.g., Green, 1977) or how much other weights are affected (e.g.,

VanderWeele & Arah, 2011). The approach we will work with is based on the fact that the resultant bias of an inaccurate Mest has the somewhat counter-intuitive consequence that the true, unbiased weights may actually perform worse than the biased weights when they are used in an estimated model that differs from the true model.

Specifically, we will make use of fungible parameters (Waller, 2008). Models using fungible parameters all yield a predefined decrease of model fit (e.g., RMSEA or

2 R ) compared to the estimates optimized for a given Mest. As a result, each set of weights are all equally discrepant with the optimal estimates, and explain the data equally well.

To date, fungible parameters have been applied to regression (Waller, 2008), logistic regression (Jones, 2013), and structural equation modeling and latent growth curves

(MacCallum, Lee, & Browne, 2012). Within the context of regression models fungible parameters are termed fungible weights (Waller, 2008). Weights may be considered insensitive if small changes in R2 result in only modest changes in the weights. In contrast, they are considered sensitive if small changes in R2 yield large changes in the weights. If a parameter is sensitive to small changes in R2, then the strength of conclusions that may be drawn regarding its relationship with the dependent variable is

106

limited as the parameter estimates are considered less trustworthy (Green, 1977; Waller,

2008).

In the case of two predictors there are two pairs of fungible weights, and thus two different weights per predictor. These two pairs yield the same correlation between the predicted and observed values of the dependent variable. More generally, if there are three or more predictors there are an infinite number of fungible weights. As each set of fungible weights predicts the dependent variable with equal effectiveness, they cannot be distinguished from one another based on this criterion alone. Though the mathematics that describe fungible weights are beyond the scope of this document (see Waller, 2008;

Waller & Jones, 2009), in general examining them is quite simple and requires inputting correlations into an R function (R Core Team, 2012) that Waller (2008) created.

Conducting a sensitivity analysis then simply requires to select a lower R2 by way of the critical correlation between OLS predictions and the fungible predictions, which we denote . High values of result in sets of weights that are only slightly less effective than the OLS weights at describing the data and yield only a minor drop in predictive value, e.g. = .5 and = .98 results in = .49. An example of fungible weights is shown in Figure 24, based on the case that all correlations are equal to

.5, and so all bs = .25. Easily visible is that even a modest drop in variance explained based on = .98 yields a large discrepancy between the minimum and maximum values of the fungible weights, and that although the sign remains the same, the interpretation of the importance of the variables would be affected quite strongly.

Another example with a smaller discrepancy is shown in Figure 25, based on the case that all correlations are equal to .3.

107

Figure 26. Plot of fungible weights with three predictors when all variable correlations are r = .5 and = .98. The point in the center is the OLS estimate of the weights, b1 = b2 = c’ = .25. Histograms illustrate the spread of the fungible weights, and show the clear bimodality associated with them.

108

Figure 27. Plot of fungible weights with three predictors when all variable correlations are r = .3 and = .98. The point in the center is the OLS estimate of the weights, b1 = b2 = c’ = .19. Histograms illustrate the spread of the fungible weights, and show the clear bimodality associated with them.

109

Although fungible weights may be examined using plots that provide a great deal of information, there is great utility in quantifying uncertainty. This is exemplified by the familiar confidence interval. Confidence intervals serve to estimate the uncertainty regarding the precision of the estimates obtained for a given estimated model as expected by sampling theory. Unfortunately, quantifying bias is not so easily accomplished because of the many ways it may occur, and there is little information regarding how large such intervals should be. Fungible weights do not circumvent this problem, but by calculating the range of the fungible weights associated with conservative values of it is possible to establish at least some indication for the size of a validity interval that may serve as an analogy to the confidence interval. The range of the fungible weights, which we will call the fungible interval, provides an informative picture of the weights associated with a given value. The bulk of the results and discussion in this chapter is therefore focused on the range of the fungible weights, defined as the minimum and maximum values of the fungible weights associated with a given predictor for a given correlation matrix.

The range will be used for a few reasons. The first is that there is heavy bimodality in the distribution of the fungible weights, with peaks at the boundaries (as shown in Figures 24 and 25), so that the range per predictor weight results in a loss of relatively little information as most weights are near the extremes of the range.

Additionally, the range per predictor is also common practice for confidence intervals, and its interpretation is simple. With that being said, it does not capture that the value of each fungible weight is a tightly constrained function of the other weights (Waller, 2008).

The result of this constraint is that the maximum value of a weight at a given is

110

associated with smaller, more modest values of the other weights. Discussing the range does not reflect this because the range is predictor specific, but it is sufficient for illustration purposes.

Fungible Intervals for the Single Mediator Model

To begin, we shall illustrate fungible weights in the case of a single mediator, as this is the most common form of mediation analysis (Maxwell & Cole, 2007). Further, in the case of two predictors there are only two pairs of fungible regression weights for a given , and each pair consists of different values for and for . This affords simplicity of determining the range and reporting the results.

In regards to the logic of this study, we approach it from a hypothetical perspective. Given a set of correlations from a sample, we may wonder how much the mediation effects could change due to a drop in R2 that is possibly due to using the true regression weights. Note that the result does not depend on the cause of the drop and that therefore the conclusions apply also when the cause is a model violation. If a modest drop in R2 results in highly discrepant weights then the model is considered sensitive to any unknown violations, and less trustworthy. However, if even larger drops do not result in highly discrepant effects, then the inference is more robust and may be treated as such.

Method

There is no absolute basis for the choice of an , just as there is not one in the case of model fit (e.g., RMSEA cutoffs; Browne & Cudeck, 1993; Steiger, 2007), reliability (e.g., Cronbach’s α = .8 or .9), or with confidence intervals (e.g., the familiar

95% and 99% confidence intervals). In each of these cases, what qualifies as acceptable is primarily a matter of convention and of possible considerations one may have

111

regarding the level of uncertainty. Here we generally follow the example of confidence intervals and use three different values, = .90, .95, .98. These criterion values result in only modest drops in variance explained, and the differences are unlikely to be considered noteworthy in practice. For example, for a correlation of .5 between the predicted dependent variable values and the observed values ( = 0.5), the resultant analogous correlation obtained with the fungible weights ( ) values would then be

.45, .475, and .49 for values of .90, .95, and .98 respectively.

For this study, the possible correlations between the independent variable, mediator, and dependent variable were positive and negative values of .1, .2, .3, .4, and

.5, which resulted in 103 = 1000 matrices because there are three correlations in a simple mediation scheme. However, 20 of these matrices were not valid correlation matrices, and so our results here are based on 980 correlation matrices. For each valid matrix, we calculated the OLS weights in addition to the two three fungible weight pairs, one pair for each value. To determine the OLS weights and the fungible weights no data simulation is needed, as both types of weights may be calculated directly using a correlation matrix. As no simulation was necessary here, our results for this study are simply calculations of the fungible intervals for the direct effect and the indirect effect.

To be clear, these intervals do not depend on sample size, and so are fixed for a given correlation matrix.

To provide some sense of the magnitude of the fungible intervals, we also calculated confidence intervals for the direct and indirect effects for N = 100. For the indirect effect we used Sobel’s (1982) standard error to calculate confidence intervals, as although it is not recommended in practice it is computationally simple and sufficient for

112

the purposes of allowing for some sense of the size of the fungible interval. Specifically, the use of normal theory confidence intervals allows for a straight-forward comparison between the uncertainty that might be expected as a consequence of random sampling and the uncertainty stemming from possible model inaccuracy.

Results and Discussion.

Comparison with confidence intervals. As a reminder, whereas confidence intervals will decrease in magnitude as sample size increases, fungible weights are fully independent of sample size. Nonetheless, their differences are informative, as there are three general trends apparent in the behavior of fungible intervals that contrast to the behavior of confidence intervals. The first is that whereas confidence intervals tend to become narrower as R2 increases, the reverse is true of fungible intervals. This is shown in Figure 26, with the dashed line representing the OLS estimate, and the end points of each line representing the minimum and maximum values of an interval. Whereas the fungible intervals for c’, shown in black, widen as R2 increases, the confidence intervals, shown in grey, narrow. This same pattern holds for ab because it holds for b. The fungible intervals of the direct and indirect effects tend to increase as R2 does.

113

Figure 28. Comparison of fungible intervals and confidence intervals with R2. The lines represent the intervals about the OLS estimated weights. As a given value of R2 value may result in different fungible intervals depending on the correlations between the variables, the lines have been jiggered about R2. Grey lines indicate confidence intervals, and the black lines indicate fungible intervals. Confidence intervals are based on Sobel standard error and N = 100.

114

Figure 29. Relation of fungible intervals and confidence intervals with the magnitude of the indirect effect. The lines represent the interval about the OLS estimated weights. Grey lines indicate confidence intervals, and the black lines indicate fungible intervals. The lines are overlain because a given value of R2 value may result in different fungible intervals, depending on the correlations between the variables. Confidence intervals are based on Sobel standard error and N = 100.

115

The second and third trends apply to both the direct and indirect effect as well, but will be illustrated using ab. Whereas the confidence intervals for the indirect effect tend to become wider as ab increases, the magnitude of the fungible intervals are relatively constant across all values of the indirect effect; see Figure 27. Finally, the third trend is that the fungible intervals are not symmetric about the estimated weights and tend to trend towards less extreme values. When the estimate is positive, the deviations from symmetry about the OLS weights tend to be negative. In contrast, when the estimate is negative, the deviations tend to be positive and they decrease with a decrease of the estimate; for both positive and negative weights, as the OLS weights go to zero the fungible intervals are more symmetric about the OLS weights. That the deviations are opposite in direction reflects that for a given total effect of X on Y, the indirect effect is negatively related to the direct effect ( , and ). Additionally, the fact that the deviations become less extreme as ab decreases is reflected in the fact that the magnitude of the fungible intervals decreases as R2 does.

In general then, it seems that as explanation of the dependent variable increases, so too does the impact of model inaccuracies. To better understand this result, we now consider other predictors of the range.

Predictors of the fungible intervals. Although it is clear that the fungible range increases with R2, the fact that R2 is based on other properties of a correlation matrix makes it necessary to consider other predictors. For this single mediator illustration, we do not report p-values because there is no sampling involved in the study. Rather than generating data, we directly use the correlations for each of the 980 correlation matrices and so all regression analyses are based on 980 cases.

116

Preliminary analyses were consistent with Figure 26. Specifically, the R2 of a given model explained 34.9% of the variance in the fungible interval, regardless of .

We also considered the effects of multicollinearity because of its relationship with the size of standard errors, and found that the variance inflation factor (VIF) added a small amount of variance explained, with total variance explained of 35.2%.

However, upon further investigation using all three correlations as predictors of the fungible interval, it becomes clear that our primary finding here is the magnitude of the fungible interval for a given predictor is almost entirely explained by the absolute value of the correlation between the other predictor and the dependent variable (see

Figure 28). These results for are reported in Table 7.

The range of the c’ path coefficient is best predicted by the absolute value of and the range of b’ is predicted by the magnitude of . In both cases, the absolute value

2 of the other correlation, i.e., for and for , is sufficient to yield an R = .986 when predicting the fungible intervals. By including VIF into the regression (there is only one VIF value in this two-predictor case) the result is a model with R2 = .998; all weights in these models were positive. With the addition of the interaction term the result is R2 =

1, and so these three predictors perfectly explain the variance of the range. The

unstandardized weights for = .98 are = 0.210, = 0.000, and =

0.181; for = .95 they are = 0.320, = 0.000, and = 0.275; and for

= .90 they are = 0.423, = 0.000, and = 0.363. The values are identical for , excepting that the other correlation is . Figure 28 illustrates the relationship between and the range of c’, and shows the clear increase in the

117

magnitude of the fungible interval as increases, as well as the increasing spread as

VIF increases as well due to the interaction between them.

Predictors of Fungible Interval Magnitude Weights Effect c’ .90 .95 .98 0.423 0.320 0.210 0.000 0.000 0.000 0.363 0.275 0.181 R2= 1 ab -0.068 -0.051 -0.034 0.000 0.000 0.000 0.000 0.000 0.000 0.510 0.386 0.254 0.068 0.052 0.034 0.000 0.000 0.000 0.262 0.198 0.130 R2 = 1 Table 7. Regression results for predictors of fungible interval of the direct and indirect effects in the single mediator case.

118

A similar set of predictors applies for the indirect effect, but now it is necessary to include the a path as a predictor because it is a multiplicative factor for the indirect effect ab. The results for these analyses are also shown in Table 7, and Figure 29 shows the range of as increases, with the increased spread of the ranges due to the effects of .

To begin, we considered a model using only as a predictor (the other correlation for the b path); this resulted in R2 = .402. Adding the magnitude of the a path,

2 , and VIF resulted in a model with R = .890. We realize that and VIF are both a function of , but the result shows that it is useful to have both in the same equation.

For both models, all weights were positive. When considering the interactions between

2 these three variables the result is R = 1. The unstandardized regression weights for

= .98 are = -0.034, = 0.034, = 0.254, and =

0.130. For = .95 the weights are = -0.051 and = 0.051, and

= 0.386 and = 0.198. Finally for = .90 they are = -

0.068 and = 0.068, and = 0.510 and = 0.262. For

all three rcrit values, the remaining weights for , VIF, and were all equal to 0. In short, these results are again clear and confirm that the fungible interval of the indirect effect can also be fully explained by a few predictors.

119

Figure 30. Range of fungible intervals for the direct effect in the one mediator case as a function of , based on . The results are the same for the range of b as a function of

120

Figure 31. Range of fungible intervals for indirect effects in the one mediator case compared as a function of to , based on .

121

Also visible in Figures 28 and 29 are the facts that lower values result in larger intervals than do higher values, and so a tendency towards more discrepant weights. The largest interval associated with the direct effect c’ was 0.453, 0.343, and

0.225, for values of .90, .95, and .98, respectively. Conversely, the smallest intervals were .079, .060, and .039 for the same criterion values. Reflecting the maximum value of

= .5 in the present study, the maximum range of the indirect effect is half of the direct effect, and the values were 0.226, 0.171, and 0.113, with the smallest being .008, .006, and .004, again for values of .90, .95, and .98. These results are to be expected, as generally speaking increasingly discrepant predictions necessitate increasingly discrepant variable weights, and stronger model violations generally lead to greater bias in the estimates.

The results presented here may be understood by considering that when the weight of one predictor goes up the weight of the other goes down. This is perfectly the case for two predictors and thus the two pairs of fungible weights, but for more predictors, it follows from the ellipsoid form as presented in Figures 24 and 25. As a consequence, the predicted Y values are highly sensitive to a predictor and its weight if the other predictor is only poorly related to Y compared with a situation where the other predictor is strongly related to Y. Varying the weights has fewer consequences when the other variable compensates for the changes, rather than simply adding noise by changing the weights. When both X and M are highly correlated with Y, then a decrease in c’ and a higher b (or vice versa) will lead to approximately the same R2 as that of the optimal OLS estimates. However, if X is highly related to Y and M is not related, then decreasing c’ and increasing b will have a larger detrimental effect because giving M a larger weight

122

adds noise to the prediction. Conversely, if M is highly correlated with Y but X is not, then changing c’ will have a strong effect upon the predicted values whereas changing b will not. It is therefore also evident that this effect is moderated by the absolute correlation between the two predictors X and M (collinearity), because the higher this correlation the less it matters if a weight is increased at the cost of another.

The implications of the present study for the indirect effect are straightforward.

First, as X and Y are more closely related, and so the total effect of X on Y increases, then the sensitivity and uncertainty regarding the value of the indirect effect ab will increase as well. Conversely, if is small (whether due to suppression or simply due to a small effect) then the sensitivity and uncertainty regarding the indirect effect will be smaller.

Second, the uncertainty regarding the indirect effect increases also with VIF (similarly with the absolute value of the correlation between and ) and therefore the uncertainty regarding the indirect effect increases as a increases. The implications for the direct effect c’ are that the uncertainty about its value generally increases with the strength of the path, and the estimated direct effect should be interpreted with caution when the mediator and the outcome variable are highly correlated.

Fungible Weights with Two Mediators

The previous study illustrated a limited case to begin, but it is well known that situations where a single mediator is sufficient are quite rare in psychology. Instead, it is generally assumed that there are additional mediators involved (Bullock, Green, & Ha,

2010). Additionally, it is further often useful to include covariates in the model so as to compare, contrast, or exclude alternative explanations. As such, illustrating the behavior

123

of fungible weights in a more complex model is useful and informative, and it is also desirable to confirm the above results using another model.

Method

For this study the possible values for were again .90, .95, and .98. As the previous investigation showed that it was the absolute value of the correlations that predicted the range we use only positive correlations, rs = .1, .2, .3, .4, and .5. In a four- variable system there are six correlations, and so in this case with five possible values there were 56 = 15625 matrices. All of these matrices were valid correlation matrices. We again conducted tests of mediation using the derived OLS regression weights. We also calculated 100 fungible weight trios using Waller’s (2008) R function, which randomly samples from the infinite set of fungible weights. These 100 sets are sufficient to recover the shape of the fungible weight ellipsoid as well as any trends, but it does sacrifice some degree of precision for the regression weights we present here. We then used these fungible weights to calculate the fungible intervals of the direct and indirect effects.

Results and Discussion

Predictors of the fungible intervals. As for the previous study, multiple regression models predicting the fungible intervals were run, with the predictors the value of the correlations of the other predictors with the dependent variable Y, as well as their interactions. We also included the VIF for each predictor, as well as the interaction of the

VIF terms with the other terms in the regression.

In regards to the effects of the criterion value rcrit, the relationships apparent in the single mediator case recur here, and again the range increases as decreases. For the critical correlations we use here, the maximum range for a given weight was 0.716,

124

0.541, and 0.356 for values of .90, .95, and .98, respectively, with these values halved for the indirect effect, i.e., the maximum values of the range are 0.358, 0.271, and

0.178 for the three values used here. The minimum values of the range for the direct effect were .091, .069, .045, and for the indirect effect they were .009, .007, and .005.

Before continuing, it is worth making explicit that the results we present here for this study do not include all possible predictors, and so are instead meant to provide a simple and useful way to understand the behavior of the fungible interval in this more complex model.4 Table 8 shows the results for both the direct and indirect effect. As we sampled from the infinite set of fungible weights here, we report b values rather than βs.

4 For both the direct and indirect effect R2 is higher if all correlations and VIF terms are included

and allowed to interact, but the increase in variance explained seems insufficient to justify

interpreting a 9-way interaction with highly sensitive weights, with magnitude and sign changing

readily.

125

Predictors of Fungible Interval Weights

0.90 0.95 0.98 c’ 0.565 0.428 0.281 0.566 0.427 0.281 -0.018 -0.014 -0.009 -1.127 -0.851 -0.560 0.328 0.248 0.163 0.327 0.248 0.163 -0.299 -0.227 -0.148 r2 = 0.958

ab1 -0.521 -0.394 -0.259 -0.521 -0.394 -0.259 0.356 0.269 0.177 -0.257 -0.194 -0.128 0.695 0.526 0.346 0.657 0.497 0.327 0.657 0.497 0.327 -0.925 -0.699 -0.459 r2 = 0.955 Table 8. Regression results for the predictors of the fungible interval of the direct and indirect effects. The results for the indirect effect ab2 are not shown, but the results are of the same nature for ab2, excepting that the effects are related to , , and rather than , , and .

126

As in the previous study, the range of the fungible parameters increases with the magnitude of the correlations associated with the other predictors. In this case, the other

2 correlations, and , are sufficient to yield r = 0.839 with all weights positive; adding their interaction results in r2 = 0.908, with the interaction term being negative.

2 Including VIFX and its interactions results in r = 0.958. For = 0.98, the result was

= = 0.281, and = -0.560, with VIFX and the other correlations

interacting such that = = 0.163, and = -0.148.

For = 0.95, the weights were = = 0.428, and = -0.851,

with = = 0.248, and = -0.227. Finally, for

= 0.90, the result was = = 0.565, and = -1.127, and

= = 0.328, and = -0.299. All weights reported here are p < .001.

This pattern with a positive effect of the single correlations and a negative effect of the interaction term means that together the two correlations have a disjunctive effect, meaning that it is sufficient that one of the two is high for the range to increase. This makes sense because one is sufficient to give room for a larger range of the direct effect.

Additionally, consistent with the previous study, it is not the VIF as such, but the interaction of the VIF with the other predictors that results in an increased range.

The behavior of the fungible range of the indirect effects is similar. The differences again stem from the fact that the indirect effect is the product of one weight that has a fungible interval, , and one weight that does not have a fungible interval, .

Additionally, as in the case of the direct effect in this three-predictor case, the number of possible predictors and interactions becomes quite unwieldy and difficult to express and

127

interpret. Therefore, we consider only the effects of the other correlations, and in regards to as well as as the VIF factor and as the magnitude of the a path; this is sufficient to explain most of the variance, and to provide easily interpreted weights. Additionally, we focus here on ab1, but the results apply equally to excepting that is then the other correlation, that is the VIF predictor, and that is the magnitude of the a path.

To begin, and , as well as without any interactions, is sufficient

2 to yield r = 0.893 for , with again each predicting an increase in the magnitude of the

2 range. With the inclusion of VIFM1 the result is r = 0.903; allowing for interactions

2 between all variables excepting rxm1 results in r = 0.955. Table 5.2 shows the results for this model, with and again having a positive effect, and both interacted with

. Again, all weights reported here are p < .001.

These results largely confirm those of the previous study with a single mediator.

The main difference is that the effect of these other correlations turns out to be disjunctive, such that one other high correlation seems to be sufficient in order to increase the range of a given predictor’s fungible interval. For = 0.98, the specific weights

were = = -0.259, = -0.128 and = 0.177, with the two-way

interaction terms being = 0.346, = = 0.327, and

finally the three-way interaction term was = -0.459. For = 0.95,

the weights were = = -0.394, -0.194 and = 0.269, the two-way

interaction weights were = 0.526, = = 0.497, and

finally the three-way interaction term was = -0.699. Finally, for =

0.90, the weights were = = -0.521,

128

= -0.257 and = 0.356, the two-way interaction weights were =

0.695, = = 0.657, and finally the three-way interaction term

was = -0.925.

However, consistent with our findings thus far, a fungible analysis reveals that these weights are extremely sensitive and so should be taken only as illustrative because the intervals for a given predictor are many times the magnitude of the weights themselves. Additionally, although the magnitude of the other correlations is sufficient to explain a large amount of the variance, the weights themselves change dramatically as predictors and interaction terms beyond what we show here are included into the regression predicting the range. Nonetheless, given that the effects of the other correlations and collinearity are generally consistent across this study and the previous, and are sufficient to explain a large amount of the variance, our basic point remains: as the other correlations increase, one can expect that so too does the uncertainty regarding a given predictor’s weight.

One final result of interest from this study is that the maximum range is greater here than in the case of a single mediator. In the case of the single mediator study, the largest range observed was 0.453, but for parallel mediators it was 0.716. This occurred despite lower values of R2 in the parallel mediator study, as in the single mediator study the use of positive and negative correlations yielded higher possible values of R2.

Nonetheless, the additional predictor affords more room for the weights to vary without strongly affecting the predicted values. Importantly, an effect of this is that the more indirect effects are involved in an estimated mediation model, the greater the potential bias due to model violations, and so the uncertainty regarding the correct values of

129

individual effects. This also applies to cases when additional variables are included in a model, as one might when testing competing theories using additional mediators or covariates. This is perhaps not surprising given that the more variables are involved the more assumptions must be made about linearity and interactions, but it is nonetheless an unfortunate consequence of attempting to use more complete models that attempt to rule out alternative explanations or that propose multiple pathways.

Fungible Weights based on Standard Error

The foregoing illustrations were based on the logic that model violations might affect obtained parameter estimates, and in doing so reduce the validity of estimated weights. Fungible weights provide a method to assess the sensitivity of the estimated weights to possible model violations. However, one may find the chosen value as a basis for investigating this sensitivity rather arbitrary, or may be uninterested in the effects of model violations and instead assume that such model violations are ignorable.

Both of these issues may be addressed by combining the logic of sampling theory with that of fungible parameters, and viewing fungible parameters then as illustrating the effects of random sampling rather than model violations.

Such an integration is accomplished by way of the fact that itself has a standard error of estimation, which we shall denote . This standard error may be

used to delineate a confidence interval for that expresses the uncertainty due to sampling, conditional on the estimated model being the true model. For example, instead

2 of starting from a predetermined one can start from a R value that is one or

two lower than . Doing so provides a statistical rationale for the selection of

, and further it affords consideration of what regression weights may have looked like

130

in the lower regions of the confidence interval for , and so to examine what is expected by normal sampling theory.

In the following we will work with just one standard error lower, as a cautious

approach. The decreased squared multiple R, denoted as , would then be equal to:

= – (4)

Based on equation 24 in Waller (2008), the value would then be:

= – (5)

This formula yields a criterion value that, for a given matrix of correlations, leads

to fungible parameters whose = and thus one below . Such a value is presumably within the range of values researchers would consider consistent with past research that makes use of the same model.

As with the previous two-mediator illustration, the possible correlations between the independent variable, dependent variable, and the two mediators were again rs = .1,

.2, .3, .4, and .5, and there were 56 = 15625 matrices. For each of the matrices, 100 fungible weight sets were calculated based on for N = 100. In the cases where

is larger than (e.g., all rs = .1), and so where our formula would yield a negative

and a negative value under the square root to determine , the criterion value was set to = .3 instead.

The results of this study suggest that the values used for the previous a priori determined criterion values may be considered somewhat high and therefore conservative, at least from a sampling perspective. They are clearly larger than the values

derived from SER2, at least for N = 100, and for values quite typical of social science

131

research. Based on the criterion we use here, in order to have .8, it is necessary to

have .184, and so .118. For a criterion value of .9, it is

necessary to have .382, which yields .310.

Figures 30 and 31 display the results of this study for the direct and indirect effects respectively. For the direct effect in Figure 30, the other correlation represented by the shades of grey is ; for the indirect effect in Figure 31, the shades of grey represent . As before, the effects of the other correlations are quite apparent, and it is

also easily seen that the fungible intervals increase rapidly as increases due to the associated increases in the other correlations, despite the fact that increases with

.

132

Figure 32. Fungible intervals for the direct effect based on a criterion value rcrit derived from the standard error of R2 and N=100.

133

Figure 33. Fungible intervals for the indirect effect based on a criterion value derived from the standard error of R2 and N=100.

134

Fungible Mediation Function

In order to facilitate the use and exploration of fungible weights in mediation analyses, we have provided an R function in Appendix F. Specifically, it is a wrapper function that makes use of a modified version of Waller’s (2008) fungible function (the original function does not allow for only two predictors) and outputs additional information relevant to mediation and does so for either a simple mediation model or for a parallel mediation model with two mediators. The function outputs the minimum and maximum values for c’, each b, and the appropriate indirect effects. It is not meant to replace general mediation macros and software (e.g., PROCESS; Hayes, 2013), but is instead simply to be used as a supplement.

The function accepts multiple arguments. The first argument, data, accepts a matrix with rows as observations and columns as variables. The second argument, x, accepts a column index value that is meant to reference the independent variable. This may be a number (e.g., 5), or a string (e.g., “IV”). The same is true of y for the dependent variable. The mediators argument, m, may be a single value or a vector of length 2, and each of these values may again be a number or a string. The first value is treated as , and the second as . Additionally, the function accepts the argument nsets, which sets the number of randomly selected fungible weights sets to generate; the default is 100, which is sufficient to obtain a reasonable sense of the dispersion of the fungible weights.

The criterion value, , is set by the argument rcrit, and defaults to .98. Finally, users may disable automatic plotting of the fungible weights by setting plot equal to FALSE.

135

To illustrate the function with a single mediator, we use data from Tal-Or,

Cohen, Tsfati, and Gunther (2010). In the original study, a parallel mediator model was estimated with a news manipulation (front or back page story about a sugar shortage) affecting reported intention to buy sugar, but with the effect mediated by presumed media influence and perceived importance of the news presented. A simpler model is also used to demonstrate PROCESS (Hayes, 2013), using only perceived media influence as a mediator, which we also use here. The OLS estimated effects using standardized variables are a = 0.361, p < .05, b = 0.432, p < .01, and ab = 0.156, 95% bootstrapped

CI with 10000 bootstraps [0.004, 0.342], and c’ = 0.164, ns, with r2 = 0.206

Fungible weights reveal that the ab path does not vary much with an = 0.98.

The minimum and maximum values of b were 0.383 and 0.446, respectively, and the indirect effect had a range of 0.138 to 0.161. However, the direct effect did vary a great deal, with a minimum value of -0.019, and a maximum value of 0.334.

This relatively large fungible interval of the direct effect is due to the rather high correlation between perceived media influence and sugar buying intentions, r = .446 (the other correlation for the direct effect). In contrast, the correlation between condition

(news manipulation) and sugar buying intentions was much smaller, r = .160 (the other correlation for the b path), and so the fungible interval of indirect effect is small.

To demonstrate a two-mediator case, we use data from Crocker, Canavello,

Breines, and Flynn (2010). These data represent 115 incoming freshman roommate dyads, for a total N = 230. These freshmen were measured using a variety of scales every week throughout their first semester, among them were shortened versions of the Purpose in Life Scale (Crumbaugh & Maholick, 1964), the Rosenberg Self-esteem Scale

136

(Rosenberg, 1965), the State Anxiety Inventory (Spielberger et al., 1980), and the Center of Epidemiological Studies Depression scale (Radloff, 1977), all of which we will use here. Though we do not claim that these analyses represent actual mediation, a plausible story could be told whereby purpose in life (X) at week 1 affects self-esteem (Y) at week

3, with this effect being mediated by decreases in depression ( ) and anxiety ( ) during week 2. The analyses are consistent with such a hypothesis, with standardized weights = -0.433, p < .01, = -0.411, p < .01, and = -0.303, p < .01, = -0.308, p

< .01, and c’ = .137, p < .05. The indirect effects were then = 0.131, 95% bootstrapped CI with 10000 resamples [0.056, 0.206], and = 0.127, 95% bootstrapped CI [0.061, 0.211].

In regards to the fungible weights, using the default setting of rcrit = .98, the direct effect c’ ranged from -0.006 to 0.271, b1 from -0.096 to -0.486, and b2 from -0.103 to -0.489. For the indirect effects, the range of was 0.042 to 0.209, and from

0.042 to 0.201, with the maximum value of a given indirect effect associated with the minimum value of the other indirect effect. In this case, although the direct effect is statistically significant, the range of the fungible direct effect includes 0 despite near identical R2. The fungible analysis does not cast doubt on the indirect effects as they remain positive.

Discussion

Although users of mediation are aware that in some sense their models may be inaccurate and that their estimates may well reflect some other effect(s), it is difficult to know what such model violations might be when conducting research. Of course, if one has good reason to assume that an estimated model is correct, then what we discuss here

137

is not relevant. This is the case with our use of fungible intervals in the single mediator case, where the fungible interval was perfectly explained with only a few predictors. For cases where one is not certain that their estimated model is the true one, we argue that fungible weights may be used as a method of examining the risks associated with such inaccuracies for a given mode. By using fungible weights to consider the sensitivity of estimated mediation effects for a given estimated model, researchers may obtain additional information regarding the trustworthiness of their estimated effects.

The need to do so is particularly apparent in cases involving many variables with many high correlations. Specifically, when other predictors are highly correlated with the dependent variable, or when two predictors are highly correlated with each other, the fungible interval for a given predictor will be larger. A similar issue is well-known regarding collinearity between predictors (as quantified by e.g., VIF) and its effects on standard errors, but as we show here, it is sufficient simply for other predictors to be highly correlated with the dependent variable to yield large fungible intervals, without any collinearity between predictors. Indeed, the bulk of the variance in the fungible interval was explained by simply considering the absolute value of the other predictor(s), and the correlations between predictors added only a modest amount to the size of the fungible interval.

Of course, fungible intervals do not themselves prove anything about the extent of the bias, missing variables, missing terms, etc. Instead they may simply be used to indicate how sensitive the estimated effects are to possible model violations, without suggesting anything about the form or degree of these violations. Of course, if one has some indication of what the model violations are the consequences can be assessed for

138

these specific violations and the model can be adapted. Unfortunately such information is not always available. In such cases, the methodology of fungible weights may help because it agnostic to the kind of invalidity, and so it can be used as a general approach to consider the uncertainty that stems from a possibly inaccurate estimated model, and how much one may trust the parameter estimates in a mediation model.

The documented R function that is described earlier may help to assess the consequences of model violations if they occur, and it is a tool to be recommended to assess the risks where one does not feel perfectly confident that all relevant variables and interactions are included in the model and that the relationships are indeed linear.

Future Directions.

A possibility for future research is to develop the point regarding the recovery of the true parameter estimates using fungible weights. Developing this point would serve to better demonstrate the utility of fungible weights because it would show the ways in which one may recover true weights under a variety of conditions. This would also relate to another area of future research, which is to consider the effects of an intentionally

2 misspecified model on relative to R . This would serve to provide a means of selecting less arbitrary values of rcrit as it would provide some sense of what values of rcrit are appropriate when one does have some sense of the form of model misspecification, whether missing variables or misspecified relationships. Researchers would then be able to apply past research to increase confidence in their parameter estimates when e.g., a known effect was unable to be controlled for due to logistical limitations.

Finally, a somewhat more audacious future step may be to apply fungible weights to already published research. As we showed, fungible weights may vary dramatically,

139

and in the case of research on small effects a possible consequence would be regression weights of opposite sign. Such weights would obviously have opposite interpretations as well, and so would obviously be an issue to be addressed. This may also prove useful for researchers as it may shed light on areas of research that, although the effects of interest are genuine, may nonetheless be unlikely to replicate due to various effects that must be controlled for.

140

Chapter 6: General Discussion

Supporting and testing claims of mediation is never a simple task. Researchers must first establish that an apparent effect is not due to chance alone, and then to consider the effects of confounding or an otherwise inaccurate model. Both considerations are complicated by a range of issues that are often outside of a researcher’s control. Our general goal here was to provide ways researchers may deal with both testing and estimating models, and what we have discussed here may be considered of use to researchers at all stages of testing claims of mediation.

Significance Testing.

We began by considering the relative performance of various methods of testing for the significance of direct and indirect effects in Chapter 2. We touched upon the matter of using regression or structural equation modeling when testing for mediation, based on Iacobucci et al.’s (2007) claim that SEM should always be used for testing mediation. Consistent with Iacobucci et al. (2007), we did find that SEM using ML had higher power than regression, but the practical difference of this is fairly small as the

AUC values of both approaches were similar across all conditions. The similarity of the

AUC values was driven by the fact that SEM with ML had higher Type I error rates than did regression, which was itself owed to the bias in estimated standard errors observed for small sample sizes. To be

141

clear, the Type I error rates for ML were more accurate than regression and power higher, but a lower risk of Type I error rates is certainly desirable.

In regards to bootstrapping there were two main findings. The first was that our results confirmed the well-replicated finding that bootstrapping has higher power than the

Sobel test for an α of .05 (e.g., Hayes & Scharkow, 2013), and further that bootstrapping is to be preferred for other decision thresholds relevant to practice. Given the prevalence of bootstrapping and many recommendations advocating its use for testing the indirect effect, this is good to know for researchers with a preference for α levels other than .05.

Our second finding was that, despite its superiority over the Sobel test, the bootstrapped distribution of the indirect effect does not accurately recover the distribution of the parameter estimates for the indirect effect. It is worth noting that what we present here is not the only example of bootstrapping failing to recover the true distribution of a parameter. This issue originally arose when generating confidence intervals for standard errors, and is what led Efron (1987) to develop the bias-corrected and accelerated (BCa) bootstrap originally. For testing indirect effects, this failure is particularly relevant for the percentile bootstrap, as it uses percentile cutoffs naively applied to the bootstrapped distribution without any correction (whereas the BCa does offer some corrections and so tends to have higher power). Nonetheless, this failure does not render bootstrapping inappropriate for use in testing mediation, as evidenced by its clear advantages in power.

Examining the quality of parameter estimates.

One of our goals here was to develop and introduce tools to examine the quality of mediation claims, as even given a statistically significant result it is still necessary to examine the trustworthiness of the estimates and how confounding or other model

142

inaccuracies may affect the conclusions drawn. To that end, we considered alternative models that may be used for mediation, as well as a general methodology to examine parameter sensitivity.

The alternative models we considered in Chapters 3 and 4 were based on the idea of a shared factor space. Such a model serves as a way to at times test the validity of a mediation model, and in all cases as a way to capture the worst-case scenario where all direct and indirect effects are completely confounded. Factor models are quite familiar to researchers, and so are easily understood and implemented by researchers as a means of considering alternative explanations or to capture confounding effects.

Of course, we do acknowledge that any approach for testing the quality of parameter estimates will be used less than is perhaps appropriate, but even so simply considering a factor model is useful when considering mediation claims. This is because when one works within a given framework, e.g., mediation with only a few variables, it becomes easy to neglect alternative explanations that are quite plausible simply because it is difficult to estimate them. Often such alternative explanations are viable in their own right, as we argue in regards to conceptually similar variables specifically.

We illustrated this point in Chapter 4, where we developed an SEM-based model that may be used to compare the explanations of mediation and a shared factor space for a set of variables. The logic of the model is simple, and is at its core no different from standard practice that involves controlling for additional variables. Specifically, it is based on the notion that if the paths between X, M, and Y are significant even in the presence of shared factors then a mediation claim remains a viable explanation. The main limitation of our approach is that it requires a full longitudinal design, but given the

143

general superiority of such a design for testing mediation (cf. Maxwell & Cole, 2007) this does not seem a major limitation.

As mentioned previously, when testing the models we discussed it is necessary to make a few additional decisions. Although there are numerous that may be made (e.g., estimation method), we focused on those that imply claims regarding the variable relationships. The model we based our approach on, latent autoregressive models, nearly always makes use of correlated residuals and we encourage the same here. Equal loadings at different time points is more contentious however, as it may reduce model fit and may be considered a strong claim to make regarding the underlying latent variables. It is defensible under many circumstances, and may be preferred depending upon how one views the underlying latent variables and how much regularity there is in their effects.

Nonetheless, from our perspective, unequal loadings are to be preferred given that measurement invariance is perhaps the exception rather than the rule.

Finally, in Chapter 5, we considered parameter sensitivity in mediation models.

Of course, if a model is accurate, then the use of methods to consider parameter sensitivity is not necessary. As a result then, the use of fungible intervals is not necessary as a means of considering model inaccuracy. However, if desired, in such cases one may still adopt a sampling perspective to consider what the results may look like across replications, as we did when we made use of the standard error of R2.

If however a model is not known to be accurate, it is quite likely that the causes of the inaccuracies are not known. As a result, there is some uncertainty regarding the trustworthiness of the estimates, and it is wise to consider the sensitivity of the parameters to model violations. We argue that fungible weights may be used as a method

144

of examining the risks associated with such inaccuracies for a given model. By using fungible weights to consider the sensitivity of estimated mediation effects for a given estimated model, researchers may obtain additional information regarding the trustworthiness of the estimated effects.

Broadly, there were two findings regarding fungible weights. The first is that the fungible range is almost entirely explained by the absolute value of the other predictor(s).

This poses a potential problem for researchers because often times one must control for other variables that are highly correlated with the dependent variable, and so the range of the fungible weights for the new variable meant to advance a given theory may be rather large, and so the OLS estimates less trustworthy.

The second finding is that there is a small effect of collinearity between predictors, with interaction effects adding a modest amount of variance explained. This finding reflects the fact that when two predictors are themselves correlated, it becomes harder to disentangle their individual effects as one may compensate for another. It is also similar to that of the well-known issue with collinearity affecting standard errors, but unlike standard errors it is not possible to reduce this effect by increasing sample size.

Conclusion.

The process of testing mediation requires many considerations, both when significance testing and considering the quality of the parameter estimates. The difficulties encountered are of course numerous and not easily reduced or minimized, let alone solved. Nonetheless, it is our hope that what we have discussed here may nonetheless be used by researchers testing mediation.

145

References

Anderson, K. J. (1994). Impulsivity, caffeine, and task difficulty: A within-subjects test of the Yerkes-Dodson law. Personality and Individual Differences,16(6), 813- 829. Andrews, B., Qian, M., & Valentine, J. D. (2002). Predicting depressive symptoms with a new measure of shame: The Experience of Shame Scale. British Journal of Clinical Psychology, 41, 29-42. Bentler, P. M., & Yuan, K. H. (1999). Structural equation modeling with small samples: Test statistics. Multivariate Behavioral Research, 34(2), 181-197. Bollen, K. A. (1989). Structural Equations with Latent Variables. John Wiley & Sons. Borkenau, P. (1986). Toward an understanding of trait interrelations: Acts as instances for several traits. Journal of Personality and Social Psychology, 51, 371. Box, G. E. (1976). Science and statistics. Journal of the American Statistical Association, 71(356), 791-799. Bullock, J. G., Green, D. P., & Ha, S. E. (2010). Yes, but what’s the mechanism?(don’t expect an easy answer). Journal of Personality and Social Psychology, 98, 550. Browne, M. W. (1984). Asymptotically distribution-free methods for the structures. British Journal of Mathematical and Statistical Psychology,37(1), 62-83. Browne, M. W., & Cudeck, R. (1992). Alternative ways of assessing model fit. Sociological Methods & Research, 21(2), 230-258.Cai, L. (2008). SEM of another flavour: Two new applications of the supplemented EM algorithm. British Journal of Mathematical and Statistical Psychology, 61, 309– 329. Cheung, M. W. (2009). Comparison of methods for constructing confidence intervals of standardized indirect effects. Behavior Research Methods, 41(2), 425-438.

146

Cole, D. A., & Maxwell, S. E. (2003). Testing mediational models with longitudinal data: questions and tips in the use of structural equation modeling. Journal of Abnormal Psychology, 112, 558-577. Cox, M. G., Kisbu-Sakarya, Y., Miočević, M., & MacKinnon, D. P. (2014). Sensitivity plots for confounder bias in the single mediator model. Evaluation review, 0193841X14524576. Coyne, J. C., Kessler, R. C., Tal, M., Turnbull, J., Wortman, C. B., & Greden, J. F. (1987). Living with a depressed person. Journal of Consulting and Clinical psychology, 55(3), 347. Crocker, J. (2008). From egosystem to ecosystem: Implications for learning, relationships, and well-being. In H. Wayment & J. Brauer (Eds.), Transcending self-interest: Psychological explorations of the quiet ego (pp. 63-72). Washington , DC : American Psychological Association. DOI: 10.1037/11771-006 Crocker, J., Canevello, A., Breines, J. G., & Flynn, H. (2010). Interpersonal goals and change in anxiety and dysphoria: Effects of compassionate and self-image goals. Journal of Personality and Social Psychology, 98, 1009-1024. Crumbaugh, J. C., & Maholick, L. T. (1964). An experimental study in existentialism: The psychometric approach to Frankl's concept of noogenic neurosis. Journal of Clinical Psychology, 20, 200-207. DeYoung, C. G., Peterson, J. B., & Higgins, D. M. (2002). Higher-order factors of the Big Five predict conformity: Are there neuroses of health? Personality and Individual Differences, 33, 533-552. Dickinson, K. A., & Pincus, A. L. (2003). Interpersonal analysis of grandiose and vulnerable narcissism. Journal of Personality Disorders, 17, 188-207. Edwards, M. C. (2013). Purple Unicorns, True Models, and Other Things I've Never Seen. Measurement: Interdisciplinary Research and Perspectives, 11(3), 107-111. Efron, B. (1979). Bootstrap methods: another look at the jackknife. The annals of Statistics, 1-26. Efron, B. (1987). Better bootstrap confidence intervals. Journal of the American statistical Association, 82(397), 171-185. Efron, B. (2003). Second thoughts on the bootstrap. Statistical Science, 18(2), 135-140.

147

Fiedler, K., Schott, M., & Meiser, T. (2011). What mediation analysis can (not) do. Journal of Experimental Social Psychology, 47, 1231-1236. Fritz, M. S., Taylor, A. B., & MacKinnon, D. P. (2012). Explanation of two anomalous results in statistical mediation analysis. Multivariate Behavioral Research, 47(1), 61-87. Garcia, J. A., & Crocker, J. (2008). Reasons for disclosing depression matter: The consequences of having egosystem and ecosystem goals. Social Science and Medicine, 67(3), 453-462. Gurtman, M. B. (2009). Exploring personality with the interpersonal circumplex. Social and Personality Psychology Compass, 3, 601-619. Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1), 29-36. Hayes, A. F. (2009). Beyond Baron and Kenny: Statistical mediation analysis in the new millennium. Communication Monographs, 76, 408-420. Hayes, A. F. (2013). Introduction to mediation, moderation, and conditional process analysis: A regression-based approach. Guilford Press. Hayes, A. F., & Scharkow, M. (2013). The Relative Trustworthiness of Inferential Tests of the Indirect Effect in Statistical Mediation Analysis Does Method Really Matter?. Psychological Science, 24, 1918-1927. Herman, C. P., & Mack, D. (1975). Restrained and unrestrained eating. Journal of Personality. 43(4), 657-660. Iacobucci, D., Saldanha, N., & Deng, X. (2007). A meditation on mediation: Evidence that structural equations models perform better than regressions. Journal of Consumer Psychology, 17(2), 139-153. Imai, K., Keele, L., & Yamamoto, T. (2010). Identification, inference and sensitivity analysis for causal mediation effects. Statistical Science, 51-71. Jaccard, J., & Wan, C. K. (1995). Measurement error in the analysis of interaction effects between continuous predictors using multiple regression: Multiple indicator and structural equation approaches. Psychological Bulletin, 117, 348–357. Jo, B. (2008). Causal inference in randomized experiments with mediational processes. Psychological Methods, 13(4), 314.

148

Jones, J. A. (2013). Fungible weights in logistic regression. (Unpublished doctoral dissertation, University of Minnesota, Minneapolis and St. Paul, MN). Kelley, K., & Lai, K. (2011). MBESS. R package version, 3(1). Kisbu-Sakarya, Y., MacKinnon, D. P., & Miočević, M. (2014). The distribution of the product explains normal theory mediation confidence interval estimation. Multivariate Behavioral Research, 49(3), 261-268. Kenny, D. A., & Judd, C. M. (2013). Power anomalies in testing mediation. Psychological Science, 0956797613502676. Koopman, R.F. (1988). On the sensitivity of a composite to its weights. Psychometrika, 53, 547–552. Koopman, J., Howe, M., Hollenbeck, J. R., & Sin, H. P. (2015). Small sample mediation testing: Misplaced confidence in bootstrapped confidence intervals.Journal of Applied Psychology, 100(1), 194. Kraemer, H.C., Kiernan, M., Essex, M., & Kupfer, D. J. (2008). How and why criteria defining moderators and mediators differ between the Baron & Kenny and MacArthur approaches. Health Psychology, 27(2S), S101. Lennox, R. D., & Wolfe, R. N. (1984). Revision of the Self-Monitoring Scale. Journal of Personality and Social Psychology, 46, 1349-1364. Lockwood, C.M., & MacKinnon, D.P. (n.d.). Bootstrapping the Standard Error of the Mediated Effect. Retrieved from http://www2.sas.com/proceedings/sugi23/Posters/p180.pdf MacCallum, R. C., Lee, T., & Browne, M. W. (2012). Fungible Parameter Estimates in Latent Curve Models. Current Issues in the Theory and Application of Latent Variable Models, 183. MacKinnon, D. P., Fairchild, A. J., & Fritz, M. S. (2007). Mediation analysis. Annual Review of Psychology, 58, 593-614. MacKinnon, D. P., Fritz, M. S., Williams, J., & Lockwood, C. M. (2007). Distribution of the product confidence limits for the indirect effect: Program PRODCLIN. Behavior research methods, 39(3), 384-389. MacKinnon, D. P., Krull, J. L., & Lockwood, C. M. (2000). Equivalence of the mediation, confounding and suppression effect. Prevention Science, 1, 173-181.

149

MacKinnon, D. P., Lockwood, C. M., Hoffman, J. M., West, S. G., & Sheets, V. (2002). A comparison of methods to test mediation and other intervening variable effects. Psychological methods, 7(1), 83. MacKinnon, D. P., Lockwood, C. M., & Williams, J. (2004). Confidence limits for the indirect effect: Distribution of the product and resampling methods. Multivariate behavioral research, 39(1), 99-128. Maxwell, S. E., & Cole, D. A. (2007). Bias in cross-sectional analyses of longitudinal mediation. Psychological Methods, 12, 23-44. Maxwell, S. E., Cole, D. A., & Mitchell, M. A. (2011). Bias in cross-sectional analyses of longitudinal mediation: Partial and complete mediation under an autoregressive model. Multivariate Behavioral Research, 46(5), 816-841. Miller, S. (2013). The Shame Experience. London, UK: Routledge.Mulaik, S.A. (2010). Foundations of Factor Analysis (2nd ed.). New York, NY: Chapman & Hall. Neff, K. D. (2003). The development and validation of a scale to measure self- compassion. Self and Identity, 2, 223-250. Petty, R. E., Briñol, P., & Tormala, Z. L. (2002). Thought confidence as a determinant of persuasion: the self-validation hypothesis. Journal of personality and social psychology, 82(5), 722. Preacher, K. J., & Hayes, A. F. (2008). Asymptotic and resampling strategies for assessing and comparing indirect effects in multiple mediator models. Behavior Research Methods, 40(3), 879-891. Podsakoff, P. M., MacKenzie, S. B., Lee, J. Y., & Podsakoff, N. P. (2003). Common method biases in behavioral research: a critical review of the literature and recommended remedies. Journal of Applied Psychology, 88, 879. R Core Team (2012). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R- project.org/ Radloff, L. S. (1977). The CES-D scale: a self-report depression scale for research in the general population. Applied psychological measurement, 1(3), 385-401.

150

Raskin, R., & Terry, H. (1988). A principal-components analysis of the Narcissistic Personality Inventory and further evidence of its construct validity. Journal of Personality and Social Psychology, 54, 890. Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1-36. Rosenberg, M. (1965). Society and the adolescent self-image. Princeton, NJ: Princeton University Press. Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41-55. Rucker, D. D., Preacher, K. J., Tormala, Z. L., & Petty, R. E. (2011). Mediation analysis in social psychology: Current practices and new recommendations. Social and Personality Psychology Compass, 5, 359-371. Russell. J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39. 1161-1178. Russell, J. A. (2003). Core affect and the psychological construction of emotion. Psychological Review, 110, 145-172. Savalei, V. (2014). Understanding robust corrections in structural equation modeling. Structural Equation Modeling: A Multidisciplinary Journal, 21(1), 149-160. Schmader, T., & Johns, M. (2003). Converging evidence that stereotype threat reduces working memory capacity. Journal of personality and social psychology,85(3), 440. Schumacker, R. E., & Lomax, R. G. (2004). A beginner's guide to structural equation modeling. Psychology Press. Selig, J. P., & Preacher, K. J. (2009). Mediation models for longitudinal data in developmental research. Research in Human Development, 6(2-3), 144-164. Shepard, D. S., & Rabinowitz, F. E. (2013). The Power of shame in men who are depressed: Implications for counselors. Journal of Counseling and Development, 91, 451-457. Shrout, P. E., & Bolger, N. (2002). Mediation in experimental and nonexperimental studies: new procedures and recommendations. Psychological methods, 7(4), 422.

151

Snyder, M. (1974). Self-monitoring of expressive behavior. Journal of Personality and Social Psychology, 30, 526-537. Sobel, M. E. (1982). Asymptotic confidence intervals for indirect effects in structural equation models. Sociological Methodology, 13(1982), 290-312 Sobel, M. E. (2008). Identification of causal parameters in randomized studies with mediating variables. Journal of Educational and Behavioral Statistics, 33(2), 230- 251. Spencer, S. J., Zanna, M. P., & Fong, G. T. (2005). Establishing a causal chain: why experiments are often more effective than mediational analyses in examining psychological processes. Journal of Personality and Social Psychology, 89, 845- 851. Spielberger, C. D., Vagg, P. R., Barker, L. R., Donham, G. W., & Westberry, L. G. (1980). The factor structure of the state-trait anxiety inventory. Stress and anxiety, 7, 95-109. Steiger, J. H. (2007). Understanding the limitations of global fit assessment in structural equation modeling. Personality and Individual differences, 42(5), 893-898. Stone-Romero, E. F., & Rosopa, P. J. (2008). The relative validity of inferences about mediation as a function of research design characteristics. Organizational Research Methods, 11(2), 326-352. Tafarodi, R. W., & Swann, W. B. (2001). Two-dimensional self-esteem: Theory and measurement. Personality and Individual Differences, 31, 653-673. Tal-Or, N., Cohen, J., Tsfati, Y., & Gunther, A. C. (2010). Testing causal direction in the influence of presumed media influence. Communication Research,37(6), 801-824. Tingley, D., Yamamoto, T., Hirose, K., Keele, L., & Imai, K. (2014). Mediation: R package for causal mediation analysis. VanderWeele, T. J. (2010). Bias formulas for sensitivity analysis for direct and indirect effects. Epidemiology (Cambridge, Mass.), 21(4), 540. VanderWeele, T. J., & Arah, O. A. (2011). Unmeasured Confounding for General Outcomes, Treatments, and Confounders: Bias Formulas for Sensitivity Analysis.Epidemiology (Cambridge, Mass.), 22(1), 42.

152

VanderWeele, T., & Vansteelandt, S. (2009). Conceptual issues concerning mediation, interventions and composition. Statistics and its Interface, 2, 457-468. Waller, N. G. (2008). Fungible weights in multiple regression. Psychometrika, 73(4), 691-703. Waller, N. G., & Jones, J. A. (2009). Locating the extrema of fungible regression weights. Psychometrika, 74(4), 589-602. Wojciszke, B. (1994). Multiple meanings of behavior: Construing actions in terms of competence or morality. Journal of Personality and Social Psychology, 67, 222- 232. Woody, E. (2011). An SEM perspective on evaluating mediation: What every clinical researcher needs to know. Journal of Experimental Psychopathology, 2, 210-251. Yik, M., Russell, J. A., & Steiger, J. H. (2011). A 12-point circumplex structure of core affect. Emotion, 11, 705-731.

153

Appendix A: Full Results for Chapter 3

c a B ab c' rXY rXM rMY ∠XY ∠XM ∠MY

0.640 0.640 0.390 0.250 0.390 0.640 0.640 0.640 0 0 0 0.554 0.640 0.338 0.216 0.338 0.554 0.640 0.554 30 0 30 0.554 0.630 0.418 0.264 0.291 0.554 0.630 0.601 30 10 20 0.554 0.618 0.446 0.276 0.279 0.554 0.618 0.618 30 15 15 0.554 0.601 0.465 0.280 0.274 0.554 0.601 0.630 30 20 10 0.554 0.554 0.480 0.266 0.288 0.554 0.554 0.640 30 30 0 0.320 0.640 0.195 0.125 0.195 0.320 0.640 0.320 60 0 60 0.320 0.601 0.467 0.281 0.039 0.320 0.601 0.490 60 20 40 0.320 0.554 0.544 0.302 0.018 0.320 0.554 0.554 60 30 30 0.320 0.490 0.585 0.287 0.033 0.320 0.490 0.601 60 40 20 0.320 0.320 0.599 0.192 0.128 0.320 0.320 0.640 60 60 0 0.000 0.640 0.000 0.000 0.000 0.000 0.640 0.000 90 0 90 0.000 0.554 0.462 0.256 -0.256 0.000 0.554 0.320 90 30 60 0.000 0.453 0.569 0.258 -0.258 0.000 0.453 0.453 90 45 45 0.000 0.320 0.617 0.198 -0.198 0.000 0.320 0.554 90 60 30 0.000 0.000 0.640 0.000 0.000 0.000 0.000 0.640 90 90 0 -0.320 0.640 -0.195 -0.125 -0.195 -0.320 0.640 -0.320 120 0 120 -0.320 0.490 0.353 0.173 -0.493 -0.320 0.490 0.111 120 40 80 -0.320 0.320 0.471 0.151 -0.471 -0.320 0.320 0.320 120 60 60 -0.320 0.111 0.532 0.059 -0.379 -0.320 0.111 0.490 120 80 40 -0.320 -0.320 0.599 -0.192 -0.128 -0.320 -0.320 0.640 120 120 0 -0.554 0.640 -0.338 -0.216 -0.338 -0.554 0.640 -0.554 150 0 150 -0.554 0.411 0.141 0.058 -0.612 -0.554 0.411 -0.111 150 50 100 -0.554 0.166 0.265 0.044 -0.598 -0.554 0.166 0.166 150 75 75 -0.554 -0.111 0.354 -0.039 -0.515 -0.554 -0.111 0.411 150 100 50 -0.554 -0.554 0.480 -0.266 -0.288 -0.554 -0.554 0.640 150 150 0 -0.640 0.640 -0.390 -0.250 -0.390 -0.640 0.640 -0.640 180 0 180 -0.640 0.320 -0.128 -0.041 -0.599 -0.640 0.320 -0.320 180 60 120 -0.640 0.000 0.000 0.000 -0.640 -0.640 0.000 0.000 180 90 90 -0.640 -0.320 0.128 -0.041 -0.599 -0.640 -0.320 0.320 180 120 60 -0.640 -0.640 0.390 -0.250 -0.390 -0.640 -0.640 0.640 180 180 0 Table 9. Results presented in Chapter 3, based on possible factor space angles and vector lengths of .8 for a given combination of mediation results.

154

c a b ab c' rXY rXM rMY ∠XY ∠XM ∠MY 0.250 0.250 0.200 0.050 0.200 0.250 0.250 0.250 0 0 0 0.217 0.250 0.173 0.043 0.173 0.217 0.250 0.217 30 0 30 0.217 0.246 0.193 0.048 0.169 0.217 0.246 0.235 30 10 20 0.217 0.241 0.201 0.049 0.168 0.217 0.241 0.241 30 15 15 0.217 0.235 0.207 0.049 0.168 0.217 0.235 0.246 30 20 10 0.217 0.217 0.213 0.046 0.170 0.217 0.217 0.250 30 30 0 0.125 0.250 0.100 0.025 0.100 0.125 0.250 0.125 60 0 60 0.125 0.235 0.172 0.040 0.085 0.125 0.235 0.192 60 20 40 0.125 0.217 0.199 0.043 0.082 0.125 0.217 0.217 60 30 30 0.125 0.192 0.219 0.042 0.083 0.125 0.192 0.235 60 40 20 0.125 0.125 0.238 0.030 0.095 0.125 0.125 0.250 60 60 0 0.000 0.250 0.000 0.000 0.000 0.000 0.250 0.000 90 0 90 0.000 0.217 0.131 0.028 -0.028 0.000 0.217 0.125 90 30 60 0.000 0.177 0.182 0.032 -0.032 0.000 0.177 0.177 90 45 45 0.000 0.125 0.220 0.027 -0.027 0.000 0.125 0.217 90 60 30 0.000 0.000 0.250 0.000 0.000 0.000 0.000 0.250 90 90 0 -0.125 0.250 -0.100 -0.025 -0.100 -0.125 0.250 -0.125 120 0 120 -0.125 0.192 0.070 0.013 -0.138 -0.125 0.192 0.043 120 40 80 -0.125 0.125 0.143 0.018 -0.143 -0.125 0.125 0.125 120 60 60 -0.125 0.043 0.197 0.009 -0.134 -0.125 0.043 0.192 120 80 40 -0.125 -0.125 0.238 -0.030 -0.095 -0.125 -0.125 0.250 120 120 0 -0.217 0.250 -0.173 -0.043 -0.173 -0.217 0.250 -0.217 150 0 150 -0.217 0.161 -0.009 -0.001 -0.215 -0.217 0.161 -0.043 150 50 100 -0.217 0.065 0.079 0.005 -0.222 -0.217 0.065 0.065 150 75 75 -0.217 -0.043 0.152 -0.007 -0.210 -0.217 -0.043 0.161 150 100 50 -0.217 -0.217 0.213 -0.046 -0.170 -0.217 -0.217 0.250 150 150 0 -0.250 0.250 -0.200 -0.050 -0.200 -0.250 0.250 -0.250 180 0 180 -0.250 0.125 -0.095 -0.012 -0.238 -0.250 0.125 -0.125 180 60 120 -0.250 0.000 0.000 0.000 -0.250 -0.250 0.000 0.000 180 90 90 -0.250 -0.125 0.095 -0.012 -0.238 -0.250 -0.125 0.125 180 120 60 -0.250 -0.250 0.200 -0.050 -0.200 -0.250 -0.250 0.250 180 180 0 Table 10. Results presented in Chapter 3, based on possible factor space angles and vector lengths of .5 for a given combination of mediation results.

155

Appendix B: Formulas for Converting Correlations to Factor Loadings, One Factor

If a one-factor model applies, then it follows that

2 rXY r XM/ r MY  X

2 rXM r MY/ r XY  M (6)

2 rXY r MY/ r XM  Y

First, since the squared loadings cannot be smaller than zero, it follows that either all correlations need to have the same sign, or alternatively that two of the three are negative and the third is positive. This is a necessary condition for a one-factor structure, and one negative correlation and two positive correlations require at least two or more factors.

Second, since the squared loadings cannot be larger than one, it follows that the absolute value of each of the three correlations may not be smaller than the product of the other two absolute values. If this condition is fulfilled, one can derive the loadings taking into account the sign of the correlations. Therefore, this is a necessary and sufficient condition for a one-factor structure. One correlation equal to zero violates the one-factor structure. Two or three zero correlations are trivial cases.

156

Appendix C Formulas for Converting Regression Weights to Factor Loadings, One

Factor

Assume three variables: X, M, Y, and a one-factor model, so that there are only

three loadings: XMY,,   . We will discuss two cases: (a) full mediation, and thus c’ = 0, as well as (b) the general case.

Case I:

If c’ = 0 (no direct effect), then rXY r XM r MY , raXM  , and rbMY  .

It then follows that

2 XYXMMYMXY         , and thus M 1, X r XM , and Y r MY .

Case II:

In the general case:

raXM  rXY r XM b c ' rMY r XM c' b

XYXMMYXM   b  c''       c  b (7)  cb'   bcXYMY '  XMXM   

MYXY bc   ' XYXMMYXY   bc        '  XMX   M 22 MYXXYM (1  )   (1  ) bc2 2 '  2 2 11XMXM   

157

It follows from (7) that

2 2 2 2 ab MXYXMX  (1  )  (1  ) 22, (8) c'XYMM  (1  ) 1  and thus that the indirect effect and the direct effect have the same sign.

If they have the same sign independent of the roles of the three variables in the mediation scheme, then a one-factor structure applies. It means that the second condition from

Appendix B is fulfilled. If they do not always have the same sign, this second condition is not fulfilled, and a one-factor solution is not sufficient. Therefor the equal sign condition is a necessary and sufficient condition.

The squared factor loadings can be derived from (7) when the correlations are formulated in terms of the standardized mediation model coefficients:

a( ab c ')  2  X ac' b a(') ac b  2  (9) M ab c ' (ac ' b )( ab c ')  2  Y a

158

Appendix D: Formulas for Converting Correlations to Factor Loadings, Two

Factors

Assume three variables: X, M, and Y. The correlations between these three variables can always be explained by two factors, following the general principle that Q variables can be explained by Q-1 factors. Assuming that one factor is not sufficient to explain the

correlations, the three correlations, rXM,, r MY r XY , are not sufficient to estimate the six loadings. It means that independent of rotational indeterminacy, different factor models

can be constructed. Given the six loadings of XMYXMY1,,,,,  1  1  2  2  2 , for the variables

X, M, and Y, for the factors F1 and F2, the following is one of many possible orthogonal two-factor structures:

XF1 1 (10)

MF1  r XM (11)

YF1  r XY (12)

159

XF 2  0 (13)

2 MF211 MF (12)

YF2()/r MY  MF 1  YF 1  MF 2 (15) 2 (rMY  r XM r XY ) / 1  r XM

which is the semi-partial correlation rYMX(.) .

The logic behind the above derivation is that X is perfectly captured by the factor

(10), and further this factor explains the correlations of X with M and Y (Equations 11 and

12). The remaining relationship between Y with M is partly explained by the second factor (Equations 14 and 15). The factor loading matrix, when expressed in terms of standardized regression coefficients of the mediation model, appears as follows:

1    XF12 XF 22 MF12r XM  a  MF 11  r XM   a   2 YF1r XY  c'  ab  YF 2  r Y ( M . X )  b1  a

(16)

Equation 16 shows how the mediation model coefficients correspond to the factor loadings and vice versa. However, again this is only one of the many possible factor matrices for the same set of three correlations when one factor is not sufficient to describe them.

160

Appendix E: Fungible Mediation Function

##### ## Mediation wrapper for fungible function from Waller (2008) ## Wrapper written by Robert Agler ## ## One or two parallel mediators are supported in this function. ## ## More complex models require direct use of Waller’s function ##

## data = data matrix ## x = a single string or column index value for x ## y = a single string or column index value for y ## m = one or two strings or index values for m(s) ## sets = number of fungible weight sets to calculate ## rcrit = desired correlation between OLS predictions and fungible predictions ## plot = boolean. creates 2d plots of fungible weights if true.

fung.med <- function(data, x, y, m, nsets = 100, rcrit = .98, plot = TRUE){

fungible <- function(R.X,rxy,r.yhata.yhatb = rcrit, sets=nsets){

## R function: Fungible ## Author: Niels Waller ## March 11, 2008 ## ## Alterations by Robert Agler: ## removed all text reporting during calculation of weights ## changed code involved in calculation of d so as to allow just two predictors ## for a single mediator model ## September 19, 2015 ##

161

GenU <- function(mat,u){ ## Generate U matrix via Gram Schmidt p <- ncol(mat) n <- nrow(mat) oData <-matrix(0,n,p+1) oData[,1]<-u for(i in 2:(p+1)){ oData[,i] <- resid(lm(mat[,(i-1)]~-1+oData[,1:(i-1)])) } U<-oData[,2:(p+1)] d <- diag(1/sqrt(diag(crossprod(U))), (n-1)) #(n-1) added here so as to allow just 2 predictors

U <- U%*%d U }#end GenU

NX <- ncol(R.X) a.matrix <- k.matrix <- matrix(0,sets,NX)

b <- crossprod(solve(R.X),rxy)

r <- as.numeric(r.yhata.yhatb) VLV <- eigen(R.X) V <- VLV$vectors L <- diag(VLV$values)

Linv.sqrt <- solve( sqrt(L)) u.star <- t(V)%*%b u.circle <- sqrt(L) %*% u.star u <- u.circle/ as.numeric(sqrt((t(u.circle) %*% u.circle))) r.y.yhatb <- sqrt( (t(b) %*% R.X %*%b) ) mat <- matrix(rnorm(NX*(NX-1)),NX,NX-1) U <- GenU(mat,u) for(i in 1:sets){ z <- rnorm((NX-1)) z <- z / as.numeric( sqrt( t(z) %*% z))

k <- r * u + + U %*% z * sqrt(1-r^2) k.star <- Linv.sqrt%*%k a <- V %*% k.star # scale a to minimize SSE_a s <- (t(rxy) %*% a)/(t(a)%*%R.X%*%a) 162

a <- as.numeric(s) * a

a.matrix[i,] <-a k.matrix[i,] <- k }

colnames(a.matrix) <- paste("a",1:NX,sep="")

# Compute Expected Moments G <- V%*%Linv.sqrt%*%U esq <- (1-r^2) # Expected a mn.a <- r^2 * b Ezsq <- 1/(NX-1) # Expected covariance matrix cov.a <- as.numeric(r^2 * r.y.yhatb^2)* Ezsq * esq * G %*%t(G)

Dmat <- diag(1/sqrt(diag(cov.a))) cor.a <- Dmat %*% cov.a %*% Dmat

list(a = a.matrix , k = k.matrix, b = b, u = u, r.yhata.yhatb=r.yhata.yhatb, r.y.yhatb=r.y.yhatb, cov.a=cov.a, cor.a=cor.a) } data = as.matrix(na.omit(data[,c(x,m,y)])) cov.mat = cov(data, use = "complete.obs") est.a = function(i, m){ xt = na.omit(cbind(1, data[,x])) a.weights = round(solve(t(xt) %*% xt) %*% t(xt) %*% data[,m[i]], digits = 5) return (a.weights[2]) } n.pred = length(x) + length(m) ols.a = unlist(lapply (1:length(m), m = m, FUN = est.a)) ols.bc = summary(lm(data[,y] ~ data[,x] + data[,m])) ols.bc = ols.bc$coefficients[-1,1] 163

ols.ab = ols.a * ols.bc[2:length(ols.bc)]

fung = fungible(R.X = cov.mat[1:(1+length(m)), 1:(1+length(m))], rxy = cov.mat[1:(1+length(m)), nrow(cov.mat)]) fung.short = fung$cor.a

c.max = fung$a[which.max(fung$a[,1]),] b1.max = fung$a[which.max(fung$a[,2]),]

c.min = fung$a[which.min(fung$a[,1]),] b1.min = fung$a[which.min(fung$a[,2]),]

if (ncol(fung$a) == 2){ ols.out = matrix(c(ols.a, ols.bc[2], ols.bc[1], ols.ab), nrow = 1, dimnames = list(c("Weights"), c("a", "b", "c'", "ab")))

c.max.out = matrix(c(c.max[1], c.max[2], (ols.a * c.max[2])), nrow = 1, dimnames = list(c("Weights"), c("c'", "b1", "ab1"))) b1.max.out = matrix(c(b1.max[1], b1.max[2], (ols.a * b1.max[2])), nrow = 1, dimnames = list(c("Weights"), c("c'", "b1", "ab1")))

c.min.out = matrix(c(c.min[1], c.min[2], (ols.a * c.min[2])), nrow = 1, dimnames = list(c("Weights"), c("c'", "b1", "ab1"))) b1.min.out = matrix(c(b1.min[1], b1.min[2], (ols.a * b1.min[2])), nrow = 1, dimnames = list(c("Weights"), c("c'", "b1", "ab1")))

fung.out = list(ols.out, c.max.out, b1.max.out, c.min.out, b1.min.out) names(fung.out) = c("OLS.estimates", "Max.c", "Max.b1", "Min.c", "Min.b1") }

if (ncol(fung$a) == 3){ ols.out = matrix(c(ols.a, ols.bc[2], ols.bc[3], ols.bc[1]), nrow = 1, dimnames = list(c("Weights"), c("a1", "a2", "b1", "b2", "c'"))) b2.max = fung$a[which.max(fung$a[,3]),] b2.min = fung$a[which.min(fung$a[,3]),]

c.max.out = matrix(c(c.max[1], c.max[2], c.max[3], (ols.a[1] * c.max[2]), (ols.a[2] * c.max[3])), nrow = 1, dimnames = list(c("Weights"), c("c'", "b1", "b2", "ab1", "ab2"))) b1.max.out = matrix(c(b1.max[1], b1.max[2], b1.max[3], (ols.a[1] * b1.max[2]), (ols.a[2] * b1.max[3])), nrow = 1, dimnames = list(c("Weights"), c("c'", "b1", "b2", "ab1", "ab2"))) 164

b2.max.out = matrix(c(b2.max[1], b2.max[2], b2.max[3], (ols.a[1] * b2.max[2]), (ols.a[2] * b2.max[3])), nrow = 1, dimnames = list(c("Weights"), c("c'", "b1", "b2", "ab1", "ab2")))

c.min.out = matrix(c(c.min[1], c.min[2], c.min[3], (ols.a[1] * c.min[2]), (ols.a[2] * c.min[3])), nrow = 1, dimnames = list(c("Weights"), c("c'", "b1", "b2", "ab1", "ab2"))) b1.min.out = matrix(c(b1.min[1], b1.min[2], b1.min[3], (ols.a[1] * b1.min[2]), (ols.a[2] * b1.min[3])), nrow = 1, dimnames = list(c("Weights"), c("c'", "b1", "b2", "ab1", "ab2"))) b2.min.out = matrix(c(b2.min[1], b2.min[2], b2.min[3], (ols.a[1] * b2.min[2]), (ols.a[2] * b2.min[3])), nrow = 1, dimnames = list(c("Weights"), c("c'", "b1", "b2", "ab1", "ab2")))

fung.out = list(ols.out, c.max.out, b1.max.out, b2.max.out, c.min.out, b1.min.out, b2.min.out) names(fung.out) = c("OLS.estimates", "Max.c", "Max.b1", "Max.b2", "Min.c", "Min.b1", "Min.b2") }

if (plot){ if (ncol(fung$a) == 2){ #does not presently plot out the ols estimates it seems... plot (fung$a[,1], fung$a[,2], xlab = "c'", ylab = "b") points (ols.bc[1], ols.bc[2], pch = 2)

plot (fung$a[,1], ols.a * fung$a[,2], xlab = "c'", ylab = "ab") points (ols.bc[1], ols.a[1] * ols.bc[2], pch = 2) ###oh right need ab } if (ncol(fung$a) == 3){ plot (fung$a[,1], fung$a[,2], xlab = "c'", ylab = "b1") points (ols.bc[2], ols.bc[3], pch = 2) plot (fung$a[,1], fung$a[,3], xlab = "c'", ylab = "b2") points (ols.bc[2], ols.bc[4], pch = 2) plot (fung$a[,2], fung$a[,3], xlab = "b1", ylab = "b2") points (ols.bc[3], ols.bc[4], pch = 2)

#confirm that ols.a is indexed properly plot (fung$a[,1], fung$a[,2], xlab = "c'", ylab = "ab1") points (ols.bc[2], ols.a[1] * ols.bc[3], pch = 2) plot (fung$a[,1], fung$a[,3], xlab = "c'", ylab = "ab2") points (ols.bc[2], ols.a[2] * ols.bc[4], pch = 2) plot (fung$a[,2], fung$a[,3], xlab = "ab1", ylab = "ab2") points (ols.a[1] * ols.bc[3], ols.a[2] * ols.bc[4], pch = 2) } } 165

return(fung.out)

166

Appendix F: Fungible Mediation Example

Single mediator example:

fung.med(data = pmi, x = "cond", m = "pmi", y = "reaction")

$`OLS estimates`

a b c' ab Weights 0.36066 0.4316278 0.164068 0.1556709

$`Max c'` c' b1 ab1 Weights 0.3342014 0.3827577 0.1380454

$`Max b1` c' b1 ab1 Weights -0.0190596 0.446313 0.1609672

$`Min c'` c' b1 ab1 Weights -0.0190596 0.446313 0.1609672

$`Min b1` c' b1 ab1 Weights 0.3342014 0.3827577 0.1380454

167

Two mediator example: fung.med(data=roommates, x = "PIL ", y = "RSE", m = c("CESD", "ANX))

$`OLS estimates` a1 a2 b1 b2 c' Weights -0.43089 -0.41142 -0.3032736 -0.3080596 0.1375329

$`Max c'` c' b1 b2 ab1 ab2 Weights 0.2703593 -0.2518693 -0.2433101 0.108528 0.1001026

$`Max b1` c' b1 b2 ab1 ab2 Weights 0.1557986 -0.09740261 -0.4746921 0.04196981 0.1952978

$`Max b2` c' b1 b2 ab1 ab2 Weights 0.1509325 -0.4701773 -0.1036889 0.2025947 0.04265967

$`Min c'` c' b1 b2 ab1 ab2 Weights -0.006425723 -0.3491311 -0.3300249 0.1504371 0.1357789

$`Min b1` c' b1 b2 ab1 ab2 Weights 0.08524573 -0.4855254 -0.1320327 0.209208 0.05432089

$`Min b2` c' b1 b2 ab1 ab2 Weights 0.1021804 -0.1191812 -0.4885414 0.05135398 0.2009957

168