<<

Statistical Science 2010, Vol. 25, No. 1, 51–71 DOI: 10.1214/10-STS321 © Institute of Mathematical Statistics, 2010 Identification, Inference and Sensitivity Analysis for Causal Mediation Effects Kosuke Imai, Luke Keele and Teppei Yamamoto

Abstract. Causal mediation analysis is routinely conducted by applied re- searchers in a variety of disciplines. The goal of such an analysis is to inves- tigate alternative causal mechanisms by examining the roles of intermediate variables that lie in the causal paths between the treatment and outcome vari- ables. In this paper we first prove that under a particular version of sequen- tial ignorability assumption, the average causal mediation effect (ACME) is nonparametrically identified. We compare our identification assumption with those proposed in the literature. Some practical implications of our identifi- cation result are also discussed. In particular, the popular estimator based on the linear structural equation model (LSEM) can be interpreted as an ACME estimator once additional parametric assumptions are made. We show that these assumptions can easily be relaxed within and outside of the LSEM framework and propose simple nonparametric estimation strategies. Second, and perhaps most importantly, we propose a new sensitivity analysis that can be easily implemented by applied researchers within the LSEM framework. Like the existing identifying assumptions, the proposed sequential ignorabil- ity assumption may be too strong in many applied settings. Thus, sensitivity analysis is essential in order to examine the robustness of empirical findings to the possible existence of an unmeasured confounder. Finally, we apply the proposed methods to a randomized experiment from political psychol- ogy. We also make easy-to-use software available to implement the proposed methods. Key words and phrases: Causal inference, causal mediation analysis, direct and indirect effects, linear structural equation models, sequential ignorability, unmeasured confounders.

1. INTRODUCTION such an analysis is to investigate causal mechanisms by examining the role of intermediate variables thought to Causal mediation analysis is routinely conducted by lie in the causal path between the treatment and out- applied researchers in a variety of scientific disciplines come variables. Over fifty years ago, Cochran (1957) including epidemiology, political science, psychology pointed to both the possibility and difficulty of using and sociology (see MacKinnon, 2008). The goal of covariance analysis to explore causal mechanisms by stating: “Sometimes these averages have no physical Kosuke Imai is Assistant Professor, Department of Politics, or biological meaning of interest to the investigator, Princeton University, Princeton, New Jersey 08544, USA and sometimes they do not have the meaning that is (e-mail: [email protected];URL: http://imai.princeton.edu). Luke Keele is Assistant ascribed to them at first glance” (page 267). Recently, Professor, Department Political Science, Ohio State a number of statisticians have taken up Cochran’s chal- University, 2140 Derby Hall, Columbus, Ohio 43210, USA lenge. Robins and Greenland (1992) initiated a formal (e-mail: [email protected]). Teppei Yamamoto is study of causal mediation analysis, and a number of ar- Ph.D. Student, Department of Politics, Princeton ticles have appeared in more recent years (e.g., Pearl, University, 031 Corwin Hall, Princeton, New Jersey 08544, 2001; Robins, 2003; Rubin, 2004; Petersen, Sinisi and USA (e-mail: [email protected]). van der Laan, 2006; Geneletti, 2007;Joffe,Smalland

51 52 K. IMAI, L. KEELE AND T. YAMAMOTO

Hsu, 2007; Ten Have et al., 2007; Albert, 2008;Jo, that the sequential ignorability assumption cannot be 2008; Joffe et al., 2008; Sobel, 2008; VanderWeele, directly tested even in randomized experiments, sen- 2008, 2009; Glynn, 2010). sitivity analysis must play an essential role in causal What do we mean by a causal mechanism? The mediation analysis. Finally, in Section 6 we apply the aforementioned paper by Cochran gives the follow- proposed methods to the empirical example, to which ing example. In a randomized experiment, researchers we now turn. study the causal effects of various soil fumigants on eelworms that attack farm crops. They observe that 2. AN EXAMPLE FROM THE SOCIAL SCIENCES these soil fumigants increase oats yields but wish to know whether the reduction of eelworms represents Since the influential article by Baron and Kenny an intermediate phenomenon that mediates this effect. (1986), mediation analysis has been frequently used In fact, many scientists across various disciplines are in the social sciences and psychology in particular. not only interested in causal effects but also in causal A central goal of these disciplines is to identify causal mechanisms because competing scientific theories of- mechanisms underlying human behavior and opinion ten imply that different causal paths underlie the same formation. In a typical psychological experiment, re- cause-effect relationship. searchers randomly administer certain stimuli to sub- In this paper we contribute to this fast-growing lit- jects and compare treatment group behavior or opin- erature in several ways. After briefly describing our ions with those in the control group. However, to di- motivating example in the next section, we prove in rectly test psychological theories, estimating the causal Section 3 that under a particular version of the sequen- effects of the stimuli is typically not sufficient. Instead, tial ignorability assumption, the average causal media- researchers choose to investigate psychological factors tion effect (ACME) is nonparametrically identified. We such as cognition and emotion that mediate causal ef- compare our identifying assumption with those pro- fects in order to explain why individuals respond to a posed in the literature, and discuss practical implica- certain stimulus in a particular way. Another difficulty tions of our identification result. In particular, Baron faced by many researchers is their inability to directly and Kenny’s (1986) popular estimator (Google Scholar manipulate psychological constructs. It is in this con- records over 17 thousand citations for this paper), text that causal mediation analysis plays an essential which is based on a linear structural equation model role in social science research. (LSEM), can be interpreted as an ACME estimator un- In Section 6 we apply our methods to an influen- der the proposed assumption if additional parametric tial randomized experiment from political psychology. assumptions are satisfied. We show that these addi- Nelson, Clawson and Oxley (1997) examine how the tional assumptions can be easily relaxed within and framing of political issues by the news media affects outside of the LSEM framework. In particular, we pro- citizens’ political opinions. While the authors are not pose a simple nonparametric estimation strategy in the first to use causal mediation analysis in political Section 4. We conduct a Monte Carlo experiment to science, their study is one of the most well-known ex- investigate the finite-sample performance of the pro- amples in political psychology and also represents a posed nonparametric estimator and its asymptotic con- typical application of causal mediation analyses in the fidence interval. social sciences. Media framing is the process by which Like many identification assumptions, the proposed news organizations define a political issue or empha- assumption may be too strong for the typical situations size particular aspects of that issue. The authors hy- in which causal mediation analysis is employed. For pothesize that differing frames for the same news story example, in experiments where the treatment is ran- alter citizens’ political tolerance by affecting more gen- domized but the mediator is not, the ignorability of the eral political attitudes. They conducted a randomized treatment assignment holds but the ignorability of the experiment to test this mediation hypothesis. mediator may not. In Section 5 we propose a new sen- Specifically, Nelson, Clawson and Oxley (1997) sitivity analysis that can be implemented by applied re- used two different local newscasts about a Ku Klux searchers within the standard LSEM framework. This Klan rally held in central Ohio. In the experiment, method directly evaluates the robustness of empirical student subjects were randomly assigned to watch the findings to the possible existence of unmeasured pre- nightly news from two different local news channels. treatment variables that confound the relationship be- The two news clips were identical except for the final tween the mediator and the outcome. Given the fact story on the Klan rally. In one newscast, the Klan rally CAUSAL MEDIATION ANALYSIS 53 was presented as a free speech issue. In the second than various causal effects given in the last column of newscast, the journalists presented the Klan rally as a Table 1. Indeed, in many social science experiments, disruption of public order that threatened to turn vio- researchers’ interest lies in the identification of causal lent. The outcome was measured using two different mediation effects rather than the total causal effect or scales of political tolerance. Immediately after viewing controlled direct effects (these terms are formally de- the news broadcast, subjects were asked two seven- fined in the next section). Causal mediation analysis is point scale questions measuring their tolerance for the particularly appealing in such situations. Klan speeches and rallies. The hypothesis was that the One crucial limitation of this study, however, is that causal effects of the media frame on tolerance are me- like many other psychological experiments the origi- diated by subjects’ attitudes about the importance of nal researchers were only able to randomize news sto- free speech and the maintenance of public order. In ries but not subjects’ attitudes. This implies that there other words, the media frame influences subjects’ atti- is likely to be unobserved covariates that confound the tudes toward the Ku Klux Klan by encouraging them relationship between the mediator and the outcome. As to consider the Klan rally as an event relevant for the we formally show in the next section, the existence of general issue of free speech or public order. The re- such confounders represents a violation of a key as- searchers used additional survey questions and a scal- sumption for identifying the causal mechanism. For ing method to measure these hypothesized mediating example, it is possible that subjects’ underlying politi- factors after the experiment was conducted. cal ideology affects both their public order attitude and Table 1 reports descriptive statistics for these media- their tolerance for the Klan rally within each treatment tor variables as well as the treatment and outcome vari- condition. This scenario is of particular concern since ables. The sample size is 136, with 67 subjects exposed it is well established that politically conservative citi- to the free speech frame and 69 subjects assigned to zens tend to be more concerned about public order is- the public order frame. As is clear from the last col- sues and also, in some instances, be more sympathetic umn, the media frame treatment appears to influence to groups like the Klan. In Section 5 we propose a new both types of response variables in the expected direc- sensitivity analysis that partially addresses such con- tions. For example, being exposed to the public order cerns. frame as opposed to the free speech frame significantly increased the subjects’ perceived importance of public 3. IDENTIFICATION order, while decreasing the importance of free speech In this section we propose a new nonparametric iden- (although the latter effect is not statistically signifi- tification assumption for the ACME and discuss its cant). Moreover, the public order treatment decreased practical implications. We also compare the proposed the subjects’ tolerance toward the Ku Klux Klan speech assumption with those available in the literature. in the news clips compared to the free speech frame. 3.1 The Framework It is important to note that the researchers in this ex- ample are primarily interested in the causal mechanism Consider a simple random sample of size n from a between media framing and political tolerance rather population where for each unit i we observe (Ti,Mi,

TABLE 1 Descriptive statistics and estimated average treatment effects from the media framing experiment. The middle four columns show the means and standard deviations of the mediator and outcome variables for each treatment group. The last column reports the estimated average causal effects of the public order frame as opposed to the free speech frame on the three response variables along with their standard errors. The estimates suggest that the treatment affected each of these variables in the expected directions

Treatment media frames Public order Free speech Response variables Mean S.D. Mean S.D. ATE (s.e.)

Importance of free speech 5.25 1.43 5.49 1.35 −0.231 (0.239) Importance of public order 5.43 1.73 4.75 1.80 0.674 (0.303) Tolerance for the KKK 2.59 1.89 3.13 2.07 −0.540 (0.340) Sample size 69 67 54 K. IMAI, L. KEELE AND T. YAMAMOTO

Xi,Yi).WeuseTi to denote the binary treatment vari- would occur if the treatment status is the same and yet able where Ti = 1(Ti = 0) implies unit i receives (does the mediator takes a value that would result under the not receive) the treatment. The mediating variable of other treatment status. Note that the former is observ- interest, that is, the mediator, is represented by Mi , able (if the treatment variable is actually equal to t), whereas Yi represents the outcome variable. Finally, whereas the latter is by definition unobservable [un- Xi denotes the vector of observed pre-treatment covari- der the treatment status t we never observe Mi(1 − t)]. ates, and we use M, X and Y to denote the support of Some feel uncomfortable with the idea of making in- the distributions of Mi , Xi and Yi , respectively. ferences about quantities that can never be observed What qualifies as a mediator? Since the mediator lies (e.g., Rubin, 2005, page 325), while others emphasize in the causal path between the treatment and the out- their importance in policy making and scientific re- come, it must be a post-treatment variable that occurs search (Pearl, 2001, Section 2.4, 2010, Section 6.1.4; before the outcome is realized. Beyond this minimal Hafeman and Schwartz 2009). requirement, what constitutes a mediator is determined Furthermore, the above notation implicitly assumes solely by the scientific theory under investigation. Con- that the potential outcome depends only on the val- sider the following example, which is motivated by a ues of the treatment and mediating variables and, in referee’s comment. Suppose that the treatment is par- particular, not on how they are realized. For example, ents’ decision to have their child receive the live vac- this assumption would be violated if the outcome vari- cine for H1N1 flu virus and the outcome is whether the able responded to the value of the mediator differently child develops flu or not. For a virologist, a mediator of depending on whether it was directly assigned or oc- interest may be the development of antibodies to H1N1 curred as a natural response to the treatment, that is, for = ∈ M = − live vaccine. But, if parents sign a form acknowledging t 0, 1andallm , Yi(t, Mi(t)) Yi(t, Mi(1 = = = the risks of the vaccine, can this act of form signing t)) Yi(t, m) if Mi(1) Mi(0) m. also be a mediator? Indeed, social scientists (if not vi- Thus, equation (1) formalizes the idea that the medi- rologists!) may hypothesize that being informed of the ation effects represent the indirect effects of the treat- risks will make parents less likely to have their child ment through the mediator. In this paper we focus on receive the second dose of the vaccine, thereby increas- the identification and inference of the average causal ing the risk of developing flu. This example highlights mediation effect (ACME), which is defined as ¯ the important role of scientific theories in causal medi- δ(t) ≡ E(δi(t)) ation analysis. (2) To define the causal mediation effects, we use the po- = E{Yi(t, Mi(1)) − Yi(t, Mi(0))} tential outcomes framework. Let Mi(t) denote the po- for t = 0, 1. In the potential outcomes framework, the tential value of the mediator for unit i under the treat- causal effect of the treatment on the outcome for unit i = ment status Ti t. Similarly, we use Yi(t, m) to repre- is defined as τ ≡ Y (1,M (1)) − Y (0,M (0)),which = i i i i i sent the potential outcome for unit i when Ti t and is typically called the total causal effect. Therefore, the = Mi m. Then, the observed variables can be written as causal mediation effect and the total causal effect have = = Mi Mi(Ti) and Yi Yi(Ti,Mi(Ti)). Similarly, if the the following relationship: mediator takes J different values, there exist 2J poten- = + − tial values of the outcome variable, only one of which (3) τi δi(t) ζi(1 t), can be observed. where ζi(t) = Yi(1,Mi(t)) − Yi(0,Mi(t)) for t = 0, 1. Using the potential outcomes notation, we can define This quantity ζi(t) is called the natural direct effect by the causal mediation effect for unit i under treatment Pearl (2001)andthepure/total direct effect by Robins status t as (see Robins and Greenland, 1992; Pearl, (2003). This represents the causal effect of the treat- 2001) ment on the outcome when the mediator is set to the po- tential value that would occur under treatment status t. (1) δi(t) ≡ Yi(t, Mi(1)) − Yi(t, Mi(0)) In other words, ζi(t) is the direct effect of the treat- for t = 0, 1. Pearl (2001) called δi(t) the natural in- ment when the mediator is held constant. Equation (3) direct effect, while Robins (2003) used the term the shows an important relationship where the total causal pure indirect effect for δi(0) and the total indirect ef- effect is equal to the sum of the mediation effect un- fect for δi(1). In words, δi(t) represents the difference der one treatment condition and the natural direct effect between the potential outcome that would result un- under the other treatment condition. Clearly, this equal- der treatment status t, and the potential outcome that ity also holds for the average total causal effect so that CAUSAL MEDIATION ANALYSIS 55 ¯ ¯ τ¯ ≡ E{Yi(1,Mi(1)) − Yi(0,Mi(0))}=δ(t) + ζ(1 − t) lows for t = 0, 1: for t = 0, 1whereζ(t)¯ = E(ζ (t)). i ¯ The causal mediation effects and natural direct ef- δ(t) = E(Yi|Mi = m, Ti = t,Xi = x) fects differ from the controlled direct effect of the me-  {dF | = = (m) diator, that is, Yi(t, m) − Yi(t, m ) for t = 0, 1and Mi Ti 1,Xi x  m = m , and that of the treatment, that is, Yi(1,m)− − } dFMi |Ti =0,Xi =x(m) dFXi (x), Yi(0,m) for all m ∈ M (Pearl, 2001; Robins, 2003). ¯ Unlike the mediation effects, the controlled direct ef- ζ(t)= {E(Yi|Mi = m, Ti = 1,Xi = x) fects of the mediator are defined in terms of specific  values of the mediator, m and m , rather than its poten- − E(Yi|Mi = m, Ti = 0,Xi = x)} tial values, Mi(1) and Mi(0). While causal mediation dF | = = (m) dF (x), analysis is used to identify possible causal paths from Mi Ti t,Xi x Xi Ti to Yi , the controlled direct effects may be of inter- where FZ(·) and FZ|W (·) represent the distribution est, for example, if one wishes to understand how the function of a random variable Z and the conditional causal effect of Mi on Yi changes as a function of Ti . distribution function of Z given W . In other words, the former examines whether M medi- i A proof is given in Appendix A. Theorem 1 is quite ates the causal relationship between T and Y , whereas i i general and can be easily extended to any types of the latter investigates whether T moderates the causal i treatment regimes, for example, a continuous treatment effect of M on Y (Baron and Kenny, 1986). i i variable. In fact, the proof requires no change except 3.2 The Main Identification Result letting t and t take values other than 0 and 1. Assump- tion 1 can also be somewhat relaxed by replacing equa- We now present our main identification result using the potential outcomes framework described above. We tion (5) with its corresponding mean independence as- show that under a particular version of sequential ig- sumption. However, as mentioned above, this identifi- norability assumption, the ACME is nonparametrically cation result does not hold under the standard sequen- identified. We first define our identifying assumption: tial ignorability assumption. As shown by Avin, Sh- pitser and Pearl (2005) and also pointed out by Robins ASSUMPTION 1 (Sequential ignorability). (2003), the nonparametric identification of natural di-  rect and indirect effects is not possible without an addi- (4) {Yi(t ,m),Mi(t)}⊥⊥ Ti|Xi = x, tional assumption if equation (5) holds only after con-  ⊥⊥ | = = (5) Yi(t ,m) Mi(t) Ti t,Xi x ditioning on the post-treatment confounders Zi as well   ⊥⊥ for t,t = 0, 1, and all x ∈ X where it is also assumed as the pre-treatment covariates Xi ,thatis,Yi(t ,m) | = = =  = that 0 < Pr(Ti = t|Xi = x) and 0

TABLE 2 Finite-sample performance of the proposed estimators and their variance estimators. The table presents the results of a Monte Carlo experiment with varying sample sizes and fifty thousand iterations. The upper half of the table represents the results for δ(ˆ 0) and the bottom half δ(ˆ 1). The columns represent (from left to right) the following: sample sizes, estimated biases, root mean squared errors (RMSE) and the coverage probabilities of the 95% confidence intervals of the nonparametric estimators, and the same set of quantities for the parametric estimators. Thetruevaluesofδ(¯ 0) and δ(¯ 1) are 0.675 and 4.03, respectively. The results indicate that nonparametric estimators have smaller bias than the parametric estimator though its variance is much larger. The confidence intervals converge to the nominal coverage as the sample size increases. The convergence occurs much more quickly for the parametric estimator

Nonparametric estimator Parametric estimator Sample size Bias RMSE 95% CI coverage Bias RMSE 95% CI coverage

δ(ˆ 0) 50 0.002 1.034 0.824 0.096 0.965 0.919 100 0.006 0.683 0.871 0.044 0.566 0.933 500 −0.002 0.292 0.922 0.006 0.229 0.947 δ(ˆ 1) 50 0.010 2.082 0.886 −0.010 1.840 0.934 100 0.005 1.462 0.912 0.003 1.290 0.944 500 0.001 0.643 0.939 0.001 0.570 0.955 CAUSAL MEDIATION ANALYSIS 61 diator is not. Causal mediation analysis is most fre- of ρ and compute the corresponding estimate of the quently applied to such experiments. In this case, equa- ACME. In a quite different context, Roy, Hogan and tion (4) of Assumption 1 is satisfied but equation (5) Marcus (2008) take this general strategy of computing may not hold for two reasons. First, there may exist a quantity of interest at various values of an unidentifi- unmeasured pre-treatment covariates that confound the able sensitivity parameter. relationship between the mediator and the outcome. The next theorem shows that if the treatment is ran- Second, there may exist observed or unobserved post- domized, the ACME is identified given a particular treatment confounders. These possibilities, along with value of ρ. other obstacles encountered in applied research, have led some scholars to warn against the abuse of media- THEOREM 4 (Identification with a given error corre- tion analyses (e.g., Green, Ha and Bullock, 2010). In- lation). Consider the LSEM defined in equations (11), deed, as we formally show below, the data generating (12) and (13). Suppose that equation (4) holds and the process contains no information about the credibility correlation between εi2 and εi3, that is, ρ, is given. − of the sequential ignorability assumption. If we further assume 1 <ρ<1, then the ACME is To address this problem, we develop a method to as- identified and given by sess the sensitivity of an estimated ACME to unmea- ¯ ¯ β2σ1 sured pre-treatment confounding (The proposed sensi- δ(0) = δ(1) = ρ˜ − ρ (1 −˜ρ2)/(1 − ρ2) , σ2 tivity analysis, however, does not address the possible 2 ≡ = ˜ ≡ existence of post-treatment confounders). The method where σj Var(εij ) for j 1, 2 and ρ Corr(εi1, is based on the standard LSEM framework described εi2). in Section 3.4 and can be easily used by applied re- A proof is in Appendix D. We offer several re- searchers to examine the robustness of their empirical marks about Theorem 4. First, the unbiased esti- findings. We derive the maximum departure from equa- mates of (α ,α ,β ,β ) can be obtained by fitting the tion (5) that is allowed while maintaining their original 1 2 1 2 equation-by-equation least squares of equations (11) conclusion about the direction of the ACME (see Imai and Yamamoto, 2010). For notational simplicity, we do and (12). Given these estimates, the covariance ma- 2 2 ˜ not explicitly condition on the pre-treatment covariates trix of (εi1,εi2), whose elements are (σ1 ,σ2 , ρσ1σ2), X . However, the same analysis can be conducted by can be consistently estimated by computing the sam- i ˆ = including them as additional covariates in each regres- ple covariance matrix of the residuals, that is, εi1 −ˆ − ˆ ˆ = −ˆ − ˆ sion. Yi α1 β1Ti and εi2 Mi α2 β2Ti . Second, the partial derivative of the ACME with re- 5.1 Parametric Sensitivity Analysis Based on the spect to ρ implies that the ACME is either monoton- Residual Correlation ically increasing or decreasing in ρ, depending on The proof of Theorem 2 implies that if equation (4) the sign of β2. The ACME is also symmetric about (ρ, δ(t))¯ = (0,β ρσ˜ /σ ) holds, εi2 ⊥⊥ Ti and εi3 ⊥⊥ Ti hold but εi2 ⊥⊥ εi3 does 2 1 2 . not unless equation (5) also holds. Thus, one way to Third, the ACME is zero if and only if ρ equals ρ˜. assess the sensitivity of one’s conclusions to the viola- This implies that researchers can easily check the ro- tion of equation (5) is to use the following sensitivity bustness of their conclusion obtained under the sequen- parameter: tial ignorability assumption via correlation between εi1 and ε . For example, if δ(t)ˆ = βˆ γˆ is negative, the true ≡ i2 2 (24) ρ Corr(εi2,εi3), ACME is also guaranteed to be negative if ρ<ρ˜ holds. where −1 <ρ<1. In Appendix C we show that As- Fourth, the expression of the ACME given in The- sumption 1 implies ρ = 0. (Of course, the contrapos- orem 4 is cumbersome to use when computing the itive of this statement is also true; ρ = 0 implies the standard errors. A more straightforward and general violation of Assumption 1). A nonzero correlation pa- approach is to apply the iterative feasible generalized rameter can be interpreted as the existence of omitted least square algorithm of the seemingly unrelated re- variables that are related to both the observed value of gression (Zellner, 1962), and use the associated asymp- the mediator Mi and the potential outcomes Yi even totic variance formula. This strategy will also work after conditioning on the treatment variable Ti (and the when there is an interaction term between the treatment observed covariates Xi ). Note that these omitted vari- and mediating variables as in equation (15) and/or ables must causally precede Ti . Then, we vary the value when there are observed pre-treatment covariates Xi . 62 K. IMAI, L. KEELE AND T. YAMAMOTO

Finally, Theorem 4 implies the following corollary, confounder. In this case, we use the following sensitiv- which shows that under the LSEM the data generating ity parameters: process is not informative at all about either the sensi- Var(ε ) − Var(ε ) 2 ≡ i2 i2 = − 2 2∗ tivity parameter ρ or the ACME without equation (5). RM (1 RM )RM This result highlights the difficulty of causal mediation Var(Mi) analysis and the importance of sensitivity analysis even and in the parametric modeling setting. Var(ε ) − Var(ε ) 2 ≡ i3 i3 = − 2 2∗ RY (1 RY )RY , COROLLARY 1 (Bounds on the sensitivity parame- Var(Yi) ter). Consider the LSEM defined in equations (11), 2 2 (12) and (13). Suppose that equation (4) holds but where RM and RY represent the coefficients of de- termination from the two regressions given in equa- equation (5) may not. Then, the sharp, that is, best pos- 2∗ 2∗ sible, bounds on the sensitivity parameter ρ and ACME tions (12)and(13). Note that unlike RM and RY 2 2 are given by (−1, 1) and (−∞, ∞), respectively. (as well as ρ given in Corollary 1), RM and RY are bounded from above by Var(εi2)/ Var(Mi) and The first statement of the corollary follows directly Var(εi3)/ Var(Yi), respectively. from the proof of Theorem 4, while the second state- In either case, it is straightforward to show that the ment can be proved by taking a limit of δ(t) as ρ tends following relationship between ρ and these parameters to −1or1. 2 = 2∗ 2∗ = 2 2 { − 2 − holds, that is, ρ RM RY RM RY / (1 RM )(1 2 } 5.2 Parametric Sensitivity Analysis Based on the RY ) or, equivalently, Coefficients of Determination sgn(λ λ )R R = ∗ ∗ = 2 3 M Y ρ sgn(λ2λ3)RM RY , The sensitivity parameter ρ can be given an alterna- (1 − R2 )(1 − R2 ) tive definition which allows it to be interpreted as the M Y magnitude of an unobserved confounder. This alterna- ∗ ∗ [ ] where RM ,RY , RM and RY are in 0, 1 . Thus, in tive version of ρ is based on the following decomposi- this framework, researchers can specify the values of tion of the error terms in equations (12)and(13): 2∗ 2∗ 2 2 (RM ,RY ) or (RM , RY ) as well as the sign of λ2λ3 in  order to determine values of ρ and estimate the ACME ε = λ U + ε ij j i ij based on these values of ρ. Then, the analyst can ex- amine variation in the estimated ACME with respect to for j = 2, 3, where Ui is an unobserved confounder change in these parameters. and the sequential ignorability is assumed given Ui and Ti . Again, note that Ui has to be a pre-treatment 5.3 Extensions to Nonlinear and variable so that the resulting estimates can be given Nonparametric Models a causal interpretation. In addition, we assume that  ⊥⊥ = The proposed sensitivity analysis above is developed εij Ui for j 2, 3. We can then express the influ- ence of the unobserved pre-treatment confounder using within the framework of the LSEM, but some exten- the following coefficients of determination: sions are possible. For example, Imai, Keele and Tin- gley (2009) show how to conduct sensitivity analysis Var(ε ) 2∗ ≡ − i2 with probit models when the mediator and/or the out- RM 1 Var(εi2) come are discrete. In Appendix E, while it is substan- tially more difficult to conduct such an analysis in the and nonparametric setting, we consider sensitivity analysis Var(ε ) 2∗ ≡ − i3 for the nonparametric plug-in estimator introduced in RY 1 , Var(εi3) Section 4.2 (see also VanderWeele, 2010 for an alter- native approach). which represent the proportion of previously unex- plained variance (either in the mediator or in the out- 6. EMPIRICAL APPLICATION come) that is explained by the unobserved confounder (see Imbens, 2003). In this section we apply our proposed methods to the Another interpretation is based on the proportion of influential randomized experiment from political psy- original variance that is explained by the unobserved chology we described in Section 2. CAUSAL MEDIATION ANALYSIS 63

6.1 Analysis under Sequential Ignorability AsshowninSection3.4, we can relax the no- In the original analysis, Nelson, Clawson and Ox- interaction assumption (Assumption 2) that is implicit ley (1997) used a LSEM similar to the one discussed in the LSEM of Baron and Kenny (1986). The first and in Section 3.4 and found that subjects who viewed the second rows of the table present estimates from the Klan story with the free speech frame were signifi- parametric and nonparametric analysis without this as- cantly more tolerant of the Klan than those who saw sumption. These results show that the estimated ACME the story with the public order frame. The researchers under the free speech condition [δ(ˆ 0)] is larger than the also found evidence supporting their main hypothesis effect under the public order condition [δ(ˆ 1)] for both that subjects’ general attitudes mediated the causal ef- the parametric and nonparametric estimators. In fact, fect of the news story frame on tolerance for the Klan. the 95% confidence interval for the nonparametric es- In the analysis that follows, we only analyze the pub- ¯ lic order mediator, for which the researchers found a timate of δ(1) includes zero. However, we fail to reject ¯ = ¯ significant mediation effect. the null hypothesis of δ(0) δ(1) under the parametric As we showed in Section 3.4, the original results can analysis, with a p-value of 0.238. be given a causal interpretation under sequential ig- Based on this finding, the no-interaction assumption norability, that is, Assumption 1. Here, we first make could be regarded as appropriate. The last two rows in this assumption and estimate causal effects based on Table 3 contain the analysis based on the parametric our theoretical results. Table 3 presents the findings. estimator under this assumption. As expected, the es- The second and third columns of the table show the timated ACME is between the previous two estimates, estimated ACME and average total effect based on and the 95% confidence interval does not contain zero. the LSEM and the nonparametric estimator, respec- Finally, the estimated average total effect is identical tively. The 95% asymptotic confidence intervals are constructed using the Delta method. For most of the to that without Assumption 2. This makes sense since estimates, the 95% confidence intervals do not contain the no-interaction assumption only restricts the way the zero, mirroring the finding from the original study that treatment effect is transmitted to the outcome and thus general attitudes about public order mediated the effect does not affect the estimate of the overall treatment ef- of the media frame. fect.

TABLE 3 Parametric and nonparametric estimates of the ACME under sequential ignorability in the media framing experiment. Each cell of the table represents an estimated average causal effect and its 95% confidence interval. The outcome is the subjects’ tolerance level for the free speech rights of the Ku Klux Klan, and the treatments are the public order frame (Ti = 1) andthefreespeechframe(Ti = 0). The second column of the table shows the results of the parametric LSEM approach, while the third column of the table presents those of the nonparametric estimator. The lower part of the table shows the results of parametric mediation analysis under the no-interaction assumption [δ(ˆ 1) = δ(ˆ 0)], while the upper part presents the findings without this assumption, thereby showing the estimated average mediation effects under the treatment and the control, that is, δ(ˆ 1) and δ(ˆ 0)

Parametric Nonparametric

Average mediation effects Free speech frame δ(ˆ 0) −0.566 −0.596 [−1.081, −0.050][−1.168, −0.024] Public order frame δ(ˆ 1) −0.451 −0.374 [−0.871, −0.031][−0.823, 0.074] Average total effect τˆ −0.540 −0.540 [−1.207, 0.127][−1.206, 0.126] With the no-interaction assumption Average mediation effect −0.510 δ(ˆ 0) = δ(ˆ 1) [−0.969, −0.051] Average total effect τˆ −0.540 [−1.206, 0.126] 64 K. IMAI, L. KEELE AND T. YAMAMOTO

6.2 Sensitivity Analysis Next, we present the same sensitivity analysis us- ing the alternative interpretation of ρ which is based The estimates in Section 6.1 are identified if the se- on two coefficients of determination as defined in quential ignorability assumption holds. However, since Section 5; (1) the proportion of unexplained variance the original researchers randomized news stories but that is explained by an unobserved pre-treatment con- subjects’ attitudes were merely observed, it is unlikely 2∗ 2∗ founder (RM and RY ) and (2) the proportion of the this assumption holds. As we discussed in Section 2, original variance explained by the same unobserved one particular concern is that subjects’ pre-existing ide- ˜2 ˜2 confounder (RM and RY ). Figure 2 shows two plots ology affects both their attitudes toward public order is- based on the types of coefficients of determination. The sues and their tolerance for the Klan within each treat- lower left quadrant of each plot in the figure repre- ment condition. Thus, we next ask how sensitive these sents the case where the product of the coefficients for estimates are to violations of this assumption using the the unobserved confounder is negative, while the upper methods proposed in Section 5. We consider politi- right quadrant represents the case where the product is cal ideology to be a possible unobserved pre-treatment positive. confounder. We also maintain Assumption 2. For example, this product will be positive if the Figure 1 presents the results for the sensitivity analy- unobserved pre-treatment confounder represents sub- sis based on the residual correlation. We plot the es- jects’ political ideology, since conservatism is likely timated ACME of the attitude mediator against dif- to be positively correlated with both public order im- fering values of the sensitivity parameter ρ,whichis portance and tolerance for the Klan. Under this sce- equal to the correlation between the two error terms nario, the original conclusion about the direction of the of equations (27)and(28) for each. The analysis in- ACME is perfectly robust to the violation of sequential dicates that the original conclusion about the direction ignorability, because the estimated ACME is always of the ACME under Assumption 1 (represented by the negative in the upper right quadrant of each plot. On dashed horizontal line) would be maintained unless ρ the other hand, the result is less robust to the existence is less than −0.68. This implies that the conclusion is of an unobserved confounder that has opposite effects plausible given even fairly large departures from the on the mediator and outcome. However, even for this ignorability of the mediator. This result holds even af- alternative situation, the ACME is still guaranteed to ter we take into account the variability, as the be negative as long as the unobserved confounder ex- confidence interval covers the value of zero only when plains less than 27.7% of the variance in the mediator −0.79 <ρ<−0.49. Thus, the original finding about or outcome that is left unexplained by the treatment the negative ACME is relatively robust to the violation alone, no matter how large the corresponding portion of equation (5) of Assumption 1 under the LSEM. of the variance in the other variable may be. Similarly, the direction of the original estimate is maintained if the unobserved confounder explains less than 26.7% (14.7%) of the original variance in the mediator (out- come), regardless of the degree of confounding for the outcome (mediator).

7. CONCLUDING REMARKS In this paper we study identification, inference and sensitivity analysis for causal mediation effects. Causal mediation analysis is routinely conducted in various disciplines, and our paper contributes to this fast- growing methodological literature in several ways. First, we provide a new identification condition for the FIG.1. Sensitivity analysis for the media framing experiment. ACME, which is relatively easy to interpret in substan- The figure presents the results of the sensitivity analysis described tive terms and also weaker than existing results in some in Section 5. The solid line represents the estimated ACME for the attitude mediator for differing values of the sensitivity parameter situations. Second, we prove that the estimates based ρ, which is defined in equation (24). Thegrayregionrepresentsthe on the standard LSEM can be given valid causal inter- 95% confidence interval based on the Delta method. The horizontal pretations under our proposed framework. This pro- dashed line is drawn at the point estimate of δ¯ under Assumption 1. vides a basis for formally analyzing the validity of CAUSAL MEDIATION ANALYSIS 65 The term . ) 2 Y ˜ R , 2 M ˜ R ( or ) ∗ 2 Y ,R 2 ∗ M which represent the proportion of (R 2 Y ˜ Each plot contains various mediation R . 5 and 2 M ˜ R which represent the proportion of unexplained variance ∗ 2 Y R and ∗ 2 M R The right plot contains the contours for . . respectively , The left plot contains the contours for . Each line represents the estimated ACME under proposed values of either . The plot presents the results of the sensitivity analysis described in Section . represents the sign on the product of the coefficients of the unobserved confounder An alternative interpretation of the sensitivity analysis ) 3 λ 2 .2. (λ IG F that is explained by the unobserved confounder for the mediator and outcome the variance explained by the unobserved pre-treatment confounder effects under an unobserved pre-treatment confounder of various magnitudes sgn 66 K. IMAI, L. KEELE AND T. YAMAMOTO empirical studies using the LSEM framework. Third, general underlying issues. The H1N1 flu virus exam- we propose simple nonparametric estimation strategies ple mentioned in Section 3.1 also highlights the same for the ACME. This allows researchers to avoid the fundamental point. Thus, causal mediation analysis can stronger functional form assumptions required in the be uncomfortably far from a completely data-oriented standard LSEM. Finally, we offer a parametric sensi- approach to scientific investigations. It is, however, tivity analysis that can be easily used by applied re- precisely this aspect of causal mediation analysis that searchers in order to assess the sensitivity of estimates makes it appealing to those who resist standard statisti- to the violation of this assumption. We view sensitivity cal analyses that focus on estimating treatment effects, analysis as an essential part of causal mediation analy- an approach which has been somewhat pejoratively la- sis because the assumptions required for identifying beled as a “black-box” view of (e.g., Skra- causal mediation effects are unverifiable and often are banek, 1994; Deaton, 2009). It may be the case that not justified in applied settings. causal mediation analysis has the potential to signifi- cantly broaden the scope of statistical analysis of cau- At this point, it is worth briefly considering the pro- sation and build a bridge between scientists and statis- gression of mediation research from its roots in the ticians. empirical psychology literature to the present. In their There are a number of possible future generaliza- seminal paper, Baron and Kenny (1986) supplied ap- tions of the proposed methods. First, the sensitivity plied researchers with a simple method for mediation analysis can potentially be extended to various nonlin- analysis. This method has quickly gained widespread ear regression models. Some of this has been done by acceptance in a number of applied fields. While psy- Imai, Keele and Tingley (2009). Second, an important chologists extended this LSEM framework in a number generalization would be to allow multiple mediators of ways, little attention was paid to the conditions un- in the identification analysis. This will be particularly der which their popular estimator can be given a causal valuable since in many applications researchers aim interpretation. Indeed, the formal definition of the con- to test competing hypotheses about alternative causal cept of causal mediation had to await the later works by mechanisms via mediation analysis. For example, the epidemiologists and statisticians (Robins and Green- media framing study we analyzed in this paper in- land, 1992; Pearl, 2001; Robins, 2003). The progress cluded another measurement (on a separate group ran- made on the identification of causal mediation effects domly split from the study sample) which was pur- by these authors has led to the recent development of ported to test an alternative causal pathway. The formal alternative and more general estimation strategies (e.g., treatment of this issue will be a major topic of future re- Imai, Keele and Tingley, 2009; VanderWeele, 2009). search. Third, implications of measurement error in the In this paper we show that under a set of assumptions mediator variable have yet to be analyzed. This repre- this popular product of coefficients estimator can be sents another important research topic, as mismeasured given a causal interpretation. Thus, over twenty years mediators are quite common, particularly in psycho- later, the work of Baron and Kenny has come full cir- logical studies. Fourth, an important limitation of our cle. framework is that it does not allow the presence of a Despite its natural appeal to applied scientists, sta- post-treatment variable that confounds the relationship tisticians often find the concept of causal mediation between mediator and outcome. As discussed in Sec- mysterious (e.g., Rubin, 2004). Part of this skepticism tion 3.3, some of the previous results avoid this prob- seems to stem from the concept’s inherent dependence lem by making additional identification assumptions on background scientific theory; whether a variable (e.g., Robins, 2003). The exploration of alternative so- lutions is also left for future research. Finally, it is im- qualifies as a mediator in a given empirical study relies portant to develop new experimental designs that help crucially on the investigator’s belief in the theory be- identify causal mediation effects with weaker assump- ing considered. For example, in the social science ap- tions. Imai, Tingley and Yamamoto (2009)present plication introduced in Section 2, the original authors some new ideas on the experimental identification of test whether the effect of a media framing on citizens’ causal mechanisms. opinion about the Klan rally is mediated by a change in attitudes about general issues. Such a setup might make APPENDIX A: PROOF OF THEOREM 1 no sense to another political psychologist who hypoth- esizes that the change in citizens’ opinion about the First, note that equation (4) in Assumption 1 implies   Klan rally prompts shifts in their attitudes about more (25) Yi(t ,m)⊥⊥ Ti|Mi(t) = m ,Xi = x. CAUSAL MEDIATION ANALYSIS 67

Now, for any t,t,wehave tions (12)and(13) using the potential outcome nota-  tion as follows: E(Yi(t, Mi(t ))|Xi = x) = + +  (27) Mi(Ti) α2 β2Ti εi2(Ti), = E Yi(t, m)|Mi(t ) = m, Xi = x Yi(Ti,Mi(Ti)) = α3 + β3Ti + γMi(Ti)  (28) dFMi (t )|Xi =x(m) + εi3(Ti,Mi(Ti)),   = E Yi(t, m)|Mi(t ) = m, Ti = t ,Xi = x where the following normalization is used: E(εi2(t)) = E = = ∈ M  (εi3(t, m)) 0fort 0, 1andm . Then, equa- dFMi (t )|Xi =x(m) tion (4) of Assumption 1 implies εi2(t) ⊥⊥ Ti , yield-  ing E(ε (T )|T = t)= E(ε (t)) = 0foranyt = 0, 1. = E(Yi(t, m)|Ti = t ,Xi = x) i2 i i i2 Similarly, equation (5) implies εi3(t, m) ⊥⊥ Mi|Ti = dFM (t)|X =x(m) t for all t and m, yielding E(εi3(Ti,Mi(Ti))|Ti = i i t,Mi = m) = E(εi3(t, m)|Ti = t) = E(εi3(t, m)) = = E(Yi(t, m)|Ti = t,Xi = x) 0foranyt and m where the second equality fol- lows from equation (4). Thus, the parameters in equa- dFM (t)|T =t,X =x(m) i i i tions (12)and(13) are identified under Assumption 1. Finally, under Assumption 1 and the LSEM, we can = E Yi(t, m)|Mi(t) = m, Ti = t,Xi = x write E(Mi|Ti) = α2 + β2Ti ,andE(Yi|Mi,Ti) = α3 +   β3Ti + γMi . Using these expressions and Theorem 1, dFMi (t )|Ti =t ,Xi =x(m) the ACME can be shown to equal β2γ . = E(Yi|Mi = m, Ti = t,Xi = x) APPENDIX C: PROOF THAT ρ = 0 UNDER   dFMi (t )|Ti =t ,Xi =x(m) ASSUMPTION 1

(26) = E(Yi|Mi = m, Ti = t,Xi = x) First, as shown in Appendix B, Assumption 1 im- plies E(εi2(Ti)|Ti) = 0andE(εi3(Ti,Mi(Ti))|Ti,  dFMi |Ti =t ,Xi =x(m), Mi) = 0 where the (potential) error terms are defined in equations (27)and(28). These mean independence where the second equality follows from equation (25), relationships (together with the law of iterated expec- equation (5) is used to establish the third and fifth tations) imply equalities, equation (4) is used to establish the fourth and last equalities, and the sixth equality follows from 0 = E(εi3(Ti,Mi(Ti))Mi) the fact that M = M (T ) and Y = Y (T ,M (T )).Fi- i i i i i i i i = E ε (T ,M (T )) α + β T + ε (T ) nally, equation (26) implies i3 i i i 2 2 i i2 i  = E{ε (T ,M (T ))ε (T ))}. E(Y (t, M (t ))) i3 i i i i2 i i i Thus, under Assumption 1,wehaveρ = 0 ⇐⇒ = E(Y |M = m, T = t,X = x) i i i i E{εi2(Ti)εi3(Ti,Mi(Ti))}=0.

dFM |T =t,X =x(m) dFX (x). i i i i APPENDIX D: PROOF OF THEOREM 4 ¯ Substituting this expression into the definition of δ(t) First, we write the LSEM in terms of equations (12) given by equations (1)and(2) yields the desired ex- and (14). We omit possible pre-treatment confounders pression for the ACME. In addition, since τ¯ = ζ(t)¯ + ¯   = =  Xi from the model for notational simplicity, although δ(t ) for any t,t 0, 1andt t under Assumption 1, the result below remains true even if such confounders the result for the average natural direct effects is also are included. Since equation (4) implies E(εji|Ti) = 0 immediate. for j = 2, 3, we can consistently estimate (α1,α2,β1, β2),whereα1 = α3 + α2γ and β1 = β3 + β2γ ,as APPENDIX B: PROOF OF THEOREM 2 2 2 ˜ well as (σ1 ,σ2 , ρ). Thus, given a particular value of ˜ = 2 + 2 = 2 2 + We first show that under Assumption 1 the model ρ,wehaveρσ1σ2 γσ2 ρσ2σ3 and σ1 γ σ2 2 + = =˜ parameters in the LSEM are identified. Rewrite equa- σ3 2γρσ2σ3.Ifρ 0, then γ ρσ1/σ2 provided 68 K. IMAI, L. KEELE AND T. YAMAMOTO 2 = 2 −˜2 ≥ = = = = that σ3 σ1 (1 ρ ) 0. Now, assume ρ 0. Then, (31) Pr Yi(1, 1) y11,Yi(1, 0) y10, substituting σ3 = (ρσ˜ 1 − γσ2)/ρ into the above ex- Yi(0, 1) = y01,Yi(0, 0) = y00| pression of σ 2 yields the following quadratic equa- 1  2 − ˜ + 2 ˜2 − 2 { 2 − 2 }= Mi = 0,Ti = t tion: γ 2γ ρσ1/σ2 σ1 (ρ ρ )/ σ2 (1 ρ ) 0. Solving this equation and using σ3 ≥ 0, we ob-  for all t ,ytm, ∈{0, 1}. The equality states that within tain the following desired expression: γ = σ1 {˜ρ − σ2 each treatment group the mediator is assigned indepen- ρ (1 −˜ρ2)/(1 − ρ2)}. Thus, given a particular value dent of potential outcomes. We now consider the fol- of ρ, δ(t)¯ is identified. lowing sensitivity parameter υ, which is the maximum possible difference between the left- and right-hand APPENDIX E: NONPARAMETRIC side of equation (31). That is, υ represents the upper SENSITIVITY ANALYSIS bound on the absolute difference in the proportion of any principal stratum that may exist between those who We consider a sensitivity analysis for the simple take different values of the mediator given the same plug-in nonparametric estimator introduced in Sec- treatment status. Thus, this provides one way to para- tion 4.2. Unfortunately, sensitivity analysis is not as metrize the maximum degree to which the sequential straightforward as the parametric settings. Here, we ex- ignorability can be violated. (Other, potentially more amine the special case of binary mediator and outcome intuitive, parametrization are possible, but, as shown where some progress can be made and leave the devel- below, this parametrization allows for easier computa- opment of sensitivity analysis in a more general non- tion of the bounds.) parametric case for future research. Using the population proportion of each princi- We begin by the nonparametric bounds on the m1m0 ≡ = pal stratum, that is, πy11y10y01y00 Pr(Yi(1, 1) y11, ACME without assuming equation (5) of the sequential Y (1, 0) = y ,Y (0, 1) = y ,Y (0, 0) = y ,M (1) = ignorability assumption. In the case of binary media- i 10 i 01 i 00 i m ,M (0) = m ), we can write this difference as fol- tor and outcome, we can derive the following sharp 1 i 0 lows: bounds using the result of (2009): 1 1m0 1 0m0 ⎧ ⎫ = π = π − − m0 0 y11y10y01y00 m0 0 y11y10y01y00 ⎨ P001 P011 ⎬ − − − − 1 P 1 P max ⎩ P000 P001 P100 ⎭ y=0 y11 y=0 y01 − − − (32) P011 P010 P110 ≤ (29) ⎧ ⎫ υ, ⎨P101 + P111 ⎬ ¯ 1 m11 1 m10 ≤ δ( ) ≤ P + P + P , m =0 πy11y10y01y00 m =0 πy11y10y01y00 1 min ⎩ 000 100 101 ⎭ 1 − 1 + + 1 1 P010 P110 P111 = Py10 = Py00 ⎧ ⎫ y 0 y 0 − − (33) ⎨ P100 P110 ⎬ ≤ υ, − − − max ⎩ P001 P100 P101 ⎭ −P110 − P011 − P111 where υ is bounded between 0 and 1. Clearly, if and (30) ⎧ ⎫ only if υ = 0, the sequential ignorability assumption is + ⎨P000 P010 ⎬ satisfied. ≤ ¯ ≤ + + δ(0) min ⎩P010 P011 P111 ⎭ , Finally, note that the ACME can be written as + + P000 P001 P101 the following linear function of unknown parame- m1m0 where Pymt ≡ Pr(Yi = y,Mi = m|Ti = t) for all ters πy11y10y01y00 : ∈{ } y,m,t 0, 1 . These bounds always contain zero, im- 1 1 1 1 plying that the sign of the ACME is not identified with- δ(t)¯ = out an additional assumption even in this special case. m=0 y1−t,m=0 y1,1−m=0 y0,1−m=0 To construct a sensitivity analysis, we follow the (34) strategy of Imai and Yamamoto (2010) and first express 1 1 π mm0 − π m1m , the second assumption of sequential ignorability using y11y10y01y00 y11y10y01y00 = = the potential outcomes notation as follows: m0 0 m1 0 where one of the subscripts of π corresponding to y Pr Y (1, 1) = y ,Y (1, 0) = y , tm i 11 i 10 is equal to 1. Then, given a fixed value of sensitivity  Yi(0, 1) = y01,Yi(0, 0) = y00|Mi = 1,Ti = t parameter υ, you can obtain the sharp bounds on the CAUSAL MEDIATION ANALYSIS 69

ACME by numerically solving the linear optimization is exactly equal to zero (i.e., υ = 0), so that equa- problem with the linear constraints implied by equa- tion (31) holds. The sharp bounds widen as we increase tions (32)and(33) as well as the following relation- the value of υ, until they flatten out and become equal ship implied by the ignorability of the treatment assign- to the no-assumption bounds given in equations (29) ment: and (30). The results suggest that the point estimates of the 1 1 1 1 P = π m1m0 ACME are rather sensitive to the violation of the se- ymt y11y10y01y00 ¯ = = = = quential ignorability assumption. For both δ(1) and y1−t,m 0 yt,1−m 0 y1−t,1−m 0 m1−t 0 ¯ (35) δ(0), the upper bounds sharply increase as we increase ∈{ } the value of υ and cross the zero line at small values for each y,m,t 0, 1 . In addition, we use the linear ¯ ¯ m1m0 of υ [0.019 for δ(1) and 0.022 for δ(0)]. This con- constraint that all πy11y10y01y00 sum up to 1. We apply this framework to the media framing ex- trasts with the parametric sensitivity analyses reported ample described in Sections 2 and 6. For the purpose in Section 6.2, where the estimates of the ACME ap- of illustration, we dichotomize both the mediator and peared quite robust to the violation of Assumption 1. treatment variables using their sample medians as cut- Although the direct comparison is difficult because points. Figure 3 shows the results of this analysis. In of different parametrization and variable coding, this stark difference illustrates the potential importance of each panel the solid curves represent the sharp upper parametric assumptions in causal mediation analysis; and lower bounds on the ACME for different values a significant part of identification power could in fact of the sensitivity parameter υ. The horizontal dashed be attributed to such functional form assumptions as lines represent the point estimates of δ(¯ 1) (upper panel) opposed to empirical evidence. and δ(¯ 0) (lower panel) under Assumption 1. This cor- responds to the case where the sensitivity parameter ACKNOWLEDGMENTS A companion paper applying the proposed methods to various applied settings is available as Imai, Keele and Tingley (2009). The proposed methods can be im- plemented via the easy-to-use software mediation (Imai et al. 2010), which is an R package available at the Comprehensive R Archive Network (http://cran. r-project.org/web/packages/mediation). The replica- tion data and code for this article are available for download as Imai, Keele and Yamamoto (2010). We thank Brian Egleston, Adam Glynn, Guido Imbens, Gary King, Dave McKinnon, Judea Pearl, Marc Ratko- vic, Jas Sekhon, Dustin Tingley, Tyler VanderWeele and seminar participants at Columbia University, Har- vard University, New York University, Notre Dame, University of North Carolina, University of Colorado– Boulder, University of Pennsylvania and University of Wisconsin–Madison for useful suggestions. The sug- gestions from the associate editor and anonymous ref- erees significantly improved the presentation. Finan- cial support from the National Science Foundation FIG.3. Nonparametric sensitivity analysis for the media framing (SES-0918968) is acknowledged. experiment. In each panel the solid curves show the sharp upper and lower bounds on the ACME as a function of the sensitivity pa- rameter υ, which represents the degree of violation of the sequen- REFERENCES tial ignorability assumption. The horizontal dashed lines represent ALBERT, J. M. (2008). Mediation analysis via potential outcomes ¯ ¯ the point estimates of δ(1) (upper panel) and δ(0) (lower panel) un- models. Stat. Med. 27 1282–1304. MR2420158 der Assumption 1. In contrast to the parametric sensitivity analysis ANGRIST,J.D.,IMBENS,G.W.andRUBIN, D. B. (1996). Identi- reported in Section 6.2, the estimates are shown to be rather sensi- fication of causal effects using instrumental variables (with dis- tive to the violation of Assumption 1. cussion). J. Amer. Statist. Assoc. 91 444–455. 70 K. IMAI, L. KEELE AND T. YAMAMOTO

AVIN,C.,SHPITSER,I.andPEARL, J. (2005). Identifiability of IMBENS, G. W. (2003). Sensitivity to exogeneity assumptions in path-specific effects. In Proceedings of the International Joint program evaluation. American Economic Review 93 126–132. Conference on Artificial Intelligence. Morgan Kaufman, San JO, B. (2008). Causal inference in randomized experiments with Francisco, CA. MR2192340 mediational processes. Psychological Methods 13 314–336. BARON,R.M.andKENNY, D. A. (1986). The moderator– JOFFE,M.M.,SMALL,D.,TEN HAV E ,T.,BRUNELLI,S. mediator variable distinction in social psychological research: and FELDMAN, H. I. (2008). Extended instrumental vari- Conceptual, strategic, and statistical considerations. Journal of ables estimation for overall effects. Int. J. Biostat. 4 Article 4. Personality and Social Psychology 51 1173–1182. MR2399287 COCHRAN, W. G. (1957). Analysis of covariance: Its and JOFFE,M.M.,SMALL,D.andHSU, C.-Y. (2007). Defining and uses. Biometrics 13 261–281. MR0090952 estimating intervention effects for groups that will develop an DEATON, A. (2009). Instruments of development: Randomization auxiliary outcome. Statist. Sci. 22 74–97. MR2408662 in the tropics, and the search for the elusive keys to economic JUDD,C.M.andKENNY, D. A. (1981). Process analysis: Esti- Proc. Br. Acad. 162 development. 123–160. mating mediation in treatment evaluations. Evaluation Review EGLESTON,B.,SCHARFSTEIN,D.O.,MUNOZ,B.andWEST, 5 602–619. S. (2006). Investigating mediation when counterfactuals are not KRAEMER,H.C.,KIERNAN,M.,ESSEX,M.andKUPFER,D.J. metaphysical: Does sunlight UVB exposure mediate the effect (2008). How and why criteria definig moderators and mediators of eyeglasses on cataracts? Working Paper 113, Dept. Biostatis- differ between the Baron & Kenny and MacArthur approaches. tics, Johns Hopkins Univ., Baltimore, MD. Health Psychology 27 S101–S108. ELLIOTT,M.R.,RAGHUNATHAN,T.E.andLI, Y. (2010). Bayesian inference for causal mediation effects using principal KRAEMER,H.C.,WILSON,T.,FAIRBURN,C.G.andAGRAS, stratification with dichotomous mediators and outcomes. Bio- W. S. (2002). Mediators and moderators of treatment effects in statistics. 11 353–372. randomized clinical trials. Archives of General Psychiatry 59 GALLOP,R.,SMALL,D.S.,LIN,J.Y.,ELLIOT,M.R.,JOFFE, 877–883. M. and TEN HAV E , T. R. (2009). Mediation analysis with prin- MACKINNON, D. P. (2008). Introduction to Statistical Mediation cipal stratification. Stat. Med. 28 1108–1130. Analysis. Taylor & Francis, New York. GENELETTI, S. (2007). Identifying direct and indirect effects in NELSON,T.E.,CLAWSON,R.A.andOXLEY, Z. M. (1997). Me- a non-counterfactual framework. J. Roy. Statist. Soc. Ser. B 69 dia framing of a civil liberties conflict and its effect on tolerance. 199–215. MR2325272 American Political Science Review 91 567–583. GLYNN, A. N. (2010). The product and difference fallacies for PEARL, J. (2001). Direct and indirect effects. In Proceedings of the indirect effects. Unpublished manuscript, Dept. Government, Seventeenth Conference on Uncertainty in Artificial Intelligence Harvard Univ. (J. S. Breese and D. Koller, eds.) 411–420. Morgan Kaufman, GOODMAN, L. A. (1960). On the exact variance of products. San Francisco, CA. J. Amer. Statist. Assoc. 55 708–713. MR0117809 PEARL, J. (2010). An introduction to causal inference. Int. J. Bio- GREEN,D.P.,HA,S.E.andBULLOCK, J. G. (2010). Yes, but stat. 6 Article 7. what’s the mechanism? (don’t expect an easy answer). Journal PETERSEN,M.L.,SINISI, S. E. and van der Laan, M. J. (2006). of Personality and Social Psychology 98 550–558. Estimation of direct causal effects. Epidemiology 17 276–284. HAFEMAN,D.M.andSCHWARTZ, S. (2009). Opening the black ROBINS, J. (1999). Marginal structural models versus structural box: A motivation for the assessment of mediation. Interna- nested models as tools for causal inference. In Statistical Mod- tional Journal of Epidemiology 38 838–845. els in Epidemiology, the Environment and Clinical Trials (M. E. HAFEMAN,D.M.andVANDERWEELE, T. J. (2010). Alternative Halloran and D. A. Berry, eds.) 95–134. Springer, New York. assumptions for the identification of direct and indirect effects. MR1731682 Epidemiology 21. To appear. ROBINS, J. M. (2003). Semantics of causal DAG models and the IMAI,K.andYAMAMOTO, T. (2010). Causal inference with dif- identification of direct and indirect effects. In Highly Structured ferential measurement error: Nonparametric identification and Stochastic Systems (P. J. Green, N. L. Hjort and S. Richardson, sensitivity analysis. American Journal of Political Science 54 eds.) 70–81. Oxford Univ. Press, Oxford. MR2082403 543–560. ROBINS,J.M.andGREENLAND, S. (1992). Identifiability and IMAI,K.,KEELE,L.andTINGLEY, D. (2009). A general ap- exchangeability for direct and indirect effects. Epidemiology 3 proach to causal mediation analysis. Psychological Methods.To appear. 143–155. OY OGAN ARCUS IMAI,K.,KEELE,L.,TINGLEY,D.andYAMAMOTO, T. (2010). R ,J.,H ,J.W.andM , B. H. (2008). Principal Causal mediation analysis using R. In Advances in Social Sci- stratification with predictors of compliance for randomized tri- ence Research Using R (H. D. Vinod, ed.). Lecture Notes in Sta- als with 2 active treatments. Biostatistics 9 277–289. tist. 196 129–154. Springer, New York. RUBIN, D. B. (2004). Direct and indirect causal effects via poten- IMAI,K.,KEELE,L.andYAMAMOTO, T. (2010). Replica- tial outcomes (with discussion). Scand. J. Statist. 31 161–170. tion data for: Identification, inference, and sensitivity analysis MR2066246 for causal mediation effects. Available at http://hdl.handle.net/ RUBIN, D. B. (2005). Causal inference using potential outcomes: 1902.1/14412. Design, modeling, decisions. J. Amer. Statist. Assoc. 100 322– IMAI,K.,TINGLEY,D.andYAMAMOTO, T. (2009). Experimen- 331. MR2166071 tal designs for identifying causal mechanisms. Technical re- SJÖLANDER, A. (2009). Bounds on natural direct effects in the port, Dept. Politics, Princeton Univ. Available at http://imai. presence of confounded intermediate variables. Stat. Med. 28 princeton.edu/research/Design.html. 558–571. CAUSAL MEDIATION ANALYSIS 71

SKRABANEK, P. (1994). The emptiness of the black box. Epidemi- VANDERWEELE, T. J. (2008). Simple relations between princi- ology 5 5553–5555. pal stratification and direct and indirect effects. Statist. Probab. SOBEL, M. E. (1982). Asymptotic confidence intervals for indirect Lett. 78 2957–2962. Sociological Methodology effects in structural equation models. VANDERWEELE, T. J. (2009). Marginal structural models for the 13 290–321. estimation of direct and indirect effects. Epidemiology 20 18– SOBEL, M. E. (2008). Identification of causal parameters in ran- 26. domized studies with mediating variables. Journal of Educa- tional and Behavioral Statistics 33 230–251. VANDERWEELE, T. J. (2010). Bias formulas for sensitivity analy- TEN HAV E ,T.R.,JOFFE,M.M.,LYNCH,K.G.,BROWN,G.K., sis for direct and indirect effects. Epidemiology. To appear. MAISTO,S.A.andBECK, A. T. (2007). Causal mediation ZELLNER, A. (1962). An efficient method of estimating seemingly analyses with rank preserving models. Biometrics 63 926–934. unrelated regressions and tests for aggregation bias. J. Amer. MR2395813 Statist. Assoc. 57 348–368. MR0139235