When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Kosuke Imai (Princeton) In Song Kim (MIT)

Political Methodology Summer Meeting Rice University

July 21, 2016

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 1 / 37 Fixed Effects Regressions in Causal Inference

Linear ﬁxed effects regression models are the primary workhorse for causal inference with longitudinal/panel data

Researchers use them to adjust for unobserved time-invariant confounders (omitted variables, endogeneity, selection bias, ...):

“Good instruments are hard to ﬁnd ..., so we’d like to have other tools to deal with unobserved confounders. This chapter considers ... strategies that use data with a time or cohort dimension to control for unobserved but ﬁxed omitted variables” (Angrist & Pischke, Mostly Harmless Econometrics)

“ﬁxed effects regression can scarcely be faulted for being the bearer of bad tidings” (Green et al., Dirty Pool)

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 2 / 37 Overview of the Talk

Identify two under-appreciated causal assumptions of unit fixed effects regression estimators: 1 Past treatments do not directly affect current outcome 2 Past outcomes do not directly affect current treatments and time-varying confounders can be relaxed under a selection-on-observables approach New matching framework for causal inference with panel data: 1 propose within-unit matching estimators to relax linearity 2 incorporate various estimators, e.g., the before-and-after estimator 3 establish equivalence between matching estimators and weighted linear fixed effects regression estimators Extend the analysis to two-way fixed effects models, difference-in-differences design, and synthetic control method An empirical illustration: Effects of GATT on trade

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 3 / 37 Linear Regression with Unit Fixed Effects

Balanced panel data with N units and T time periods

Yit : outcome variable Xit : causal or treatment variable of interest

Assumption 1 (Linearity)

Yit = αi + βXit + it

Ui : a vector of unobserved time-invariant confounders

αi = h(Ui ) for any function h(·) A ﬂexible way to adjust for unobservables Average contemporaneous treatment effect:

β = E(Yit (1) − Yit (0))

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 4 / 37 Strict Exogeneity and Least Squares Estimator

Assumption 2 (Strict Exogeneity)

it ⊥⊥{Xi , Ui }

Mean independence is sufﬁcient: E(it | Xi , Ui ) = E(it ) = 0 Least squares estimator based on de-meaning:

N T ˆ X X 2 βFE = arg min {(Yit − Y i ) − β(Xit − X i )} β i=1 t=1

where X i and Y i are unit-speciﬁc sample means ATE among those units with variation in treatment:

τ = E(Yit (1) − Yit (0) | Cit = 1) PT where Cit = 1{0 < t=1 Xit < T }.

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 5 / 37 Causal Directed Acyclic Graph (DAG)

Yi1 Yi2 Yi3

Xi1 Xi2 Xi3 arrow = direct causal effect absence of arrows causal assumptions Ui

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 6 / 37 Nonparametric Structural Equation Model (NPSEM)

One-to-one correspondence with a DAG:

Yit = g1(Xit , Ui , it )

Xit = g2(Xi1,..., Xi,t−1, Ui , ηit )

Nonparametric generalization of linear unit ﬁxed effects model: Allows for nonlinear relationships, effect heterogeneity Strict exogeneity holds No arrows can be added without violating Assumptions 1 and 2

Causal assumptions: 1 No unobserved time-varying confounders 2 Past outcomes do not directly affect current outcome 3 Past outcomes do not directly affect current treatment 4 Past treatments do not directly affect current outcome

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 7 / 37 Potential Outcomes Framework

DAG causal structure Potential outcomes treatment assignment mechanism Assumption 3 (No carryover effect) Past treatments do not directly affect current outcome

Yit (Xi1, Xi2,..., Xi,t−1, Xit ) = Yit (Xit )

What randomized experiment satisﬁes unit ﬁxed effects model?

1 randomize Xi1 given Ui 2 randomize Xi2 given Xi1 and Ui 3 randomize Xi3 given Xi2, Xi1, and Ui 4 and so on

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 8 / 37 Assumption 4 (Sequential Ignorability with Unobservables)

T {Yit (1), Yit (0)}t=1 ⊥⊥ Xi1 | Ui . . T {Yit (1), Yit (0)}t=1 ⊥⊥ Xit0 | Xi1,..., Xi,t0−1, Ui . . T {Yit (1), Yit (0)}t=1 ⊥⊥ XiT | Xi1,..., Xi,T −1, Ui

“as-if random” assumption without conditioning on past outcomes Past outcomes cannot directly affect current treatment

Says nothing about whether past outcomes can directly affect current outcome

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 9 / 37 Past Outcomes Directly Affect Current Outcome

Yi1 Yi2 Yi3

Strict exogeneity still Xi1 Xi2 Xi3 holds Past outcomes do not confound Xit −→ Yit given Ui

Ui No need to adjust for past outcomes

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 10 / 37 Past Treatments Directly Affect Current Outcome

Past treatments as confounders Yi1 Yi2 Yi3 Need to adjust for past treatments Strict exogeneity holds given past treatments and Xi1 Xi2 Xi3 Ui Impossible to adjust for an entire treatment history and Ui at the same time Adjust for a small number Ui of past treatments often arbitrary

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 11 / 37 Past Outcomes Directly Affect Current Treatment

Correlation between Yi1 Yi2 Yi3 error term and future treatments Violation of strict exogeneity

Xi1 Xi2 Xi3 No adjustment is sufﬁcient

Together with the previous assumption no feedback effect Ui over time

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 12 / 37 Instrumental Variables Approach

Yi1 Yi2 Yi3 Instruments: Xi1, Xi2, and Yi1 GMM: Arellano and Bond (1991)

Xi1 Xi2 Xi3 Exclusion restrictions Arbitrary choice of instruments Substantive justiﬁcation rarely given Ui

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 13 / 37 An Alternative Selection-on-Observables Approach

Absence of unobserved Yi1 Yi2 Yi3 time-invariant confounders Ui past treatments can directly affect current outcome

past outcomes can directly Xi1 Xi2 Xi3 affect current treatment

Comparison across units within the same time rather than across different time periods within the same unit

Marginal structural models can identify the average effect of an entire treatment sequence Trade-off no free lunch

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 14 / 37 Adjusting for Observed Time-varying Confounders

Yi1 Yi2 Yi3 past treatments cannot directly affect current outcome past outcomes Xi1 Xi2 Xi3 cannot directly affect current treatment

adjusting for Zit does not relax these

Zi1 Zi2 Zi3 assumptions past outcomes cannot indirectly affect current treatment through Zit Ui

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 15 / 37 A New Matching Framework

Even if these assumptions are satisﬁed, the the unit ﬁxed effects estimator is inconsistent for the ATE:

PT PT t=1 Xit Yit t=1(1−Xit )Yit 2 E Ci PT − PT Si p = Xit = 1−Xit βˆ −→ t 1 t 1 6= τ FE 2 E(Ci Si )

2 PT 2 where Si = t=1(Xit − X i ) /(T − 1) is the unit-speciﬁc variance Key idea: comparison across time periods within the same unit ˆ The Within-unit matching estimator improves βFE by relaxing the linearity assumption:

N PT PT ! 1 X Xit Yit (1 − Xit )Yit τˆ = C t=1 − t=1 match PN i PT PT i=1 Ci i=1 t=1 Xit t=1(1 − Xit )

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 16 / 37 Constructing a General Matching Estimator

Mit : matched set for observation (i, t) For the within-unit matching estimator,

match 0 0 0 Mit = {(i , t ): i = i, Xi0t0 = 1 − Xit }

A general matching estimator:

N T 1 X X τˆ = D (Y\(1) − Y\(0)) match PN PT it it it i=1 t=1 Dit i=1 t=1

where Dit = 1{#Mit > 0} and Yit if Xit = x Y\it (x) = 1 P 0 0 Y 0 0 if X = 1 − x #Mit (i ,t )∈Mit i t it

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 17 / 37 Before-and-After Design

No time trend for the average potential outcomes:

E(Yit (x) − Yi,t−1(x) | Xit 6= Xi,t−1) = 0 for x = 0, 1

with the quantity of interest E(Yit (1) − Yit (0) | Xit 6= Xi,t−1)

Or just the average potential outcome under the control condition

E(Yit (0) − Yi,t−1(0) | Xit = 1, Xi,t−1 = 0) = 0

This is a matching estimator with the following matched set:

BA 0 0 0 0 Mit = {(i , t ): i = i, t ∈ {t − 1, t + 1}, Xi0t0 = 1 − Xit }

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 18 / 37 It is also the ﬁrst differencing estimator:

N T ˆ X X 2 βFD = arg min {(Yit − Yi,t−1) − β(Xit − Xi,t−1)} β i=1 t=2 “We emphasize that the model and the interpretation of β are exactly as in [the linear ﬁxed effects model]. What differs is our method for estimating β” (Wooldridge; italics original).

The identiﬁcation assumptions is very different Slightly relaxing the assumption of no carryover effect But, still requires the assumption that past outcomes do not affect current treatment Regression toward the mean: suppose that the treatment is given when the previous outcome takes a value greater than its mean

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 19 / 37 Matching as a Weighted Unit Fixed Effects Estimator

Any within-unit matching estimator can be written as a weighted unit ﬁxed effects estimator with different regression weights

The proposed within-matching estimator:

N T ˆ X X ∗ ∗ 2 τˆmatch = βWFE = arg min Dit Wit {(Yit − Y i ) − β(Xit − X i )} β i=1 t=1

∗ ∗ where X i and Y i are unit-speciﬁc weighted averages, and

 T  PT if Xit = 1, t0=1 Xit0 Wit = T  PT if Xit = 0. t0=1(1−Xit0 )

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 20 / 37 We show how to construct regression weights for different matching estimators (i.e., different matched sets) Idea: count the number of times each observation is used for matching

Beneﬁts: computational efﬁciency model-based standard errors

robustness matching estimator is consistent even when linear unit ﬁxed effects regression is the true model

speciﬁcation test (White 1980) null hypothesis: linear ﬁxed effects regression is the true model

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 21 / 37 Linear Regression with Unit and Time Fixed Effects

Model:

Yit = αi + γt + βXit + it

where γt ﬂexibly adjusts for a vector of unobserved unit-invariant time effects Vt , i.e., γt = f (Vt )

Estimator:

N T ˆ X X 2 βFE2 = arg min {(Yit − Y i − Y t + Y ) − β(Xit − X i − X t + X)} β i=1 t=1

where Y t and X t are time-speciﬁc means, and Y and X are overall means

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 22 / 37 Understanding the Two-way Fixed Effects Estimator

βFE: bias due to time effects

βFEtime: bias due to unit effects

βpool: bias due to both time and unit effects

ˆ ˆ ˆ ˆ ωFE × βFE + ωFEtime × βFEtime − ωpool × βpool βFE2 = wFE + wFEtime − wpool

with sufﬁciently large N and T , the weights are given by,

2 ωFE ≈ E(Si ) = average unit-speciﬁc variance 2 ωFEtime ≈ E(St ) = average time-speciﬁc variance 2 ωpool ≈ S = overall variance

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 23 / 37 Matching and Two-way Fixed Effects Estimators

Problem: No other unit shares the same unit and time Units

4 C T C C T

3 T C T T C

2 C C T C C Time periods

1 T T T C T Two kinds of mismatches 1 Same treatment status 2 Neither same unit nor same time

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 24 / 37 We Can Never Eliminate Mismatches

Units

4 C T C C T

3 T C T T C

2 C C T C C Time periods

1 T T T C T

To cancel time and unit effects, we must induce mismatches No weighted two-way ﬁxed effects model eliminates mismatches

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 25 / 37 Difference-in-Differences Design

Parallel trend assumption:

E(Yit (0) − Yi,t−1(0) | Xit = 1, Xi,t−1 = 0) = E(Yit (0) − Yi,t−1(0) | Xit = Xi,t−1 = 0) 1.0

treatment group ● 0.8

counterfactual 0.6

● 0.4 Average Outcome Average ●

0.2 control group ● 0.0

time t time t+1 Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 26 / 37 General DiD = Weighted Two-Way FE Effects

2 × 2: equivalent to linear two-way ﬁxed effects regression General setting: Multiple time periods, repeated treatments

Units

4 C T C C T

3 T C T T C

2 C C T C C Time periods

1 T T T C T

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 27 / 37 Units

4 0 0 0 0 0

1 1 3 0 2 0 1 2

1 1 2 0 2 0 1 2 Time periods

1 0 0 0 0 0

Fast computation, standard error, speciﬁcation test Still assumes that past outcomes don’t affect current treatment Baseline outcome difference caused by unobserved time-invariant confounders It should not reﬂect causal effect of baseline outcome on treatment assignment Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 28 / 37 Synthetic Control Method (Abadie et al. 2010)

One treated unit i∗ receiving the treatment at time T

Quantity of interest: Yi∗T − Yi∗T (0) Create a synthetic control using past outcomes P Weighted average: Y\i∗T (0) = i6=i∗ wˆi YiT Estimate weights to balance past outcomes and past time-varying covariates

A motivating autoregressive model:

> YiT (0) = ρT Yi,T −1(0) + δT ZiT + iT ZiT = λT −1Yi,T −1(0) + ∆T Zi,T −1 + νiT

Past outcomes can affect current treatment No unobserved time-invariant confounders

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 29 / 37 Causal Effect of ETA’s Terrorism THE AMERICAN ECONOMIC REVIEW MARCH 2003

11- -5 / s / a / 210- 5

0 9-

, -Actual with terrorism - -Synthetic without terrorism 1 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 year

FIGURE1. PERCAPITA GDP FOR THE BASQUECOUNTRI Abadie and Gardeazabal (2003, AER) 4 - - Imai (Princeton) and Kim\ (MIT) Fixed Effects for Causal Inference GDt' gap PolMeth (July 21, 2016) 30 / 37 I -Terrorist activity

year

FIGIJRE2. TERRORISTACTIVITY AND ESTIMATEDGAP

expect terrorism to have a lagged negative ef- of deaths caused by terrorist actions (used as a fect on per capita GDP. In Figure 2, we plotted proxy for overall terrorist activity). As ex- the per capita GDP gap, Y, - YT, as a percent- pected, spikes in terrorist activity seem to be age of Basque per capita GDP, and the number followed by increases in the amplitude of the The main motivating model:

> > Yit (0) = γt + δt Zit + ξ Ui + it

A generalization of the linear two-way ﬁxed effects model How is it possible to adjust for unobserved time-invariant confounders by adjusting for past outcomes?

The key assumption: there exist weights such that X X wi Zit = Zi∗t for all t ≤ T − 1 and wi Ui = Ui∗ i6=i∗ i6=i∗

In general, adjusting for observed confounders does not adjust for unobserved confounders

The same tradeoff as before

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 31 / 37 Effects of GATT Membership on International Trade

1 Controversy Rose (2004): No effect of GATT membership on trade Tomz et al. (2007): Signiﬁcant effect with non-member participants

2 The central role of fixed effects models: Rose (2004): one-way (year) fixed effects for dyadic data Tomz et al. (2007): two-way (year and dyad) fixed effects Rose (2005): “I follow the profession in placing most confidence in the fixed effects estimators; I have no clear ranking between country-specific and country pair-specific effects.” Tomz et al. (2007): “We, too, prefer FE estimates over OLS on both theoretical and statistical ground”

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 32 / 37 Data and Methods

1 Data Data set from Tomz et al. (2007) Effect of GATT: 1948 – 1994 162 countries, and 196,207 (dyad-year) observations

2 Year ﬁxed effects model:

> ln Yit = αt + βXit + δ Zit + it

Yit : trade volume Xit : membership (formal/participants) Both vs. At most one Zit : 15 dyad-varying covariates (e.g., log product GDP) 3 Assumptions: past membership status doesn’t directly affect current trade volume past trade volume doesn’t affect current membership status Before-and-after increasing trend in trade volume Difference-in-differences after conditional on past outcome?

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 33 / 37 Empirical Results: Formal Membership

Dyad with Both Members vs. One or None Member 0.4 0.3

DID− 0.2 FE caliper

●

0.1 FE ●

● FE ● ● ● 0.0 ● FD ● DID WFE −0.1 WFE Estimated Effects (log of trade) Estimated Effects −0.2 Year Fixed Effects Dyad Fixed Effects Year and Dyad Fixed Effects

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 34 / 37 Empirical Results: Participants Included

Dyad with Both Members vs. One or None Member

● Formal Member Participants FE 0.4

0.3 FE

FE DID− 0.2 FD FE caliper DID− WFE ● caliper 0.1 FE ●

● FE ● ● ● 0.0 ● FD ● WFE DID DID WFE −0.1 WFE Estimated Effects (log of trade) Estimated Effects −0.2 Year Fixed Effects Dyad Fixed Effects Year and Dyad Fixed Effects

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 35 / 37 Concluding Remarks

When should we use linear fixed effects models? Key tradeoff: 1 unobserved time-invariant confounders fixed effects 2 causal dynamics between treatment and outcome selection-on-observables Two key (under-appreciated) causal assumptions of fixed effects: 1 past treatments do not directly affect current outcome 2 past outcomes do not directly affect current treatment A new matching estimator: 1 Within-unit matching estimator no linearity assumption 2 Various causal identification strategies can be incorporated including the before-and-after and difference-in-differences designs 3 Equivalent representation as a weighted linear fixed effects regression estimator R package wfe is available at CRAN

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 36 / 37 Send comments and suggestions to:

[email protected] [email protected]

More information about this and other research:

http://imai.princeton.edu http://web.mit.edu/insong/www

Imai (Princeton) and Kim (MIT) Fixed Effects for Causal Inference PolMeth (July 21, 2016) 37 / 37