<<

The Logic of Propensity Score Model Misspecification Beyond Logistic Propensity Scores

Propensity Score Matching and Beyond

Marc Ratkovic

Department of Politics Princeton University

EPEN 2014, Day 1 The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

MAIN IDEAS

Main ideas I Using matching to adjustfor differences between treatment and control groups

I Balance: how and why? I Propensity scores: promise and pitfalls I Covariate balancing propensity score: an improvement I Beyond propensity scores

Questions under consideration 1. How do television appearances affect donations? 2. Does slavery have discernible effects on contemporary political behavior? 3. Did the adoption of Stand-Your-Ground laws increase homicides in Florida? The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

OVERVIEW OF TALK

The Logic of Matching

Propensity Score

Model Misspecification

Beyond Logistic Propensity Scores The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

STEPHEN COLBERT:PHILOSOPHER-PUNDIT

I Colbert Bump: Does a legislator’s appearance on Stephen’s show cause an increase in donations? The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

THE POTENTIAL OUTCOMES FRAMEWORK

Notation th I i in 1, 2,..., N: denotes the i of N legislators I Ti: “treatment” I Ti = 0: Legislator i did not go on the show I Ti = 1: Legislator i went on the show I Ti is random I Yi(Ti): The potential outcome I Yi(0): Donation if legislator i does not go on the show I Yi(1): Donation if legislator i does go on the show I Yi(1), Yi(0) non-random The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

CAUSAL ESTIMANDS

I Individual Causal Effect

Yi(1) Yi(0) | {z } − | {z } Outcome under treatment Outcome under no treatment

I (ATE)

I Sample Average Treatment Effect: (SATE) 1 PN  Yi(1) Yi(0) N i=1 − I Population Average Treatment Effect on the Treated:  E( Yi(1) Yi(0) (PATE) − I Average Treatment Effect on the Treated (ATT)

I Sample Average Treatment Effect on the Treated: (SATT) 1 PN  N i=1 Yi(1 Ti = 1) Yi(0 Ti = 1) T | − |  I Population Average Treatment Effect: E( Yi(1) Yi(0) Ti = 1 − | The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

CAUSAL EFFECTS

I Fundamental Problem of Causal Inference:

I We only observe one potential outcome for each observation I Equivalently: We do not observe the same observation both getting the treatment and not getting the treatment I Counterfactual Observation

I The potential outcome we do not observe I For treated observations: the outcome, had they not been treated I For control observations: the outcome, had they been treated I No causation without manipulation

I Race cannot be a cause I People’s reaction to race can be I Even in absence of manipulation–comparison can be useful The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

OBSERVED DATA

Pre-treatment Treatment Potential Outcomes Unit Covariates Indicator Treated Control 1 X1 1 Y1(1)? 2 X2 0 ? Y2(0) 3 X3 0 ? Y3(0) 4 X4 1 Y4(1)? ...... NXN 1 YN(1)?

I Pre-treatment covariates: Incumbency, party, etc.

I Treatment indicator: 1 if they go on show, 0 if they do not I Potential outcomes

I Yi(1): Donations, after going on the show I Yi(0): Donations, after not going on the show The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

MATCHING TO BETTER APPROXIMATE CAUSAL INFERENCE

Example: The Colbert Effect Does appearing on the Colbert Report cause an increase in donations? (Fowler, 2008) BUT reps on the Colbert Report are not directly comparable to all reps. The best way to avoid selection effects is to conduct an with a treatment and a control... Thus, to really controls for such effects, it would be best to randomly assign which candidates appear on this show. I floated this idea to Stephen, but he isn’t returning my calls. Neither is Congress. The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

COLBERT EFFECT

Can we create a counterfactual? So the next best thing is to figure out a way to clone each candidate who appeared on the program, make each clone run for Congress, and see what happens to their campaigns if they don’t go on the program. The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

COLBERT EFFECT

Finding a counterfactual in the data Instead, what we can do is try to find similar candidates to compare with those who went on the Colbert Report. Fowler matches on incumbency, party, and donations in the previous 20 days. The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

OBSERVED DATA

Pre-treatment Covariates Treatment Observed Outcomes Unit Treated, Matched Indicator Treated Matched M M 1 X1, X1 1,0 Y1(1) Y1 (0) M M 2 X2, X2 1,0 Y1(1) Y2 (0) M M 3 X3, X3 1,0 Y1(1) Y3 (0) M M 4 X4, X4 1,0 Y1(1) Y4 (0) ...... M M NXN, XN 1,0 YN(1) YN (0)

Matching gives the Average Treatment Effect on the Treated (ATT) candidates with the smallest sum Figure 1 of the squared difference in both Absolute Differences in Number and Dollar Amount of ranks relative to the Colbert candi- Donations to Candidates Who Appear on The Colbert Report date. These rules generated 35 The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores unique matches and 12 matches Compared to Matched Candidates Who Do Not Appear on the where more than one candidateTHE COLBERTProgramBOUNCE? from the control group might qual- ify ~under these circumstances, one of the qualifying candidates was chosen by lot!. Table 1 shows a list of candidates who appeared on the Colbert Report and their matched counterparts. Colbert might call the group on the left the Hall of Heroes and the group on the right the Hall of Cowards ~though who knows if they were even invited on the show?!. Figure 1 shows the average monthly number of donations and money received by candidates who appear on the Colbert Report compared to their matched coun- terparts on each day starting 60 days before the appearance and ending 60 days after. Each data point indicates activity for the past 30 days, so the left-most point measures donations made between 60 to 90 days before the show, while the right-most point mea- sures donations made between 30 to 60 days after the show. The value for Day 0 should be close to 0 because we have matched each Colbert candidate to a similar can- didate that received nearly the same number and amount of dona- These graphs show that Democrats who appear on The Colbert Report enjoy a significant increase tions in the 30 days prior to the in the number and total amount of donations they receive in the next 30–40 days, compared to show. Results for Democrats and similar candidates who do not appear on the show. Top panels show the difference in total dollar Republicans are shown in their amount of donations for the past 30 days on each date relative to the candidate’s appearance, traditional locations on the left and while bottom panels show similar figures for the total number of donations. Left panels show results the right, and results for number of for Democratic candidates and right panels show results for Republicans. Open circles indicate points where donations during the past 30 days among candidates who appear on The Colbert donations and amount of money Report are significantly different than others (based on nonparametric Wilcoxon Signed Rank tests, received appear at the top and the p < 0.10). bottom, respectively. To be sure that the trends we see are not due to chance we need some statistical tests. Throughout that I don’t assume that a histo- 60 days prior to appearing on the show, the results below I will use results from gram of the data produces a nice, “nor- Colbert Democrats are doing about as these tests to indicate the likelihood that mal” bell shape. In fact, I know the data well as others. However, their luck takes the observed difference could be the doesn’t look that way—it looks more like a serious turn for the worse in the next result of a random aberration, like get- a skateboard ramp, starting high near zero 30 days. At their lowest point 28 days ting 10 heads in a row in 10 coin and curving down sharply to become flat. before appearing on the show, donations tosses. When I write something like p ϭ For percentage differences, I use a related to Colbert Democrats lag those of simi- 0.05, it means there is a 5% chance that nonparametric ~so cool! test called the lar candidates by 7.7 contributions per there really isn’t any difference in the Mann Whitney U. I’m sure Stephen will month. In dollar terms, Colbert Demo- numbers that are being compared. By be pleased that there is a “man” in his crats are receiving $8,449 less than the convention, when p Ͻ 0.05 we say the statistical test ~though what kind of a man control group, and this result is weakly difference is “significant” though the calls himself Whitney?!. significant ~ p ϭ 0.06!. choice of that number and not some Poor performance prior to the show other is essentially arbitrary ~one might Democrats in Trouble and suggests one of two possibilities. First, even call it truthy!!. the Colbert Bump the Colbert Report may be targeting To evaluate absolute differences be- Democrats who are in trouble with their tween Colbert candidates and others I use First, let’s start with Democrats and campaigns. If this were true, it might a Wilcoxon signed rank test. This test is their fundraising activity before the Col- be because they are trying to help them, nonparametric, which is a super-cool term bert Report. Notice in Figure 1 that or because their campaign troubles are

PSOnline www.apsanet.org 535 The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

ATE VS. ATT

Average Treatment Effect (ATE)

I Estimated in a fully

I Change in outcome, on average, were everyone treated minus were everyone in the control group

I Should all public schools be turned into charter schools? Average Treatment Effect on the Treated (ATT)

I Matching/Colbert

I Effect of treatment on participants

I Useful in program evaluation

I What is the effect of charter school attendance on students? The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

WHEN DOES MATCHING WORK?

The argument with matching: M M I If Xi X , then Yi(0) Y (0) ≈ i ≈ i I To estimate the treatment effect

N N 1 X  1 X  M Yi(1) Yi(0) Yi(1) Yi (0) NT − ≈ NT − i=1 | {z } i=1 Unobserved

Since attendance on the Report is not fully at random

I Estimate the counterfactual with the matched subset of the data

I May not work exactly for each observation; may work on average over the matched subset The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

ASSUMPTIONSBEHIND MATCHING 1. Common Support: A valid counterfactual must exist

I Treatment is probabilistic I Formally: 0 < Pr(Ti = 1 Xi) < 1 | I Example: There is some probability a legislator wil go on Colbert I Can gender or race be a treatment?

I No causation without manipulation I Resume : manipulate perceptions, not race 2. Strong Ignorability: All confounders are matched on

I No omitted confounders I Formally: Yi(1), Yi(0) Ti Xi ⊥⊥ | 3. Stable Treatment: Only a single version of the treatment 4. Non-Interference: One observation’s treatment level does not affect another’s outcome (Stable Unit Treatment Value Assumption, or SUTVA)

Note:(1) and (2) necessary for causal effect estimation; (3) and (4) necessary for difference-in-means to be unbiased; can be relaxed The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

WHY BALANCE?

Matching (hopes) to achieve balance: equality between the full multivariate density of pre-treatment covariates between treated and untreated

I Yi(1), Yi(0) Ti Xi FX∗ (X Ti = 1) = FX∗ (X Ti = 0) where X∗ is the weighted⊥⊥ | or⇔ matched| subset |

I In the absence of an experiment, we want the density of Xi for treated and untreated to be as similar as possible

I At best–similarity on observed confounders Why? The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

CONFOUNDING

Density of Income Before Program

0.30 Untreated Difference in outcomes

Treated I Treatment assignment? 0.20 I Causal effect of Density

0.10 treatment? 0.00 0 2 4 6 8 10 12

Income in Dollars, logged The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

REDUCING BIASTHROUGH MATCHING

Density of Income Before Program

0.30 Untreated Treated and matched controls

Treated

0.20 Matched I Difference between Controls

Density groups causal effect

0.10 ⇒ 0.00 0 2 4 6 8 10 12

Income in Dollars, logged The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

BALANCINGTO REDUCE MODEL DEPENDENCE

Treated and Controls Raw Data

C C C C C CCCCC C C CCCCCC C CCCCCCCCCCCCC T CCCCCCCCCCCCCCC C CCCCCCCCCCCCCCCCC C CCCCCCCCCCCCCCCCCCCCCCCCCC C C C CCTCCCCCCCCCCCCCCCCCCCCCCC C C C CC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C CT CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C 10 C CCCCCCCCCCCCCCC T C CC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C CTT CC CCCCCCCCCCCCTCCCCCCCCCCCCCCCCCC TC C CCTCCCCCCCCCCCCCCCCCCCCCCCCCCC C C T C CCTCCCCTCCCCCCCCTCCCCCCCCCCCCC TT T CTCCCC TCTCCCCCTCCCCCCCCCCCCCC C C T C CCTCTCCCCCCCCCCCCCCCCCCCCCC C C TC CTCCTCCCTCCCCCCCCCCCC CCCCC C T C TCC CCTCTCCCCCCC C TC TCTCCC CT C CC Treatment effect C C T CT T TC CC C C T T

8 C CT C C TC CCC CC CCC C TCCC C CCCCC CC T C C C C I Data not balanced T CTC T TCC C CC TC C C CC T T C C C C T C T TC I Model dependence T C C 6 C C Post−Treatment Income, logged Income, Post−Treatment

4 C

4 6 8 10 12

Pre−Treatment Income, logged The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

MODEL DEPENDENCEIN RAW DATA

Linear Models Raw Data

C C C C C CCCCC C C CCCCCC C CCCCCCCCCCCCC T CCCCCCCCCCCCCCC C CCCCCCCCCCCCCCCCC C CCCCCCCCCCCCCCCCCCCCCCCCCC C C C CCTCCCCCCCCCCCCCCCCCCCCCCC C C C CC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C CT CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C 10 C CCCCCCCCCCCCCCC T C CC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C CTT CC CCCCCCCCCCCCTCCCCCCCCCCCCCCCCCC TC C CCTCCCCCCCCCCCCCCCCCCCCCCCCCCC C C T C CCTCCCCTCCCCCCCCTCCCCCCCCCCCCC TT T CTCCCC TCTCCCCCTCCCCCCCCCCCCCC C C T C CCTCTCCCCCCCCCCCCCCCCCCCCCC C C TC CTCCTCCCTCCCCCCCCCCCC CCCCC C T C TCC CCTCTCCCCCCC C TC TCTCCC CT C CC Try a linear model C C T CT T TC CC C C T T

8 C CT C C TC CCC CC CCC C TCCC C CCCCC CC T C C C C I Red line: treated T CTC T TCC C CC TC C C CC T T C C C C T C T TC I Blue line: controls T C C 6 C C Post−Treatment Income, logged Income, Post−Treatment

Treated

4 C Controls

4 6 8 10 12

Pre−Treatment Income, logged The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

MODEL DEPENDENCEIN RAW DATA

Quadratic Models Raw Data

C C C C C CCCCC C C CCCCCC C CCCCCCCCCCCCC T CCCCCCCCCCCCCCC C CCCCCCCCCCCCCCCCC C CCCCCCCCCCCCCCCCCCCCCCCCCC C C C CCTCCCCCCCCCCCCCCCCCCCCCCC C C C CC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C CT CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C 10 C CCCCCCCCCCCCCCC T C CC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C CTT CC CCCCCCCCCCCCTCCCCCCCCCCCCCCCCCC TC C CCTCCCCCCCCCCCCCCCCCCCCCCCCCCC C C T C CCTCCCCTCCCCCCCCTCCCCCCCCCCCCC TT T CTCCCC TCTCCCCCTCCCCCCCCCCCCCC C C T C CCTCTCCCCCCCCCCCCCCCCCCCCCC C C TC CTCCTCCCTCCCCCCCCCCCC CCCCC C T C TCC CCTCTCCCCCCC C TC TCTCCC CT C CC Try a quadratic model C C T CT T TC CC C C T T

8 C CT C C TC CCC CC CCC C TCCC C CCCCC CC T C C C C I Different curves T CTC T TCC C CC TC C C CC T T C C C C T C T TC I Different effect estimates T C C 6 C C Post−Treatment Income, logged Income, Post−Treatment

Treated

4 C Controls

4 6 8 10 12

Pre−Treatment Income, logged The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

MODEL DEPENDENCEIN RAW DATA

Linear and Quadratic Models Raw Data

C C C C C CCCCC C C CCCCCC C CCCCCCCCCCCCC T CCCCCCCCCCCCCCC C CCCCCCCCCCCCCCCCC C CCCCCCCCCCCCCCCCCCCCCCCCCC C C C CCTCCCCCCCCCCCCCCCCCCCCCCC C C C CC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C CT CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C 10 C CCCCCCCCCCCCCCC T C CC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C CTT CC CCCCCCCCCCCCTCCCCCCCCCCCCCCCCCC TC C CCTCCCCCCCCCCCCCCCCCCCCCCCCCCC C C T C CCTCCCCTCCCCCCCCTCCCCCCCCCCCCC TT T CTCCCC TCTCCCCCTCCCCCCCCCCCCCC C C T C CCTCTCCCCCCCCCCCCCCCCCCCCCC C C TC CTCCTCCCTCCCCCCCCCCCC CCCCC C T C TCC CCTCTCCCCCCC C TC TCTCCC CT C CC Model dependence C C T CT T TC CC C C T T

8 C CT C C TC CCC CC CCC C TCCC C CCCCC CC T C C C C I Inference varies T CTC T TCC C CC TC C C CC T T C C C C T C T TC I Unreliable results T C C 6 C C Post−Treatment Income, logged Income, Post−Treatment Linear Model, Treated Quadratic Model, Treated Linear Model, Controls

4 QuadraticC Model, Treated

4 6 8 10 12

Pre−Treatment Income, logged The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

BALANCINGTO REDUCE MODEL DEPENDENCE

Treated and Controls Balanced Data

T T C C C T 10 T C C C CTT CC CT C CTCTCC T T T T CTCC TC T T C T TCT CCTCCTC C T CT CTCCCCCCC T T TC CT CT Matched data T TTTCC T TC C T T T TC C T T 8 T T C C T C I Similar groups T T T T C T T C T T C C I Approximates T T T T

6 independence Post−Treatment Income, logged Income, Post−Treatment 4

4 6 8 10 12

Pre−Treatment Income, logged The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

BALANCINGTO REDUCE MODEL DEPENDENCE

Linear and Quadratic Models Balanced Data

T T C C C T 10 T C C C CTT CC CT C CTCTCC T T T T CTCC TC T T C T TCT CCTCCTC C T CT CTCCCCCCC Comparing results T T TC CT CT T TTTCC T TC C T T T TC C T T 8 T T C C T C I Linear model T T T T C T T C T T C C I Quadratic model T T T T 6 Post−Treatment Income, logged Income, Post−Treatment

Treated

4 Controls

4 6 8 10 12

Pre−Treatment Income, logged The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

BALANCINGTO REDUCE MODEL DEPENDENCE

Linear and Quadratic Models Balanced Data

T T C C C T 10 T C C C CTT CC CT C CTCTCC T T T T CTCC TC T T C T TCT CCTCCTC C T CT CTCCCCCCC Comparing results T T TC CT CT T TTTCC T TC C T T T TC C T T 8 T T C C T C I Linear model T T T T C T T C T T C C I Quadratic model T T T T 6 Post−Treatment Income, logged Income, Post−Treatment

Treated

4 Controls

4 6 8 10 12

Pre−Treatment Income, logged The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

BALANCINGTO REDUCE MODEL DEPENDENCE

Linear and Quadratic Models Balanced Data

T T C C C T 10 T C C C CTT CC CT C CTCTCC T T T T CTCC TC T T C T TCT CCTCCTC C T CT CTCCCCCCC Comparing results T T TC CT CT T TTTCC T TC C T T T TC C T T 8 T T C C T C I Linear model T T T T C T T C T T C C I Quadratic model T T T T 6 I Reliable result Post−Treatment Income, logged Income, Post−Treatment Linear Model, Treated Quadratic Model, Treated Linear Model, Controls

4 Quadratic Model, Treated

4 6 8 10 12

Pre−Treatment Income, logged The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

LOOKING CLOSER AT BALANCE

I When does E(Yi(1) Ti = 1) E(Yi(0) Ti = 0) = E(Yi(1) Yi(0) Ti = 1)? Z| − | Z − | Yi(1)dFX(X Ti = 1) Yi(0)dFX(X Ti = 0) X | − X | Z Z = Yi(1)dFX(X Ti = 1) Yi(0)dFX(X Ti = 1) X | − X | − Z  Yi(0)d FX(X Ti = 1) FX(X Ti = 0) X { | − | } Z = Yi(1) Yi(0)dFX(X Ti = 1) X − | − | {z } ATT Z Z  Yi(0)dFX(X Ti = 1) Yi(0)dFX(X Ti = 0) X | − X | } | {z } Selection Bias

I Sufficient condition: FX(X Ti = 1) = FX(X Ti = 0) | | The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

OVERVIEW OF TALK

The Logic of Matching

Propensity Score

Model Misspecification

Beyond Logistic Propensity Scores The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

PROPENSITY SCORE

I Notation:

I Ti 0, 1 : binary treatment ∈ { } I Xi: pre-treatment covariates I (Yi(1), Yi(0)): potential outcomes I Yi = Yi(Ti): observed outcomes

I Propensity score:

I Predicted probability of treatment assignment:

π(Xi) = Pr(Ti = 1 Xi) | The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

WHY RANDOMIZE?

I Assumptions:

I Ignorability: Yi(1), Yi(0) Ti ⊥⊥ I Simple Randomization: Pr(Ti = 1) = E(Ti) = p for all i

E(YiTi) = E(Yi(1)Ti) = Yi(1)E(Ti) = Yi(1)p   YiTi Yi(1) = E ⇒ p N 1 X YiTi Y[i(1) = ⇒ N p i=1 The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

WHY RANDOMIZE?

1 PN I Noting p = N i=1 1(Ti = 1):

PN Y 1(T = 1) Y[1 i=1 i i Y i( ) = PN = Ti=1 i=1 1(Ti = 1)

I Symmetry implies unbiased estimate for ATE as

PN Y 1(T = 1) PN Y 1(T = 0) i=1 i i i=1 i i Y Y PN PN = Ti=1 Ti=0 i=1 1(Ti = 1) − i=1 1(Ti = 0) − The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

WHAT IF Ti IS NOT FROMA SIMPLE RANDOMIZATION? Assume Yi(1), Yi(0) Ti 6⊥⊥ I π(Xi) = Pr(Ti = 1 Xi), a consistent estimate | I Ignorability Yi(1), Yi(0) Ti Xi ⊥⊥ | I Common support: 0 < π(Xi) < 1 E(YiTi) = E(Yi(1)Ti) = E(E(Yi(1)Ti Xi)) | = E(Yi(1)E(Ti Xi)) | {z | } Yi(1),Yi(0) Ti|Xi ⊥⊥ = E(Yi(1) Pr(Ti = 1 Xi)) | = Yi(1)π(Xi)   YiTi E(Yi(1)) = E ⇒ π(Xi) N 1 X YiTi Y[i(1) = ⇒ N π(Xi) i=1 The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

WEIGHTING BASED ESTIMATORS

I Estimating the ATE    Ti (1 Ti) ATE = E Yi − = 0 π(Xi) − (1 π(Xi)) − N   1 X YiTi Yi(1 Ti) ATEd = − N π(Xi) − 1 π(Xi) i=1 −

I Estimating the ATT    (1 Ti)π(Xi) ATT : ETi=1 Yi Ti − = 0 − 1 π(Xi) − N   1 X Yi(1 Ti)π(Xi) ATTd = YiTi − NT − 1 π(Xi) i=1 − The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

PROPENSITY SCORE WEIGHTING METHODS

Goal: Estimate outcome for treated observations, if all were treated

Yi(1) 1 1 1 1 1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1 π(Xi) 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 Treatment/Control (Ti) 1110000011111101 H-T wt 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.2

1 PN I Truth: N i=1 Yi = 0 PN i=1 YiTi I Sample Average: PN = 0.4 i=1 Ti − 1 PN I Horvitz-Thompson: N i=1 YiTi/πi = 0.125 PN − i=1 YiTi/π(Xi) I IPW: PN = 0.077 i=1 Ti/π(Xi) − The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

“DOUBLY ROBUST”ESTIMATOR

Doubly-robust estimators (Robins et al.):

n "( ) ( )# 1 X Ti(Yi − Yb(1, Xi)) (1 − Ti)(Yi − Yb(0, Xi)) Yb(1, Xi) + − Yb(0, Xi) + N π(Xi) 1 − π(Xi) i=1 b b

I Consistent if either propensity model (π(Xi)) or outcome   b model Yb(Ti, Xi) is correct I If πb(Xi) is consistent, reduces to Horvitz-Thompson estimator I Yb(Ti, Xi) terms cancel I If Yb(Ti, Xi) is consistent, Yi Yb(Ti, Xi) is noise − I Second and fourth terms in summand go to zero

I Beware: No guarantees if neither is consistent The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

COMPARING WEIGHTING ESTIMATORS

Comparing Four Estimators, Truth = 0

Mean Horvitz−Thompson Inverse Probability Weighting Doubly Robust 15 10 5 0

−1.0 −0.5 0.0 0.5 1.0 The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

ROSENBAUMAND RUBIN (1983)

I Assumptions: 1. Overlap: 0 < π(Xi) < 1 2. Unconfoundedness:

Yi(1), Yi(0) Ti Xi { } ⊥⊥ |

I The main result: Propensity score as a dimension reduction tool

Yi(1), Yi(0) Ti π(Xi) { } ⊥⊥ |

Rather than balance across every dimension of Xi, sufficient to balance/match only on π(Xi)! The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

MATCHING

1. Fit a propensity function 2. Match each treated observation to the k untreated observations closest in propensity or match to all observations within some caliper (i.e. all untreated observations within 2 percetage points) ± 3. Reuse observations, if necessary 4. Assess balance on matched set; return to (1) if necessary Some notes:

I No guidance on how many matches/caliper size common sense ⇒

I No guidance on how much balance is “good enough” The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

STANDARD ERRRORS

Problem: Incorporating uncertainty in matching/weighting into your estimation I Why not just treat matched sample or weights as fixed?

I Understate standard errors! I Weighting: Use the delta method or boostrap

I Lunceford and Davidian (Statistics in Medicine, 2004)–tractable, but lots of algebra I Matching: Abadie and Imbens (Econometrica, 2006 & 2008)

I Can’t boostrap! Use:

N N   AI 1 X 2 1 X 1 1 2 Vb = 2 (Yi(1) − Ybi(0) − τ) + 2 − 2 (Yi − Yl(i)) 1(Ti = 0) N b 2N K i K T i=1 T i=1 i where

I Ki: Number of times observation i used as a match I Yl(i): Closest matched obseration with same treatment status as i The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

SENSITIVITY ANALYSIS Idea: How large would an unobserved confounder have to be to switch the results?

I Γ: measure of unobserved confounder, odds of two observationally equivalent people being prone to different treatment conditions I How large does Γ have to be to make your result non-significant?

I Formally, 1 πi(Xi)/(1 πi(Xi)) − Γ Γ ≤ πj(Xi)/(1 πj(Xi)) ≤ − I Equivalently, assume true propensity model is

−1 Pr(Ti = 1) = logit (Xiβ + log(Γ)uj)

where 0 uj 1 ≤ ≤ I Larger Γ better result; ⇒ I Γ 6 for smoking; normally 1.2 for social science. ≈ I Use package rbounds The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

OVERVIEW OF TALK

The Logic of Matching

Propensity Score

Model Misspecification

Beyond Logistic Propensity Scores The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

PROPENSITY SCORE TAUTOLOGY

I Propensity score is unknown and must be estimated

I Dimension reduction is purely theoretical: must model Ti given Xi I Diagnostics: covariate balance checking

I In theory: ellipsoidal covariate distributions = equal percent bias reduction ⇒ I In practice: skewed covariates and adhoc specification searches

I Model misspecification is always possible

I Propensity score methods can be sensitive to misspecification

I Tautology: propensity score methods only work when they work The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

COVARIATE BALANCING PROPENSITY SCORE (CBPS)

I Idea: take advantage of propensity score tautology I Recall the dual characteristics of propensity score 1. Predicts treatment assignment 2. Balances covariates I Implied conditions: 1. Score condition: first derivative of the log-likelihood set to zero   ∂l(β Ti) E | = E Xi(Ti πβ(Xi)) = 0 ∂β { − } 2. Balancing condition: weighted difference in means between treated an untreated observations set to zero    Ti (1 Ti) ATE : E Xi − = 0 πβ(Xi) − (1 πβ(Xi))   −  (1 Ti)πβ(Xi) ATT : E Xi Ti − = 0 − 1 πβ(Xi) −

I CBPS uses the same propensity score model (e.g., ) but estimates it to best satisfy the above conditions The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

WEIGHTING CONTROL GROUPTO BALANCE COVARIATES

n πβ (Xi)(1−Ti)Xi o I Balancing condition: E TiXi = 0 − 1−πβ (Xi) 6 5 4 Control units Treated units 3 2 1 0

0.0 0.2 0.4 0.6 0.8 1.0 The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

WEIGHTING CONTROL GROUPTO BALANCE COVARIATES

n πβ (Xi)(1−Ti)Xi o I Balancing condition: E TiXi = 0 − 1−πβ (Xi) 6 5 4 3 ATT weighted Control units

2 Treated units 1 0

0.0 0.2 0.4 0.6 0.8 1.0 The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

WEIGHTING BOTH GROUPSTO BALANCE COVARIATES

n T X (1−Ti)Xi o I Balancing condition: E i i = 0 πβ (Xi) − 1−πβ (Xi) 6 5 4 3 2

ATE weighted ATE weighted Control units Treated units 1 0

0.0 0.2 0.4 0.6 0.8 1.0 The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

GENERALIZED METHODOF MOMENTS (GMM) ESTIMATION

I Over-identification: more moment conditions than parameters I GMM (Hansen 1982): > −1 βˆGMM = argmin ¯gβ(T, X) Σβ(T, X) ¯gβ(T, X) β∈Θ where N 1 X  score condition  ¯g (T, X) = β N balancing condition i=1 | {z } gβ (Ti,Xi)

I “Continuous updating” GMM estimator with the following Σ: N 1 X > Σβ(T, X) = E(gβ(Ti, Xi)gβ(Ti, Xi) Xi) N | i=1 I Newton-type optimization algorithm with MLE as starting values The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

SPECIFICATION TESTAND OPTIMAL MATCHING

I CBPS is overidentified

I Specification test based on Hansen’s J-:

> −1 2 J = n¯gβ(T, X) Σβ(T, X) ¯gβ(T, X) χ ∼ k where k is the number of moment conditions

I Can also be used to select matching estimators I Example: Optimal 1-to-N matching

I Assume N control units matched with each treated unit I Calculate J statistic by downweighting matched control units with weight 1/N I Choose N such that J statistic is minimized The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

SIMULATION STUDYOF WEIGHTING ESTIMATORS

KANGAND SCHAFER (2007, Statistical Science)

I Simulation study: the deteriorating performance of propensity score weighting methods when the model is misspecified

I Can the CBPS save propensity score weighting methods?

∗ I 4 covariates Xi : all are i.i.d. standard normal I Outcome model: linear model

I Propensity score model: logistic model with linear predictors I Misspecification induced by measurement error: ∗ I Xi1 = exp(Xi1/2) ∗ ∗ I Xi2 = Xi2/(1 + exp(X1i) + 10) ∗ ∗ 3 I Xi3 = (Xi1Xi3/25 + 0.6) ∗ ∗ 2 I Xi4 = (Xi1 + Xi4 + 20) The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

WEIGHTING ESTIMATORS EVALUATED

1. Horvitz-Thompson (HT): n   1 X TiYi (1 Ti)Yi − n πˆ(Xi) − 1 πˆ(Xi) i=1 − 2. Inverse-probability weighting with normalized weights (IPW): Same as HT but with normalized weights

3. Weighted least squares regression (WLS): with HT weights

4. Doubly-robust least squares regression (DR): consistently estimates the ATE if either the outcome or propensity score model is correct The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

WEIGHTING ESTIMATORS WITH CORRECT MODEL Bias RMSE Sample size Estimator GLM True GLM True (1) Both models correct HT −0.01 0.68 13.07 23.72 IPW −0.09 −0.11 4.01 4.90 n = 200 WLS 0.03 0.03 2.57 2.57 DR 0.03 0.03 2.57 2.57 HT −0.03 0.29 4.86 10.52 IPW −0.02 −0.01 1.73 2.25 n = 1000 WLS −0.00 −0.00 1.14 1.14 DR −0.00 −0.00 1.14 1.14 (2) Propensity score model correct HT −0.32 −0.17 12.49 23.49 IPW −0.27 −0.35 3.94 4.90 n = 200 WLS −0.07 −0.07 2.59 2.59 DR −0.07 −0.07 2.59 2.59 HT 0.03 0.01 4.93 10.62 IPW −0.02 −0.04 1.76 2.26 n = 1000 WLS −0.01 −0.01 1.14 1.14 DR −0.01 −0.01 1.14 1.14 The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

WEIGHTING ESTIMATORS WITH INCORRECT MODEL Bias RMSE Sample size Estimator GLM True GLM True (3) Outcome model correct HT 24.72 0.25 141.09 23.76 IPW 2.69 −0.17 10.51 4.89 n = 200 WLS −1.95 0.49 3.86 3.31 DR 0.01 0.01 2.62 2.56 HT 69.13 −0.10 1329.31 10.36 IPW 6.20 −0.04 13.74 2.23 n = 1000 WLS −2.67 0.18 3.08 1.48 DR 0.05 0.02 4.86 1.15 (4) Both models incorrect HT 25.88 −0.14 186.53 23.65 IPW 2.58 −0.24 10.32 4.92 n = 200 WLS −1.96 0.47 3.86 3.31 DR −5.69 0.33 39.54 3.69 HT 60.60 0.05 1387.53 10.52 IPW 6.18 −0.04 13.40 2.24 n = 1000 WLS −2.68 0.17 3.09 1.47 DR −20.20 0.07 615.05 1.75 The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

REVISITING KANGAND SCHAFER (2007) Bias RMSE Estimator GLM Balance CBPS True GLM Balance CBPS True (1) Both models correct HT −0.01 2.02 0.73 0.68 13.07 4.65 4.04 23.72 IPW −0.09 0.05 −0.09 −0.11 4.01 3.23 3.23 4.90 n = 200 WLS 0.03 0.03 0.03 0.03 2.57 2.57 2.57 2.57 DR 0.03 0.03 0.03 0.03 2.57 2.57 2.57 2.57 HT −0.03 0.39 0.15 0.29 4.86 1.77 1.80 10.52 IPW −0.02 0.00 −0.03 −0.01 1.73 1.44 1.45 2.25 n = 1000 WLS −0.00 −0.00 −0.00 −0.00 1.14 1.14 1.14 1.14 DR −0.00 −0.00 −0.00 −0.00 1.14 1.14 1.14 1.14 (2) Propensity score model correct HT −0.32 1.88 0.55 −0.17 12.49 4.67 4.06 23.49 IPW −0.27 −0.12 −0.26 −0.35 3.94 3.26 3.27 4.90 n = 200 WLS −0.07 −0.07 −0.07 −0.07 2.59 2.59 2.59 2.59 DR −0.07 −0.07 −0.07 −0.07 2.59 2.59 2.59 2.59 HT 0.03 0.38 0.15 0.01 4.93 1.75 1.79 10.62 IPW −0.02 −0.00 −0.03 −0.04 1.76 1.45 1.46 2.26 n = 1000 WLS −0.01 −0.01 −0.01 −0.01 1.14 1.14 1.14 1.14 DR −0.01 −0.01 −0.01 −0.01 1.14 1.14 1.14 1.14 The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

CBPSMAKES WEIGHTING METHODS WORK BETTER Bias RMSE Estimator GLM Balance CBPS True GLM Balance CBPS True (3) Outcome model correct HT 24.72 0.33 −0.47 0.25 141.09 4.55 3.70 23.76 IPW 2.69 −0.71 −0.80 −0.17 10.51 3.50 3.51 4.89 n = 200 WLS −1.95 −2.01 −1.99 0.49 3.86 3.88 3.88 3.31 DR 0.01 0.01 0.01 0.01 2.62 2.56 2.56 2.56 HT 69.13 −2.14 −1.55 −0.10 1329.31 3.12 2.63 10.36 IPW 6.20 −0.87 −0.73 −0.04 13.74 1.87 1.80 2.23 n = 1000 WLS −2.67 −2.68 −2.69 0.18 3.08 3.13 3.14 1.48 DR 0.05 0.02 0.02 0.02 4.86 1.16 1.16 1.15 (4) Both models incorrect HT 25.88 0.39 −0.41 −0.14 186.53 4.64 3.69 23.65 IPW 2.58 −0.71 −0.80 −0.24 10.32 3.49 3.50 4.92 n = 200 WLS −1.96 −2.01 −2.00 0.47 3.86 3.88 3.88 3.31 DR −5.69 −2.20 −2.18 0.33 39.54 4.22 4.23 3.69 HT 60.60 −2.16 −1.56 0.05 1387.53 3.11 2.62 10.52 IPW 6.18 −0.87 −0.72 −0.04 13.40 1.86 1.80 2.24 n = 1000 WLS −2.68 −2.69 −2.70 0.17 3.09 3.14 3.15 1.47 DR −20.20 −2.89 −2.94 0.07 615.05 3.47 3.53 1.75 The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

CBPSSACRIFICES LIKELIHOODFOR BETTER BALANCE Log−Likelihood Covariate Imbalance 3 −520 10

● ● ●● 2 ●●● ● ●●● ●● 10 −550 ●● ●● ●●● ●●●● ●●● ●●● ●●●● ●●●● ●●●● ● ●●●● 1 ●●●●● ●●●● ● ●●●● ● ●●●● ●● ●● 10 ●● ●●●●● ●●● ●●●●● ●●●●● ● GLM ● GLM ●●●● ●●●●●●● ●●●●● ●●●●●●● ●● −580 ●● ●●● ●●●● ●●●●●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●● ●●●● 0 ●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●● ●●●● ●●●●●●●●● ●●●● ●●●●●● ●● 10 ●●●●●● ●●●●● ●●●●● ●●● ●●● ●●● ●● ● ●●●● ●● ●●● 1 −610 ●●●

●● −

● 10

− −610 −580 −550 −520 10 1 100 101 102 103 CBPS CBPS

● 3 ●● ●●●

−520 ● ● ●● ●● ● ● ●● ● ●●● 10 ●● ●● ● ●●●● ●● ●● ●●●●● ● ● ●●●●●●● ●●●●●●● ●● ●●●●●●●● ● ● ●●●●●● ●●● ●●●●●●● ●●● ● ●●●●●●●●●●● 2 ●●●● ● ●●●●●●●●●●●●● ●●● ●●●●●● ●●●●●●● ●●●●●● ●●●●●●●●●●●●● ●●● ● ●●●●●●●●●●●● ●●●● ● ●●●●●● 10 ● −550 ● ● ●●●●●●●● ●●●● ●●●●●●●●●●●●● ●●●● ● ●●●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●● ● ●●●●●●●●●● ●●●●●●● ●●●●●●● ●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●● ● ●●●●●●●● ●●●●●●●● ● ●●●●●●● 1 ●●●●●●●●●● ●●●●●●●●● ●●●●●●●● ●●●●●●● ●●●●●●● ●● ●●●●●●●●●●● ●●●●●●●● ●●●●●●●● ●●●●●●●●●● ●●●● 10 ●●●●● ●●●●●● ●●●●●●●● ●●●●●●● ●●●●●●● ● GLM ●●●● GLM ●●●●●●●● ●● ●●● ●●●●●●●●●● ●● −580 ● ●●●● ●●●● ●●●●●●●●● ●●●● ●●●●● ●●●●●●●●●●●●●●● ● ●● ●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●● ● ●●●●●●●●●●● 0 ●●●●●●●●● ●●● ●●●●●●●●●●● ●●● ●●●●●●● ● ● ●●●●●●●●● ●●●●● ● 10 ●●●●● ● ●●●● ● ● 1 −610 − 10

− −610 −580 −550 −520 10 1 100 101 102 103 CBPS CBPS Neither Model Both Models Specified Correctly The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

SOFTWARE:RPACKAGE CBPS ## upload the package library("CBPS") ## load the LaLonde data data(LaLonde) ## Estimate ATT weights via CBPS fit <- CBPS(treat ~ age + educ + re75 + re74 + I(re75==0) + I(re74==0), data = LaLonde, ATT = TRUE) summary(fit) ## matching via MatchIt library(MatchIt) ## one to one nearest neighbor with replacement m.out <- matchit(treat ~ 1, distance = fitted(fit), method = "nearest", data = LaLonde, replace = TRUE) summary(m.out) The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

THE CAUSAL EFFECT OF HEALTH INSURANCEIN CHINA

I Shiro’s Data: China increased its insurance coverage from 20% in 2002 to 95% in 2012, but this percentage alone says nothing of whether the content of the insurance is reliable or sufficient, particularly in rural areas where insurance is difficult to implement... In sum, the CHNS data indicates improvement in certain health seeking behavior-related variables, but the metric of health seeking behavior in the survey is insufficient to draw conclusions about the new scheme’s impact on health status and equality to access.

I What was the impact of health insurance? The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

PROPENSITY SCORE

I Example:

I Ti 0, 1 : Gets insurance or not ∈ { } I Xi: income, mean gender, mean age, etc. I (Yi(1), Yi(0)): City with insurance and same city without I Yi = Yi(Ti): The outcome we observe

I Design

I Consider cities with treatment in 2000 I Find similar (matched) cities without treatment in 2000 I Estimate a difference-in-difference effect The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

NON-RANDOM TREATMENT ASSIGNMENT Poor balance along pre-treatment covariates

avHsHedeg avAnyEd avFarmer 7 4 Insurance 1.6

No Insurance 6 1.4 3 5 1.2 4 1.0 2 3 Density Density Density 0.8 2 0.6 1 1 0.4 0 0

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.65 0.75 0.85 0.95 0.0 0.2 0.4 0.6 0.8

avHsHedeg avAnyEd avFarmer

avAge logInc avFem 1.2 0.15 15 1.0 0.8 0.10 10 0.6 Density Density Density 0.4 5 0.05 0.2 0 0.0 0.00 30 35 40 45 0 2 4 6 8 0.40 0.45 0.50 0.55 0.60

avAge logInc avFem The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

NON-RANDOM TREATMENT ASSIGNMENT Poor balance along pre-treatment covariates

avHsHedeg avAnyEd avFarmer 7 4 Insurance 1.6

No Insurance 6 1.4 3 5 1.2 4 1.0 2 3 Density Density Density 0.8 2 0.6 1 1 0.4 0 0

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.65 0.75 0.85 0.95 0.0 0.2 0.4 0.6 0.8

avHsHedeg avAnyEd avFarmer

avAge logInc avFem 1.2 0.15 15 1.0 0.8 0.10 10 0.6 Density Density Density 0.4 5 0.05 0.2 0 0.0 0.00 30 35 40 45 0 2 4 6 8 0.40 0.45 0.50 0.55 0.60

avAge logInc avFem The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

CBPS CBPS improves balance

avHsHedeg avAnyEd avFarmer 7

Insurance 1.6 4 Matched 6 5 3 1.2 4 2 3 Density Density Density 0.8 2 1 1 0.4 0 0

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.65 0.75 0.85 0.95 0.0 0.2 0.4 0.6 0.8

avHsHedeg avAnyEd avFarmer

avAge logInc avFem 1.2 0.15 15 1.0 0.8 0.10 10 0.6 Density Density Density 0.4 5 0.05 0.2 0 0.0 0.00 30 35 40 45 0 2 4 6 8 0.40 0.45 0.50 0.55 0.60

avAge logInc avFem The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

RESULTS WITH LOGISTIC REGRESSION Logistic Regression

avHsHedeg avAnyEd avFarmer 7 4 Insurance 1.6

Matched 6 1.4 3 5 1.2 4 1.0 2 3 Density Density Density 0.8 2 0.6 1 1 0.4 0 0

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.65 0.75 0.85 0.95 0.0 0.2 0.4 0.6 0.8

avHsHedeg avAnyEd avFarmer

avAge logInc avFem 1.2 0.15 15 1.0 0.8 0.10 10 0.6 Density Density Density 0.4 5 0.05 0.2 0 0.0 0.00 30 35 40 45 0 2 4 6 8 0.40 0.45 0.50 0.55 0.60

avAge logInc avFem The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

OVERVIEW OF TALK

The Logic of Matching

Propensity Score

Model Misspecification

Beyond Logistic Propensity Scores The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

ALTERNATIVE 1: SUBCLASSIFICATION

Imai and Van Dyk, 2004; Cochran and Rubin 1973

I Subclassification: Break confounding through subsetting

I Estimate propensity score I Split at deciles I Take unweighted difference-in-means in each decile – or – take weights as 1/(# in decile) I Aggregate up

I Easy! Works! The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

SUBCLASSIFICATION FOR CONTINUOUS TREATMENT Subclassify on deciles, quartiles, etc. of predicting a continuous treatment (any GLM).

I Outcome: Medical Expenditures I Treatment: Smoking (pack-years) I Confounders: Smoking duration (years) and frequency (when

smoking);Imai and van Dyk: age, Generalizing race, the education, Propensity Score etc. 861

(a) (b)

Figure 3. Reduction in Correlations Between the Two Treatment Variables and the Covariates for (a) 3 3 and (b) 4 4 Subclasses. The panels × × plot the absolute value of the correlations between the each of the two treatment variables and each of the covariates (horizontal axis) against the average of the absolute value of the within-subclass correlations (verticalaxis).Thecirclesindicatethecorrelationsbetweenthedurationtreatment variable and the covariates, whereas the crosses represent the correlations between the frequency treatment variable and the covariates.

P c1(λ) p 1 λp exp(κpXip ) and a multiplicative model of the variates, we include the square terms for the two age variables = A A P in both models to improve the balance given the two linear pre- form Yi αi1Ti1 αi2Ti2 c2(λ) exp( p 1 λpXip ),where != + + = T A and T A represent the the duration and the frequency of dictors. Figure 3 shows the significant reduction in correlations i1 i2 ! between the treatment variables and each of the covariates. In smoking for individual i and αi1 and αi2 are the correspond- ing treatment effects. As in Section 3.2, we simulate 1,000 sets particular, after subclassification on the propensity function, the absolute magnitude of the mean within-subclass correlation is of response variables for each of the 12 nonlinear models. To less than .1 for all variables except one of the age variables, construct the variable treatment effect models, we set α equal i1 whose correlation is reduced by 2/3. to the variable treatment in Section 3.2 and construct α in the i2 We subclassify the data into several subclasses based on same manner except using the current age covariate. θ and θ .Eachsubclasscontainsunitswithaspecificrange ˆ1 ˆ2 We first estimate two propensity functions, one function for of both θ and θ .AsFigure4illustrates,inthe3 3tableof ˆ1 ˆ2 the frequency of smoking and one for the duration of smok- subclasses the first subclass contains units with θ and× θ lower ˆ1 ˆ2 ing. We model the propensity functions using two independent than their 33rd , and the last subclass contains units Gaussian linear regression models and fit the models via ML with both quantities above their 67th percentile. (In some cases, and the propensity functions summarized by θ1 X"β and classification schemes that are more complex than a simple grid ˆ = ˆ 1 θ2 X β ,withβ and β representing the ML estimate of may be required.) Next, we estimate the average causal effects ˆ = " ˆ 2 ˆ 1 ˆ 2 the covariates for the two models. In addition to the set of co- within each subclass using Gaussian linear regression. Namely,

Propensityfunction Propensityfunctionforduration for frequency Lower third Middle third Upper third Subclass I Subclass II Subclass III Upper third duration: .317 (.221) duration: .075 (.092) duration: .016 (.078) frequency: .223 (.143) frequency: .125 (.075) frequency: .093 (.067) n −324 n 1,160 n 1,542 = = = Subclass IV Subclass V Subclass VI Middle third duration: .020 (.105) duration: .011 (.092) duration: .182 (.100) frequency: .009 (.075) frequency:− .123 (.076) frequency:− .208 (.080) n 1,162 n 910 n 952 = = = Subclass VII Subclass VIII Subclass XI Lower third duration: .079 (.099) duration: .178 (.096) duration: .018 (.138) frequency:− .105 (.058) frequency:− .016 (.072) frequency: .026 (.106) n 1,538 n 954 n 532 = = =

Figure 4. Within-Subclass Estimates of the Causal Effects of Smoking on Medical Expenditure. Each cell of the 3 3tablerepresentsa × subclass within which units have a particular of the propensity functions for the two treatments. The vertical and horizontal lines that form the subclasses are the 33rd and 67th of the two propensity functions. The figures within each cell represent the estimated coefficients from the within-subclass Gaussian linear regression and the number of within-subclass observations; standard errors are in parentheses, and n represents the subclass sample sizes. The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

SUBCLASSIFICATION FOR CONTINUOUS TREATMENT

I Estimating the causal effect with a GAM

I No need for other covariates if balance achieved! 860 Journal of the American Statistical Association, September 2004

3.3 Data Analysis We now turn to the observed response variable, self-reported medical expenditure, denoted by Y .Weusethesamepropen- sity function as that described in Section 3.2 and model Y within each of 10 subclasses using the two-part model of Duan, Manning, Morris, and Newhouse (1983) for semicon- tinuous variables. In particular, within each subclass we first model the probability of spending some money on medical care, Pr(Y > 0 T A, X),giventhetreatmentvariable,T A log( packyear),andallcovariates,| X,usinglogisticregression.= We then model the conditional distribution of log(Y ),given T A and X,forthoseindividualswhoreportedpositivemedical expenditure, p(log(Y ) Y>0,TA, X),usingGaussianlinearre- gression (see also Olsen| and Schafer 2001; Javaras and van Dyk Figure 2. Estimated Causal Effect From the Smooth Coefficient Model. The solid curve represents the causal effect as a function of θ ˆ 2003). and is based on the estimated coefficient of log(packyears) from the Using this two-part model, we estimate the effects of smok- (Gaussian) smooth coefficient model presented in Table 2. The dotted ing on medical costs within each of the 10 subclasses. Finally, lines show two standard errors above and below the estimate. The ver- we compute the weighted average of the 10 within-subclass tical lines represent division into three subclasses of equal size. The observed value of θ are indicated by short bars on the horizontal axis. estimates to obtain the average causal effect; in each within- ˆi subclass analysis, we use the weights provided in The gray bands correspond to two standard errors above and below the within-subclass estimates based on the within-subclass Gaussian linear the dataset. Using three subclasses produces similar results, as regressions. given in Table 2. We also fit a smooth coefficient model by letting the causal effect as well as an intercept vary smoothly as a function of θ.Inparallelwiththeaforementionedtwo- function yield a greater effect of smoking on medical expendi- ˆ part model, we use the generalized additive model (Hastie and ture than the standard linear . In particular, if Tibshirani 1990) with the binomial family and logistic link to packyear were to double, then we would expect annual medical model the probability of positive medical costs, and use the expenditure to increase by a factor of about 1.04. Gaussian family and identity linktomodeltheconditionaldis- Figure 2 illustrates an advantage of using the propensity tribution of log(Y ).Wefitthesemodelsusingalloftheco- score methods in this example. The figure plots the estimated variates with the R package mgcv developed by Simon Wood; causal effect from the smooth coefficient model as a function of excluding the covariates in this model has little effect on the the estimated propensity function. The constant treatment effect fitted causal effects. assumption of the standard regression models is not appropriate Table 2 presents the results from the methods based on here; the constant treatment effect model is rejected with ap- the propensity function as well as the results of the standard proximate p<.002 under the smooth coefficient model. (This complete-case linear and logistic regressions, both of which in- p value was computed with the mgcv package in R.) The two clude all covariates. All methods agree that cumulative expo- propensity score methods presented in this section relax this sure to smoking, as measured by the packyear variable, has little assumption. Subclassification enables us to estimate the causal effect on the probability of spending some money on medical effect separately within each subclass, whereas the smooth co- care in 1987. In contrast, we find that smoking appears to in- efficient model allows the causal effect to vary smoothly as a function of θ.Inthiscase,ageishighlycorrelatedwiththeas- crease medical expenditure among those who reported positive ˆ medical cost. (As pointed out by a referee, this ignores the fact signed treatment. Thus, roughly speaking, Figure 2 shows that that smoking can be fatal and potentially reduce medical ex- as age increases, the effect of log( packyear) on medical ex- penditure.) Moreover, the two methods based on the propensity penses also increases. 4. EFFECTS OF SMOKING USING ABIVARIATETREATMENT Table 2. Estimated Average Causal Effect of Increased Smoking on Medical Costs 4.1 The Bivariate Treatment Propensity score methods Instead of combining frequency and duration into a single Smooth measure, we can conduct an analysis with a bivariate treatment Direct 3 sub- 10 sub- coefficient composed of the duration of smoking (the log number of smok- models classes classes model ing months) and the frequency of smoking (the log number of Logistic regression model cigarettes per day). Coefficient for T A .085 .082 .079 .070 − − − − Standarderror 3.075 2.996 3.126 3.260 4.2 A Simulation Study Gaussian regression model Averagecausaleffect .029 .044 .048 .050 Before analyzing the data using the bivariate treatment, we Standarderror .017 .017 .018 .017 conduct a simulation study using the same setup as in Sec- NOTE: The logistic regression model presents the coefficient of the treatment variable for pre- tion 3.2, except that the single treatment variable is replaced dicting positive medical costs. The Gaussian regression model presents the estimated average causal effects of log( packyear)onlog(medical expenditure)forindividualswithpositivemedical by the sum of two treatment variables. In particular, we con- A A costs. struct an additive model of the form Yi αi1T αi2T = i1 + i2 + The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

COARSENED EXACT MATCHING Iacus, King, and Porro 2011

I Form multidimensional of X through coarsening I Example: age into 20-30, 31-40, 41-50, and so on; schooling into 0-8, 9-12, Bachelors, etc. I Give treated observations weight 1; give untreated observations weight {# total matched controls}/{# total matched treated} {# total matched treated in stratum}/{# total matched controlled× in stratum} I Advantages: easy to understand I Problems: I Sensitivity to stratifying choices I Drops treated observations: Local ATT not ATT! (extrapolation for ATT often not successful) I Problem with irrelevant covariates–we should be more concerned about balance on covariates that predict the treatment! I Takeaway–trim on empirical support for most relevant confounders The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

CAUSAL EFFECT OF SLAVERY

Does slavery have an effect on contemporary voting behavior in the south? (link) 1. If we imagined Southern counties without slavery, what would contemporary voting behavior look like? 2. How do we know if it’s slavery or simply being in the South?

Question 1: Compare adjacent counties with 1860 slave proportion of population above and below 50%. Question 2: Compare slave counties in the South to similar counties in the North. The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

1SPQ %FNPDSBU "ďSN "DUJPO 3BDJBM 3FTFOUNFOU Ɖ Ɗ Ƌ ƌ ƍ Ǝ 1SPQ 4MBWF ƉƐƎƈ ų.ŵŵŸ ų.ŵŴŹ ų.źźż − ∗∗ − ∗∗ ∗∗ (ų.ųźŹ) (ų.ųźŴ) (ų.ŵżż) 4MBWF 4UBUF ų.ųųŸ ų.ųųŻ ų.ųŷŻ (ų.ųŶŷ) (ų.ųŶŴ) (ų.ŴųŶ)

4UBUF 'JYFE &ČFDUT ! ! ! ƉƐƎƈ $PWBSJBUFT ! ! ! ! ! ! ƍƈŰ ćSFTIPME .BUDI ! ! ! /PSUI4PVUI .BUDI ! ! !

/ Ŷŵż ŴŻŴ Ŷŵż ŴŻŴ ŵźŷ ŴźŴ 3ŵ ų.Ŷźź ų.ųźż ų.ŵŶż ų.ųżų ų.ŶŷŸ ų.ŴŴŷ

∗Q < ƈƍ

5BCMF ƌ $PMVNOT Ɖ Ƌ BOE ƍ TIPX SFTVMUT PG 8-4 SFHSFTTJPOT XJUI TUBUF ĕYFE FČFDUT BOE UIF ƉƐƎƈ DPWBSJBUFT GPS POMZ UIF TVCTFU PG DPVOUJFT UIBU CPSEFS B DPVOUZ JO XIJDI QSPQPSUJPO TMBWF MJFT PO UIF PUIFS TJEF PG UIF ƍƈŰ UISFTIPME $PMVNOT Ɗ ƌ BOE Ǝ TIPX EJČFSFODF CFUXFFO TMBWFTUBUF DPVOUJFT XJUI WFSZ GFX TMBWFT MFTT UIFO ƋŰ PG UIF ƉƐƎƈ QPQVMBUJPO BOE OPO4PVUIFSO DPVOUJFT NBUDIFE PO HFPHSBQIZ GBSN WBMVF QFS DBQJUB BOE UPUBM QPQVMBUJPO $PFďDJFOUT BSF GSPN B 8-4 SFHSFTTJPO PO UIF NBUDIFE EBUB UIBU JODMVEFT B EVNNZ WBSJBCMF GPS 4MBWF 4UBUF BT XFMM BT UIF ƉƐƎƈ DPWBSJBUFT *O BMM SFHSFTTJPOT XFJHIUT BSF UIF XJUIJODPVOUZ TBNQMF TJ[FT 4UBOEBSE FSSPST JO QBSFOUIFTFT

UP CF UIF BQQSPQSJBUF DPVOUFSGBDUVBM XIJDI JT XIBU DPOUFNQPSBSZ QPMJUJDBM BUUJUVEFT JO UIF 4PVUI XPVME IBWF CFFO IBE TMBWFSZ CFFO BT OPOQSFWBMFOU JO UIF 4PVUI BT JU XBT JO UIF /PSUI 8F BEESFTT UIFTF JTTVFT CZ FYBNJOJOH EJČFSFODFT CFUXFFO 4PVUIFSO DPVOUJFT XJUI WFSZ GFX TMBWFT JO ƉƐƎƈ BOE OPO4PVUIFSO DPVOUJFT XJUI OP TMBWFT JO ƉƐƎƈ 5P EP UIJT XF SFTUSJDU UIF EBUB UP DPVOUJFT JO TMBWF TUBUFT XIFSF GFXFS UIBO ƋŰ PG UIF DPVOUZ QPQV MBUJPO XBT FOTMBWFE Ɖƍ BOE UIFO NBUDI UIFTF DPVOUJFT UP TJNJMBS DPVOUJFT JO OPOTMBWFT TUBUFT PO HFPHSBQIZ MBUJUVEFMPOHJUVEF GBSN WBMVF QFS DBQJUB BOE UPUBM DPVOUZ QPQ VMBUJPOƉƎ ćVT XF DPNQBSF Ɖ DPVOUJFT GSPN 4PVUIFSO TUBUFT XJUI WFSZ GFX TMBWFT UP Ɗ DPVOUJFT XIFSF TMBWFSZ XBT BHBJOTU UIF MBX 8F SFHSFTT FBDI PG PVS UISFF PVUDPNF

ƉƍćJT BOBMZTJT XBT GBJSMZ SPCVTU UP UIF DIPJDF PG DVUPČ GPS FYBNQMF DIPPTJOH POMZ DPVOUJFT JO TMBWF TUBUFT UIBU IBE VQ UP ƍŰ FOTMBWFE SFTVMUFE JO B DPNQBSBCMF BOBMZTJT ƉƎ8F VTF DPBSTFOFE FYBDU NBUDIJOH $&. PO UIFTF WBSJBCMFT FNQMPZJOH UIF EFGBVMU DVUQPJOUT *BDVT ,JOH BOE 1PSSP ƊƈƉƊ ƊƈƈƐ  'PS OPOTMBWF TUBUFT UIF QPTUNBUDIFE TBNQMF JODMVEFE DPVOUJFT JO "SJ[POB /FX +FSTFZ /FX .FYJDP /FX :PSL 0IJP 0LMBIPNB BOE 1FOOTZMWBOJB GPS UIF TMBWF TUBUFT JU JODMVEFT "SLBOTBT %FMBXBSF ,FOUVDLZ .BSZMBOE 5FYBT 7JSHJOJB BOE 8FTU 7JSHJOJB

Ɖƍ The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

ALTERNATIVE 2: GENETIC MATCHING

Diamond and Sekhon, Review of and Statistics

I Specify measure of discrepancy

I Default: negative p-value for all marginal difference-in-means and Kolmogorov-Smirnov statistics I Minimize this p-value using a genetic algorithm

I Gives weights; specifies rules for more or less “fit” weights I More “fit” weights reproduce and interact I Repeat across generations

I Pluses: Kind of awesome, generally works

I Minuses: Sensitivity to initial conditions (number of generations; population size); no ; small p-values may arise due to small sample sizes! The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

ALTERNATIVE 3: ENTROPY BALANCING Hainmueller 2012

I Targets balancing weights explicitly: Problem: Find wi to satisfy N N X Xi1(Ti = 1) X wiXi1(Ti = 0) = 1(Ti = 1) wi1(Ti = 0) i=1 i=1 such that N X wi log wi 1(Ti = 0) | {z } i=1 Entropy of weights is a small as possible I Without constraint (number of untreated > number of covariates) infinite number of weights that balance exactly! ⇒ I Minimum entropy: find weights that achieve perfect in sample balance in means as close to 1 as possible I Problems: ; Advantages: guarantees perfect in-sample balance The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

ALTERNATIVE 4: INVERSE-PROBABILITY TILTING

Graham, et al. 2008

I Same as entropy balancing, but identifies weights through constructing weights as functions of logistic-regression estimated probabilities

I Equivalent to CBPS, but only using balance conditions

I Gives perfect in-sample mean balance; assumes a model for propensity score The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

ALTERNATIVE 5: SYNTHETIC MATCHING

Abadie, Diamond, Hainmueller 2010

I Entropy balancing, but with a single treated observation

I Balance on covariates before an intervention–get weights I Compare evolution of

I Treated observation I Synthetic treated observation (weighted average of controls) The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

EXAMPLE

I Treatment: Stand-Your-Ground Law in Florida I Outcome: Firearm deaths I Synthetic Matching to weight untreated observations to look like the pre-treatment treated observation I Placebo Test: Permute Treatment The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

ALTERNATIVE 6: SUPPORT VECTOR MACHINE MATCHING

> Assume a model Xi β.

Transform Xi to : P ∗ i Xi 1(Ti = 1) Xi = Xi P · − i 1(Ti = 1) | {z } mean of treated Xi

Transform Ti from 0, 1 to 1, 1 : { } {− } ∗ T = 2Ti 1 i − The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

IDENTIFYINGA BALANCED SUBSETOFTHE DATA

Define the “hinge loss” z + = max(z, 0) | | :

X ∗ ∗> (β) = 1 T X β + L | − i i | i

Examples: ∗ ∗> I Easy to classify: Ti = 1; Xi β = 2 1 1 2 + = 1 + = 0 | − · | | − | ∗ ∗> I Hard to classify: T = 1; X β = 0.5 i − i − 1 ( 1) ( 0.5) + = 0.5 + = 0.5 | − − · − | | | Constraint identifies ATT. The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

IDENTIFYINGA BALANCED SUBSETOFTHE DATA

∗ ∗> Define = i : 1 Ti Xi β > 0 TakingM and expanding{ − the first order} condition:

∂ X ∗ ∗> X ∗ ∗ 1 T X β + = T X 1(i ) = 0 ∂β | − i i | − i i · ∈ M i i X ∗ X ∗ X 1(Ti = 0, i ) = X 1(Ti = 1) = 0 i · ∈ M i · i i

| ∗ {z } since Xi is centered on Ti = 1 Law of Large Numbers:

E(Xi Ti = 1) = E(Xi Ti = 0, i ) | | ∈ M Therefore, all marginal observations are balanced-in-mean. The Logic of Matching Propensity Score Model Misspecification Beyond Logistic Propensity Scores

THE BINARY TREATMENT SVM

Balance-in-means is not balance in distribution.

To achieve joint independence: ∗> ∗ I Xi β η (Xi) ∗→ I η (Xi) is a smoothing spline

I Observations in are balanced-in-distribution M Advantages

I Identifies largest balanced subset of data

I Extension to continuous

I Bayesian implementation posterior credible intervals for PATT and SATT ⇒