SUMMARY OF RIGOROUS IMPACT EVALUATION METHODOLOGIES Identifying Assumption Data Requirements Advantages Disadvantages Randomized Controlled No systematic differences At least post-intervention  Weakest  Spillovers can Trial in unobservable data on randomly assigned identifying invalidate control group characteristics between members of treated & assumption of 4  Might be politically treated & control groups control groups (baseline methods (as long difficult to argue in favor (bolstered by showing that helps make identifying as randomization of randomization there are no systematic assumption plausible) successful)  External validity can differences in observable  When budget be questionable (was the characteristics between prevents program study population treated & control groups) from being representative of the provided to population served at everyone scale?) simultaneously, randomizing (either timing or in the absolute sense of who does and does not get the program) is sometimes viewed as the fairest way to allocate scarce resources Difference-in-Difference Whatever happened to the At least pre- & post- Might be possible to do Identifying assumption is control group over time is intervention cross sections with existing survey data relatively strong (need to what would have including both treated & consider all other possible happened to the treated control (additional pre- differences between the group in the absence of the intervention data helps groups that could have led program make identifying to the observed outcomes) assumption plausible) Regression Discontinuity Those just above the Data on threshold When threshold is built Only identifies program threshold (the treated) and qualification (already into program design there impact for those at the those just below (the exists) & outcomes is no need to do anything threshold control) are otherwise special for evaluation identical Matching After controlling for Rich data on as many Might be possible to do Strong identifying observables, treated & observable characteristics with existing survey data assumption (even if control are not as possible, large sample they’re otherwise identical, systematically different size (so that it is possible why did some get program to find appropriate match) and others not?)