<<

Guest Lecture

Patricio S. Dalton Tilburg University

Centre for Development Studies University of Glasgow

7 March 2019 Who am I?

- Born and raised in Argentina. Big supporter of San Lorenzo (Buenos Aires football team) - 18 years ago I moved to Europe - Studied MSc and PhD in Economics at Warwick (met Theo Koutmeridis and Sayantan Ghosal there) - Associate Professor in Tilburg University, The Netherlands - Research on Behavioral Development Economics & Psychology of Poverty - Theory: Poverty and aspirations, welfare implications of bounded rationality - Lab : Goals, rationality, stress, hormones, self-confidence, generosity - Lab-in-the-field experiments: Unemployment benefits (Colombia) - RCTs: - Mobile money as payment instrument (Kenya) - Goal setting and productivity (Ghana) - Business practices and aspirations (Indonesia) - Financial worries and risk preferences (Vietnam) - Empowerment and accountability (India) Structure of today’s lecture

Block 1: Conceptual overview of RCTs • What? • Why? • How? • Main challenges

Block 2: Practical example of an RCT

• Dalton, Ruschenpohler, Uras, Zia (2018) “Learning Best Practices from Peers”, Working paper. References

• Banerjee and Duflo (2009). “The Experimental Approach to Development Economics”, Annual Review of Economics, 1:151–78

• Bruhn and McKenzie (2009). “In Pursuit of Balance: Randomization in Practice in Development Field Experiments” American Economic Journal: Applied, 1:4, 200–232

• Duflo, Glennerster and Kremer (2008) “Using Randomization in Development Economics Research: A Toolkit” T. Schultz and John Strauss, eds., Handbook of Development Economics. Vol. 4. Amsterdam and New York: North Holland.

• Harrison and List (2004) “Field Experiments”, Journal of Economic Literature, 42:4, 1009-1055.

• Imbens and Angrist (1994). “Identification and Estimation of Local Average Treatment Effects.” Econometrica, 62:2, 467-475

• List (2011) “Why Economists Should Conduct Field Experiments and 14 Tips for Pulling One Off”, Journal of Economic Perspectives: 25:3, 3-16 Why RCTs? Dogs, sunrise, umbrellas and rain

Observation I: Every morning I let my dog out Observation II: When people use umbrellas, and then shortly after the sun comes up. it rains

Do dogs make the sun rise? Do umbrellas make it rain?

Policy recommendation: Distribute umbrellas in regions with drought! Correlation Fallacy

Correlation fallacy: the logical mistake of believing that because two events occurred together, there is a cause-effect relationship.

• People who play a musical instrument have higher IQ. playing an instrument  increases IQ? people with high IQ  play an instrument?

• People who speak more languages are wealthier. Wealthier people  study more languages? Speaking more languages  wealthier?

Policy implications are very different!

Messerli (2012) “Chocolate Consumption, Cognitive Function, and Nobel Laureates” N. Engl J Med, 367:1562-1564 The Problems of Causal Inference

• Causal impact  counterfactual • What is the effect of a college degree on future earnings? • How much you guys would earn without a college degree? • How much people who do not have a college degree would earn if they had one? • Comparing people over time (before and after) will not give us, in most cases, a reliable estimate of impact. Why? 1. Unobserved factors affecting earnings may (and will) change while and after receiving the education. 2. Going to college is a decision influenced by you or others. Individuals who go to school may differ from those who do not go, in aspects that also affect future earnings  Selection Selection Bias

• Selection bias arises when individuals are selected (or self-selected) for treatment based on (typically unobserved) characteristics that may also affect their outcomes.  Positive: Those who come to school are more motivated  overestimate the effect  Negative: Those who come to school are less motivated  underestimate the effect • Difficult to disentangle the impact of the treatment from the factors that drove selection. • Selection bias is a problem endemic to retrospective evaluation • Many econometric ways of addressing this problems. For example: • find an instrument Z for the endogenous variable X (IV approach) – 푪풐풓 푿, 풁 ≠ ퟎ – 푪풐풓 풁, 훆 = ퟎ • Assign X randomly () – Addresses the endogeneity caused by “unobserved factors” and “selection-bias”  creates a counterfactual by design Types of experiments (Harrison and List, 2004)

• Conventional lab experiment: students, abstract framing, imposed set of rules • Artefactual field experiment: IDEM conventional but with non-student subject pool • Framed field experiment (or lab-in-the-field): IDEM artefactual but with field context • Natural field experiment (or RCT): IDEM framed but where the environment is one where the subjects naturally undertake these tasks and where the subjects do not know that they are in an experiment.

• RCTs: • To evaluate the impact of existing programs • To test economic model predictions Practicalities of RCTs Overview: Understanding and Analyzing RCTs

1. Ethics Review Board approval (ERB) & Pre-analysis plan registration 2. Timeline of an RCT and Data 3. Randomization • How we actually do the randomization? • Simple vs stratified 4. Analysis of treatment effects • How to estimate the effects of the treatment once you have run the RCT? • ITT (ATE) vs TOT (LATE) • HTEs 5. Precision of estimates • How can we improve the precision of the estimation of treatment effects? • ANCOVA 6. Internal and external validity ERB and PAP

• Ethics Review Board (ERB) or Institutional Review Board (IRB) • Administrative body established to protect the rights and welfare of human research subjects. • All the project (including , potential risks, damages, etc.) • How data and privacy will be protected? Informed consent? Data management protocols? • Is it fair to randomize?

• Pre-analysis Plan (PAP) • Register the project before implementing it: Question, method, hypothesis, analysis. • Adds transparency. Separates exploratory from confirmatory analyses • American Economic Association RCT Registry • https://aspredicted.org/ • Berkeley Initiative for Transparency in the Social Sciences (BITSS) RCT Timeline RCT: Data and Timeline

Typical Study

Listing Baseline Randomization Intervention Midline Endline Long-term effects?

• Listing: short of the population • Baseline survey: typically a 60-90 mins survey (not necessary) • Randomization: assign units (individuals, households, villages, etc) to treatment/control • Midline: short-term (6 months to 12 months) measure of outcome variables (not necessary) • Endline: measure of outcomes after 12 to 24 months • Long-term line: measure of outcomes after 4 or longer years How to randomize? Randomization from scratch: Example

ID Gender Age Income Married Wellbeing 1 0 71 112 1 1 Listing exercise 2 1 79 222 1 2 3 0 78 332 0 1 (e.g. 100 people) 4 1 63 32 1 4 5 1 39 44 0 5 6 1 68 656 1 2 7 1 58 77 0 1 Randomly select 8 1 79 878 1 2 age 60 9 1 62 932 0 4 Female 1 20 people to the 10 1 36 104 1 5 11 1 32 1155 1 3 study 12 0 54 123 0 2 13 1 52 1323 1 2 14 1 22 124 1 1 15 0 28 1215 0 4 Baseline survey 16 0 79 1361 0 4 17 1 29 173 1 5 18 1 65 148 1 4 19 1 76 192 0 2 20 1 23 204 1 1 Randomization from scratch

Above Random Create the variables you ID Gender Age median age number may want to use to stratify 1 0 71 1 0.443881712 the randomization later. 2 1 79 1 0.691642531 (Eg: above median age) 3 0 78 1 0.810024944 4 1 63 1 0.9232475 5 1 39 0 0.689051701 6 1 68 1 0.47461034 Assign a random number 7 1 58 0 0.415243898 to each ID, (e.g. using 8 1 79 1 0.182071105 Rand() in excel, uniform() 9 1 62 1 0.399455999 in STATA, runif(1) in R) 10 1 36 0 0.085861319 11 1 32 0 0.992117141 12 0 54 0 0.85518587 13 1 52 0 0.026121019 Note: If you do it in Excel, 14 1 22 0 0.210388031 “Copy and paste” the random 15 0 28 0 0.303809516 numbers as “values” 16 0 79 1 0.756718513 17 1 29 0 0.322047071 18 1 65 1 0.87360388 19 1 76 1 0.666088779 20 1 23 0 0.081817358 Randomization from scratch

Above median Treatment ID Gender Age age Rank Order the data by 13 1 52 0 0.026121019 0 “random number” (e.g. 20 1 23 0 0.081817358 0 ascending order) 10 1 36 0 0.085861319 0 8 1 79 1 0.182071105 0 14 1 22 0 0.210388031 0 15 0 28 0 0.303809516 0 Control 17 1 29 0 0.322047071 0 9 1 62 1 0.399455999 0 Assign the first 10 IDs 7 1 58 0 0.415243898 0 to “Control” (=0) and 1 0 71 1 0.443881712 0 6 1 68 1 0.47461034 1 the second 10 IDs to 19 1 76 1 0.666088779 1 “Treatment” (=1) 5 1 39 0 0.689051701 1 2 1 79 1 0.691642531 1 16 0 79 1 0.756718513 1 3 0 78 1 0.810024944 1 Treatment 12 0 54 0 0.85518587 1 18 1 65 1 0.87360388 1 4 1 63 1 0.9232475 1 11 1 32 0 0.992117141 1 Two types of randomization: Simple or Stratified

Simple randomization • Draw a random number from uniform distribution for each observation • Order the random numbers • Set cut-off points (first half  control, second half  treatment) Do you see any problem with this simple randomization? • If N is small enough, simple randomization can deliver unbalance samples in characteristics that could be correlated with the expected treatment outcome! Let’s check the results of our randomization:

Control Treatment p-value Fisher-exact test Female = 1 proportion 80% 70% 1 t-test Age (years) 46 63.3 0.05*

Do you see any problem? Two types of randomisation: Simple or Stratified

Stratified (block) randomization

 Randomization is performed separately within each stratum. Example of a stratum: median age

People with below median age People with above median age

Above Random Treatment Gende Above Random Treatment ID Gender Age median age number ID r Age median age number 13 1 52 0 0.07071747 0 8 1 79 1 0.957940262 1 20 1 23 0 0.417151645 1 9 1 62 1 0.319488351 0 10 1 36 0 0.142811272 0 1 0 71 1 0.079220899 0 14 1 22 0 0.27322448 1 6 1 68 1 0.263201397 0 15 0 28 0 0.004400111 0 19 1 76 1 0.790514337 1 17 1 29 0 0.660553207 1 2 1 79 1 0.305248374 0 7 1 58 0 0.640168547 1 16 0 79 1 0.521213453 1 5 1 39 0 0.071544272 0 3 0 78 1 0.958544166 1 12 0 54 0 0.25921245 0 18 1 65 1 0.543191938 1 11 1 32 0 0.6313839 1 4 1 63 1 0.461254908 0 Two types of randomization: Simple or Stratified

Did this create balance in our sample? Above median Random Treatment ID Gender Age age number Control Treatment p-value 13 1 52 0 0.07071747 0 Female = 1 proportion 80% 70% 1 10 1 36 0 0.142811272 0 Age (years) mean 55.2 54.1 0.9 15 0 28 0 0.004400111 0 5 1 39 0 0.071544272 0 12 0 54 0 0.25921245 0 The sample is balanced across 9 1 62 1 0.319488351 0 Treatment and Control groups in 1 0 71 1 0.079220899 0 both observable variables now! 6 1 68 1 0.263201397 0 2 1 79 1 0.305248374 0 4 1 63 1 0.461254908 0 Thanks to stratified randomization! 20 1 23 0 0.417151645 1 14 1 22 0 0.27322448 1 17 1 29 0 0.660553207 1 You can stratify by more 7 1 58 0 0.640168547 1 variables (e.g. race, initial 11 1 32 0 0.6313839 1 ability, teachers, gender, etc.) 8 1 79 1 0.957940262 1 19 1 76 1 0.790514337 1 16 0 79 1 0.521213453 1 3 0 78 1 0.958544166 1 18 1 65 1 0.543191938 1 Two types of randomization: Simple or Stratified

FAQs on stratified randomization - Which variables should we chose to stratify? Variables we have reasons to believe are strongly correlated with the outcome of interest, or may interact with the treatment effect. - How many variables should we use to stratify? In principle, as many variables as you want, but there is a limit (generally between 1 to 5, depending the N) - How do I use the strata variables in the analysis of the treatment effect? The strata variables should be added as covariates in the  increase and power of hypothesis tests. Randomization: Balance Tables in research papers

Balance table What do the researchers do to support the argument that the sample is balanced?

Individual tests (t-tests, chi 2) Joint-orthogonality tests Treat = a + b1*X1 + b2*X2 + b3*X3 + ….+b20*X20 +u And then test the joint hypothesis b1=b2=b3=…=b20=0 Analysis Treatment Effects: Average Treatment Effect (ATE)

ATE or ITT (Intention-to-treat)

 Effect of the treatment on all individuals in the study

 ITT = E[Y | Treat = 1] – E[Y | Treat = 0]

Average Treatment Effect (ATE)

 Regression specification: 푌푖 = 훼 + 훽푇푟푒푎푡푚푒푛푡푖 + 휀푖

푌푖: outcome variable measured at endline Precision of Estimates: ANCOVA Specifications

ANCOVA specification

• Adds the baseline value of the outcome to the regression specification to reduce the of the treatment estimator

• This does not affect the expected value of the estimator of b0 (only the variance)

 Regression specification Y2(i) = a + b0*Treat(i) + b1*Y1(i) + e(i) Treatment Effects: Threats to the Identification of ATEs

• What do you think could threat the identification of a causal effect in the context of an RCT?

• Think of people who are supposed to take part, but does not… • What different kinds of abstention could you expect in an RCT? • Can abstention be harmful to identification? Treatment Effects: Threats to the Identification of ATEs

Survey non-response  Attrition • Respondents can stop or refuse responding to the (end-line) survey due to lack of time, disinterest, mobility, resentment, etc.

• Two separate problems: 1) General loss of statistical power  reduces likelihood to find effects, when there are! - Solutions: More N (increase sample or waves), minimize it by design (short surveys, stable people, incentives?) 2) If differential across treatment  threat for ITT to identify ATE

1) Address it by 푌푖 = 훼 + 훽푇푟푒푎푡푚푒푛푡푖 + 휀푖 푌푖 = 1 if individual was part of the endline Treatment Effects: Threats to the Identification of ATEs

Partial compliance • People can choose not to take part in the intervention due to lack of time, disinterest, mobility, resentment, etc. • People can switch between experimental groups (treatment dilution)

• Which type of experimental designs may be prone to partial take-up (“by design”)?  Encouragement designs! Example: Aspirational movie for Indonesian retail businesses (see Block 2 of this lecture)

- Learn about the characteristics of (non)compliers? 푌푖 = 훼 + 훽푋푖 + 휀푖 푌푖 = 1 if the person complies Treatment Effects: Threats to the Identification of ATEs

Survey attrition and treatment non-compliance • If differential, introduces endogeneity at the stage between assignment and take-up (compliance) or at any later stage (survey attrition)

 ITT estimates do not capture causal effect of treatment

• What happens if attrition rates are similar in treatment and comparison groups? Does the problem remain?

• Yes, it remains possible that the attritors were selected differently in the treatment and comparison groups.

• Way to address this: Use “assignment to treatment” as instrument for compliance  ToT Treatment Effects: Threats to the Identification of LATEs

Identified parameter: Local Average Treatment Effect (LATE) • Average treatment effect on individuals induced to get treated by assignment (compliers)

• Also: “Average treatment effect on the treated” (ATT/ATET) or “Treatment-on-the-treated” (TOT)

• Regression specification (2SLS) 1) Take up(i) = a + b*Treatment(i) + e(i) 2) Y(i) = a + b*Take up(i) + e(i)

TOT effects examine treatment effects of specific subgroup: Individuals who comply with the treatment Heterogeneous Treatment Effects

Heterogeneous treatment effects What if we have theory that predicts differences between subgroups characterized by some observable characteristic?

 Can you imagine an example of a theory predicting heterogeneity in treatment effects?

• Allowing treatment effects to vary by subgroup (defined by covariates)

• Regression specification ( term)

Y(i) = a + b0*Treatment(i) + b1*X(i) + b2*(Treatment(i)*X(i)) + e(i)

Typically (but not necessarily) a dummy variable Spill-over effects

Spill-over and general equilibrium effects Treatment effects on some individuals may influence treatment effects on other individuals

Can you imagine an example of a treatment with externality for non-treated individuals?

• Example: Deworming, vaccination, etc.

• How could spill-overs happen? • What could be done? Internal and External Validity

Internal validity

 How well an experiment is done especially whether it avoids .

 The less chance for confounding in a study, the higher its internal validity.

External validity

• To what extent we can apply the conclusions of the study outside the context of our study?