Guest Lecture
Patricio S. Dalton Tilburg University
Centre for Development Studies University of Glasgow
7 March 2019 Who am I?
- Born and raised in Argentina. Big supporter of San Lorenzo (Buenos Aires football team) - 18 years ago I moved to Europe - Studied MSc and PhD in Economics at Warwick (met Theo Koutmeridis and Sayantan Ghosal there) - Associate Professor in Tilburg University, The Netherlands - Research on Behavioral Development Economics & Psychology of Poverty - Theory: Poverty and aspirations, welfare implications of bounded rationality - Lab Experiments: Goals, rationality, stress, hormones, self-confidence, generosity - Lab-in-the-field experiments: Unemployment benefits (Colombia) - RCTs: - Mobile money as payment instrument (Kenya) - Goal setting and productivity (Ghana) - Business practices and aspirations (Indonesia) - Financial worries and risk preferences (Vietnam) - Empowerment and accountability (India) Structure of today’s lecture
Block 1: Conceptual overview of RCTs • What? • Why? • How? • Main challenges
Block 2: Practical example of an RCT
• Dalton, Ruschenpohler, Uras, Zia (2018) “Learning Best Practices from Peers”, Working paper. References
• Banerjee and Duflo (2009). “The Experimental Approach to Development Economics”, Annual Review of Economics, 1:151–78
• Bruhn and McKenzie (2009). “In Pursuit of Balance: Randomization in Practice in Development Field Experiments” American Economic Journal: Applied, 1:4, 200–232
• Duflo, Glennerster and Kremer (2008) “Using Randomization in Development Economics Research: A Toolkit” T. Schultz and John Strauss, eds., Handbook of Development Economics. Vol. 4. Amsterdam and New York: North Holland.
• Harrison and List (2004) “Field Experiments”, Journal of Economic Literature, 42:4, 1009-1055.
• Imbens and Angrist (1994). “Identification and Estimation of Local Average Treatment Effects.” Econometrica, 62:2, 467-475
• List (2011) “Why Economists Should Conduct Field Experiments and 14 Tips for Pulling One Off”, Journal of Economic Perspectives: 25:3, 3-16 Why RCTs? Dogs, sunrise, umbrellas and rain
Observation I: Every morning I let my dog out Observation II: When people use umbrellas, and then shortly after the sun comes up. it rains
Do dogs make the sun rise? Do umbrellas make it rain?
Policy recommendation: Distribute umbrellas in regions with drought! Correlation Fallacy
Correlation fallacy: the logical mistake of believing that because two events occurred together, there is a cause-effect relationship.
• People who play a musical instrument have higher IQ. playing an instrument increases IQ? people with high IQ play an instrument?
• People who speak more languages are wealthier. Wealthier people study more languages? Speaking more languages wealthier?
Policy implications are very different!
Messerli (2012) “Chocolate Consumption, Cognitive Function, and Nobel Laureates” N. Engl J Med, 367:1562-1564 The Problems of Causal Inference
• Causal impact counterfactual • What is the effect of a college degree on future earnings? • How much you guys would earn without a college degree? • How much people who do not have a college degree would earn if they had one? • Comparing people over time (before and after) will not give us, in most cases, a reliable estimate of impact. Why? 1. Unobserved factors affecting earnings may (and will) change while and after receiving the education. 2. Going to college is a decision influenced by you or others. Individuals who go to school may differ from those who do not go, in aspects that also affect future earnings Selection bias Selection Bias
• Selection bias arises when individuals are selected (or self-selected) for treatment based on (typically unobserved) characteristics that may also affect their outcomes. Positive: Those who come to school are more motivated overestimate the effect Negative: Those who come to school are less motivated underestimate the effect • Difficult to disentangle the impact of the treatment from the factors that drove selection. • Selection bias is a problem endemic to retrospective evaluation • Many econometric ways of addressing this problems. For example: • find an instrument Z for the endogenous variable X (IV approach) – 푪풐풓 푿, 풁 ≠ ퟎ – 푪풐풓 풁, 훆 = ퟎ • Assign X randomly (experiment) – Addresses the endogeneity caused by “unobserved factors” and “selection-bias” creates a counterfactual by design Types of experiments (Harrison and List, 2004)
• Conventional lab experiment: students, abstract framing, imposed set of rules • Artefactual field experiment: IDEM conventional but with non-student subject pool • Framed field experiment (or lab-in-the-field): IDEM artefactual but with field context • Natural field experiment (or RCT): IDEM framed but where the environment is one where the subjects naturally undertake these tasks and where the subjects do not know that they are in an experiment.
• RCTs: • To evaluate the impact of existing programs • To test economic model predictions Practicalities of RCTs Overview: Understanding and Analyzing RCTs
1. Ethics Review Board approval (ERB) & Pre-analysis plan registration 2. Timeline of an RCT and Data 3. Randomization • How we actually do the randomization? • Simple vs stratified 4. Analysis of treatment effects • How to estimate the effects of the treatment once you have run the RCT? • ITT (ATE) vs TOT (LATE) • HTEs 5. Precision of estimates • How can we improve the precision of the estimation of treatment effects? • ANCOVA 6. Internal and external validity ERB and PAP
• Ethics Review Board (ERB) or Institutional Review Board (IRB) • Administrative body established to protect the rights and welfare of human research subjects. • All the project (including questionnaires, potential risks, damages, etc.) • How data and privacy will be protected? Informed consent? Data management protocols? • Is it fair to randomize?
• Pre-analysis Plan (PAP) • Register the project before implementing it: Question, method, hypothesis, analysis. • Adds transparency. Separates exploratory from confirmatory analyses • American Economic Association RCT Registry • https://aspredicted.org/ • Berkeley Initiative for Transparency in the Social Sciences (BITSS) RCT Timeline RCT: Data and Timeline
Typical Study
Listing Baseline Randomization Intervention Midline Endline Long-term effects?
• Listing: short census of the population • Baseline survey: typically a 60-90 mins survey (not necessary) • Randomization: assign units (individuals, households, villages, etc) to treatment/control • Midline: short-term (6 months to 12 months) measure of outcome variables (not necessary) • Endline: measure of outcomes after 12 to 24 months • Long-term line: measure of outcomes after 4 or longer years How to randomize? Randomization from scratch: Example
ID Gender Age Income Married Wellbeing 1 0 71 112 1 1 Listing exercise 2 1 79 222 1 2 3 0 78 332 0 1 (e.g. 100 people) 4 1 63 32 1 4 5 1 39 44 0 5 6 1 68 656 1 2 7 1 58 77 0 1 Randomly select 8 1 79 878 1 2 Median age 60 9 1 62 932 0 4 Female 1 20 people to the 10 1 36 104 1 5 11 1 32 1155 1 3 study 12 0 54 123 0 2 13 1 52 1323 1 2 14 1 22 124 1 1 15 0 28 1215 0 4 Baseline survey 16 0 79 1361 0 4 17 1 29 173 1 5 18 1 65 148 1 4 19 1 76 192 0 2 20 1 23 204 1 1 Randomization from scratch
Above Random Create the variables you ID Gender Age median age number may want to use to stratify 1 0 71 1 0.443881712 the randomization later. 2 1 79 1 0.691642531 (Eg: above median age) 3 0 78 1 0.810024944 4 1 63 1 0.9232475 5 1 39 0 0.689051701 6 1 68 1 0.47461034 Assign a random number 7 1 58 0 0.415243898 to each ID, (e.g. using 8 1 79 1 0.182071105 Rand() in excel, uniform() 9 1 62 1 0.399455999 in STATA, runif(1) in R) 10 1 36 0 0.085861319 11 1 32 0 0.992117141 12 0 54 0 0.85518587 13 1 52 0 0.026121019 Note: If you do it in Excel, 14 1 22 0 0.210388031 “Copy and paste” the random 15 0 28 0 0.303809516 numbers as “values” 16 0 79 1 0.756718513 17 1 29 0 0.322047071 18 1 65 1 0.87360388 19 1 76 1 0.666088779 20 1 23 0 0.081817358 Randomization from scratch
Above median Treatment ID Gender Age age Rank Order the data by 13 1 52 0 0.026121019 0 “random number” (e.g. 20 1 23 0 0.081817358 0 ascending order) 10 1 36 0 0.085861319 0 8 1 79 1 0.182071105 0 14 1 22 0 0.210388031 0 15 0 28 0 0.303809516 0 Control 17 1 29 0 0.322047071 0 9 1 62 1 0.399455999 0 Assign the first 10 IDs 7 1 58 0 0.415243898 0 to “Control” (=0) and 1 0 71 1 0.443881712 0 6 1 68 1 0.47461034 1 the second 10 IDs to 19 1 76 1 0.666088779 1 “Treatment” (=1) 5 1 39 0 0.689051701 1 2 1 79 1 0.691642531 1 16 0 79 1 0.756718513 1 3 0 78 1 0.810024944 1 Treatment 12 0 54 0 0.85518587 1 18 1 65 1 0.87360388 1 4 1 63 1 0.9232475 1 11 1 32 0 0.992117141 1 Two types of randomization: Simple or Stratified
Simple randomization • Draw a random number from uniform distribution for each observation • Order the random numbers • Set cut-off points (first half control, second half treatment) Do you see any problem with this simple randomization? • If N is small enough, simple randomization can deliver unbalance samples in characteristics that could be correlated with the expected treatment outcome! Let’s check the results of our randomization:
Control Treatment p-value Fisher-exact test Female = 1 proportion 80% 70% 1 t-test Age (years) mean 46 63.3 0.05*
Do you see any problem? Two types of randomisation: Simple or Stratified
Stratified (block) randomization
Randomization is performed separately within each stratum. Example of a stratum: median age
People with below median age People with above median age
Above Random Treatment Gende Above Random Treatment ID Gender Age median age number ID r Age median age number 13 1 52 0 0.07071747 0 8 1 79 1 0.957940262 1 20 1 23 0 0.417151645 1 9 1 62 1 0.319488351 0 10 1 36 0 0.142811272 0 1 0 71 1 0.079220899 0 14 1 22 0 0.27322448 1 6 1 68 1 0.263201397 0 15 0 28 0 0.004400111 0 19 1 76 1 0.790514337 1 17 1 29 0 0.660553207 1 2 1 79 1 0.305248374 0 7 1 58 0 0.640168547 1 16 0 79 1 0.521213453 1 5 1 39 0 0.071544272 0 3 0 78 1 0.958544166 1 12 0 54 0 0.25921245 0 18 1 65 1 0.543191938 1 11 1 32 0 0.6313839 1 4 1 63 1 0.461254908 0 Two types of randomization: Simple or Stratified
Did this create balance in our sample? Above median Random Treatment ID Gender Age age number Control Treatment p-value 13 1 52 0 0.07071747 0 Female = 1 proportion 80% 70% 1 10 1 36 0 0.142811272 0 Age (years) mean 55.2 54.1 0.9 15 0 28 0 0.004400111 0 5 1 39 0 0.071544272 0 12 0 54 0 0.25921245 0 The sample is balanced across 9 1 62 1 0.319488351 0 Treatment and Control groups in 1 0 71 1 0.079220899 0 both observable variables now! 6 1 68 1 0.263201397 0 2 1 79 1 0.305248374 0 4 1 63 1 0.461254908 0 Thanks to stratified randomization! 20 1 23 0 0.417151645 1 14 1 22 0 0.27322448 1 17 1 29 0 0.660553207 1 You can stratify by more 7 1 58 0 0.640168547 1 variables (e.g. race, initial 11 1 32 0 0.6313839 1 ability, teachers, gender, etc.) 8 1 79 1 0.957940262 1 19 1 76 1 0.790514337 1 16 0 79 1 0.521213453 1 3 0 78 1 0.958544166 1 18 1 65 1 0.543191938 1 Two types of randomization: Simple or Stratified
FAQs on stratified randomization - Which variables should we chose to stratify? Variables we have reasons to believe are strongly correlated with the outcome of interest, or may interact with the treatment effect. - How many variables should we use to stratify? In principle, as many variables as you want, but there is a limit (generally between 1 to 5, depending the N) - How do I use the strata variables in the analysis of the treatment effect? The strata variables should be added as covariates in the regression analysis increase efficiency and power of hypothesis tests. Randomization: Balance Tables in research papers
Balance table What do the researchers do to support the argument that the sample is balanced?
Individual tests (t-tests, chi 2) Joint-orthogonality tests Treat = a + b1*X1 + b2*X2 + b3*X3 + ….+b20*X20 +u And then test the joint hypothesis b1=b2=b3=…=b20=0 Analysis Treatment Effects: Average Treatment Effect (ATE)
ATE or ITT (Intention-to-treat)
Effect of the treatment on all individuals in the study
ITT = E[Y | Treat = 1] – E[Y | Treat = 0]
Average Treatment Effect (ATE)
Regression specification: 푌푖 = 훼 + 훽푇푟푒푎푡푚푒푛푡푖 + 휀푖
푌푖: outcome variable measured at endline Precision of Estimates: ANCOVA Specifications
ANCOVA specification
• Adds the baseline value of the outcome to the regression specification to reduce the variance of the treatment estimator
• This does not affect the expected value of the estimator of b0 (only the variance)
Regression specification Y2(i) = a + b0*Treat(i) + b1*Y1(i) + e(i) Treatment Effects: Threats to the Identification of ATEs
• What do you think could threat the identification of a causal effect in the context of an RCT?
• Think of people who are supposed to take part, but does not… • What different kinds of abstention could you expect in an RCT? • Can abstention be harmful to identification? Treatment Effects: Threats to the Identification of ATEs
Survey non-response Attrition • Respondents can stop or refuse responding to the (end-line) survey due to lack of time, disinterest, mobility, resentment, etc.
• Two separate problems: 1) General loss of statistical power reduces likelihood to find effects, when there are! - Solutions: More N (increase sample or waves), minimize it by design (short surveys, stable people, incentives?) 2) If differential across treatment threat for ITT to identify ATE
1) Address it by 푌푖 = 훼 + 훽푇푟푒푎푡푚푒푛푡푖 + 휀푖 푌푖 = 1 if individual was part of the endline Treatment Effects: Threats to the Identification of ATEs
Partial compliance • People can choose not to take part in the intervention due to lack of time, disinterest, mobility, resentment, etc. • People can switch between experimental groups (treatment dilution)
• Which type of experimental designs may be prone to partial take-up (“by design”)? Encouragement designs! Example: Aspirational movie for Indonesian retail businesses (see Block 2 of this lecture)
- Learn about the characteristics of (non)compliers? 푌푖 = 훼 + 훽푋푖 + 휀푖 푌푖 = 1 if the person complies Treatment Effects: Threats to the Identification of ATEs
Survey attrition and treatment non-compliance • If differential, introduces endogeneity at the stage between assignment and take-up (compliance) or at any later stage (survey attrition)
ITT estimates do not capture causal effect of treatment
• What happens if attrition rates are similar in treatment and comparison groups? Does the problem remain?
• Yes, it remains possible that the attritors were selected differently in the treatment and comparison groups.
• Way to address this: Use “assignment to treatment” as instrument for compliance ToT Treatment Effects: Threats to the Identification of LATEs
Identified parameter: Local Average Treatment Effect (LATE) • Average treatment effect on individuals induced to get treated by assignment (compliers)
• Also: “Average treatment effect on the treated” (ATT/ATET) or “Treatment-on-the-treated” (TOT)
• Regression specification (2SLS) 1) Take up(i) = a + b*Treatment(i) + e(i) 2) Y(i) = a + b*Take up(i) + e(i)
TOT effects examine treatment effects of specific subgroup: Individuals who comply with the treatment Heterogeneous Treatment Effects
Heterogeneous treatment effects What if we have theory that predicts differences between subgroups characterized by some observable characteristic?
Can you imagine an example of a theory predicting heterogeneity in treatment effects?
• Allowing treatment effects to vary by subgroup (defined by covariates)
• Regression specification (interaction term)
Y(i) = a + b0*Treatment(i) + b1*X(i) + b2*(Treatment(i)*X(i)) + e(i)
Typically (but not necessarily) a dummy variable Spill-over effects
Spill-over and general equilibrium effects Treatment effects on some individuals may influence treatment effects on other individuals
Can you imagine an example of a treatment with externality for non-treated individuals?
• Example: Deworming, vaccination, etc.
• How could spill-overs happen? • What could be done? Internal and External Validity
Internal validity
How well an experiment is done especially whether it avoids confounding.
The less chance for confounding in a study, the higher its internal validity.
External validity
• To what extent we can apply the conclusions of the study outside the context of our study?