When should we adjust standard errors for clustering ? When should we adjust standard errors A discussion of Abadie for clustering ? et al. 2017 Arthur Heim A discussion of Abadie et al. 2017

Introduction Dealing with PSE Doctoral program: clusters: the usual views

What does Labor & public economics Abadie et al. 2017 change ? Formal results Arthur Heim Conclusions

References

Appendix

October, 2nd 2019

...... When should we adjust standard errors for Outline clustering ? A discussion of Abadie et al. 2017 Arthur Heim 1 Introduction

Introduction Dealing with 2 Dealing with clusters: the usual views clusters: the usual views

What does Abadie et al. 3 What does Abadie et al. 2017 change ? 2017 change ? Formal results 4 Formal results Conclusions References 5 Conclusions Appendix

6 Appendix

...... When should we adjust standard errors for Outline clustering ? A discussion of Abadie et al. 2017 Arthur Heim 1 Introduction

Introduction Dealing with 2 Dealing with clusters: the usual views clusters: the usual views

What does Abadie et al. 3 What does Abadie et al. 2017 change ? 2017 change ? Formal results 4 Formal results Conclusions References 5 Conclusions Appendix

6 Appendix

...... When should we adjust standard errors for Introduction clustering ? A discussion of Abadie et al. 2017 The Clusterjerk1 of every seminar Arthur Heim • “Did you cluster your ?” Introduction

Dealing with • Yet, most of the time, it is not clear whether one should clusters: the usual views cluster or not and on which level of grouping. What does • There is also a big confusion on the role of fixed effects to Abadie et al. 2017 change account for clustering. ?

Formal results

Conclusions Econometricians Haiku from Angrist and Pischke 2008,

References end of chapter one: Appendix T-stats looks too good Try cluster standard errors significance gone.

...... 1. From a debate on Chris Blattman’s blog ...... When should we adjust standard errors for Introduction clustering ? A discussion of Abadie et al. 2017 A simple example Arthur Heim • Imagine you wrote a not-desk-rejected paper estimating a Mincerian Introduction equation using Labor force survey (e.g. Enquête emploi in France):

Dealing with 2 ′ clusters: the Yi = α + δSi + γ1ei + γ2ei + Xi β + εi usual views What does • You are considering whether you should cluster your SE. Abadie et al. 2017 change • Referees strongly encourage you to do so: ?

Formal results 1 Referee 1 tells you “the wage residual is likely to be correlated within local labor markets, so you should cluster your standard errors by Conclusions state or village.” References 2 Referee 2 argues “The wage residual is likely to be correlated for

Appendix people working in the same industry, so you should cluster your standard errors by industry” 3 Referee 3 argues that “the wage residual is likely to be correlated by age cohort, so you should cluster your standard errors by cohort”. • What should you do?

...... When should we adjust standard errors for Introduction clustering ? A discussion of Abadie A slightly more sophisticated one et al. 2017 Arthur Heim • You conduct a field where first, a of120 middle schools are randomly selected to participate in a teacher Introduction training program. Dealing with clusters: the usual views • Second, you randomly select the teachers (whichever school

What does they belong to) who are to participate in the first year. The Abadie et al. 2017 change others represent a control group for the first year. ? • Outcomes are test scores retrieved from national student Formal results assessments and concerns all students from the classrooms Conclusions taughts by these teachers (let’s assume that the students to References teacher assignment is also fairly random) Appendix • Should you cluster SE:

1 Yes/no ? 2 at the teacher level ? 3 at the school level ?

...... When should we adjust standard errors for Introduction clustering ? A discussion of Abadie et al. 2017 Arthur Heim Answer from Abadie et al. 2017: Introduction Dealing with • Whether one should cluster (or not) should not be clusters: the usual views decided based on whether or not it changes something to What does the results. Abadie et al. 2017 change ? • Clustering will almost always matter, even when there

Formal results is no correlation between residuals within cluster and no Conclusions correlation between regressors within cluster. References • Inspecting is not sufficient to determine whether Appendix clustering adjustment is needed.

...... When should we adjust standard errors for Introduction clustering ? A discussion of Abadie et al. 2017 Answer from Abadie et al. 2017: Arthur Heim • There are three rationals for clustering: Introduction 1 Design: The sampling process consists in Dealing with selecting a small share of clusters from a larger clusters: the usual views population of many more clusters. What does • French Labor force survey samples ”grapes” of households Abadie et al. 2017 change 2 Experimental Design: There exist a correlation between ? belonging to a certain cluster and the values of your Formal results variable of interest. Conclusions • Clustered randomized control trials References 3 Heterogenity in treatment effect w.r.t clusters ; Appendix • Different cluster-specific-ATE • Abadie et al. 2017 explain the situations when one should/shouldn’t adjust w.r.t these rationals.

...... When should we adjust standard errors for Outline clustering ? A discussion of Abadie et al. 2017 1 Introduction Arthur Heim

Introduction 2 Dealing with clusters: the usual views Dealing with The textbook case clusters: the usual views The almost forgotten reason for clustering The textbook case The almost Conventional wisdom about standard errors forgotten reason for clustering Conventional wisdom about standard errors 3 What does Abadie et al. 2017 change ? What does Abadie et al. 2017 change ? 4 Formal results Formal results Conclusions 5 Conclusions References

Appendix 6 Appendix

...... When should we adjust standard errors for The textbook case clustering ? A discussion of Abadie et al. 2017 What is usually meant when one talks about clusters Arthur Heim • Most texbooks2 approches the clustering issue as Introduction something close to omitted variable where, the initial model: Dealing with clusters: the ′ usual views Yic = α + X β + µic The textbook case The almost actually hides the fact that the error term µc has a group structure forgotten reason for clustering s.t.: Conventional wisdom about µic = υc + εic standard errors

What does • And thus, estimating the model without accounting for that yields Abadie et al. biased standard errors because E[µ µ ] = ρσ2 > 0 2017 change ic jc µ ? ♭ This presentation, although pedagogical, reinforce the confusion Formal results between fixed effect and clustering.

Conclusions ′ (Yic − Y¯c) = (Xic − X¯c) β + µic − µ¯c References

Appendix

...... 2. For instance Cameron and Trivedi 2005; Angrist and Pischke. .2008. ;. Wooldridge...... 2010. . ;. Wooldridge...... 2012. . . When should we adjust standard errors for The textbook case clustering ? A discussion of Abadie et al. 2017 What is usually meant when one talks about clusters Arthur Heim • The second approach is usually through panel data and especially Dif in Dif Introduction issues. Dealing with • The very influential paper by Bertrand, Duflo, and Mullainathan 2004 clusters: the (QJE) emphasizes the issue of serial correlation in DiD models such as the usual views classic group-time fixed effect estimand: The textbook case The almost forgotten reason for ′ clustering Yict = γc + λt + X β + εict Conventional wisdom about standard errors • The problem is that individuals in a given group are likely to suffer from What does common shocks at some time t such that there is another component hiden Abadie et al. in the error above: 2017 change ? ε = υct + ηict Formal results • If these group-time shocks are (assumed) independents, then the situation Conclusions is closed to the one before and one could cluster by group-time.

References • Yet, this is often not true (e.g. if groups are states or region, a bad situation one period is likely to be bad too the next period) Appendix

...... When should we adjust standard errors for The textbook case clustering ? A discussion The group structure problem of Abadie et al. 2017 Arthur Heim • Heteroskedasticity[ robust] standard errors assume that the × E ′| Introduction (N N) matrix εε X is diagonal, meaning there is no

Dealing with correlation between errors accross observations. Memo clusters: the usual views • This assumption is false in many settings among which: The textbook case The almost forgotten reason for • Non-stationary or panel data clustering Conventional • Identical values of one or more regressors for groups of wisdom about standard errors individuals = clusters What does • ... Abadie et al. 2017 change ? • From a setting where potentially all errors are correlated Formal results together, we cannot use the∑ estimated residuals as in the robust ˆ Conclusions SE (White 1980) (because Xiϵˆi = 0 by construction) References • Hence, one has to allow correlation up to a certain point: in Appendix time (Newey and West 1987), or among members of a group (Kloek 1981; Moulton 1986) ...... When should we adjust standard errors for The textbook case clustering ? A discussion of Abadie et al. 2017 The group structure problem Arthur Heim • Assuming homoskedasticity: Introduction  Dealing with [ ]  0 if Ci ≠ Cj clusters: the 2 E εε|X ≡ Ωij = ρσ if Ci = Cj, i ≠ j usual views  2 The textbook case σ if i = j The almost forgotten reason for clustering • Suppose just 2 groups, this matrix looks something like: Conventional wisdom about   standard errors σ2 ··· ρσ2 0 ··· 0  (1,1)1 (1,n1)1  What does  ......  Abadie et al.  ......  2017 change  2 2  ? ρσ ··· σ 0 ··· 0  (n1,1)1 (n1,n1)1 Ωij =  2 2  Formal results  0 ··· 0 σ ··· ρσ   (n1+1,n1+1)2 (n1+1,N)2   ......  Conclusions  ......  References ··· 2 ··· 2 0 0 ρσ(N,1)2 σ(N,N)2 Appendix

...... When should we adjust standard errors for The textbook case clustering ? A discussion of Abadie et al. 2017 Assuming homoskedasticity & group size Arthur Heim • Assuming homoskedasticity & same group size: Introduction ( ) N Dealing with Vkloek(βˆ|X) = VOLS × 1 + ρερX (1) clusters: the C usual views The textbook case • Where ρε is the within cluster correlation of the errors The almost forgotten reason for clustering • Where ρX is the within cluster correlation of the regressors Conventional wisdom about standard errors What does Relaxing homoskedasticity Abadie et al. 2017 change ? • The cluster adjustment by Liang and Zeger 1986 used in most Formal results statistical packages: Conclusions ( ) ( ∑C )( ) ˆ ′ −1 ′ ′ −1 References VLZ(β|X) = X X XcΩcXc X X (2) Appendix c=1

...... When should we adjust standard errors for The textbook case clustering ? A discussion of Abadie et al. 2017 Estimated versions Arthur Heim • The estimated version of the so called robust (EHW) is:

Introduction ( ∑N ) ( )− ( )− Dealing with ˆ ˆ ′ 1 ˆ′ 2 ′ ′ 1 VEWH(β) = X X (Yi − β Xi) XiX X X (3) clusters: the i usual views i=1 The textbook case The almost • The estimated version of the cluster robust (LZ) variance is: forgotten reason for clustering Conventional wisdom about ( ) ( ∑C ( ∑ ) standard errors ˆ ˆ ′ −1 ˆ′ ′ VLZ(β) = X X (Yi − β Xi)Xi What does | {z } c=1 i:Ci=c Abadie et al. εˆX 2017 change i ? ( ∑ ′ ) )( ) ′ ′ ′ −1 (Y − βˆ X )Xi X X (4) Formal results | i {z i } i:Ci=c Conclusions εˆXi

References • These are the main used by applied researchers between which Appendix one has to choose.

...... When should we adjust standard errors for The almost forgotten reason clustering ? A discussion for clustering of Abadie et al. 2017 Arthur Heim ”How were your data collected ?” Introduction

Dealing with • ”Textbook cases” discussed before are what one may call clusters: the usual views ”model-based” cases for clustering The textbook case The almost • These examples implicitely assume that data are collected forgotten reason for clustering randomly, or randomly enough. Conventional wisdom about standard errors • However, surveys often use more sophisticated sampling What does Abadie et al. methods with nested structures (e.g. sampling cities, then 2017 change ? neighborhoods, then households), stratification and/or

Formal results weightings. Conclusions The first clustering issue should be survey design effect References ⇒ Clustering at the primary survey unit (PSU) at the Appendix minimum.

...... When should we adjust standard errors for Conventional wisdom about clustering ? A discussion standard errors of Abadie et al. 2017 Arthur Heim When to cluster according to Colin Cameron and Miller

Introduction 2015 Dealing with • Equation (1) while restrictive shows that the inflation factor increases in: clusters: the usual views • The within-cluster correlation of the regressors ρX The textbook case • The within-cluster correlation of the error ρϵ The almost forgotten reason for • The number of observations in each cluster clustering Conventional • Consequently one could think clustering does not change a thing if either wisdom about standard errors ρX = 0 or ρϵ = 0 What does • It has been shown by Moulton 1990 that the inflation factor can be large Abadie et al. despite very small correlation. 2017 change ? • Colin Cameron and Miller 2015 basically say that whenever there is a Formal results reason to believe that there is some correlation within some groups, one should cluster. Conclusions

References • “The consensus is to be conservative and avoid bias and to use bigger and Appendix more aggregate clusters when possible”. (p. 333)

...... When should we adjust standard errors for Conventional wisdom about clustering ? A discussion standard errors of Abadie et al. 2017 Arthur Heim

Introduction When to cluster according to Colin Cameron and Miller Dealing with clusters: the 2015 usual views The textbook case “There are settings where one may not need to use The almost forgotten reason for clustering cluster-robust standard errors. We outline several Conventional wisdom about though note that in all these cases it is always standard errors possible to still obtain cluster-robust standard errors What does Abadie et al. and contrast them to default standard errors. If there 2017 change ? is an appreciable difference, then use cluster robust Formal results standard errors”. (p.334) Conclusions

References

Appendix

...... When should we adjust standard errors for Outline clustering ? A discussion of Abadie et al. 2017 1 Introduction Arthur Heim

Introduction 2 Dealing with clusters: the usual views Dealing with clusters: the usual views 3 What does Abadie et al. 2017 change ? What does Abadie et al. Clustering matters, yes, so what ? 2017 change ? So it’s not because you can cluster (and it matters) that Clustering matters, yes, so what ? you should cluster So it’s not because you can cluster (and it matters) that you should cluster 4 Formal results Formal results

Conclusions

References 5 Conclusions

Appendix 6 Appendix

...... When should we adjust standard errors for Clustering matters, yes, so clustering ? A discussion what ? of Abadie et al. 2017 Arthur Heim

Introduction One misconception according to Abadie et al. 2017 Dealing with clusters: the usual views • Because of the formula (1), it is often thought that What does clustering “does not matter” if either ρX = 0 or ρε = 0 Abadie et al. 2017 change • Thus, adjusting for cluster wouldn’t change a thing in ? Clustering matters, situation such as: yes, so what ? So it’s not because • Individual randomized control trials you can cluster (and it matters) that you should cluster • Adding cluster fixed effects to the regression Formal results • Using simulated data, they show that clustering does Conclusions affect estimated standard errors in this setting. References

Appendix

...... When should we adjust standard errors for Clustering matters, yes, so clustering ? A discussion what ? of Abadie et al. 2017 Arthur Heim Example data (not knowing the DGP)

Introduction • Sample: N= 100 323 with 100 clusters and ≈ 1 000 units Dealing with clusters: the per cluster. usual views 1 What does • Observe Yic, Wic = (treated), Cic Abadie et al. 2017 change • Estimate Yi = α + τWi + ϵi by OLS. ? Clustering matters, yes, so what ? So it’s not because you can cluster (and First result it matters) that you should cluster

Formal results ρˆεˆ = 0.001ρ ˆWˆ = 0.001 Conclusions

References Appendix • Correlations are essentially 0, hence, the correction should not have an impact.

...... When should we adjust standard errors for Clustering matters, yes, so clustering ? A discussion what ? of Abadie et al. 2017 Cluster adjustment matters (1) Arthur Heim ˆ ˆ Introduction τˆLS = −0.120 SEEHW = 0.004 SELZ = 0.100

Dealing with clusters: the usual views • Adjusting for cluster matters a lot. ⇒ Inspecting within cluster correlation

What does is not enough to determine whether adjusting SE would matter. Abadie et al. • Indeed, the LZ adjustment relies on something else: 2017 change ? Clustering matters, ( C yes, so what ? ( ) ∑ ( ∑ ) ˆ ˆ ′ −1 ˆ′ ′ So it’s not because VLZ(β) = X X (Yi − β Xi)Xi you can cluster (and | {z } it matters) that you c=1 i:Ci=c should cluster εˆXi Formal results ( ∑ ′ ) )( ) ˆ′ ′ ′ −1 (Yi − β Xi)Xi X X Conclusions | {z } i:Ci=c References εˆXi

Appendix • What matters for the adjustment is the within-cluster correlation of the product of the residuals and the regressors.

• Here, ρεˆW = 0.500 ...... When should we adjust standard errors for Clustering matters, yes, so clustering ? A discussion what ? of Abadie et al. 2017 Arthur Heim

Introduction

Dealing with Cluster adjustment matters (2) ∑ clusters: the C usual views Estimating the fixed effect model: Yi = τWi + c=1 αcCic + εi What does ˆ ˆ Abadie et al. τˆFE = −0.120 SEEHW = 0.003 SELZ = 0.243 2017 change ? Clustering matters, yes, so what ? So it’s not because • Adding fixed effect did not change the point estimate, but increased you can cluster (and it matters) that you precision (as one would expect) of the EHW robust SE. should cluster

Formal results • Clustering however matters a lot here too ⇒ Adding fixed effect does

Conclusions not necessary fix the clustering issue.

References

Appendix

...... When should we adjust standard errors for So it’s not because you can clustering ? A discussion cluster (and it matters) that of Abadie et al. 2017 you should cluster Arthur Heim

Introduction If we were to follow Colin Cameron and Miller 2015 Dealing with clusters: the • We would cluster everything in the previous example. usual views • Abadie et al. 2017 disagree and illustrate with another example What does Abadie et al. 2017 change ? Data generating process Clustering matters, yes, so what ? So it’s not because • General population of 10 million units, 100 clusters of 10 000 units in each. you can cluster (and it matters) that you • Here, W is assigned at random with probability p=1/2. should cluster i

Formal results • Treatment effect is heterogenous w.r.t. clusters such that:

Conclusions { −1 for half of the clusters τ = References c 1 for the other half Appendix • Error term ∼ N 0, 1) and ATE=0.

...... When should we adjust standard errors for So it’s not because you can clustering ? A discussion cluster (and it matters) that of Abadie et al. 2017 you should cluster Arthur Heim

Introduction

Dealing with Which SE is correct ? clusters: the usual views • Draw random samples 10 000 times with sampling What does Abadie et al. probability = 1 % and estimate the models. 2017 change ? Clustering matters, Table: Standard errors and coverage rates of random samplings yes, so what ? So it’s not because you can cluster (and it matters) that you Simple OLS Fixed effect should cluster EHW Variance LZ Variance EHW Variance LZ Variance (SE) cov rate (SE) cov rate (SE) cov rate (SE) cov rate Formal results 0.007 0.950 0.051 1.000 0.007 0.950 0.131 0.986 Conclusions References • The correct standard error is EHW as it rejects the Appendix appropriate proportion of type 1 error.

...... When should we adjust standard errors for So it’s not because you can clustering ? A discussion cluster (and it matters) that of Abadie et al. 2017 you should cluster Arthur Heim

Introduction Why the differences ? How to choose ? Dealing with clusters: the • Given , both errors are corrects but for different usual views estimands. What does Abadie et al. • EHW assumes that the population is randomly selected from the 2017 change ? relevant population (which is the case here) Clustering matters, yes, so what ? • LZ assumes that the clusters here are a sample of more clusters in So it’s not because you can cluster (and the main population. it matters) that you should cluster • This assumption is often implicit in the textbook cases but has Formal results important consequences. Conclusions • More obvious in the sampling design literature (e.g. French Labor References force survey) Appendix ⇒ One cannot tell from the data itself whether other clusters exist in the full population...... When should we adjust standard errors for Outline clustering ? A discussion of Abadie et al. 2017 1 Introduction Arthur Heim

Introduction 2 Dealing with clusters: the usual views Dealing with clusters: the usual views 3 What does Abadie et al. 2017 change ? What does Abadie et al. 2017 change ? 4 Formal results Formal results Conceptual framework Conceptual framework Sampling and assignment Sampling and assignment Estimators Estimators

Conclusions References 5 Conclusions Appendix

6 Appendix

...... When should we adjust standard errors for Conceptual framework clustering ? A discussion of Abadie et al. 2017 Sequence of population Arthur Heim • Sequence of populations defined by M units and C clusters ; M is Introduction n n n strictly increasing and Cn is weakly increasing in n. Dealing with clusters: the • Rubin’s causal framework with 2 potential outcomes for each usual views individual: Yin(1) ; Yin(0). What does Abadie et al. • 2 treatment specific errors: 2017 change ? n 1 ∑ Formal results εin(W) = Yin(W) − Yjn(W) for W = 0, 1. Conceptual n framework j=1 Sampling and assignment Estimators • Main interest lies in the n-population’s average treatment effect:

Conclusions n ( ) 1 ∑ References τn = Yin(1) − Yin(0) = Y¯n(1) − Y¯n(0) (5) Mn Appendix i=1

...... When should we adjust standard errors for Sampling and assignment clustering ? A discussion of Abadie Sampling process et al. 2017 Arthur Heim • Define a variable Rin = 1(sampled) such that we observe the triplet (Yin, Win, Cin) if Rin = 1 and nothing otherwise. Introduction ∑ Mn Dealing with • For a population Mn, we observe a sample of size N = i=1 Rin. clusters: the usual views • Rin ⊥ Yin(1) ; Yin(0) What does Abadie et al. • 2 stages design: 2017 change ? • Clusters are sampled with probability PCn • Formal results Individuals are sampled in the selected clusters with PUn Conceptual framework • Probability of person in being sampled is PCn PUn . Sampling and assignment • Both probability may be equal to 1, or close to 0: Estimators

Conclusions

References ≈ PCn = 1 PCn 0

Appendix PUn = 1 full population sample everyone in few clusters

PUn = 0 random sample few units from few clusters

...... When should we adjust standard errors for Sampling and assignment clustering ? A discussion of Abadie et al. 2017 Treatment assignment Arthur Heim • Treatment assignment is also a 2 stages process: Introduction

Dealing with • First Stage For cluster c in population n, an assignment probability clusters: the 1 is drawn randomly from a distrubtion f(·) with µn = and usual views 2 2 variance σn ≥ 0. What does Abadie et al. 2 • If σn = 0, we have pure random assignment. 2017 change 2 ? • If σn > 0, we have correlated assignment within the cluster. 2 1 ∈ { } Formal results • Special case: σn = 4 then qcn 0, 1 all units within a cluster Conceptual have identical assignments. framework Sampling and assignment • Second stage: each individual within a cluster c is assigned to Estimators treatment independently with cluster-specific probability qcn Conclusions 2 • Translation: If σn > 0, individuals from a cluster are all either more References likely or less likely to be treated than average. Thus, there is a Appendix correlation between treatment assignment and being in a cluster.

...... When should we adjust standard errors for Estimators clustering ? A discussion of Abadie OLS of the treatment effect et al. 2017 Arthur Heim • The least square estimator of τn is: ∑ Introduction n Rin(Win − W¯ n)Yin Dealing with ∑i=1 ¯ − ¯ τˆ = n = Yn(1) Yn(2) clusters: the R (W − W¯ )2 usual views i=1 in in n

What does Abadie et al. • What matters is the estimation of the variance of τˆ 2017 change √ ? • By definition, the true variance is Nn(ˆτ − τn) Formal results Conceptual • Using large sample proporties, the authors show: framework Sampling and assignment √ ∑Mn Estimators 2 N (ˆτ − τ ) − √ R (2W − 1)ε = o (1) Conclusions n n in in in p MnPCn PUn i=1 References | {z }

Appendix linear approximation (6)

...... When should we adjust standard errors for Estimators clustering ? A discussion of Abadie et al. 2017 Arthur Heim

Introduction Properties of the linear approximation of the variance

Dealing with clusters: the usual views ∑Mn What does 2 Abadie et al. ηn = √ ηin With ηin = Rin(2Win − 1)εin (7) 2017 change MnPC PU ? n n i=1

Formal results Conceptual framework • They calculate the exact variance of η for various values Sampling and n assignment Estimators of the parameters and the corresponding EHW and LZ

Conclusions estimator.

References

Appendix

...... When should we adjust standard errors for Estimators clustering ? A discussion of Abadie et al. 2017 Arthur Heim Proposition 1-i

Introduction

Dealing with ∑Mn [ ( ) ] clusters: the 1 V[η ] = 2 ε (1)2 + ε (0)2 − P (ε (1) − ε (0))2 + 4P σ2(ε (1) − ε (0))2 usual views n in in Un in in Un n in in Mn i=1 What does P ∑Cn [ ( ) ( ) ] Abadie et al. Un 2 − − 2 2 2 + Mcn (1 PCn ) ε¯cn(1) ε¯cn(0) + 4σn ε¯cn(1) +ε ¯cn(0) (8) 2017 change Mn c=1 ?

Formal results Conceptual • The first sum in this formula is approximately the EHW Variance. framework ( ) Sampling and ∑ 2 2 assignment N εin(1) +εin(0) • if PU ≈ 0, the firt term simplifies to VEHW = Estimators n i=1 Mn Conclusions • The second term captures the effects of clustered sampling and References assignment on variance. Appendix

...... When should we adjust standard errors for Estimators clustering ? A discussion of Abadie et al. 2017 Arthur Heim Proposition 1-i Introduction

Dealing with clusters: the M 1 ∑n [ ( ) ] usual views V 2 2 − − 2 2 − 2 [ηn] = 2 εin(1) + εin(0) PUn (εin(1) εin(0)) + 4PUn σn (εin(1) εin(0)) Mn i=1 What does Abadie et al. ∑Cn [ ( ) ( ) ] PUn 2 2 2 2 2017 change + M (1 − P ) ε¯cn(1) − ε¯cn(0) + 4 σ ε¯cn(1) +ε ¯cn(0) (9) cn Cn |{z}n ? Mn c=1 | {z } sampling assignment Formal results Conceptual framework Sampling and • First part of the second sum disapears if PCn = 1, that is, if we have all assignment clusters in the sample (e.g. in a pure random assignment) Estimators 2 Conclusions • Second part of the second sum disapears if σn = 0 if there is no correlation between assignment to treatment and clustering. References

Appendix

...... When should we adjust standard errors for Estimators clustering ? A discussion of Abadie et al. 2017 Arthur Heim Proposition 1-ii

Introduction • The difference between the correct variance and the limit of the Dealing with clusters: the normalized LZ variance estimator is: usual views C P P ∑n ) What does V − V Cu Un 2 − 2 ≥ Abadie et al. LZ [ηn] = Mcn(¯εcn(1) ε¯cn(0) 0 (10) Mn 2017 change c=1 ? • Formal results LZ variance captures correctly the component due to cluster Conceptual assignment but performs poorly for the clustering due to sampling framework ≈ Sampling and design unless PCn 0 assignment Estimators • Due to the assumption that the sampled cluster are a small Conclusions proportion of the population of clusters which explain why the LZ

References estimator and the true variance are proportional to PCn .

Appendix

...... When should we adjust standard errors for Estimators clustering ? A discussion of Abadie et al. 2017 Proposition 1-iii Arthur Heim • The difference between the limit of the normalized LZ and theEHW variance estimator is: Introduction

Dealing with M [( ) ( ) ] clusters: the − ∑n 2 2 2PUn 2 usual views V − V = εin(1) − εin(0) + 4σ εin(1) + εin(0) LZ EHW M n n i=1 What does Abadie et al. ∑Cn [( ) ( ) ] 2017 change PUn 2 2 2 2 + M ε¯cn(1) − ε¯cn(0) + 4σ ε¯cn(1) +ε ¯cn(0) (11) ? M cn n n c=1 Formal results Conceptual framework • This part show when adjusting with LZ makes a difference with EHW Sampling and assignment • First sum is small relative to the second part if there is a large number of Estimators unit per cluster relative to the number of cluster. Conclusions • If the number of unit per cluster is constant (Mn/Cn) and large compared Mn References to the number of clusters, the second sum is proportional to 2 and large Cn Appendix relative to the first sum. ⋆ This is how the generated data in the 1st example.

...... When should we adjust standard errors for Estimators clustering ? A discussion of Abadie et al. 2017 Proposition 1-iii Arthur Heim

Introduction • In the case where the number of individuals in each cluster

Dealing with is large relative to the number of clusters, the clustering clusters: the usual views matters if there is heterogeneity of treatment accross

What does clusters or if there is cluster assignment. Abadie et al. 2017 change • This comes from the fact that ε¯cn(1) − ε¯cn(0) = τcn − τn ? Proof Formal results Conceptual framework • Looking at the second sum only: Sampling and assignment Estimators ∑Cn [ ( ) ( ) PU 2 2 Conclusions n M2 ε¯ (1) − ε¯ (0) + 4σ2 ε¯ (1)+¯ε (0) M cn | cn {z cn } |{z}n cn cn References n c=1 Heterogeneity assignment Appendix

...... When should we adjust standard errors for Estimators clustering ? A discussion of Abadie et al. 2017 Arthur Heim

Introduction Corollary 1: When we don’t need to cluster Dealing with clusters: the • usual views There is no need for clustering in two situations: ∀ What does 1 There is no clustering in the sampling (PCn = 1 n) and Abadie et al. there is no clustering in the assignment (σ2 = 0) 2017 change n ? 2 There is no heterogenity of treatment Formal results (Yin(1) − Yin(0) = τ ∀ i) and there is no clustering Conceptual 2 framework assignment (σ = 0) Sampling and assignment Estimators • Corollary 1 is a special case of Proposition 1-i.

Conclusions

References

Appendix

...... When should we adjust standard errors for Estimators clustering ? A discussion of Abadie et al. 2017 Arthur Heim Corollary 2: When LZ correction is correct

Introduction • One can use LZ variance estimation to adjust clustering if: Dealing with − ∀ clusters: the 1 There is no heterogenity of treatment (Yin(1) Yin(0) = τ i) ≈ ∀ usual views 2 (PCn 0 n) i.e. We only observe few clusters from the total population. What does Abadie et al. 3 PUn is close to 0 so that there is at most one sampled unit per cluster 2017 change (in which case clustering adjustment do not matter but the PSU is a ? level higher) Formal results Conceptual • Corollary 2 emerges from P1-ii with important restrictions. framework Sampling and assignment • 1) is not likely to hold in general Estimators • 2) cannot be assessed using the actual data. One has to know the Conclusions sampling conditions. • If one concludes that all clusters are included, then LZ is in general References too conservative. Appendix

...... When should we adjust standard errors for Estimators clustering ? A discussion of Abadie et al. 2017 Arthur Heim Clever idea: Using heterogeneity Introduction • In a situation where all clusters are included, LZ is too conservative Dealing with clusters: the • If the assignment is perfectly correlated within the cluster, there is usual views nothing much to do. What does Abadie et al. • If there is variation in the treatment within clusters, one can estimate 2017 change ? VLZ − V[ηn] and substract that from VLZ using again that − − Formal results ε¯cn(1) ε¯cn(0) = τcn τn. Conceptual framework • The proposed cluster-adjusted variance estimator is then: Sampling and assignment Estimators C 1 ∑ Vˆ (ˆτ) = Vˆ (ˆτ) − N2(ˆτ − τˆ ) (12) Conclusions CA LZ N2 c cn n c=1 References

Appendix

...... When should we adjust standard errors for Outline clustering ? A discussion of Abadie et al. 2017 Arthur Heim 1 Introduction

Introduction Dealing with 2 Dealing with clusters: the usual views clusters: the usual views

What does Abadie et al. 3 What does Abadie et al. 2017 change ? 2017 change ? Formal results 4 Formal results Conclusions References 5 Conclusions Appendix

6 Appendix

...... When should we adjust standard errors for Conclusions clustering ? A discussion of Abadie • Adjusting SE for clustering effect is often misunderstood et al. 2017 Arthur Heim • Usual recommandations are often too conservatives • We should cluster: Introduction • In the presence of heterogenous treatment effect and small Dealing with clusters: the number of clusters compared to the overall population usual views • when there is correlation between treatment and clusters What does (cluster assignment) Abadie et al. 2017 change ? • We should not cluster:

Formal results • In pure randomized control trial (or any situation without

Conclusions sampling clustering or assignment clustering) • when there is constant treatment effect and no clustering in the References assignment. Appendix ♭ Convincing model but specific to the stated configurations. ♭ Less usefull for less RCT-like designs (e.g. the infamous serial correlation in DID)

Back to intro examples

...... When should we adjust standard errors for Bibliography I clustering ? A discussion of Abadie et al. 2017 Abadie, Alberto, Susan Athey, Guido Imbens, and Jeffrey Wooldridge. 2017. When Should You Adjust Standard Errors for Clustering? Working paper. October 8. http://arxiv.org/abs/1710.02926. Arthur Heim

Introduction Angrist, Joshua D., and Jörn-Steffen Pischke. 2008. Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton University Press. Dealing with clusters: the usual views Bertrand, Marianne, Esther Duflo, and Sendhil Mullainathan. 2004. “How Much Should WeTrust What does Differences-in-Differences Estimates?” The Quarterly Journal of Economics 119 (1): 249–275. Abadie et al. 2017 change Cameron, A Colin, and Pravin K Trivedi. 2005. Microeconometrics : Methods and Applications. Cambridge ? University Press. Formal results

Conclusions Colin Cameron, A., and Douglas L. Miller. 2015. “A Practitioner’s Guide to Cluster-Robust Inference.” Journal of Human Resources 50 (2): 317–372. issn: 0022-166X, 1548-8004. References doi:10.3368/jhr.50.2.317. http://jhr.uwpress.org/lookup/doi/10.3368/jhr.50.2.317. Appendix Kloek, Tuenis. 1981. “OLS Estimation in a Model Where a Microvariable Is Explained by Aggregates and Contemporaneous Disturbances Are Equicorrelated.” Econometrica 49 (1): 205–207.

Liang, Kung-Yee, and Scott L. Zeger. 1986. “Longitudinal Data Analysis Using Generalized Linear Models.” Biometrika 73 (1): 13–22.

...... When should we adjust standard errors for Bibliography II clustering ? A discussion of Abadie et al. 2017 Moulton, Brent. 1986. “Random Group Effects and the Precision of Regression Estimates.” Journal of Arthur Heim Econometrics 32 (3): 385–397.

Introduction Moulton, Brent R. 1990. “An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Dealing with Micro Units Vol. 72, No. 2 (May, 1990), Pp. 334-338.” The review of Economics and Statistics 72 clusters: the (2): 334–338. usual views

What does Newey, Whitney K., and Kenneth D. West. 1987. “A Simple, Positive Semi-Definite, Heteroskedasticity and Abadie et al. 2017 change Consistent Matrix.” Econometrica 55, no. 3 (May): 703. issn: 00129682. ? doi:10.2307/1913610. https://www.jstor.org/stable/1913610?origin=crossref.

Formal results White, Albert. 1980. “A Heteroskedasticity-Consistent Estimator and a Discret Test for Conclusions Heteroskedasticiy.” Econometrica 48 (4): 817–838. References

Appendix Wooldridge, Jeffrey M. 2010. Econometric Analysis of Cross Section and Panel Data. MIT Press.

Wooldridge, Jeffrey M. 2012. “Introductory Econometrics: A Modern Approach”: 910.

...... When should we adjust standard errors for Outline clustering ? A discussion of Abadie et al. 2017 Arthur Heim 1 Introduction

Introduction Dealing with 2 Dealing with clusters: the usual views clusters: the usual views

What does Abadie et al. 3 What does Abadie et al. 2017 change ? 2017 change ? Formal results 4 Formal results Conclusions References 5 Conclusions Appendix

6 Appendix

...... When should we adjust standard errors for Outline clustering ? A discussion of Abadie et al. 2017 Arthur Heim

A friendly memo 7 A friendly memo Estimations depend Errors and residuals on error ! Estimating the Estimations depend on error ! variance-covariance matrix of βˆ Estimating the variance-covariance matrix of βˆ Proof that ε¯cn(1) − ε¯cn(0) = − − τcn − τn 8 Proof that ε¯cn(1) ε¯cn(0) = τcn τn

...... When should we adjust standard errors for Errors and residuals clustering ? A discussion of Abadie et al. 2017 We sometimes get confused... Arthur Heim • Errors are the vertical distances between observations and A friendly memo the unknown Function (CEF). Errors and residuals Estimations depend Therefore, they are unknown. on error ! Estimating the • variance-covariance Residuals are the vertical distances between observations matrix of βˆ and the estimated regression function. Therefore, they are Proof that ε¯cn(1) − known. ε¯cn(0) = − 3 τcn τn • Errors comme from the CEF decomposition property :

Yi = E[Yi|Xi] + εi

where εi is mean independent of Xi and is therefore uncorrelated with any function of Xi

...... 3. Angrist and Pischke 2008, Theorem 3.1.1 ...... When should we adjust standard errors for Errors and residuals clustering ? A discussion We sometime get confused... of Abadie et al. 2017 Arthur Heim • Errors represent the difference between the outcome and the true conditional mean. A friendly memo • In matrix notation: Errors and residuals Estimations depend on error ! Estimating the Y =Xβ + ε variance-covariance matrix of βˆ ε =Y − Xβ Proof that ε¯cn(1) − ε¯cn(0) = τcn − τn • Residuals represent the difference between the outcome and the estimated average. • In matrix notation:

Y =Xβˆ + ϵˆ ϵˆ =Y − Xβˆ ...... When should we adjust standard errors for Estimations depend on error ! clustering ? A discussion of Abadie et al. 2017 Arthur Heim

A friendly OLS Estimand as seen in class memo Errors and residuals Estimations depend on error ! ′ − ′ Estimating the ˆ 1 variance-covariance βOLS = (X X) X Y matrix of βˆ ′ −1 ′ Proof that = (X X) X Xβ + ε ε¯cn(1) − ′ −1 ′ ε¯cn(0) = = β + (X X) X ε τcn − τn

ˆ βOLS is known to be unbiased but its variance depends on the unknown error.

...... When should we adjust standard errors for Estimations depend on error ! clustering ? A discussion of Abadie et al. 2017 Variance of βˆ as seen in class Arthur Heim

A friendly memo V[βˆ|X] = E[(βˆ − β)(βˆ − β)′|X] Errors and residuals [ ] Estimations depend ′ − ′ ′ − ′ ′ on error ! E 1 1 | Estimating the = [X X] [X ε]([X X] [X ε]) X variance-covariance matrix of βˆ [ ] ′ −1 ′ ′ ′ −1 Proof that = E [X X] X εε X[X X] |X ε¯cn(1) − ε¯cn(0) = τcn − τn

Which give us the variance covariance matrix of the betas: [ ] V[βˆ|X] = [X′X]−1E X′εε′X|X [X′X]−1 (13)

Back

...... When should we adjust standard errors for Estimations depend on error ! clustering ? A discussion of Abadie et al. 2017 Arthur Heim What does this matrix looks like A friendly memo Without clusters: Errors and residuals   Estimations depend ˆ ˆ ˆ ˆ ˆ on error ! V[β0|X] cov(β0, β1|X) ··· cov(β0, βP|X) Estimating the   variance-covariance ˆ ˆ | V ˆ | ··· ˆ ˆ | matrix of βˆ cov(β1, β0 X) [β1 X] cov(β1, βP X) V[βˆ|X] =   Proof that  . . .. .  ε¯cn(1) − . . . . ε¯cn(0) = ˆ ˆ | ˆ ˆ | ··· V ˆ | τcn − τn cov(βP, β0 X) cov(βP, β1 X) [βP X]

This matrix is not identified and we need either some extra assumptions such as homoskedasticity and/or no serial correlation to estimate it.

...... When should we adjust standard errors for Estimating the clustering ? A discussion ˆ of Abadie variance-covariance matrix of β et al. 2017 Arthur Heim Under homoskedasticity and no serial correlation If we assume that the correlation between errors is null and that the errors’ A friendly variance is constant, that is: memo   Errors and residuals 2 ··· Estimations depend σ 0 0 on error ! [ ]  2 ···   σ 0  Estimating the E ′|   2 variance-covariance εε X = . . . . = σ I[N.N] ˆ  . . . .  matrix of β . . . . Proof that 0 0 ··· σ2 ε¯cn(1) − ε¯cn(0) = τcn − τn Then the variance-covariance matrix of betas simplifies a lot:

ˆ ′ −1 2 ′ −1 Vhomo[β|X] = [X X] σ I[X X] = σ2[X′X]−1[X′X]−1 = σ2[X′X]−1

The estimated variance of the error term: ∑ 1 N ϵˆ2 σˆ2 = ϵˆϵˆ′ = i=1 i − −...... n p N . . p...... When should we adjust standard errors for Estimating the clustering ? A discussion ˆ of Abadie variance-covariance matrix of β et al. 2017 Arthur Heim Allowing Heteroskedasticity If we assume that the correlation between errors is null but that the errors’ A friendly variance is heterogenous, that is: memo   Errors and residuals 2 ··· Estimations depend σ1 0 0 on error ! [ ]  2 ···  Estimating the  σ2 0  variance-covariance E | ˆ εε X =  . . .  matrix of β  .  . . .. . Proof that ··· 2 ε¯cn(1) − 0 0 σn ε¯cn(0) = τcn − τn There is now n different and the variance of the coefficient simplifies: ˆ ′ −1 ′ 2 ˆ ′ −1 VEHW(β|X) = (X X) X E[diag(σi )|X]β|XX(X X) ∑n ( ) ′ − ′ 1 ′ ′ − = (X X) 1X σ2X X (X X) 1 n i i i i=1 ∑n ( ) ′ − ′ 1 ′ ′ − ≡ (X X) 1X Ω X X (X X) 1 n ii i i i=1 ...... When should we adjust standard errors for Estimating the clustering ? A discussion ˆ of Abadie variance-covariance matrix of β et al. 2017 Arthur Heim

A friendly Allowing Heteroskedasticity memo Errors and residuals Estimations depend n on error ! ∑ ( ) ′ −1 ′ 1 ′ ′ −1 Estimating the V ˆ| variance-covariance EHW(β X) = (X X) X ΩiiXiXi (X X) (14) ˆ n matrix of β i=1 Proof that (15) ε¯cn(1) − ε¯cn(0) = τcn − τn The estimated version is: ∑n ( ) ′ − ′ 1 ′ ′ − V (βˆ|X) = (X X) 1X (Y − βˆ′X )2 X X (X X) 1 EHW n | i {z i } i i i=1 ˆ2 =ϵi

...... When should we adjust standard errors for Outline clustering ? A discussion of Abadie et al. 2017 Arthur Heim

A friendly memo

Proof that 7 A friendly memo ε¯cn(1) − ε¯cn(0) = τcn − τn 8 Proof that ε¯cn(1) − ε¯cn(0) = τcn − τn

...... When should we adjust standard errors for Proof that clustering ? A discussion ε¯cn(1) − ε¯cn(0) = τcn − τn of Abadie et al. 2017

Arthur Heim Start with writing the difference and substitute with the expressions page8:

A friendly ( ) memo 1 ∑ ε¯cn(1) − ε¯cn(0) = Cincεin(1) − Cincεin(0) Mcn ∈ Proof that i cin=c ε¯ (1) − cn 1 [ ∑ ( 1 ∑n 1 ∑n )] ε¯cn(0) = = C Y (1) − (Y (1)) − Y (0) + (Y (0)) − inc in jn in jn τcn τn Mcn ∈ n n i cin=c j=1 j=1 [ ∑ ( ) ( ∑n )] 1 Cinc = Cinc Yin(1) − Yin(0) − (Yjn(1) − Yjn(0) Mcn ∈ n i cin=c j=1

Since Cinc = 1(cin = c), then

1 ∑ ( ) 1 ∑ ( ) Cinc Yin(1) − Yin(0) = 1 Yin(1) − Yin(0) ≡ τcn Mcn ∈ Mcn ∈ i cin=c i cin=c

∑ ( ∑n ) 1 Cinc ε¯cn(1) − ε¯cn(0) = τcn − (Yjn(1) − Yjn(0) Mcn ∈ n i cin=c j=1

...... When should we adjust standard errors for Proof that clustering ? A discussion ε¯cn(1) − ε¯cn(0) = τcn − τn of Abadie et al. 2017 Arthur Heim On the right hand side of the sum, the only thing that depends on c is Cinc. Thus, A friendly memo ∑ ∑Cn Proof that Cinc = Mcn = n = Mcn i∈c =c c=1 ε¯cn(1) − in ε¯cn(0) = τcn − τn

( ∑n ) Mcn 1 ε¯cn(1) − ε¯cn(0) = τcn − (Yjn(1) − Yjn(0) Mcn n j=1 1 ( ∑n ) = τcn − (Yjn(1) − Yjn(0) Mn j=1 | {z } =τn

= τcn − τn (Q.E.D)

Back

......