Panel Data: Fixed and Random Effects
Total Page:16
File Type:pdf, Size:1020Kb
Short Guides to Microeconometrics Kurt Schmidheiny Panel Data: Fixed and Random Effects 2 Fall 2020 Unversit¨atBasel We will assume throughout this handout that each individual i is ob- served in all time periods t. This is a so-called balanced panel. The Panel Data: Fixed and Random Effects treatment of unbalanced panels is straightforward but tedious. The T observations for individual i can be summarized as 2 3 2 0 3 2 0 3 2 3 yi1 xi1 zi ui1 6 . 7 6 . 7 6 . 7 6 . 7 6 . 7 6 . 7 6 . 7 6 . 7 1 Introduction 6 7 6 7 6 7 6 7 6 7 6 0 7 6 0 7 6 7 yi = 6 yit 7 Xi = 6 xit 7 Zi = 6 zi 7 ui = 6 uit 7 6 7 6 7 6 7 6 7 In panel data, individuals (persons, firms, cities, ... ) are observed at 6 . 7 6 . 7 6 . 7 6 . 7 4 . 5 4 . 5 4 . 5 4 . 5 several points in time (days, years, before and after treatment, ...). This y x0 z0 u iT T ×1 iT T ×K i T ×M iT T ×1 handout focuses on panels with relatively few time periods (small T ) and many individuals (large N). and NT observations for all individuals and time periods as This handout introduces the two basic models for the analysis of panel 2 3 2 3 2 3 2 3 y1 X1 Z1 u1 data, the fixed effects model and the random effects model, and presents 6 . 7 6 . 7 6 . 7 6 . 7 6 . 7 6 . 7 6 . 7 6 . 7 consistent estimators for these two models. The handout does not cover 6 7 6 7 6 7 6 7 6 7 6 7 6 7 6 7 so-called dynamic panel data models. y = 6 yi 7 X = 6 Xi 7 Z = 6 Zi 7 u = 6 ui 7 6 . 7 6 . 7 6 . 7 6 . 7 Panel data are most useful when we suspect that the outcome variable 6 . 7 6 . 7 6 . 7 6 . 7 4 . 5 4 . 5 4 . 5 4 . 5 depends on explanatory variables which are not observable but correlated y X Z u N NT ×1 N NT ×K N NT ×M N NT ×1 with the observed explanatory variables. If such omitted variables are constant over time, panel data estimators allow to consistently estimate The data generation process (dgp) is described by: the effect of the observed explanatory variables. PL1: Linearity 0 0 2 The Econometric Model yit = α + xitβ + ziγ + ci + uit where E[uit] = 0 and E[ci] = 0 Consider the multiple linear regression model for individual i = 1; :::; N The model is linear in parameters α, β, γ, effect ci and error uit. who is observed at several time periods t = 1; :::; T PL2: Independence y = α + x0 β + z0γ + c + u N it it i i it fXi; zi; yigi=1 i.i.d. (independent and identically distributed) 0 where yit is the dependent variable, xit is a K-dimensional row vector of The observations are independent across individuals but not necessarily 0 time-varying explanatory variables and zi is a M-dimensional row vector across time. This is guaranteed by random sampling of individuals. of time-invariant explanatory variables excluding the constant, α is the PL3: Strict Exogeneity intercept, β is a K-dimensional column vector of parameters, γ is a M- dimensional column vector of parameters, ci is an individual-specific effect E[uitjXi; zi; ci] = 0 (mean independent) and uit is an idiosyncratic error term. Version: 26-11-2020, 09:38 3 Short Guides to Microeconometrics Panel Data: Fixed and Random Effects 4 The idiosyncratic error term uit is assumed uncorrelated with the ex- RE3: Identifiability planatory variables of all past, current and future time periods of the 0 a) rank(W ) = K + M + 1 < NT and E[Wi Wi] = QWW is p.d. and same individual. This is a strong assumption which e.g. rules out lagged 0 0 0 finite. The typical element wit = [1 xit zi]. dependent variables. PL3 also assumes that the idiosyncratic error is 0 −1 uncorrelated with the individual specific effect. b) rank(W ) = K + M + 1 < NT and E[Wi Ωv;i Wi] = QW OW is p.d. and finite. Ωv;i is defined below. PL4: Error Variance RE3 assumes that the regressors including a constant are not perfectly a) V [u jX ; z ; c ] = σ2I, σ2 > 0 and finite i i i i u u collinear, that all regressors (but the constant) have non-zero variance (homoscedastic and no serial correlation) and not too many extreme values. b) V [u jX ; z ; c ] = σ2 > 0, finite and it i i i u;it The random effects model can be written as Cov[uit; uisjXi; zi; ci] = 0 8s 6= t (no serial correlation) 0 0 yit = α + x β + z γ + vit c) V [uijXi; zi; ci] = Ωu;i(Xi; zi) is p.d. and finite it i The remaining assumptions are divided into two sets of assumptions: the where vit = ci +uit. Assuming PL2, PL4 and RE1 in the special versions random effects model and the fixed effects model. PL4a and RE2a leads to 0 1 Ωv;1 ··· 0 ··· 0 2.1 The Random Effects Model B . C B . .. C B C B C In the random effects model, the individual-specific effect is a random Ωv = V [vjX; Z] = B 0 Ωv;i 0 C B C variable that is uncorrelated with the explanatory variables. B . .. C @ . A 0 ··· 0 ··· Ω RE1: Unrelated effects v;N NT ×NT with typical element E[cijXi; zi] = 0 0 2 2 2 1 RE1 assumes that the individual-specific effect is a random variable that σv σc ··· σc is uncorrelated with the explanatory variables of all past, current and B 2 2 2 C B σc σv ··· σc C Ωv;i = V [vijXi; zi] = B . C future time periods of the same individual. B . .. C @ . A σ2 σ2 ··· σ2 RE2: Effect Variance c c v T ×T 2 2 2 2 a) V [cijXi; zi] = σc < 1 (homoscedastic) where σv = σc + σu. This special case under PL4a and RE2a is therefore 2 called the equicorrelated random effects model. b) V [cijXi; zi] = σc;i(Xi; zi) < 1 (heteroscedastic) RE2a assumes constant variance of the individual specific effect. 2.2 The Fixed Effects Model In the fixed effects model, the individual-specific effect is a random vari- able that is allowed to be correlated with the explanatory variables. 5 Short Guides to Microeconometrics Panel Data: Fixed and Random Effects 6 FE1: Related effects and RE3a in samples with a large number of individuals (N ! 1). How- { ever, the pooled OLS estimator is not efficient. More importantly, the FE1 explicitly states the absence of the unrelatedness assumption in RE1. usual standard errors of the pooled OLS estimator are incorrect and tests (t-, F -, z-, Wald-) based on them are not valid. Correct standard errors FE2: Effect Variance can be estimated with the so-called cluster-robust covariance estimator { treating each individual as a cluster (see the handout on \Clustering in FE2 explicitly states the absence of the assumption in RE2. the Linear Model"). Fixed effects model: The pooled OLS estimators of α, β and γ are FE3: Identifiability biased and inconsistent, because the variable ci is omitted and potentially ¨ ¨ 0 ¨ correlated with the other regressors. rank(X) = K < NT and E(XiXi) is p.d. and finite P where the typical elementx ¨it = xit − x¯i andx ¯i = 1=T t xit FE3 assumes that the time-varying explanatory variables are not perfectly 4 Random Effects Estimation collinear, that they have non-zero within-variance (i.e. variation over time The random effects estimator is the feasible generalized least squares for a given individual) and not too many extreme values. Hence, x it (GLS) estimator cannot include a constant or any time-invariant variables. Note that only 0 1 the parameters β but neither α nor γ are identifiable in the fixed effects αRE b −1 −1 −1 B C 0 0 model. @ βbRE A = W Ωbv W W Ωbv y: γbRE 3 Estimation with Pooled OLS where W = [ιNT XZ] and ιNT is a NT × 1 vector of ones. The error covariance matrix Ωv is assumed block-diagonal with equicor- The pooled OLS estimator ignores the panel structure of the data and related diagonal elements Ωv;i as in section 2.1 which depend on the two simply estimates α, β and γ as 2 2 unknown parameters σv and σc only. There are many different ways to 0 1 estimate these two parameters. For example, αbP OLS 0 −1 0 T N B βbP OLS C = (W W ) W y @ A 2 1 X X 2 2 2 2 σv = vit ; σc = σv − σu γP OLS b NT b b b b b t=1 i=1 where W = [ιNT XZ] and ιNT is a NT × 1 vector of ones. where T N Random effects model: The pooled OLS estimator of α, β and γ is un- 1 X X σ2 = (v − v )2 bu NT − N bit bi biased under PL1, PL2, PL3, RE1, and RE3 in small samples. Addition- t=1 i=1 ally assuming PL4 and normally distributed idiosyncratic and individual- 0 0 PT and vbit = yit − αP OLS − xitβbP OLS − ziγbP OLS and vbi = 1=T t=1 vbit. The specific errors, it is normally distributed in small samples. It is consistent 2 degree of freedom correction in σbu is also asymptotically important when and approximately normally distributed under PL1, PL2, PL3, PL4, RE1, N ! 1.