T-Statistic Based Correlation and Heterogeneity Robust Inference

t-Statistic Based Correlation and Heterogeneity Robust Inference Rustam IBRAGIMOV Economics Department, Harvard University, 1875 Cambridge Street, Cambridge, MA 02138 Ulrich K. MÜLLER Economics Department, Princeton University, Fisher Hall, Princeton, NJ 08544 ([email protected]) We develop a general approach to robust inference about a scalar parameter of interest when the data is potentially heterogeneous and correlated in a largely unknown way. The key ingredient is the following result of Bakirov and Székely (2005) concerning the small sample properties of the standard t-test: For a significance level of 5% or lower, the t-test remains conservative for underlying observations that are independent and Gaussian with heterogenous variances. One might thus conduct robust large sample inference as follows: partition the data into q ≥ 2 groups, estimate the model for each group, and conduct a standard t-test with the resulting q parameter estimators of interest. This results in valid and in some sense efficient inference when the groups are chosen in a way that ensures the parameter estimators to be asymptotically independent, unbiased and Gaussian of possibly different variances. We provide examples of how to apply this approach to time series, panel, clustered and spatially correlated data. KEY WORDS: Dependence; Fama–MacBeth method; Least favorable distribution; t-test; Variance estimation. 1. INTRODUCTION property of the correlations. The key ingredient to the strategy is a result by Bakirov and Székely (2005) concerning the small Empirical analyses in economics often face the difficulty that sample properties of the usual t-test used for inference on the the data is correlated and heterogeneous in some unknown fash- mean of independent normal variables: For significance levels ion. Many estimators of parameters of interest remain valid and of 8.3% or lower, the usual t-test remains conservative when interesting even under the presence of correlation and hetero- the variances of the underlying independent Gaussian obser- geneity, but it becomes considerably more challenging to cor- vations are not identical. This insight allows the construction rectly estimate their sampling variability. of asymptotically valid test statistics for general correlated and The typical approach is to invoke a law of large numbers to justify inference based on consistent variance estimators: For heterogenous data in the following way: Assume that the data an OLS regression with independent but not identically distrib- can be classified in a finite number q of groups that allow as- uted disturbances, see White (1980). In the context of time se- ymptotically independent normal inference about the scalar pa- ries, popular heteroscedasticity and autocorrelation consistent rameter of interest β. This means that there exists estimators ˆ = (“long-run”) variance estimators were derived by Newey and βj, estimated from groups j 1,...,q, so that approximately ˆ ∼ N 2 ˆ ˆ West (1987) and Andrews (1991). For clustered data, which βj (β, vj ), and βj is approximately independent of βi for ˆ includes panel data as a special case, Rogers’ (1993)clus- j = i. Typically, the estimator βj will simply be the element tered standard errors provide a consistent variance estimator. ˆ ˆ of interest of the vector θ j, where θ j is the estimator of the Conley (1999) derives consistent nonparametric standard errors model’s parameter vector using group j data only. The obser- for datasets that exhibit spatial correlations. vations βˆ ,...,βˆ can then approximately be treated as inde- While quite general, the consistency of the variance estima- 1 q pendent normal observations with common mean β (but not tor is obtained through an assumption that asymptotically, an necessarily equal variance), and the usual t-test concerning β infinite number of observable entities are essentially uncorre- constructed from βˆ ,...,βˆ (with q − 1 degrees of freedom) is lated: heteroscedasticity robust estimators achieve consistency 1 q conservative. If the number of observations is reasonably large by averaging over an infinite number of uncorrelated distur- in all groups, the approximate normality βˆ ∼ N (β, v2) is of bances; clustered standard errors achieve consistency by aver- j j aging over an infinite number of uncorrelated clusters; long- course a standard result for most models and estimators, linear run variance estimators achieve consistency by averaging over or nonlinear. Knowledge about the correlation structure in the data is em- an infinite number of (essentially uncorrelated) low frequency ˆ ˆ periodogram ordinates; and so forth. Also, block bootstrap bodied in the assumption that β1,...,βq are (approximately) and subsampling techniques derive their asymptotic validity independent. In contrast to consistent variance estimators, only from averaging over an infinite number of essentially uncorre- this finite amount of uncorrelatedness is directly required for lated blocks. When correlations are pervasive and pronounced enough, these methods are inapplicable or yield poor results. © 2010 American Statistical Association This paper develops a general strategy for conducting infer- Journal of Business & Economic Statistics ence about a scalar parameter with potentially heterogenous and October 2010, Vol. 28, No. 4 correlated data, when relatively little is known about the precise DOI: 10.1198/jbes.2009.08046 453 454 Journal of Business & Economic Statistics, October 2010 the validity of the t-statistic approach. What is more, by invok- and cluster averages of the individual disturbances are approx- ing the results of Müller (2008), we show that the t-statistic ap- imately iid Gaussian across clusters. Hansen (2007) finds that proach in some sense efficiently exploits the information con- the asymptotic null distribution of test statistics based on the tained in the assumption of asymptotically independent and standard clustered error formula for a panel with one fixed di- ˆ ˆ Gaussian estimators β1,...,βq. Of course, a stronger (correct) mension and one dimension tending to infinity become that of assumption, that is, larger q, will typically lead to more pow- a Student-t with a finite number of degrees of freedom (suit- erful inference, so that one faces the usual trade-off between ably scaled), as long as the fixed dimension is “asymptotically robustness and power in choosing the number of groups. In the homogeneous.” Recent work by Bester, Conley, and Hansen benchmark case of underlying iid observations, a 5% level t- (2009) builds on our paper and proposes inference about both statistic based test with q = 16 equal sized groups loses at most scalar and vector valued parameters based on the usual full 5.8 percentage points of asymptotic local power compared to sample estimator, using clustered standard errors with groups inference with known (or correctly consistently estimated) as- chosen as suggested here and critical values derived in Hansen ymptotic variance in an exactly identified GMM problem. The (2007). This approach requires the design matrix to be (asymp- robustness versus power trade-off is thus especially acute only totically) identical across groups. Under this homogeneity, their for datasets where an even coarser partition is required to yield procedure for inference about a scalar parameter is asymptoti- independent information about the parameter of interest. cally numerically identical to the t-statistic approach under both In applications of the t-statistic approach, the key question the null and local alternatives. At the same time, the t-statistic will nevertheless often be the adequate number and composi- approach remains valid even without this homogeneity, so from tion of groups. Some examples and Monte Carlo evidence for the usual first-order asymptotic point of view, the t-statistic ap- group choices are discussed in Section 3 below. The efficiency proach is strictly preferable. result mentioned above formally shows that one cannot dele- The t-statistic approach has an important precursor in the gate the decision about the adequate group choice to the data. work of Fama and MacBeth (1973). Their work on empirical On a fundamental level, some a priori knowledge about the cor- tests of the CAPM has motivated the following widespread ap- relation structure is required in order to be able to learn from proach to inference in panel regressions with firms or stocks as the data. This is also true of other approaches to inference, al- individuals: Estimate the regression separately for each year, and then test hypotheses about the coefficient of interest by though the assumed regularity tends to be more implicit. For in- the t-statistic of the resulting yearly coefficient estimates. The stance, consider the problem of conducting inference about the Fama–MacBeth approach is thus a special case of our suggested mean real exchange rate in 40 years of quarterly data. It seems method, where observations of the same year are collected in challenging to have a substantive discussion about the appropri- a group. While this approach is routinely applied, we are not ateness of, say, a confidence interval based on Andrews’ (1991) aware of a formal justification. One contribution of this paper is consistent long-run variance estimator, whose formal validity is to provide such a justification, and we find that as long as year based on primitive conditions involving mixing conditions and coefficient estimators are approximately normal (or scale mix- the like. [Or, for that matter, on Kiefer and Vogelsang’s (2005) tures of normals) and independent, the Fama–MacBeth method approach with a bandwidth of, say, 30% of the sample size.] At results in valid inference even for a short panel that is heteroge- the same time, it seems at least conceivable to debate whether nous over time. averages from, say, 8 year blocks provide approximately inde- The t-statistic approach generalizes the previous literature on pendent information; business cycle frequency fluctuations of large sample inference without consistent variance estimation the real exchange rate, for instance, would rule out the appro- to a generic strategy that can be employed in different settings, priateness of 4 year blocks.

T-Statistic Based Correlation and Heterogeneity Robust Inference

Heteroscedastic Errors

Power Comparisons of the Mann-Whitney U and Permutation Tests

Research Report Statistical Research Unit Goteborg University Sweden

Two-Way Heteroscedastic ANOVA When the Number of Levels Is Large

Profiling Heteroscedasticity in Linear Regression Models

Iterative Approaches to Handling Heteroscedasticity with Partially Known Error Variances

Logistic Regression, Part I: Problems with the Linear Probability Model

Lecture 3: Heteroscedasticity

Non-Parametric Vs. Parametric Tests in SAS® Venita Depuy and Paul A

Alternative Methods of Adjusting for Heteroscedasticity in Wheat Growth Data

Skedastic: Heteroskedasticity Diagnostics for Linear Regression

Regression Calibration with Heteroscedastic Error Variance