Testing Heteroscedasticity in Robust Regression

VOLUME 4, 2011 TESTING HETEROSCEDASTICITY IN ROBUST REGRESSION Jan KALINA Institute of Computer Science of the Academy of Sciences of the Czech Republic ui = Yi b0 b1xi1 bpxip, i=1, ,n, (3) This work studies the phenomenon of heteroscedasticity and T denotes a vector transposition. and its consequences for various robust estimation methods for the linear regression, including the least weighted This paper however has also the aim to show that s 2 is a squares, regression quantiles and trimmed least squares key parameter also in estimating β. It is crucial to estimate estimators. We investigate hypothesis tests for these s 2 reliably in order to obtain reliable tests of hypotheses regression methods and removing heteroscedasticity from about β and also its reliable confidence intervals. Also the the linear regression model. The new asymptotic robust estimators of linear regression parameters are heteroscedasticity tests for robust regression are sensitive to the assumption of homoscedasticity. In the linear asymptotically equivalent to standard tests computed for the least squares regression. Also we describe an asymptotic regression it is known that s 2 is a nuisance parameter in approximation to the exact null distribution of the test estimating the regression parameters β. This does not mean statistics. We describe a robust estimation procedure for the that s 2 is not important or that its estimation stands aside linear regression with heteroscedastic errors. during the inference of β. We bring arguments that s 2 play a very important role in the statistical inference and influences the estimation procedures, which aim only at the < C14 < C12 < C21 < HETEROSCEDASTICITY < ROBUST REGRESSION regression parameters β. While the regression is based on the (very nonrobust) sum of squares of residuals, the estimation of s 2 is based exactly on the same sum of Homoscedasticity is known to be one of essential squares. This connects the problem of nonrobustness of assumptions of linear regression. It is important not only for 2 the classical least squares estimator, but also for any other estimating β and s . (for example robust) estimator of regression parameters. The paper starts by defining the phenomenon of We describe the classical GoldfeldQuandt test and heteroscedasticity, which is the violation of BreuschPagan test for the least squares regression. Each homoscedasticity, and presents its negative consequences. of these tests is designed for a different alternative Tests of heteroscedasticity are presented in for the least hypothesis. More details on standard heteroscedasticity squares estimator, namely the tests of the GoldfeldQuandt tests can be found in econometric references (Greene, and BreuschPagan test. The new result is the asymptotic 2002) or (Judge et al., 1985). Although originally proposed version of these tests derived for the least weighted squares, in econometric journals, they serve as basic diagnostic tools regression quantiles and trimmed least squares estimators. for a general statistical (not only econometric) context. The solution of estimating parameters in the heteroscedastic model is called heteroscedastic regression, which is GoldfeldQuandt test (Goldfeld and Quandt, 1965) is easy described again for various regression estimators. to be computed and interpreted. It tests the null hypothesis 2 H0 : var ei = , i = 1, ,n, (4) In the whole paper we consider the linear regression model against the alternative hypothesis Yi = β0 + β1xi1 + + βpxip + ei , i = 1, 2, ,n. (1) 2 H1 : var e = diag{k1, ,kn}, i = 1, ,n, (5) 2 The variance of the disturbances s is known to be a which models heteroscedasticity in a particular way. The nuisance parameter. The homoscedasticity assumption constants k1, ,kn must be selected by the statistician 2 var ei = , i = 1, ,n (2) already before the computation. In fact the test does not depend on these values, but its power depends on them. is called homoscedasticity, while its violation is denoted as The alternative hypothesis expresses that the variance of heteroscedasticity. the disturbances e1, ,en depends on some variable (or a There can be severe negative consequences of combination of variables) in a monotone way. Typically one heteroscedasticity, especially if the equality of variances of of the regressors in the linear regression model or fitted the disturbances is violated heavily. Regression parameters values of the response are selected to explain the variability β cannot be estimated efficiently. Denoting the least squares of the disturbances in this way. The test is based of dividing estimator of β by b, the classical estimator of var b is biased. the data to three groups according the values of the This disqualifies using classical hypothesis tests and constants k1, ,kn. Let SSE1 denote the residual sum of confidence intervals for β as well as the value of the squares in the first group of the data and let SSE3 denote coefficient of determination R2. Diagnostic tools checking the residual sum of squares computed in the third group. the assumption of equality of variances of the disturbances Let r1 denote the number of observations in the first group, T can be based on residuals u = (u1, ,un) , where r3 in the third group and p is the number of regression www.researchjournals.co.uk 25 TESTING HETEROSCEDASTICITY IN ROBUST REGRESSION parameters in the linear regression model. Under a permutation, which is determined automatically only during homoscedasticity the test statistic the computation based on the residuals. It is reasonable to choose such weights so that the sequence w1, w2, ..., wn is SSErp - F= 31 (6) decreasing (nonincreasing), so that the most reliable SSEr13- p observations obtain the largest weights, while outliers with large values of the residuals get small (or zero) weights. follows Fishers Fdistribution with r3 p and r1 p degrees of freedom. Let us denote the ith order value among the squared residuals for a particular value of the estimate b of the BreuschPagan test (Breusch and Pagan, 1979) requires to 2 parameter β by ui(b). The least weighted squares specify the alternative hypothesis of heteroscedasticity in estimator bLWS for the model (1) is defined as the form h 2 (10) bLWS = argmin åwub ii() (). var ei = , i = 1, ,n (7) i=1 for some variables Kalina (2007) proposed an approximative algorithm for the intensive computation of the LWS estimator and described T T Z1 = (Z11, Z1n) , , Zk = (ZK1, ,Zkn) . (8) diagnostic tests for the estimator, which are equivalent with those computed for the least squares regression. A special Often one or more regressors in the original linear case with weights equal to either 1 or 0 is the popular least regression model are selected as these auxiliary variables. trimmed squares (LTS) estimator, which has excellent The null hypothesis corresponds to properties in outlier detection (see Hekimoglu et al., 2009). H0 : (9) The least weighted squares estimator has interesting which is tested against a general alternative hypothesis that applications, which follow from its robustness and at the the null hypothesis is not true. Breusch and Pagan (1979) same time efficiency for normal data. Theoretical properties derived the test statistic in the form of the Rao score test, including the breakdown point of the estimator are studied which is one of general asymptotic tests based on the by Víek (2001). It is especially suitable to use the LWS likelihood function, in our case under the presence of estimator rather than other robust regression estimators, nuisance parameters. This tests assumes a normal because diagnostic tools (such as tests of heteroscedasticity distribution of the disturbances e. and autocorrelation of the errors e) can be computed directly using the weighted residuals and again are not affected by White (1980) proposed a general test which is known as outliers. Another advantage of the estimator is that no White test. The test exploits Whites proposal of an estimator detection of outliers is actually needed to compute it, of the variance matrix var e, which is consistent also under because outlying data are downweighted automatically. heteroscedasticity. The test is based on comparing two Víek (2010) conjectures that the LWS estimator is estimators of the variance matrix, where the classical a reasonable compromise between the least squares and estimator is consistent only under homoscedasticity, while least trimmed squares, namely the estimator combines the the Whites estimator is consistent also under the alternative efficiency of the least squares with the robustness of the hypothesis. Therefore large values of the test statistic speak least trimmed squares. in favour of the alternative hypothesis. However the White test is a special case of BreuschPagan test. Here the Kalina (2009) proposed the asymptotic GoldfeldQuandt test and the asymptotic BreuschPagan test for the least particular choice of auxiliary variables Z1, ,ZK is performed to contain squares of all regressors in the original model and weighted squares estimator. Víek (2010) derives the also products of pairs of regressors in the form X X for i ≠ j. Whites estimator of var e for the LWS regression, which is i j based on the LWS estimation and is consistent under The least squares estimator is known to be too vulnerable heteroscedasticity. This allows to define directly a test with respect to violation of the assumption of the normal statistic of White (1980), which is tailormade for the context distribution of the disturbances e. Therefore robust statistical of the LWS regression. Now we use these existing results methods are studied intensively in the literature (see and the ideas of proofs to derive asymptotic Jureèková and Sen, 1996), which represent a diagnostic heteroscedasticity tests for regression quantiles. tool for the least squares estimator or they can be used as Theorem 1. Let the test statistic F of the GoldfeldQuandt an independent tool for the statistical modeling. One of test be computed using residuals of the LWS regression efficient estimator is the least weighted squares proposed estimator with a parameter α.

Testing Heteroscedasticity in Robust Regression

Robust Statistics Part 3: Regression Analysis

Quantile Regression for Overdispersed Count Data: a Hierarchical Method Peter Congdon

Robust Linear Regression: a Review and Comparison Arxiv:1404.6274

A Guide to Robust Statistical Methods in Neuroscience

Robust Bayesian General Linear Models ⁎ W.D

Sketching for M-Estimators: a Unified Approach to Robust Regression

On Robust Regression with High-Dimensional Predictors

Robust Regression in Stata

Robust Fitting of Parametric Models Based on M-Estimation

Robust Regression Examples

Robust Regression Analysis of Ochroma Pyramidale (Balsa-Tree)

Experimental Design and Robust Regression