Outline Nature of Heteroscedasticity Possible Reasons
Total Page:16
File Type:pdf, Size:1020Kb
1/25 Outline Basic Econometrics in Transportation WWhathat iiss tthehe nnatureature ooff hheteroscedasticity?eteroscedasticity? What are its consequences? How does one detect it? What are the remedial measures? Heteroscedasticity Amir Samimi Civil Engineering Department Sharif University of Technology Primary Source: Basic Econometrics (Gujarati) 2/25 3/25 Nature of Heteroscedasticity Possible Reasons 2 2 An impor tant assu mpt io n in C LRM is t hat E(u i) = σ 1. As peopl e l earn , the ir e rro rs o f be hav io r beco me s ma lle r ove r This is the assumption of equal (homo) spread (scedasticity). time. Example: the higher income families on the average save more than the lower- As the number of hours of typing practice increases, the average number of income families, but there is also more variability in their savings. typing errors as well as their variances decreases. 2. As incomes grow, people have more choices about the disposition of their income. Rich people have more choices about their savings behavior. 2 3. As data collecting techniques improve, σ i is likely to decrease. Banks that have sophisticated data processing equipment are likely to commit fewer errors. 4/25 5/25 Possible Reasons Cross-sectional and Time Series Data 44.. HHeteroscedasticityeteroscedasticity cacann aariserise wwhenhen ttherehere aarere outoutliers.liers. HHeteroscedasticityeteroscedasticity iiss liklikelyely to be mmoreore cocommonmmon in ccrossross- An observation that is much different than other observations in the sample. sectional than in time series data. 5. Heteroscedasticity arises when model is not correctly specified. In cross-sectional data, one usually deals with members of a population at a given point in time. These members may be of different sizes, income, etc. Very often what looks like heteroscedasticity may be due to the fact that some important variables are omitted from the model. In time series data, the variables tend to be of similar orders of magnitude because one generally collects the data for the same entity over a period of 6. Skewness in distribution of a regressor is an other source. time. Distribution of income and wealth in most societies is uneven, with the bulk of the income and wealth being owned by a few at the top. 7. Other sources of heteroscedasticity: Incorrect data transformation (ratio or first difference transformations). Incorrect functional form (linear versus log–linear models). 6/25 7/25 OLS Estimation with Heteroscedasticity Method of Generalized Least Squares OLS est im ator s an d th eir vari an ces w he n Ideall y, we woul d lik e to gi ve l ess weig ht to t he obse r vati on s . coming from populations with greater variability. . Consider: Yi = β1 + β2Xi + ui = β1X0i + β2Xi + ui Assume the heteroscedastic variances are known: Is it still BLUE when we drop only the homoscedasticity assumption? We can easily prove that it is still linear and unbiased. We can also show that it is a consistent estimator. Variance of transformed disturbance term is now homoscedastic: It is no longer best and the minimum variance is not given by the equation above. What is BLUE in the presence of heteroscedasticity? Apply OLS to the transformed model and get BLUE estimators. 8/25 9/25 GLS Estimators Consequences of Using OLS MinimizMinimizee OOLSLS estestimatorimator fforor vavarianceriance iiss a bbiasediased estestimator.imator. Overestimates or underestimates, on average Cannot tell whether the bias is positive or negative No longer rely on confidence intervals, t and F tests Follow the standard calculus techniques, we have: If we persist in using the usual testing procedures despite heteroscedasticity, whatever conclusions we draw may be very misleading. Heteroscedasticity is potentially a serious problem and the researcher needs to know whether it is present in a given situation. 10/25 11/25 Detection Informal Methods Theeeaere are no ha adrd-aadnd-fast r ul es fo r detect ing hete roscedast ic ity, Nature of the Problem only a few rules of thumb. Nature of problem may suggest heteroscedasticity is likely to be encountered. Residual variance around the regression of consumption on income increases This is inevitable because σ2 can be known only if we have the entire Y i with income. population corresponding to the chosen X’s, More often than not, there is only one sample Y value corresponding to a Graphical Method 2 2 particular value of X. And there is no way one can know σ i from just one Y Estimated u i are plotted against estimated Yi observation. Is the estimated mean value of Y systematically Thus, heteroscedasticity may be a matter of intuition , educated guesswork , or rela te d to the squared resid ual? prior empirical experience. a) no systematic pattern, perhaps no Most of the detection methods are based on examination of OLS heteroscedasticity. residuals. b-e) definite pattern, perhaps no homoscedasticity. Those are the ones we observe, and not ui. We hope they are good estimates. Using such knowledge, one may transform the This hope may be fulfilled if the sample size is fairly large. data to alleviate the problem. 12/25 13/25 Formal Methods Formal Methods PParkark TTestest GGlejserlejser TTestest He formalizes the graphical method, by suggesting a Log-linear model: Glejser suggests regressing the estimated error term on the X variable: 2 2 ln σ i = ln σ + β ln Xi + vi Following functional forms are suggested: 2 Since σ i is generally unknown, Park suggests If β turns out to be insignificant, homoscedasticity assumption may be accepted. Thilfilfhbkilhe particular functional form chosen by Park is only suggesti ve. For large samples the first four give generally satisfactory results. The last two models are nonlinear in the parameters. Note: the error term vi may not satisfy the OLS assumptions. Note: some argued that vi does not have a zero expected value, it is serially correlated, and heteroscedastic. 14/25 15/25 Formal Methods Formal Methods Spearma n’s Ra nk Co rre lat io n Test Gold f el d-Quandt Test Fit the regression to the data on Y and X and estimate the residuals. Rank the observations according to Xi values. Rank both absolute value of residuals and Xi (or estimated Yi) and compute the Omit c central observations, and divide the remaining observations into two Spearman’s rank correlation coefficient: groups each of (n − c) / 2 observations. Fit separate OLS regressions to the first and last set of observations, and obtain th • di = difference in the ranks for i observation. the residual sums of squares RSS1 and RSS2. Assuming that the population rank correlation coefficient is zero and n > 8, the Compute the ratio siifiignificance o fthf the sampl e rs can btbe tes tdbthtttted by the t test, with df = n − 2: If ui are assumed to be normally distributed, and if the assumption of homoscedasticity is valid, then it can be shown that λ follows the F distribution. The ability of the test depends on how c is chosen. If the computed t value exceeds the critical t value, we may accept the Goldfeld and Quandt suggest that c = 8 if n = 30, c = 16 if n = 60. hypothesis of heteroscedasticity. Judge et al. note that c = 4 if n = 30 and c = 10 if n is about 60. 16/25 17/25 Formal Methods Formal Methods BrBreuscheusch–PPaganagan–GodGodfreyfrey TTestest WWhitehite’s GeGeneralneral HHeteroscedasticityeteroscedasticity TTestest Success of GQ test depends on c and X with which observations are ordered. Does not rely on the normality assumption and is easy to implement. Estimate Yi = β1 + β2X2i + ··· + βkXki + ui by OLS and obtain the residuals. Estimate Yi = β1 + β2X2i + β3X3i + ui and obtain the residuals. Obtain , (ML estimator of σ2) Run the following auxiliary regression: Construct variables pi defined as Regress pi on the Z’s as pi = α1 + α2Z2i + ··· + αmZmi + vi Higher powers of regressors can also be introduced. 2 o σ i is assumed to be a linear function of the Z’s. Under the null hypothesis (homoscedasticity), if the sample size n increases o Some or all of the X’s can serve as Z’s. idfiiindefinite ly, it can b e sh own th at nR2 ∼ χ2 (df = numbfber of regressors) Obtain the ESS (explained sum of squares) = 0.5 ESS If the chi-square value exceeds the critical value, the conclusion is that there is heteroscedasticity. Assuming ui are normally distributed, one can show that if there is 2 If it does not α = α = α = α = α = 0. homoscedasticity and if the sample size n increases indefinitely, then ∼ χ m−1 2 3 4 5 6 BPG test is an asymptotic, or large-sample, test. It has been argued that if cross-product terms are present, then it is a test of heteroscedasticity and specification bias. 18/25 19/25 Remedial Measures Remedial Measures 2 Heteroscedast ic ity does not dest roy unb iased ness aadnd WWehen σ i issow: known: consistency. The most straightforward method of correcting heteroscedasticity is But OLS estimators are no longer efficient, not even by means of weighted least squares. asymptotically. WLS method provides BLUE estimators. There are two approaches to remediation: 2 2 when σ i is known, and When σ i is unknown: 2 When σ i is not known. Is there a way of obtaining consistent estimates of the variances and covariances of OLS estimators even if there is heteroscedasticity? The answer is yes. 20/25 21/25 White’s Correction White’s Procedure WWhitehite hhasas suggested a pprocedurerocedure by wwhichhich asyasymptoticallymptotically vavalidlid FForor a 2-2 vavariableriable rregressionegression mmodelodel Yi = β1 + β2X2i + ui we sshowed:howed: statistical inferences can be made about the true parameter values.