Spurious Regressions
Total Page:16
File Type:pdf, Size:1020Kb
spurious regressions Clive W. J. Granger From The New Palgrave Dictionary of Economics, Second Edition, 2008 Edited by Steven N. Durlauf and Lawrence E. Blume Abstract Macmillan. Simulations have shown that if two independent time series, each being highly autocorrelated, are put into a standard regression framework, then the usual measures Palgrave of goodness of fit, such as t and R-squared statistics, will be badly biased and the series will appear to be ‘related’. This possibility of a ‘spurious relationship’ between Licensee: variables in economics, particularly in macroeconomics and finance, restrains the form of model that can be used. An error-correction model will provide a solution in some cases. permission. without Keywords distribute autocorrelation; Durbin–Watson statistic; econometrics; ordinary least squares (OLS); or regression; serial correlation; spurious regression; Weiner process copy not JEL classifications may You C1 Article For the first three-quarters of the 20th century the main workhorse of applied econometrics was the basic regression www.dictionaryofeconomics.com. Yt ¼ a þ bXt þ et: ð1Þ Here the variables are indicated as being measured over time, but could be over a Economics. cross-section; and the equation was estimated by ordinary least squares (OLS). In of practice more than one explanatory variable x would be likely to be used, but the form (1) is sufficient for this discussion. Various statistics can be used to describe the Dictionary quality of the regression, including R2, t-statistics for β, and Durbin–Watson statistic d which relates to any autocorrelation in the residuals. A good fitting model should Palgrave have jtj near 2, R2 quite near one, and d near 2. New In standard situations the regression using OLS works well, and researchers The used it with confidence. But there were several indications that in special cases the method could produce misleading results. In particular, when the individual series Macmillan. have strong autocorrelations, it had been realized by the early 1970s by time series analysis that the situation may not be so simple; that apparent relationships may ©Palgrave often be observed by using standard interpretations of such regressions. Because a relationship appears to be found between independent series, they have been called ‘spurious’. Note that, if b ¼ 0, then et must have the same time series properties as Yt, that is, it will be strongly autocorrelated, and so the assumptions of the classical OLS regression will not be obeyed. The possibility of getting incorrect results from 1 regressions was originally pointed out by Yule (1926) in a much cited paper that discussed ‘nonsense correlations’. Kendall (1954) also pointed out that a pair of independent autoregressive series of order one could have a high apparent correlation between them; and so if they were put into a regression a spurious relationship could be obtained. The magnitude of the problem was found from a number of simulations. The first simulation on the topic was by Granger and Newbold (1974), who generated pairs of independent random walks, from (1) with a ¼ b ¼ 1. Each series had 50 terms and 100 repetitions were used. If the regression is run, using series that are Macmillan. temporarily uncorrelated, one would expect that roughly 95 per cent of values of jtj on b would be less than 2. This original simulation using random walks found jtj # 2 Palgrave on only 23 occasions; out of the 100, jtj was between 2 and 4 on 24 occasions, between 4 and 7 on 34 occasions, and over 7 on the other 19 occasions. Licensee: The reaction to these results was to reassess many of the previously obtained empirical results in applied time series econometrics, which undoubtedly involved permission. highly autocorrelated series but had not previously been concerned by this fact. Just 2 having a high R value and an apparently significant value of b was no longer without sufficient for a regression to be satisfactory or its interpretations relevant. The immediate questions were how one could easily detect a spurious regression and then distribute correct for it. Granger and Newbold (1974) concentrated on the value of the or Durbin–Watson statistic: if the value is too low, it suggests that the regressions copy results cannot be trusted. Remedial methods such as using a Cochrane–Orcutt not technique to correct autocorrelations in the residuals, or differencing the series used may in a regression, were inclined to introduce further difficulties and could not be You recommended. The problem arises because the equation is mis-specified; the proper reaction to having a possible spurious relationship is to add lagged dependent and independent variables until the errors appear to be white noise, according to the Durbin–Watson statistic. A random walk is an example of a I(1) process, that is, a process that needs to be differenced to become stationary. Such processes seem to be common in parts of econometrics, especially in macroeconomics and finance. One approach that is widely recommended is to test whether Xt, Yt are I(1) and, if so, to www.dictionaryofeconomics.com. difference before one performs the regression. There are many tests available; a popular one is due to Dickey and Fuller (1979). Economics. A theoretical investigation of the basic unit root, ordinary least squares, spurious of regression case was undertaken by Phillips (1986). He considered the asymptotic ^; ^ 2 properties of the coefficients and statistics of eq. (1), a b, the t-statistic for b, R and Dictionary the Durbin–Watson statistics ρ^. To do this he introduced the link between normed sums of functions of unit root processes and integrals of Weiner processes. For Palgrave example, if a sample Xt of size T is generated from a driftless random walk, then New Z The T 1 À2 ∑ 2 ! σ2 2ð Þ ; T Xt ε W t dt 1 0 Macmillan. 2 where σε is the variance of the shock, and W(t) is a Weiner process. As a Weiner process is a continuous time random process on the real line [0,1], the various sums ©Palgrave involved are converging and can thus be replaced by integrals of a stochastic process. This transformation makes the mathematics of the investigation much easier, once one becomes familiar with the new tools. Phillips is able to show that 2 • the distributions of the t-statistics for a^ and b^ from (1) diverge as t becomes large, so there is no asymptotically correct critical values for these conventional tests; • b^ converges to some random variable whose value changes from sample to sample; • Durbin–Watson statistics tend to zero; and • R2 does not tend to zero but to some random variable. Macmillan. What is particularly interesting is not only that do these theoretical results completely explain the simulations but also that the theory deals with asymptotics, Palgrave T ! ∞, whereas the original simulations had only T ¼ 50. It seems that spurious regression occurs at all sample sizes. Licensee: Haldrup (1994) has extended Phillips’s result to the case for two independent I (2) variables and obtained similar results. (An I(2) variable is one that needs differencing twice to get to stationarity, or, here, difference once to get to random permission. walks.) Marmol (1998) has further extended these results to fractionally integrated I (d) processes. Durlauf and Phillips (1988) regress I(1) process on deterministic without polynomials in time, thus polynomial trends, and found spurious relationships. Although spurious regressions in econometrics are usually associated with I(1) distribute or processes, which were explored in Phillips’s well-known theory and in the best known simulations, what is less appreciated is that the problem can also occur, copy not although less clearly, with stationary processes. may Table 1 shows simulation results from independent series generated by two first You order autoregressive models with coefficients a1 and a2 where 0<a1 ¼ a2<1 and with inputs ext; eyt both Gaussian white noise series, using regression 1 estimated using OLS with sample sizes varying between 100 and 10,000. Table 1 Regression between independent AR(1) series Sample series a ¼ 0 a ¼ 0:25 a ¼ 0:5 a ¼ 0:75 a ¼ 0:9 a ¼ 1:0 100 4.9 6.8 13.0 29.9 51.9 89.1 www.dictionaryofeconomics.com. 500 5.5 7.5 16.1 31.6 51.1 93.7 2,000 5.6 7.1 13.6 29.1 52.9 96.2 10,000 4.1 6.4 12.3 30.5 52.0 98.3 Economics. of a1 ¼ a2 ¼ a percentage of jtj > 2 Source: Granger, Hyung and Jeon (2001). Dictionary It is seen that sample size has little impact on the percentage of spurious regressions found (apparent significance of the b coefficient in (1)). Fluctuations Palgrave down columns do not change significantly with the number of iterations used. Thus, New the spurious regression problem is not a small sample property. It is also seen to be a The serious problem with pairs of autoregressive series which are not unit root processes. If a ¼ 0:75, for example, then 30 per cent of regressions will give spurious Macmillan. implications. Further results are available in the original paper but will not be reported in detail. The Gaussian error assumption can be replaced by other ©Palgrave distributions with little or no change in the simulation results, except for an exceptional distribution such as the Cauchy. Spurious regressions also occur if a1 ≠ a2, although less frequently, and particularly if the smaller of the two a values is at least 0.5 in magnitude. 3 The obvious implications of these results is that applied econometricians should not worry about spurious regressions only when dealing with I(1), unit root, processes.