<<

Lecture 15 in

Stationarity If the time series is normally distributed then strictly stationary and weekly stationary time series are considered equivalent.

Autocorrelation The autocorrelation (Box and Jenkins, 1976) function can be used for the following two purposes: 1. To detect non- in data. 2. To identify an appropriate time series model if the data are not random.

Definition Given measurements, Y1, Y2, ..., YN at time X1, X2, ..., XN, the lag k autocorrelation function is defined as − − = − Although the time variable, X, is not used in the formula for autocorrelation, the assumption is that the observations are equi-spaced.

Autocorrelation is a correlation coefficient. However, instead of correlation between two different variables, the correlation is between two values of the same variable at times Xi and Xi+k .

When the autocorrelation is used to detect non-randomness, it is usually only the first (lag 1) autocorrelation that is of interest. When the autocorrelation is used to identify an appropriate time series model, the are usually plotted for many lags.

Sample Output Dataplot generated the following autocorrelation output using the data set: The lag-one autocorrelation coefficient of the 200 observations = -0.3073 The computed value of the constant a = -0.3073 lag autocorrelation 12. -0.55 25. -0.66 38. -0.64 0. 1.00 13. 0.73 26. 0.42 39. 0.08 1. -0.31 14. 0.07 27. 0.39 40. 0.58 2. -0.74 15. -0.76 28. -0.65 41. -0.45 3. 0.77 16. 0.40 29. 0.03 42. -0.28 4. 0.21 17. 0.48 30. 0.63 43. 0.62 5. -0.90 18. -0.70 31. -0.42 44. -0.10 6. 0.38 19. -0.03 32. -0.36 45. -0.55 7. 0.63 20. 0.70 33. 0.64 46. 0.45 8. -0.77 21. -0.41 34. -0.05 47. 0.25 9. -0.12 22. -0.43 35. -0.60 48. -0.61 10. 0.82 23. 0.67 36. 0.43 49. 0.14 11. -0.40 24. 0.00 37. 0.32

Questions The autocorrelation function can be used to answer the following questions 1. Was this sample data set generated from a random process? 2. Would a non-linear or time series model be a more appropriate model for these data than a simple constant plus error model? Instructor: Azmat Nafees Time Series and Forecasting 30-Oct-10 1

Importance Randomness is one of the key assumptions in determining if a univariate statistical process is in control. If the assumptions of constant location and scale, randomness, and fixed distribution are reasonable, then the univariate process can be modeled as:

= + where Ei is an error term. If the randomness assumption is not valid, then a different model needs to be used. This will typically be either a time series model or a non-linear model (with time as the independent variable).

Autocorrelation Plot Autocorrelation plots (Box and Jenkins) are a commonly-used tool for checking randomness in a data set. This randomness is ascertained by computing autocorrelations for data values at varying time lags. If random, such autocorrelations should be near zero for any and all time-lag separations. If non- random, then one or more of the autocorrelations will be significantly non-zero.

In addition, autocorrelation plots are used in the model identification stage for Box-Jenkins autoregressive, moving average time series models.

Sample Plot: Autocorrelations should be near-zero for randomness. Such is not the case in this example and thus the randomness assumption fails.

This sample autocorrelation plot shows that the time series is not random, but rather has a high degree of autocorrelation between adjacent and near-adjacent observations.

Definition : Autocorrelation plots are formed by • Vertical axis: Autocorrelation coefficient =

ℎ ℎ is between − 1 and + 1 where Ch is the autocovariance function 1 = − − and C0 is the function − = • Horizontal axis: Time lag h (h = 1, 2, 3, ...) • The above line also contains several horizontal reference lines. The middle line is at zero. The other four lines are 95% and 99% confidence bands. Note that there are two distinct formulas for generating the confidence bands.

Autocorrelation plots are also used in the model identification stage for fitting ARIMA models. In this case, a moving average model is assumed for the data.

Instructor: Azmat Nafees Time Series and Forecasting 30-Oct-10 2

Importance : Randomness (along with fixed model, fixed variation, and fixed distribution) is one of the four assumptions that typically underlie all measurement processes. The randomness assumption is critically important for the following three reasons: 1. Most standard statistical tests depend on randomness. The validity of the test conclusions is directly linked to the validity of the randomness assumption. 2. Many commonly-used statistical formulae depend on the randomness assumption, the most common formula being the formula for determining the of the sample : = where is the standard deviation of the data. Although heavily used, the results from using this formula are of no value unless the randomness assumption holds.

3. For univariate data, the default model is

Y = constant + error

If the data are not random, this model is incorrect and invalid, and the estimates for the parameters (such as the constant) become nonsensical and invalid.

In short, if the analyst does not check for randomness, then the validity of many of the statistical conclusions becomes suspect. The autocorrelation plot is an excellent way of checking for such randomness.

Instructor: Azmat Nafees Time Series and Forecasting 30-Oct-10 3