Introduction to Time Series

Introduction to Time Series A forecasting model for Absorbent Paper Towels 1 Introduction The Olympia Paper Company, Inc., makes Absorbent Paper Towels. The company would like to develop a prediction model that can be used to give point forecasts and prediction interval forecasts of weekly sales over 100,000 rolls, in units of 10,000 rolls, of Absorbent Paper Towels. With a reliable model, Olympia Paper can more effectively plan its production schedule, plan its budget, and estimate requirements for producing and storing its product. For the past 120 weeks the company has recorded weekly sales of Absorbent Paper Towels. The 120 sales figures, y1, y2, ......, y120 are plotted in Figure 1. 20 15 10 sales 5 0 0 50 100 150 time Figure 1: Towel Sales Raw data-weekly The overall mean of the series isy ¯ = 11.58391, the standard deviation is s = 4.377476. But it should be noticed from Figure 1 that the original values of the time series do not seem to fluctuate around a constant mean, and hence it would seem that these values are nonstationary. (Bowerman et al., 2005). 1 We decided to develop an ARIMA(p,d,q) model as our prediction model. There are many other models we could try (exponential smoothing, regression with autocorrelated errors, time series regression models, etc..) but due to their good forecasting performance, and also due to the fact that we have no other information on the series, we decided to start with ARIMA. In future reports, we will compare the forecasting performance of this model with other models. The software used for most of the analysis is SAS V8.2. Plots of the raw data and the prediction intervals were not done with SAS. 2 Identification of an ARIMA(p,d,q) model for the towels data The first step in ARIMA modeling is to check the autocorrelation sequence or correlogram (ACF). As we can see in Figure 2 of the SAS output, the autocorrelation plot indicates that the series is not stationary as we suspected. The correlations die down very slowly. The autocorrelation check for white noise when the series is nonstationary should indicate that all the autocorrelations are significantly different from 0. The hypothesis tested is Ho : all correlation coefficients up to lag l are 0 Ha : not all lags up to lag l are 0 The statistic for this test ( see Bowerman et al..page 459 for one specification) is K ∗ X −1 2 Q = n(n + 2) (n − l) rl l=1 and it has a Chi-square distribution. This is called the Ljung-Box test statistic. The P-value is the probability that Q is higher than our observed Q. Figure 3 in the SAS output contains the results of our Ljung-Box test. We can see that the p-values are so small (P < 0.0001), that we reject the null hypothesis. There is significant evidence that there is autocorrelation between the yi’s. This test is not really telling us whether there is nonstationarity or not. It is really not adding that much information for us here, since the ACF is so obvious. It is just confirming that indeed there is some correlation. The test will be more useful later to test whether residuals of the model are white noise or not. Since the plot of the data and the correlogram suggest nonstationarity, we should try differencing the series before identifying an ARIMA model for it. So we difference the series once, i.e., make z = yt − yt−1. The differenced series have a mean of 0.005423 and a standard deviation of 1.099416. The number of observations left is 119. The correlogram of the differenced data, in Figure 4 of the Sas output, suggests that since there is cut off (or fast decline) of the ACF we can tentatively say that the differenced series is stationary. The Ljung-Box test for the differenced case (Figure 5 of SAS output) keeps suggesting that the series is autocorrelated, but now less (P-value <0.0206 for lag 6, and the autocorrelation values are much smaller). If the P-value for this test was bigger than 0.5 we would have a white noise uncorrelated 2 series and we would not need any model. So our results are good, they say that the series can be modeled since it is somewhat correlated (but not as much as when it was nonstationary). With a stationary series, we can now try to identify a possible ARIMA model. For that we will also need the partial autocorrelation function (in Figure 6 of the SAS output). And we will use as guidelines for selecting a model table 9.5, page 436 of Bowerman et al. According to the guidelines in Table 9.5 of Bowerman et al.. one could say that the ACF of the differenced data dies down quickly and the PACF cuts off after lag 1. But it could be the other way around (see the argument in Chapter 9 of Bowerman et al in favor of an MA based on ACF and PACF). So it is worth trying an AR(1) model (ARIMA(1,1,0)) and a MA(1) model (ARIMA(0,1,1)) or both. We must say that since the white noise test suggests that the autocorrelations are not too strong, an MA might be better, but we will see. In this report we estimate an ARIMA(1,1,0). 3 Estimation of the model chosen The model that we identified is zt = φzt−1 + at where zt = yt − yt−1. We do not include a constant term because the mean of the differenced series is so close to 0. SAS uses the conditional least squares method of estimation for the parameters. This method obtains the parameters that minimize the sum of squared residuals. We will be talking about methods of estimation later. The algorithm used for the estimation uses an iterative procedure that requires an initial value. Most software packages make use of the relation between autocorrelation coefficients and model parameters that theory provides to give an initial estimate value (see table 9.6 in Bowerman et al.). But SAS does not use that for initial estimates. The estimated model is zt = 0.30688zt−1 + at (se = 0.08765) (t − value = 3.50) (p − value = 0.0007) Before we interpret this model, we should check the invertibility and stationarity conditions (see table 10.1 in Bowerman et al. ) Since the absolute value of the slope coefficient is less than 1, we conclude that the series is stationary. An AR is always invertible, so we don’t need to check that. Since stationarity and invertibility check,we proceed to interpret the model further. The p-value for the estimate of the slope coefficient is less than 0.05 so we reject the null hypothesis that the slope coefficient is 0 in favor of the alternative that it is different from 0. There is a linear relation between zt and zt−1. 2 The estimate of σ , the variance of at, is 1.104263 with a standard error for this estimate of 1.050839. The value of the AIC is 350.5054 and the SBC is 353.2845. These statistics aid in comparing this model to other models, so they will be relevant to compare the AR to the MA model estimated separately. Smaller AIC and SBC indicate better fitting models. All the estimation results can be seen in Figure 7 of the SAS output. 3 If the diagnostic checks we will do next suggest that this is a good model, this will be our forecasting model. 4 Diagnostic Checking We checked the residuals from the model. To do that, we did another Ljung-Box test for white noise (Figure 8 of the SAS output, but this time for the residuals). A model is good if the residuals don’t show any correlation or pattern, that is, if they are just plain white noise. In this test, the P-value for up to lag 6 is 0.2013. So we can not reject the hypothesis that the autocorrelations of the residuals are 0. This means that the residuals are no correlated, indicating that the AR(1) or ARIMA(1,1,0) model is appropriate, i.e., fits the series well, and there is no autocorrelation left to explain. What is left after fitting the model is just noise. The autocorrelation and partial autocorrelation functions for the residuals (Figure 8 of the SAS output) also indicate that there are no autocorrelations past the two standard errors, i.e., there are no significant autocorrelations, confirming the Ljung-Box test result. 5 Forecasting With a well fit model and all the conditions for interpreting it (stationarity, invertibility) satisfied, we now have a forecasting model, which is yt = yt−1 + 0.30688yt−1 − 0.30688yt−2 + at To understand this, think that zt = yt − yt−1 and zt−1 = yt−1 − yt−2 and substituting in our estimated model above, we get the forecasting model. We are now ready to forecast 12 steps ahead with this model. The company probably doesn’t want to forecast that far ahead, but we will, just to see what happens. Notice that the y121ˆ = y120 + 0.30688y120 − 0.30688y119 + 0 y121ˆ = 15.6453 + 0.30688(15.6453) − 0.30688(15.3410) + 0 = 15.7387 which is the value we obtain with SAS for the forecast of period 121. You can get the other forecasted values following similar procedure and putting iny ˆ when we run out of data values.

Load more