Lecture 7: Exponential Methods Please read Chapter 4 and Chapter 2 of MWH Book

1 Big Picture

1. In lecture 6, smoothing (averaging) method is used to estimate the trend-cycle (decomposition)

2. Now, modified smoothing method is used to forecast future values. That , in general, the averaging is one-sided, as opposed to two-sided

3. Another difference is, we focus on out-of-sample errors, other than the in-sample fitting errors (residuals)

4. Again, which method works best depends on whether trend or seasonality is present

2 Forecasting Scenario

1. The present is the t-th period

2. Past (in-sample) observations Yt ,Yt−1,...,Y1 are used to estimate the model

3. Then forecasts Ft+1,Ft+2,... are computed for the future (out-of-sample) values Yt+1,Yt+2,...

4. Do not confuse the out-of-sample forecasting errors Yt+1 − Ft+1,Yt+2 − Ft+2,... with the in-sample fitting errors (residuals) Yt − Ft ,...,Y1 − F1

3 Averaging Method—

This method uses all past available

1 t Ft+1 = ∑Yi (1) t i=1 1 t+1 Ft+2 = ∑ Yi (2) t + 1 i=1 The mean method works well only when there is no trend or seasonality or change in the mean value. In other words, it assumes the underlying process has a constant mean (stationarity).

4 Averaging Method—

Moving average, to some degree, can allow for time-varying mean value. It does so by including only the most recent k observations. More explicitly, the MA(k) forecast (not smoother) is

1 t Ft+1 = ∑ Yi (3) k i=t−k+1 1 t+1 Ft+2 = ∑ Yi (4) k i=t−k+2 It follows that 1 F = F + (Y −Y − ) (5) t+2 t+1 k t+1 t k+1 So each new forecast Ft+2 is an adjustment of the immediately preceding forecast Ft+1. The 1 − adjustment term is k (Yt+1 Yt−k+1). Bigger k leads to smaller adjustment (or smoother forecasts). Figure 4-5 in the book clearly shows that the mean method cannot capture the change in the mean value, but the moving average can, though with some lags.

5 Exponential Smoothing Method—Single Exponential Smoothing (SES) I

We hope the forecast can catch up with the change in mean value more quickly. One way is to assign a bigger weight to the latest observation. Consider

Ft+1 = Ft + α(Yt − Ft ) (6)

Ft+2 = Ft+1 + α(Yt+1 − Ft+1) (7)

F1 = Y1 (initialization) (8) where 0 < α < 1. There is substantial adjustment when α is close to one. In fact, the single exponential smoothing utilizes the idea of error correction—we revise up the forecast, i.e.,

Ft+1 > Ft if Ft underestimates the true value Yt − Ft > 0. The error correcting process is also called negative feedback.

6 Single Exponential Smoothing II

Equation (6) can be rewritten as

Ft+1 = (1 − α)Ft + αYt (9)

By using the lag operator, we can show that the single exponential smoothing is equivalent to MA(∞) process:

−1 2 Ft+1 = α[(1 − (1 − α)L] Yt = αYt + α(1 − α)Yt−1 + α(1 − α) Yt−2 + ... (10)

Moreover, notice that the weights α,α(1 − α),α(1 − α)2,... decay exponentially toward zero.

7 Mean Square Error (MSE)

Different smoothing parameter α yields different forecast. We can evaluate or rank various forecasts using mean square error (MSE), which is the average of squared error (in-sample or out-of-sample) 1 2 MSE = ∑(Yi − Fi) total count i We square the error in order to avoid cancellation. The forecast with the smallest MSE is the best one. This fact suggests that we can obtain the optimal smoothing parameter α by minimizing MSE.

8 Theil’s U-

An alternative way to rank forecasts is considering the Theil’s U-statistic v u ( ) u − 2 u∑n−1 Yt+1 Ft+1 u t=1 Yt U = t ( ) (11) − 2 ∑n−1 Yt+1 Yt t=1 Yt

− − where Yt+1 Ft+1 is the forecast relative change based on a particular method, while Yt+1 Yt is the Yt Yt forecast relative change based on a naive method Ft+1 = Yt . A U-statistic less than one indicates a method better than the naive method. The method with smallest U-statistic is the best one.

9 of Forecasting Error

The forecasting error should be serially uncorrelated if no pattern is left unexploited. So a method that produces a white noise forecasting error is better than the one with serially correlated forecasting error.

10 Single Exponential Smoothing III

The equation (6) for SES is recursive. This suggests that we may use loop. The code below, for example, computes the SES forecast, and in the end, compute the MSE for α = 0.5. alpha = 0.5 f = rep(0, length(y)) f[1] = y[1] tse = 0 for (j in 2:length(y)) { f[j] = f[(j-1)] + alpha*(y[j-1]-f[(j-1)]) tse = tse + (y[j] - f[j])ˆ2 } mse =tse/(length(y)-1)

11 Adaptive-Response-Rate SES (ARRSES)

The single SES has three limitations: (i) it is arbitrary regarding which α to use; (2) it employs a unique α to compute all forecasts. By contrast, ARRSES allows the value of α to be determined in a data-driven fashion. That is, α is adjusted automatically if there is change in the pattern of data, which can be signalized by big forecasting error. Intuitively, we prefer bigger α when bigger forecasting error arises:

F = ( − α )F + α Y t+1 1 t t t t (12)

At αt+1 = (13) Mt

At = βEt + (1 − β)At−1 (14)

Mt = β|Et | + (1 − β)Mt−1 (15)

Et = Yt − Ft (16)

So Et is the forecasting error; Mt is a smoothed estimate of absolute forecasting error, and At is a smoothed estimate of forecasting error. Notice that it is αt+1, not αt , in (13). 12 Holt’s Linear Method (HLM)

The third limitation of SES is that it cannot catch up with trend quickly enough, while Holt’s Linear Method (HLM) is able to account for trend.

Lt = αYt + (1 − α)(Lt−1 + bt−1) (17)

bt = β(Lt − Lt−1) + (1 − β)bt−1 (18)

Ft+m = Lt + bt m (19)

L1 = Y1 (20)

b1 = Y2 −Y1 (21) where 0 < α < 1,0 < β < 1, and m is the number of periods ahead to be forecasted. Here Lt denotes the level of the series, and bt the slope of trend. HLM is also called double exponential smoothing. The HLM has the drawback of failing to account for seasonal effect.

13 Holt-Winter’s Method (HWM)

We use HWM to account for both trend and seasonality. HWM involves three equations: one for level, one for trend; and one for seasonality.

Yt Lt = α + (1 − α)(Lt−1 + bt−1) (22) St−s bt = β(Lt − Lt−1) + (1 − β)bt−1 (23) Yt St = γ + (1 − γ)St−s (24) Lt Ft+m = (Lt + bt m)St−s+m (25) where s is the length of seasonality (e.g., the number of months in a year), and the seasonality has a multiplicative form.

14 Prediction Intervals

We can use prediction intervals (PI) to illustrate the inherent uncertainty in the point forecast: √ 95% PI = Ft+1  1.96 MSE (26) where Ft+1 is the point forecast, and MSE is the mean square error for the method that produces Ft+1

15