Chaper 9: Nonseasonal Box-Jenkins Models

The concepts of ‘stationary time series’ and ‘nonstationary time series’ are important in the Box-Jenkins methodology.

Stationary time series

A time series {yt } is said to be stationary if the following two conditions are satisfied: (a) the mean function is constant over time, i.e.,

mt =E( y t ) = c for all t

(b) rt, s= cov(y t , y s ) / var( y t )var( y s ) are not

functions of time, i.e., rt, t- k = r 0, k = r k for all time t and lag k . This is equivalent to the

condition: gt, s = cov(y t , y s ) are independent of

time t also, ie., t,t-k = 0,k = k for all t and lag k.

In other words both the autocorrelations rt, s and autocovariances gt, s depend only the distance

1 Ch9 between the two time points s and t but not on the actual positions of s and t.

Note: Since gt, t =cov(y t , y t ) = var( y t ), a stationary time series is also necessary that the variance is constant with respect to t.

Nonstationary Time Series

If the n values of yt do not fluctuate around a constant mean or do not fluctuate with constant variation then it is reasonable to believe the time series is not stationary.

Random walk with zero mean

15

10

Zt 5

0

-5

Time 50 100 150

2 Ch9 A nonstationary series can be transformed into a stationary one by first differencing zt=� y t - y t y t-1. Minitab command for differencing is Stat ▷ Time Series ▷Difference (lag 1)

(Differencing is like differentiation in calculus) �y y y �y- y y �y = t t t-1 t t t-1 t 1t- ( t - 1) which is similar to the definition of a derivative of a function f( t ): f( t+ D ) - f ( t ) f ( t + D ) - f ( t ) f' ( t )= lim = lim D瓺0t+ D - t 0 D

3 Ch9 Time Series Plot of Paper Towel Sales 20

15

10 y

5

0

1 12 24 36 48 60 72 84 96 108 120 Index

After first differencing

Time Series Plot of first differencing

3

2

1

0 2 C -1

-2

-3

-4 1 12 24 36 48 60 72 84 96 108 120 Index

4 Ch9 If this is not sufficient, take second differences (the first differences of the first differences) of the original series values should normally does the job

2 zt=� y t �� y t - - y t-1 - ( y t y t - 1 ) ( y t - 1 y t - 2 ) If a time series plot indicates increasing variability, it is often transform the series by using either square root, quadric or logarithmetic transformation first and then takes first differences

Example: Consider the following NCR (New Company Registrations) rates data given below:

5 Ch9 Time Series Plot of NCR 700

600

500 R

C 400 N

300

200

100 4 8 12 16 20 24 28 32 36 Index

The series is clearly not stationary since it has a trend and increasing variability which means both

E( yt ) and var(yt ) are depending on the time variable t.

6 Ch9 Time Series Plot of lnNCR

6.50

6.25

6.00

R 5.75 C N n l 5.50

5.25

5.00

4 8 12 16 20 24 28 32 36 Index

Clearly the log transformation has stabilised the variance somewhat.

Applying differencing on the logged series:

Time Series Plot of d1lnNCR

0.3

0.2

0.1 R C N n l 0.0 1 d

-0.1

-0.2

-0.3 4 8 12 16 20 24 28 32 36 Index

7 Ch9 It now appears that the resulting series is stationary.

Working Series

The textbook uses zb, z b+1 ,..., z n as the ‘working series’ obtained from the original series by transformation or differencing. b = 2 if zt= y t - y t-1

Sample autocorrelation coefficient (SAC) The sample autocorrelation at lag k is

n- k (zt- z )( z t+ k - z ) t= b rk = rk = n 2 (zt - z ) t= b where

n z= zt /( n - b + 1) t= b

The standard error of rk is

8 Ch9 1 , if k = 1 (n- b + 1)1/ 2 s = k-1 rk 2 1+ 2 rj j=1 , if k = 2,3,... (n- b + 1)1/ 2 t The rk -statistic is

rk tr = k s rk SAC graph is a graph of sample autocorrelations (Minitab calls it the ACF plot):

Autocorrelation Function for y (original towel sales) (with 5% significance limits for the autocorrelations)

1.0 0.8 0.6

n 0.4 o i t

a 0.2 l e r

r 0.0 o c

o -0.2 t u

A -0.4 -0.6 -0.8 -1.0

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Lag

9 Ch9 Spikes

We say that a spike at lag k exists if rk is t= r/ s > 2 statistically large, says rk k r k in absolute value.

In Minitab acf graph, any rk that is above or below the confidence bands is considered to be a spike so t you do not need to find the value of rk .

Cuts off after k We say that SAC cuts off after lag k if no spikes at lags greater than k in SAC

Using the SAC to find a stationary time series For nonseasonal data (i) If the time series either cuts off fairly quickly or dies down fairly quickly, then the series is considered stationary

10 Ch9 (ii) If the time series dies down extremely slowly, then the series is considered nonstationary Note that the SAC of the towel sales series refuse to die down quickly so there is a clear sign the series is nonstationary

Sample partial autocorrelation rkk Can be thought of as the sample autocorrelation of time series observations separated by a lag of k time units with the effects of the intervening observations eliminated.

In other words, this measure of correlation is used to identify the extent of relationship between current values of a variable with earlier values of the same variable (values for various time lags) while holding the effects of all other time lags constant.

11 Ch9 Consider now the differenced series of the towel sales

Autocorrelation Function for z (differenced series) (with 5% significance limits for the autocorrelations)

1.0 0.8 0.6

n 0.4 o i t

a 0.2 l e r

r 0.0 o c

o -0.2 t u

A -0.4 -0.6 -0.8 -1.0

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Lag

Here, there is a cut-off at lag 1 so the differenced series is stationary.

Simple Stationary Time Series Models- ARMA

Let {at } be a sequence of random shocks which describe the effect of all other factors other than zt-1 on zt . It is more or less the residual errors of

12 Ch9 the forecast (if the residuals et are not independent, then we can’t treat et as at )

Note: Most textbooks call {at } the white noise.

Properties of {at }

(i) a1, a 2 , a 3, ... are independent

2 (ii) ai: N(0,s a )

(iii) at+1 is independent of yt, y t-1 ,...

{at } forms a very important role in Box-Jenkins methodology. Essentially, every stationary Box- Jenkins model can be expressed in terms of the white noise process.

Simple Box-Jenkins Models

Moving Average Models

zt= a t - q1 a t -1 ... - q qa t - q

13 Ch9 and refer to it as a moving average process of order q, denoted by MA(q). (Note that structurally speaking, MA(q) is expressed as averaging of at terms except the negative signs)

The special case: MA(1)

zt= a t - q1 a t- 1

E( zt )= 0 2 2 var(zt )= s a (1 + q1 ) 2 cov(zt , z t+1 ) = -q 1 s a

cov(zt , z t+ k )= 0 for k 2

q1 Thus r1 = 2 and all other rk are zero. 1+ q1 (Make sure you know how to derive the above).

Hence the TAC of an MA(1) “cuts off” after lag 1.

MA(2) zt= a t - q 1 a t -1 - q 2 a t -2

14 Ch9 E(Z ) = 0 E( zt )= 0 t 2 2 2 var(zt )= s a (1 + q1 + q 2 ) 2 cov(zt , z t+1 )= ( -q 1 + q 1 q 2 ) s a 2 cov(zt , z t+2 ) = -q 2 s a

cov(zt , z t+ k )= 0 for k 3

-q1 + q 1 q 2 r1 = 2 2 , 1+ q1 + q 2

-q2 r2 = 2 2 1+ q1 + q 2 and all other rk are zero. Thus the TAC of an MA(2) “cuts off” after lag 2.

In general, for MA(q)

(i) rk �0 for k 1,2,..., q

rk =0 for k > q

(ii) PAC dies down

15 Ch9 Autoregressive Models

zt= f1 z t -1 + f 2 z t- 2 + ... f p z t - p + a t

Here the zt are regressed on themselves, (hence of course the name) but lagged by various amounts. The simplest case is the first order, denoted as AR(1), which takes the form

zt= f1 z t -1 + a t

E( zt )= 0 2 sa var(zt )= g0 = 2 , 1- f1 so |f1 | < 1 to ensure stationarity

k = 1 k-1 k k = f1

Thus rk “dies down” exponentially as k increases, oscillating if 1 < 0. Thus if the TAC of a series dies down rather than cuts off, we suspect it to be an AR rather than an MA.

16 Ch9 Note that AR and MA series are not entirely unrelated. It can be shown that an AR(1) can be expressed as an “infinite” MA series, much like the general linear process. The MA(1) can similarly be expressed as an “infinite” AR series.

Note: a linear process is a time series that has the form yt= a t +y1 a t- 1 + y 2 a t - 2 + ...

The AR(2) can be written as

zt= f1 z t -1 + f 2 z t -2 + a t

f1 r1 = 1- f2

r2 = f 1 r 1 + f 2

r3 = f 1 r 2 + r 2 f 1 etc. Thus again the TAC dies down rather than cuts off, though it is difficult at times to tell the difference in TAC’s between AR(1) and AR(2).

17 Ch9 TPAC has nonzero partial autocorrelations at lags 1 and 2 and zero at all lags after lag 2, i.e., cuts off after lag 2.

In general, for AR(p), TAC dies down and TPAC cuts off after lag p.

ARMA(p, q) Mixed autoregressive-moving average models

The model can be written as zt= f1 z t- 1 + f 2 z t - 2 +... + f t - p + a t - q 1 a t - 1 - q 2 a t - 2 - ... - q q a t - q

zt- f1 z t- 1 - f 2 z t - 2... - f t - p = a t - q 1 a t - 1 - q 2 a t - 2 - ... - q q a t - q i.e., we move autoregressive part to the left whereas the moving average part on the right.

ARMA(1, 1)

18 Ch9 zt= f1 z t- 1 + a t - q 1 a t - 1

(1- q1 f 1 )( f 1- q 1 ) k-1 rk =2 f1 , k 1 1- 2 q1 f 1 + q 1 i.e., TAC dies down exponentially from r1 (not from r0 =1) TPAC also dies down exponentially.

Summary

We can therefore tentatively produce a Model Identification Chart, as follows, based on the behaviours of the SAC and SPAC of a stationary series.

SAC SPAC Tentative behaviour behaviour Model Cuts off after 1 Dies down MA(1) Cuts off after 2 Dies down MA(2) Dies down Cuts off after 1 AR(1) Dies down Cuts off after 2 AR(2) Dies down Dies down ARMA(1, 1)

19 Ch9 This looks relatively obvious, but isn’t as easy in practice as it appears. Note that no process has ACF and PACF that both cut off.

Box-Jenkins Models with a nonzero constant term MA(q):

zt= d + a t - q1 a t -1 ... - q qa t - q

E( zt ) = m = d AR(p):

zt= d + f1 z t -1 + f 2 z t- 2 + ... f p z t - p + a t

d = m(1 - f1 - f 2 - ... - f p )

m = d/(1 - f1 - f 2 ... fk ) ARMA(p,q) zt= d + f1 z t- 1 + f 2 z t - 2 +... + f t - p + a t - q 1 a t - 1 - q 2 a t - 2 - ... - q q a t - q

d = m(1 - f1 - f 2 - ... - f p )

20 Ch9 Time Series Operations and Representation of ARMA (p,q) Models.

Backshift Operator

Byt= y t-1 (Push back the time series to the previous position) Difference operator

�-1 B so �yt-(1 = B ) y t - y t y t-1 . Thus, is generally known as a differencing operator.

2 �yt蜒( = y t ) � = ( y t - y t-1 ) - ( y t - y t - 1 ) ( y t - 1 y t - 2 )

=yt -2 y t-1 + y t - 2

Also �d- (1B ) d

Representation of an ARMA(p, q) model: AR(p)

zt=d + f1 z t- 1 +... + f p z t - p + a t zt-f1 z t- 1 -... - f p z t - p = d + a t which can also be written as

21 Ch9 2 p (1-f1B - f 2 B - ... - fp B ) z t = d + a t

2 p Define fp(B )= (1 - f1 B - f 2 B - ... - f p B ) so

fp(B ) z t= d + a t

MA(q) – moving average model of order q

The model is written as

zt=d + a t - q1 a t- 1 - q 2 a t - 2 -... - q q a t - q which can also be written as

2 q zt=d +(1 - q1 B - q 2 B - ... - q q B ) a t Define

2 q , qq(B )= (1 - q1 B - q 2 B - ... - q q B ) then

zt=d + q q( B ) a t

ARMA (p, q)—Mixed autoregressive-moving average model of order (p, q):

22 Ch9 zt=d + f1 z t- 1 + f 2 z t - 2 +... + f p z t - p

+at-q1 a t- 1 - q 2 a t - 2 - ... - q q a t - q or zt-f1 z t- 1 - f 2 z t - 2鬃 �= f p z t - p + d - a t - q 1 a t - 1 鬃 q 2 a t - 2 � q q a t - q

2p 2 q (1-f1B - f 2 B - ... - fp B ) z t = d + (1 - q 1 B - q 2 B - ... - q q B ) a t or

fp(B ) z t= d + q q ( B ) a t (*)

2 q where qq(B )= (1 - q1 B - q 2 B - .. - q q B )

In this notation, ARMA(p, 0)= AR(p) and ARMA(0, q) = MA(q).

In such cases one would prefer to write AR(p) and MA(q) instead of ARMA(p, 0) and ARMA(0, q).

23 Ch9 Point Estimate of the model parameters Having identified a tentative ARMA model, we must now fit it to the dataset concerned, in so doing obtain estimates of the parameters defined by the models. For the ARMA(p, q) model, the parameters are qi , fi and d (if the constant term is required).

These parameters are popularly estimated the least squares method (As we understand it, both Minitab and SAS use this approach). The least method essentially find the estimates so

ˆ 2 that SSE = (yt- y t ) is minimum.

You do not need to know the detailed algorithm. Isn’t nice that the computer packages do it for us?

24 Ch9 Forecasts What is the meaning of forecasting? yˆt+t ( t ) is a point forecast of the series at time t +t given the series has been observed from 1 to t Statistically speaking, yˆt+t( t )= E ( y t + t | y1 , y 2 ,.., y t )

Since ARMA models build upon the series{at }, the properties of {at } needs to be revisited. In particular, a1, a 2 , a 3 ,... are independent and that future values of a' s are independent of the present and the past values of y' s , i.e., at+1 is independent of yt, y t-1 ,.... Example: Paper Towel Sales It is found that the differenced series can be fitted by MA(1), so

zt= a t -q1 a t- 1 (assuming d = 0).

Since zt= y t - y t-1 so

25 Ch9 yt- y t-1 = a t -q 1 a t - 1

yt= y t-1 + a t -q 1 a t - 1 (This is known as in the form of a difference- equation) One-step forecast:

First, we have yt+1= y t + a t + 1 -q 1 a t yˆt+1( t )= E ( y t + 1 | y 1 , y 2 ,..., y t )

=E ( yt + a t+1 -q 1 a t | y 1 ,..., y t ) ˆ ˆ =yt + 0 -q1 aˆ t = y t - q 1 a ˆ t since at+1 is independent of y1,.., yt

so E( at+1 | y 1 , y 2 ,.., y t )= E ( a t + 1 ) = 0. Let t =120 and t =1 so ˆ yˆ121(120) = y 120 -q 1 a ˆ 120 In the absorbent towel sales example given in ˆ Table 9.1, Minitab gives q1 = -0.3544

26 Ch9 Final Estimates of Parameters Type Coef SE Coef T P MA 1 -0.3544 0.0864 -4.10 0.000

Differencing: 1 regular difference Number of observations: Original series 120, after differencing 119 Residuals: SS = 127.367 (backforecasts excluded) MS = 1.079 DF = 118

Modified Box-Pierce (Ljung-Box) Chi-Square statistic Lag 12 24 36 48 Chi-Square 10.3 18.6 27.5 41.2 DF 11 23 35 47 P-Value 0.500 0.725 0.815 0.710

The last two residuals are e119 = -1.0890 and e120 = 0.6903 so aˆ119 = -1.0890 and aˆ120 = 0.6903. Thus yˆ121(120)= 15.6453 + 0.3544 0.6903 = 15.8899 Using Minitab to forecast, we get

Forecasts from period 120

95 Percent Limits Period Forecast Lower Upper Actual 121 15.8899 13.8532 17.9267 which is identical.

27 Ch9 Two-step forecast: yt+2= y t + 1 + a t + 2 -q 1 a t + 1 ˆ yˆt+2= y ˆ t + 1( t ) + E ( a t + 2 ) -q 1 E ( a t + 1 ) = y ˆ t + 1 ( t ) Again, let t =120, then yˆ122= y ˆ 121(120) = 15.8899. However, the prediction interval is winder:

Forecasts from period 120

95 Percent Limits Period Forecast Lower Upper Actual 121 15.8899 13.8532 17.9267 122 15.8899 12.4609 19.3189

Finally, in ARIMA notation, we may write our model that fits the original series as

ARIMA(0,1,1).

28 Ch9