The Basics of Arma Models

THE BASICS OF ARMA MODELS

Stationarity

∞ A time series in discrete time is a sequence {xtt}=−∞ of random variables deﬁned on a common

probability space. We say that {xt }isstrictly stationary if the joint distributions do not change with time, i.e., if the distribution of (x ,...,x ) is the same as the distribution of (x+τ ,...,x +τ)for tt1 k t1 tk

τ any integers t1 ,...,tk , and any integer .

The time series is weakly stationary (covariance stationary) if E [xt ] =µ exists and is constant

2 for all t , var [xtX] =σ <∞, and the autocovariance E [(xtt−µ)(x −rr−µ)] = c depends only on the lag, r .

Strict stationarity, together with the assumption of finite first and second moments, implies weak stationarity. For Gaussian series (i.e., all joint distributions are Gaussian), weak stationarity is equivalent to strict stationarity. This is because the multivariate Gaussian distribution is uniquely determined by the first two moments, i.e., the mean and covariance. For convenience, and without any real loss of gen- erality, we will take the constant mean value to be zero unless otherwise stated. In this case, we have

= ρ= dddddddddddd = crttE [xx−rrtt], and the autocorrelation at lag r is E [xx−rtt]/√var x var x −rrc / c 0. Given a time

n = hh1 = ρ series data set x1 ,...,xnrr, we estimate c by cˆ Σ xxtt−r (deﬁne cˆ−rrcˆ ), and we estimate r by n t =r +1

ρ = ˆ rrcˆ/cˆ.0

White Noise Processes

A process {εt } is said to be white noise if it is weakly stationary, and if the εt are mutually

2 2 uncorrelated. In the case of zero mean, we have E [εtt] = 0, E [ε ] =σε <∞, and E [εεt u ] = 0ift ≠ u .

The white noise process is a basic building block for more complicated time series models. Note that we have not assumed that the εt are independent. Remember that independence implies uncorrelated-

ness, but that it is possible to have uncorrelated random variables which are not independent. In the

Gaussian case, however, independence and uncorrelatedness are equivalent properties.

The white noise process is not linearly forecastable, in the sense that the best linear forecast of

ε ε ε ... t +1 based on tt, −1 is simply the series mean (zero), and does not depend on the present and -2-

past observations.

Autoregressive Processes

We say that the process {xt }isautoregressive of order p (AR(p)) if there exist constants

a 1 ,...,ap such that

p xt = Σaxkt−kt+ε , k =1

ε ε ... where { ttt} is zero-mean white noise, and is uncorrelated with x −1 , xt −2 . The AR (p ) process

= − ... − p exists and is weakly stationary if and only if all the roots of the polynomial P (z ) 1 az1 azp lie outside the unit circle in the complex plane. An example of an autoregressive process which is not

= +ε = stationary is the random walk, xttx −1 t , which is an AR (1) with a 1 1. We can verify that the random walk is not stationary by noting that the root of P (z ) = 1 − z is z = 1, which is on the unit circle.

For a general AR (p ), if we multiply xttby x +r (r > 0) and take expectations, we obtain

p p crt= E [xaxΣΣkt+r −ktt+ x ε +r ] = ackr−k . k =1 k =1

p The equations cr = Σackr−k are called the Yule-Walker Equations. k =1

Moving Average Models

We say that {xt }isamoving average of order q (MA (q )) if there exist constants b1 ,...,bq

such that

q xt = Σbktε −k , k =0

= where b 0 1. The MA (q ) process has a ﬁnite memory, in the sense that observations spaced more than q time units apart are uncorrelated. In other words, the autocovariance function of the MA (q ) cuts off beyond lag q . This contrasts with the autocovariance function of the AR (p ), which decays exponen- tially, but does not cut off. -3-

Autoregressive Moving Average Models

We say that {xt }isanautoregressive moving average process of order p , q (ARMA (p , q )) if

there exist constants a1 ,...,ap , b1 ,...,bq such that

p q xt = ΣΣaxjt−j + bjtε+ε−jt, j =1 j =1

ε ε ... where { ttt} is zero-mean white noise, and is uncorrelated with x −1 , xt −2 . The ARMA (p , q )

process exists and is weakly stationary if and only if all the roots of the polynomial P (z ) deﬁned earlier

are outside the unit circle. The process is said to be invertible if all the roots of the polynomial

= + + ... + q Q (z ) 1 bz1 bzq lie outside the unit circle. A time series is invertible if and only if it has

an inﬁnite-order autoregressive (AR (∞)) representation of the form

∞ xt =πΣ jtx −jt+ε , j =1

2 where πjjare constants with Σπ<∞.

Linear Forecasting of ARMA Processes

In the linear prediction problem, we want to forecast xt +h based on a linear combination of

... > xtt, x −1 , where h 0 is the lead time. It can be shown that the best linear forecast, i.e., the one

which makes the mean squared error of prediction as small as possible, denoted by xˆt +h , is determined ... by two characteristics: (1) xˆt +htt can be expressed as a linear combination of x , x −1 , and (2) The

− ... prediction error xt +htxˆ+htt is uncorrelated with all linear combinations of x , x −1 . Thus, for

p = example, the best one-step linear forecast in the AR (p ) process is xˆt +1 Σaxkt+1−k . The best one-step k =1

q = ε linear forecast in the MA (q ) model is xˆt +1 Σbkt+1−k , which, in order to be useful in practice, must k =1 ... be expressed as a linear combination of the observations xtt, x −1 , by inverting the process to its

AR (∞) representation. -4-

Optimal Forecasting

It is not necessarily true that the best linear forecast is the best possible forecast. It may be that some nonlinear function of the observations gives a smaller mean squared prediction error than the best linear forecast. It can be shown that the optimal forecast, in terms of mean squared error, is the condi-

e ... tional expectation, E [xt + httx , x −1 ]. This is a function of the present and past observations. If the

{xt } are Gaussian, then this function will be linear. Thus, in the Gaussian case, the optimal predictor is the same as the best linear predictor. In fact, for any zero-mean process with a one-sided MA (∞)

∞ = ε ε representation, xt Σbkt−kt, if the conditional expectation of +1 given the available information is k =0

zero, it will be true that the optimal predictor is the same as the best linear predictor. In general, how-

ever, the optimal predictor may be nonlinear.