CHAPTER 2

Univariate Time Models

2.1 Least Squares Regression

We begin our discussion of univariate and multivariate methods by considering the idea of a simple regression model, which we have met before in other contexts. All of the multivariate methods follow, in some sense, from the ideas involved in simple univariate . In this case, we assume that there is some collection of fixed known functions of time, say zt1, zt2, . . . ztq that are influencing our output yt which we know to be random. We express this relation between the inputs and outputs as

yt = β1zt1 + β2zt2 + ··· + βqztq + et (2.1) at the time points t = 1, 2, . . . , n, where β1, . . . , βq are unknown fixed regression coefficients and et is a random error or noise, assumed to be white noise; this that the observations have zero means, equal σ2 and are independent. We traditionally assume also that the white noise series, et, is Gaussian or normally distributed.

Example 2.1: We have assumed implicitly that the model

yt = β1 + β2t + et

is reasonable in our discussion of detrending in Chapter 1. This is in the form of the regression model (2.1) when one makes the identification zt1 = 1, zt2 = t. The problem in detrending is to estimate the coeffi- cients β1 and β2 in the above equation and detrend by constructing the estimated residual series et. We discuss the precise way in which this is accomplished below.

The linear regresssion model described by Equation (2.1) can be conve- niently written in slightly more general matrix notation by defining the column 2.1: Least Squares Regression 27

0 0 vectors zzzt = (zt1, . . . , ztq) and βββ = (β1, . . . , βq) so that we write (2.1) in the alternate form 0 yt = βββ zzzt + et. (2.2) To find estimators for β and σ2 it is natural to determine the coefficient vector P 2 βββ minimizing et with respect to β. This yields least squares or maximum likelihood estimator βˆ and the maximum likelihood estimator for σ2 which is proportional to the unbiased

n−1 1 X 0 σˆ2 = (y − βββˆ zzz )2 (2.3) (n − q) t t t=0 An alternate way of writing the model (2.2) is as

yyy = Zβββ + eee (2.4)

0 where Z = (zzz1, zzz2, . . . , zzzn) is a q×n matrix composed of the values of the input 0 variables at the observed time points and yyy = (y1, y2, . . . , yn) is the vector of 0 observed outputs with the errors stacked in the vector eee = (e1, e2, . . . , en) .The estimators βˆ are the solutions to the normal equations Z0Zβββˆ = Z0y, You need not be concerned as to how the above equation is solved in practice as all computer packages have efficient software for inverting the q × q matrix Z0Z to obtain βββˆ = (Z0Z)−1Z0yyy. (2.5) An important quantity that all software produces is a measure of uncertainty for the estimated regression coefficients, say

ˆcov{βββˆ} =σ ˆ2 (Z0Z)−1. (2.6)

0 −1 ˆ ˆ 2 If cij denotes an element of C = (Z Z) , then cov( βi, βj) = σ cij and a 100(1 − α)% confidence interval for βi is ˆ √ βi ± tn−q(α/2)ˆσ cii, (2.7)

where tdf (α/2) denotes the upper 100(1 − α)% point on a t distribution with df degrees of freedom.

Example 2.2: Consider estimating the possible global warming trend alluded to in Sec- tion 1.1.2. The global temperature series, shown previously in Figure 1.3 suggests the possibility of a gradually increasing tempera- ture over the 123 year period covered by the land-based series. If we fit the model in Example 2.1, replacing t by t/100 to convert to a 100 28 1 Univariate Time Series Models

year base so that the increase will be in degrees per 100 years, we obtain ˆ ˆ β1 = 38.72, β2 = .9501 using (2.5). The error , from (2.3), is .0752, with q = 2 and n = 123. Then (2.6) yields µ ¶ 1.8272 −.0941 ˆcov(βˆ , βˆ ) = , 1 2 −.0941 .0048 √ leading to an estimated of .0048 = .0696. The value of t with n−q = 123−2 = 121 degrees of freedom for α = .025 is about 1.98, leading to a narrow confidence interval of .95 ± .138 for the slope leading to a confidence interval on the one hundred year increase of about .81 to 1.09 degrees. We would conclude from this analysis that there is a substantial increase in global temperature amounting to an increase of roughly one degree F per 100 years.

Detrended Temperature 1 1 ACF = γ (h) PACF = Φ x hh

0.5 0.5

0 0

−0.5 −0.5 0 5 10 15 20 0 5 10 15 20 lag lag

Differenced Temperature 1 1 ACF = γ (h) PACF = Φ x hh

0.5 0.5

0 0

−0.5 −0.5 0 5 10 15 20 0 5 10 15 20 lag lag

Figure 2.1 functions (ACF) and partial autocorrelation functions (PACF) for the detrended (top panel) and differenced (bottom panel) global temperature series.

ˆ ˆ If the model is reasonable, the residualse ˆt = yt − β1 − β2 t should be essentially independent and identically distributed with no correlation evident. The plot that we have made in Figure 1.3 of the detrended global temperature series shows that this is probably not the case because of the long low frequency 2.1: Least Squares Regression 29 in the observed residuals. However, the differenced series, also shown in Figure 1.3 (second panel), appears to be more independent suggesting that perhaps the apparent global warming is more consistent with a long term swing in an underlying random walk than it is of a fixed 100 year trend. If we check the autocorrelation function of the regression residuals, shown here in Figure 2.1, it is clear that the significant values at higher lags imply that there is significant correlation in the residuals. Such correlation can be important since the estimated standard errors of the coefficients under the assumption that the least squares residuals are uncorrelated is often too small. We can partially repair the damage caused by the correlated residuals by looking at a model with correlated errors. The procedure and techniques for dealing with correlated errors are based on the Autoregressive (ARMA) models to be considered in the next sections. Another method of reducing correlation is to apply a first difference ∆xt = xt − xt−1 to the global trend . The ACF of the differenced series, also shown in Figure 2.1, seems to have lower correlations at the higher lags. Figure 1.3 shows qualitatively that this transformation also eliminates the trend in the original series. Since we have again made some rather arbitrary looking specifications for the configuration of dependent variables in the above regression examples, the reader may wonder how to select among various plausible models. We mention that two criteria which reward reducing the squared error and penalize for additional parameters are the Akaike Information Criterion 2K AIC(K) = logσ ˆ2 + (2.8) n and the Schwarz Information Criterion K log n SIC(K) = logσ ˆ2 + , (2.9) n (Schwarz, 1978) where K is the number of parameters fitted (exclusive of vari- ance parameters) andσ ˆ2 is the maximum likelihood estimator for the variance. This is sometimes termed the Bayesian Information Criterion, BIC and will often yield models with fewer parameters than the other selection methods. A modification to AIC(K) that is particularly well suited for small samples was suggested by Hurvich and Tsai (1989). This is the corrected AIC, given by

n + K AIC (K) = logσ ˆ2 + (2.10) C n − K − 2 The rule for all three measures above is to choose the value of K leading to the smallest value of AIC(K) or SIC(K) or AICC (K). We will give an example later comparing the above simple least squares model with a model where the errors have a time series correlation structure. The organization of this chapter is patterned after the landmark approach to developing models for time series data pioneered by Box and Jenkins (see 30 1 Univariate Time Series Models

Box et al, 1994). This assumes that there will be a representation of time series data in terms of a difference equation that relates the current value to its past. Such models should be flexible enough to include non-stationary realizations like the random walk given above and seasonal behavior, where the current value is related to past values at multiples of an underlying season; a common one might be multiples of 12 months (1 year) for monthly data. The models are constructed from difference equations driven by random input shocks and are labeled in the most general formulation as ARIMA , i.e., AutoRegressive Integrated Moving Average processes. The analogies with differential equations, which model many physical processes, are obvious. For clarity, we develop the separate components of the model sequentially, considering the integrated, autoregressive and moving average in order, fol- lowed by the seasonal modification. The Box-Jenkins approach suggests three steps in a procedure that they summarize as l identification, estimation and . Identification uses techniques, combining the ACF and PACF as diagnostics with the versions of AIC given above to find a parsimonious (simple) model for the data. Estimation of parameters in the model will be the next step. Statistical techniques based on maximum like- lihood and least squares are paramount for this stage and will only be sketched in this course. Finally, forecasting of time series based on the estimated pa- rameters, with sensible estimates of uncertainty, is the bottom line, for any assumed model.

2.2 Integrated (I) Models

We begin our study of time correlation by mentioning a simple model that will introduce strong correlations over time. This is the random walk model which defines the current value of the time series as just the immediately preceding value with additive noise. The model forms the basis, for example, of the random walk theory of stock price behavior. In this model we define

xt = xt−1 + wt, (2.11)

2 where wt is a white noise series with zero and variance σ . Figure 2.2 shows a typical realization of such a series and we observe that it bears a passing resemblance to the global temperature series. Appealing to (2.11), the best prediction of the current value would be expected to be given by its immediately preceding value. The model is, in a sense, unsatisfactory, because one would think that better results would be possible by a more efficient use of the past. The ACF of the original series, shown in Figure 2.3, exhibits a slow decay as lags increase. In order to model such a series without knowing that it is necessarily generated by (2.11), one might try looking at a first difference and comparing the result to a white noise or completely independent process. It is 2.2 I Models 31

5

0 Random walk: x =x +w t t−1 t

−5

−10

−15 0 20 40 60 80 100 120 140 160 180 200

3 First Difference: x −x t t−1 2

1

0

−1

−2

−3 0 20 40 60 80 100 120 140 160 180 200

Figure 2.2 A typical realization of the random walk series (top panel and the first difference of the series (bottom panel)

clear from (2.11) that the first difference would be ∆xt = xt −xt−1 = wt which is just white noise. The ACF of the differenced process, in this case, would be expected to be zero at all lags h 6= 0 and the sample ACF should reflect this behavior. The first difference of the random walk in Figure 2.2 is also shown in Figure 2.3 and we note that it appears to be much more random. The ACF, shown in Figure 2.3, reflects this predicted behavior, with no significant values for lags other than zero. It is clear that (2.11) is a reasonable model for this data. The original series is nonstationary, with an autocorrelation function that depends on time of the form

 q  t  t+h , h ≥ 0 ρ(xt+h, xt) = q  t+h t , h < 0 32 1 Univariate Time Series Models

Random Walk 1 1 ACF = γ (h) PACF = Φ x hh

0.5 0.5

0 0

−0.5 −0.5 0 5 10 15 20 0 5 10 15 20 lag lag

First Difference 1 1 ACF = γ (h) PACF = Φ x hh

0.5 0.5

0 0

−0.5 −0.5 0 5 10 15 20 0 5 10 15 20 lag lag

Figure 2.3 Autocorrelation functions (ACF) and partial autocorrelation functions (PACF) for the random walk (top panel) and the first difference (bottom panel) series.

The above example, using a difference transformation to make a random walk stationary, shows a very particular case of the model identification pro- cedure advocated by Box et al (1994). Namely, we seek a linearly filtered transformation of the original series, based strictly on the past values, that will reduce it to completely random white noise. This gives a model that enables prediction to be done with a residual noise that satisfies the usual statistical assumptions about model error. We will introduce, in the following discussion, more general versions of this simple model that are useful for modeling and forecasting series with observations that are correlated in time. The notation and terminology were introduced in the landmark work by Box and Jenkins (1970) (see Box et al, 1994). A requirement for the ARMA model of Box and Jenkins is that the underlying process be stationary. Clearly the first difference of the random walk is stationary but the ACF of the first difference shows relatively little dependence on the past, meaning that the differenced process is not predictable in terms of its past behavior. To introduce a notation that has advantages for treating more general mod- els, define the backshift operator B as the result of shifting the series back by one time unit, i.e.

Bxt = xt−1, (2.12) 2.2 AR Models 33

k and applying successively higher powers, B xt = xt−k. The operator has many of the usual algebraic properties and allows, for example, writing the random walk model (2.11) as (1 − B)xt = wt. Note that the difference operator discussed previously in 1.2.2 is just ∇ = 1−B. Identifying nonstationarity is an important first step in the Box-Jenkins procedure. From the above discussion, we note that the ACF of a nonstationary process will tend to decay rather slowly as a function of lag h. For example, a straightly line would be perfectly correlated, regardless of lag. Based on this observation, we mention the following properties that aid in identifying non-stationarity.

Property P2.1: ACF and PACF of a non-stationary time series The ACF of a non-stationary time series decays very slowly as a function of lag h. The PACF of a non-stationary time series tends to have a peak very near unity at lag 1, with other values less than the significance level.

2.3 Autoregressive (AR) Models

Now, extending the notions above to more general linear combinations of past values might suggest writing

xt = φ1xt−1 + φ2xt−2 + . . . φpxt−p + wt (2.13) as a function of p past values and an additive noise component wt. The model given by (2.12) is called an autoregressive model of order p, since it is as- sumed that one needs p past values to predict xt. The coefficients φ1, φ2, . . . , φp are autoregressive coefficients, chosen to produce a good fit between the ob- served xt and its prediction based on xt−1, xt−2, . . . , xt−p. It is convenient to rewrite (2.13), using the backshift operator, as

φ(B)xt = wt, (2.14) where 2 p φ(B) = 1 − φ1B − φ2B − ... − φpB (2.15) is a polynomial with roots (solutions of φ(B) = 0) outside the unit circle (|Bk| > 1). The restrictions are necessary for expressing the solution xt of (2.14) in terms of present and past values of wt. That solution has the form

xt = ψ(B)wt (2.16) 34 1 Univariate Time Series Models where X∞ k ψ(B) = ψkB , (2.17) k=0 is an infinite polynomial (ψ0 = 1), with coefficients determined by equating coefficients of B in ψ(B)φ(B) = 1. (2.18)

Equation (2.16) can be obtained formally by noting that choosing ψ(B) sat- isfying (2.18), and multiplying both sides of (2.16) by ψ(B) gives the repre- sentation (2.16). It is clear that the random walk has B1 = 1, which does not satisfy the restriction and the process is nonstationary.

Example 2.2 Suppose that we have an autoregressive model (2.13) with p = 1, i.e., xt − φ1xt−1 = (1 − φ1B)xt = wt. Then (2.18) becomes

2 (1 + ψ1B + ψ2B + ...)(1 − φ1B) = 1

2 Equating coefficients of B implies that ψ1 − φ1 = 0 or ψ1 = φ1. For B , 2 k we would get ψ2 − φ1ψ1 = 0, or ψ2 = φ1. Continuing, we obtain ψk = φ1 and the representation is

X∞ k k ψ(B) = 1 + φ1 B k=1

and we have X∞ k xt = φ1 wt−k. (2.19) k=0 The representation (2.16) is fundamental for developing approximate forecasts and also exhibits the series as a linear process of the form con- sidered in Problem 1.4.

For data involving such autoregressive (AR) models as defined above, the main selection problems are deciding that the autoregressive structure is ap- propriate and then in determining the value of p for the model. The ACF of the process is a potential aid for determining the order of the process as are the model selection measures (2.8)-(2.10). To determine the ACF of the pth order AR in (2.13), , write the equation as

Xp xt − φkxt−k = wt k=1 2.2 AR Models 35

and multiply both sides by xt−h, h = 1, 2,.... Assuming that the mean E(xt) = 0, and using the definition of the autocovariance function (1.2) leads to the equation Xp E[(xt − φkxt−k)xt−h] = E[wtxt−h] k=1 The left-hand side immediately becomes

Xp γx(h) − φkγx(h − k). k=1

The representation (2.16) implies that

E[wtxt−h] = E[wt(wt−h + ψ1wt−h−1 + ψ2wt−h−2 + ...)]

2 For h = 0, we get σw. For all other h, the fact that the wt are independent implies that the right-hand side will be zero. Hence, we may write the equations for determining γx(h) as Xp 2 γx(0) − φkγx(h − k) = σw (2.20) k=1 and Xp γx(h) − φkγx(h − k) = 0 (2.21) k=1 for h = 1, 2, 3,.... Note that one will need the property γx(−h) = γx(h) in solving these equations. Equations (2.20) and (2.21) are called the Yule- Walker Equations (see Yule, 1927, Walker, 1931).

Example 2.3

Consider finding the ACF of the first-order autoregressive model. First, 2 (2.21) implies that γx(0) − φ1γx(1) = σw. For h = 1, 2,..., we obtain γx(h) − φ1γx(h − 1) = 0 Solving these successively gives

h γx(h) = γx(0)φ1

Combining with (2.20) yields

2 σw γx(0) = 2 1 − φ1 It follows that the autocovariance function is

2 σw h γx(h) = 2 φ1 1 − φ1 36 1 Univariate Time Series Models

Taking into account that γx(−h) = γx(h) and using (1.3), we obtain

|h| ρx(h) = φ1

for h = 0, ±1, ±2,....

The exponential decay is typical of autoregressive behavior and there may also be some periodic structure. However, the most effective diagnostic of AR structure is in the PACF and is summarized by the following identification property:

Property P2.2: PACF for AR Process

The partial autocorrelation function φhh as a function of lag h is zero for h > p, the order of the autoregressive process. This enables one to make a preliminary identification of the order p of the process using the partial autocorrelation function PACF. Simply choose the order beyond which most of the sample values of the PACF are approximately zero.

To verify the above, note that the PACF (see Section 1.3.3) is basically the last cofficient obtained when minimizing the squared error

Xh 2 MSE = E[(xt+h − akxt+h−k) ]. k=1

Setting the derivatives with respect to aj equal to zero leads to the equations

Xh E[((xt+h − akxt+h−k)xt+h−j] = 0 k=1

This can be written as

Xh γx(j) − akγx(j − k) = 0 i=1 for j = 1, 2, . . . , h. Now, from Equation and (2.21), it is clear that, for an AR(p), we may take ak = φk for k ≤ p and ak = 0 for k > p to get a solution for the above equation. This implies Property P2.3 above. Having decided on the order p of the model, it is clear that, for the esti- mation step, one may write the model (2.13) in the regression form

0 xt = φφφ zzzt + wt, (2.22) 2.2 AR Models 37

0 0 where φφφ = (φ1, φ2, . . . , φp) corresponds to βββ and zzzt = (xt−1, xt−2, . . . , xt−p) is the vector of dependent variables in (2.2). Taking into account the fact that xt is not observed for t ≤ 0, we may run the regression approach in Section 3.1 for t = p, p + 1, . . . , n − 1 to get estimators for φ and for σ2, the variance of the white noise process. These so-called conditional maximum likelihood estimators are commonly used because the exact maximum likelihood estimators involve solving nonlinear equations.

Example 2.4

We consider the simple problem of modeling the recruit series shown in Figure 1.1 using an autoregressive model. The bottom panel of Figure 1.9 shows the autocorrelation ACF and partial autocorelation PACF func- tions of the recruit series. The PACF has large values for h = 1, 2 and then is essentially zero for higher order lags. This implies by Property P2.2 above that a second order (p = 2) AR model might provide a good fit. Running the regression program for the model

xt = β0 + φ1xt−1 + φ2xt−2 + wt

leads to the estimators

ˆ ˆ ˆ 2 β0 = 6.74(1.11), φ1 = 1.35(.04), φ2 = −.46(.04), σˆ = 90.31

where the estimated standard deviations are in parentheses. To deter- mine whether the above order is the best choice, we fitted models for p = 1,..., 10, obtaining corrected AICC values of 5.75, 5.52, 5.53, 5.54, 5.54, 5.55, 5.55, 5.56, 5.57, and 5.58 respectively using (2.10) with K = 2. This shows that the minimum AICC obtains for p = 2 and we choose the second-order model.

Example 2.5

The previous example used various autoregressive models for the recruits series, fitting a second-order regression model. We may also use this re- gression idea to fit the model to other series such as a detrended version of the Southern Oscillation Index (SOI) given in previous discussions. We have noted in our discussion of Figure 1.9 from the partial autocor- relation function (PACF) that a plausible model for this series might be a first order autoregression of the form given above with p = 1. Again, putting the model above into the regression framework (2.2) for a sin- ˆ gle coefficient leads to the estimators φ1 = .59 with standard error .04, 2 σˆ = .09218 and AICC (1) = −1.375. The ACF of these residuals (not shown), however, will still show cyclical variation and it√ is clear that they still have a number of values exceeding the ±1.96/ n threshold 38 1 Univariate Time Series Models

(see Equation 1.14). A suggested procedure is to try higher order au- toregressive models and successive models for p = 1, 2,..., 30 were fitted and the AICC (K) values are plotted in Figure 3.10 of Chapter 3 so we do not repeat it here. There is a clear minimum for a p = 16th order model. The coefficient vector is φˆ with components .40, .07, .15, .08, -.04, -.08, -.09, -.08, .00, .11, .16, .15, .03, -.20, -.14 and -.06 andσ ˆ2 = .07354. Finally, we give a general approach to forecasting for any process that can be written in the form (2.16). This includes the AR, MA and ARMA processes. We begin by defining an h-step forecast of the process xt as

t xt+h = E[xt+h|xt, xt−1,...] (2.23)

Note that this is not exactly right because we only have x1, xt, . . . , xt available, so that conditioning on the infinite past is only an approximation. From this t definition is reasonable to intuit that xs = xt, s ≤ t and

E[ws|xt, xt−1,...] = E[ws|wt, wt−1 ...] = ws, (2.24)

t for s ≤ t. For s > t, use xs and

E[ws|xt, xt−1,...] = E[ws|wt, wt−1,...] = E[ws] = 0, (2.25) since ws will be independent of past values of wt. We define the h-step forecast variance as t t 2 Pt+h = E[(xt+h − xt+h) |xt, xt−1,...]. (2.26)

To develop an expression for this mean square error, note that, with ψ0 = 1, we can write X∞ xt+h = ψkwt+h−k. k=0 t Then, since wt+h−k = 0 for t + h − k > t, i.e. k < h, we have X∞ t xt+h = ψkwt+h−k, k=h so that the residual is hX−1 t xt+h − xt+h = ψkwt+h−k, k=0 Hence, the mean square error (2.26) is just the variance of a linear combination 2 of independent zero mean errors, with common variance σw

hX−1 t 2 2 Pt+h = σw ψk (2.27) k=0 2.2 AR Models 39

As an example, we consider forecasting the second order model developed for the recruit series in Example 2.5.

Example 2.6

t Consider the one-step forecast xt+1 first. Writing the defining equation for t + 1 gives

xt+1 = φ1xt + φ2xt−1 + wt+1, so that

t t t t xt+1 = φ1xt + φ2xt−1 + wt+1

= φ1xt + φ2xt−1 + 0

Continuing in this vein, we obtain

t t t t xt+2 = φ1xt+1 + φ2xt + wt+2 t = φ1xt+1 + φ2xt + 0.

Then,

t t t t xt+h = φ1xt+h−1 + φ2xt+h−2 + wt+h t t = φ1xt+h−1 + φ2xt+h−2 + 0

for h > 2. Forecasts out to lag h = 4 and beyond, if necessary, can be found by solving (2.18) for ψ1, ψ2 and ψ3, and substituting into (2.26). By equating coefficients of B,B2 and B3 in

2 2 3 (1 − φ1B − φ2B )(1 + ψ1B + ψ2B + ψ3B + ...) = 1,

we obtain ψ1 = φ1, ψ2 − φ2 + φ1ψ1 = 0 and ψ3 − φ1ψ2 − φ2ψ1 = 0. 2 2 This gives the coefficients ψ1 = φ1, ψ2 = φ2 − φ1, ψ3 = 2φ2φ1 − φ1 From ˆ ˆ 2 ˆ Example 2.5, we have φ1 = 1.35, φ2 = −.46, σˆw = 90.31 and β0 = 6.74. The forecasts are of the form

t t t xt+h = 6.74 + 1.35xt+h−1 − .46xt+h−2

For the forecast variance, we evaluate ψ1 = 1.35, ψ2 = −2.282, ψ3 = −3.065, leading to 90.31, 90.31(2.288), 90.31(7.495) and 90.31(16.890) for forecasts at h = 1, 2, 3, 4. The standard deviations of the forecasts are 9.50, 14.37, 26.02 and 39.06 for the standard errors of the forecasts. The recruit series values from 20 to 100 so the forecast uncertainty will be rather large. 40 1 Univariate Time Series Models

2.4 Moving Average (MA) Models

We may also consider processes that contain linear combinations of underlying unobserved shocks, say, represented by white noise series wt. These moving average components generate a series of the form

xt = wt − θ1wt−1 − θ2wt−2 − ... − θqwt−q (2.28) where q denotes the order of the moving average component and θ1, θ2, . . . , θq are parameters to be estimated. Using the shift notation, the above equation can be written in the form xt = θ(B)wt (2.29) where 2 q θ(B) = 1 − θ1B − θ2B − ... − θqB (2.30) is another polynomial in the shift operator B. It should be noted that the MA process of order q is a linear process of the form considered earlier in Problem 1.4 with ψ0 = 1, ψ2 = −θ1, . . . , ψq = θq. This implies that the ACF will be zero for lags larger than q because terms in the form of the covariance function given in Problem 1.4 of Chapter 1 will all be zero. Specifically, he exact forms are µ Xq ¶ 2 2 γx(0) = σw 1 + θk (2.31) k=1 for h + 0 and µ qX−h ¶ 2 γx(h) = σw −θh + θk+hθk (2.32) k=1 2 for h = 1, . . . , q − 1, with γx(q) = −σwθq, and γx(h) = 0 for h > q. Hence, we will have

P2.3: ACF for MA Series For a moving average series of order q, note that the autocor- relation function (ACF) is zero for lags h > q, i.e. ρx(h) = 0 for h > q. Such a result enables us to diagnose the order of a moving average component by examining ρˆx(h) and choosing q as the value beyond which the coefficients are essentially zero.

Example 2.7

Consider the varve thicknesses in Figure 1.10, which is described in Prob- lem 1.7 of Chapter 1. Figure 2.4 shows the ACF and PACF of the original log-transformed varve series and the first differences. The ACF of the original series indicates a possible non-stationary behavior, and suggests taking a first difference, interpreted hear as the percentage yearly change 2.2 MA Models 41

in deposition. The ACF of the first difference shows a clear peak at h = 1 and no other significant peaks, suggesting a first-order moving average. Fitting the first order moving average model xt = wt − θ1wt−1 to this ˆ data using the Gauss-Newton procedure described next leads to θ1 = .77 2 andσ ˆw = .2358.

ACF PACF 1 1

log varves 0.5 0.5

0 0

−0.5 −0.5 0 10 20 30 0 10 20 30

1 1

First difference 0.5 0.5

0 0

−0.5 −0.5 0 10 20 30 0 10 20 30

Figure 2.4 Autocorrelation functions (ACF) and partial autocorrelation functions (PACF) for the log varve series (top panel) and the first difference (bottom panel), showing a peak in the ACF at lag h = 1.

Fitting the pure moving average term turns into a nonlinear problem as we can see by noting that either maximum likelihood or regression involves solving (2.28) or (2.29) for wt, and minimizing the sum of the squared errors. Suppose that the roots of π(B) = 0 are all outside the unit circle, then this is possible 0 by solving π(B)θ(B) = 1, so that, for the vector parameter θθθ = (θ1, . . . , θq) , we may write wt(θθθ) = π(B)xt (2.33) and minimize Xn 2 SSE(θθθ) = wt (θθθ) t=q+1 as a function of the vector parameter θθθ. We don’t really need to find the opera- tor π(B) but can simply solve (2.23) recursively for wt, with w1, w2, . . . wq = 0 42 1 Univariate Time Series Models and Xq wt(θθθ) = xt + θkwt−k k=1 for t = q+1, . . . , n. It is easy to verify that SSE(θθθ) will be a nonlinear function of θ1, θ2, . . . , θq. However, note that µ ¶ ∂w 0 w (θθθ) ≈ w (θθθ ) + t (θθθ − θθθ ), t t 0 ∂t 0 where the derivative is evaluated at the previous guess θθθ0. Rearranging the above equation leads to µ ¶ ∂w 0 w (θθθ ) ≈ − t (θθθ − θθθ ) + w (θθθ), (2.34) t 0 ∂θθθ 0 t which is just the regression model (2.2). Hence, we can begin with an ini- 0 tial guess θθθ0 = (.1,.1,...,.1) , say and successively minimize SSE(θθθ) until convergence. In order to forecast a moving average series, note that Xq xt+h = wt+h − θkwt+h−k. k=1 The results below (2.24) imply that Xq t t xt+h = − θkwt+h−k, k=h+1 where the wt values needed for the above are computed recursively as before. Because of (2.17), it is clear that ψ0 = 1 and ψk = −θk, k = 1, 2, . . . , q and these values can be substituted directly into the variance formula (2.27).

2.5 Autoregressive Integrated Moving Average (ARIMA) Models

Now combining the autoregressive and moving average components leads to the autoregressive moving average ARMA(p, q) model, written as

φ(B)xt = θ(B)wt, (2.35) where the polynomials in B are as defined earlier in (2.15) and (2.29), with p autoregressive coefficients and q moving average coefficients. In the difference equation form, this becomes Xp Xq xt − φkxt−k = wt − θkwt−k. (2.36) k=1 k=1 2.5 ARIMA Models 43

The mixed processes do not satisfy the properties P2.1-P2.3 any more but they tend to behave in approximately the same way, even for the mixed cases. Estimation and forecasting for such problems are treated in essentially the same manner as for the AR and MA processes. We note that we can formally divide both sides of (2.25) by φ(B) and note that the usual representation (2.16) holds when ψ(B)φ(B) = θ(B). (2.37)

2 3 For forecasting, we determine the ψ1, ψ2,... by equating coefficients of B,B ,B ,... in (2.37), as before, assuming the all the roots of φ(B) = 0 are greater than one in absolute value. Similarly, we can always solve for the residuals, say

Xp Xq wt = xt − φkxt−k + θkwt−k (2.38) k=1 k=1 to get the terms needed for forecasting and estimation.

Example 2.8 Consider the above mixed process with p = q = 1, i.e. ARMA(1, 1). By (2.26), we may write

xt = φ1xt−1 + wt − θ1wt−1.

Now,

xt+1 = φ1xt + wt+1 − θ1wt so that t xt+1 = φ1xt + 0 − θ1wt t t and xt+h = φxt+h−1 for h > 1, leading to very simple forecasts in this case. Equating coefficients of Bk in

2 (1 − φB)(1 + ψ1B + ψ2B + ...) = (1 − θ1B)

leads to k−1 ψk = (φ1 − θ1)φ1 for k = 1, 2,.... Using (2.26) leads to the expression · ¸ t 2 2 Ph−1 2(k−1) Pt+h = σw 1 + (φ1 − θ1) k=1 φ1 · ¸ 2 2(h−1) 2 (φ1−θ1) (1−φ1 ) = σw 1 + 2 (1−φ1)

for the forecast variance. 44 1 Univariate Time Series Models

In the first example of this chapter, it was noted that nonstationary pro- cesses are characterized by a slow decay in the ACF as in Figure 2.3. In many of the cases where slow decay is present, the use of a first order difference

∆xt = xt − xt−1

= (1 − B)xt will reduce the nonstationary process xt to a stationary series ∆xt. On can check to see whether the slow decay has been eliminated in the ACF of the d d−1 transformed series. Higher order differences, ∆ xt = ∆∆ xt are possible and we call the process obtained when the dth difference is an ARMA series an ARIMA(p, d, q) series where p is the order of the autoregressive component, d is the order of differencing needed and q is the order of the moving average component. Symbolically, the form is

d φ(B)∆ xt = θ(B)wt (2.39)

The principles of model selection for ARIMA(p, d, q) series are obtained using the extensions of (2.8)-(2.10) which replace K by K = p + q the total number of ARMA parameters.

2.6 Seasonal ARIMA Models

When the autoregressive, differencing, or seasonal moving average behavior seems to occur at multiples of some underlying period s, a seasonal ARIMA series may result. The seasonal nonstationarity is characterized by slow decay at multiples of s and can often be eliminated by a seasonal differencing operator of the form D s D ∇s xt = (1 − B ) xt. For example, when we have monthly data, it is reasonable that a yearly phe- nomenon will induce s = 12 and the ACF will be characterized by slowly decaying spikes at 12, 24, 36, 48,... and we can obtain a stationary series by 12 transforming with the operator (1−B )xt = xt −xt−12 which is the difference between the current month and the value one year or 12 months ago. If the autoregressive or moving average behavior is seasonal at period s, we define formally the operators

s s 2s P s Φ(B ) = 1 − Φ1(B ) − Φ2(B ) − ... − ΦP (B ) (2.40) and s s 2s Qs Θ(B ) = 1 − Θ1(B ) − Θ2(B ) − ... − ΘQ(B ). (2.41)

The final form of the ARIMA(p, d, q) × ARIMA(P,D,Q)s model is

s s d s Φ(B )φ(B)∆D∆ xt = Θ(B )θ(B)wt (2.42) 2.5 SARIMA Models 45

We may also note the properties below corresponding to P2.1-P2.3

Property P2.1’: ACF and PACF of a seasonally non-stationary time series The ACF of a seasonally non-stationary time series decays very slowly at lag multiples s, 2s, 3s, . . . with zeros in between, where s denotes a seasonal period ,usually 12. The PACF of a non-stationary time series tends to have a peak very near unity at lag s.

Property P2.2’: PACF for Seasonal AR Series

The partial autocorrelation function φhh as a function of lag h has nonzero values at s, 2s, 3s, . . . , P s, with zeros in between, and is zero for h > P s, the order of the seasonal autoregressive process. There should be some exponential decay.

P2.3’: ACF for a Seasonal MA Series For a seasonal moving average series of order Q, note that the autocorrelation function (ACF) has nonzero values at s, 2s, 3s, . . . , Qs and is zero for h > Qs Example 2.9: We illustrate by fitting the monthly birth series from 1948-1979 shown in Figure 2.5. The period encompasses the boom that followed the Second World War and there is the expected rise which persists for about 13 years followed by a decline to around 1974, The series appears to have long-term swings, with seasonal effects super-imposed. The long-term swings indicate possible non-stationarity and we verify that this is the case by checking the ACF and PACF shown in the top panel of Figure 2.6. Note, that by Property 2.1, slow decay of the ACF indicates non- stationarity and we respond by taking a first difference. The results shown in the second panel of Figure 2.5 indicate that the first difference has eliminated the strong low frequency swing. The ACF, shown in the second panel from the top in Figure 2.6 shows peaks at 12, 24, 36, 48, ..., with now decay. This behavior implies seasonal non-stationarity, by Property P2.1’ above, with s = 12. A seasonal difference of the first difference generates an ACF and PACF in Figure 2.6 that we expect for stationary series. Taking the seasonal difference of the first difference gives a series that looks stationary and has an ACF with peaks at 1 and 12 and a PACF with a substantial peak at 12 and lesser peaks at 12,24, .... This suggests trying either a first order moving average term, by Property P2.3, or a first order 46 1 Univariate Time Series Models

400 Births

300

200 50 100 150 200 250 300 350 50 1st diff.

0

−50 50 100 150 200 250 300 350 50 ARIMA(0,1,0)X(0,1,0) 12 0

−50 50 100 150 200 250 300 350 50

ARIMA(0,1,1)X(0,1,1) 12 0

−50 50 100 150 200 250 300 350

month

Figure 2.5 Number of live births 1948(1)-1979(1) and residuals from models with a first difference, a first difference and a seasonal difference of order 12 and a fitted ARIMA(0, 1, 1) × (0, 1, 1)12 model. 2.5 SARIMA Models 47

ACF PACF 1 1 0.5 0.5 data 0 0 −0.5 −0.5 0 20 40 60 0 20 40 60 1 1 0.5 0.5 ARIMA(0,1,0) 0 0 −0.5 −0.5 0 20 40 60 0 20 40 60 1 1 0.5 0.5 ARIMA(0,1,0)X(0,1,0) 12 0 0 −0.5 −0.5 0 20 40 60 0 20 40 60 1 1 0.5 0.5 ARIMA(0,1,0)X(0,1,1) 12 0 0 −0.5 −0.5 0 20 40 60 0 20 40 60 1 1

0.5 0.5 ARIMA(0,1,1)X(0,1,1) 12 0 0 −0.5 −0.5 0 20 40 60 0 20 40 60

lag lag

Figure 2.6 Autocorrelation functions (ACF) and partial autocorrelation functions (PACF) for the birth series (top two panels), the first difference (second two panels) an ARIMA(0, 1, 0×(0, 1, 1)12 model (third two panels) and an ARIMA(0, 1, 1) × (0, 1, 1)12 model (last two panels.

seasonal moving average term with s = 12, by Property P2.3’ above. We choose to eliminate the largest peak first by applying a first-order seasonal moving average model with s = 12. The ACF and PACF of the residual series from this model, i.e. from ARIMA(0, 1, 0) × (0, 1, 112, written as

12 12 (1 − B)(1 − B )xt = (1 − Θ1B )wt, 48 1 Univariate Time Series Models

450

400

350

upper 95%

300

forecast

Forecast 1979(2)−1982(1) 250

200

lower 95%

150 370 375 380 385 390 395 400 405 410 month

Figure 2.7 A 36 month forecast for the birth series with 95% uncertainty limits.

is shown in the fourth panel from the top in Figure 2.6. We note that the peak at lag one is still there, with attending exponential decay in the PACF. This can be eliminated by fitting a first-order moving average term and we consider the model ARIMA(0, 1, 1) × (0, 1, 1)12, written as

12 12 (1 − B)(1 − B )xt = (1 − θ1B)(1 − Θ1B )wt

The ACF of the residuals from this model are relatively well behaved with a number of peaks either near or exceeding the 95% test of no correlation. Fitting this final ARIMA(0, 1, 1) × (0, 1, 1)12 model leads to the model

12 12 (1 − B)(1 − B )xt = (1 − .4896B)(1 − .6844B )wt

2 2 AICc = 4.95,R = .9804 = .961, P − values = .000,.000 R2 is computed from saving the predicted values and then plotting against the observed values using the 2-D plot option. The format that ASTSA puts out these results is shown below.

ARIMA(0,1,1)x(0,1,1)x12 from U.S. Births AICc = 4.94684 variance = 51.1906 d.f. = 358 Start values = .1 2.5 SARIMA Models 49

predictor coef st. error t-ratio p-value MA(1) .4896 .04620 10.5966 .000 SMA(1) .6844 .04013 17.0541 .000

(D1) (D(12)1) x(t) = (1 -.49B1) (1 -.68B12) w(t)

The ARIMA search in ASTSA leads to the model

12 12 2 12 (1−.0578B )(1−B)(1−B )xt = (1−.4119B−.1515B )(1−.8136B )wt

with AICc = 4.8526, somewhat lower than the previous model. The sea- sonal autoregressive coefficient is not statistically significant and should probably be omitted from the model. The new model becomes

12 2 12 (1 − B)(1 − B )xt = (1 − .4088B − .1645B )(1 − .6990B )wt,

2 2 yielding AICc = 4.92 and R = .981 = .962, slightly better than the ARIMA(0, 1, 1) × (0, 1, 1)12 model. Evaluating these latter models leads to the conclusion that the extra parameters do not add a practically substantial amount to the predictability. The model is expanded as

12 12 (1 − B)(1 − B )xt = (1 − θ1B)(1 − Θ1B )wt 12 13 12 13 (1 − B − B + B )xt = (1 − θ1B − θ1B + θ1Θ1B )wt

so that

xt − xt−1 − xt−12 + xt−13 = wt − θ1wt−1 − Θ1wt−12 + θ1Θ1wt−13

or

xt = xt−1 + xt−12 − xt−13 + wt − θ1wt−1 − Θ1wt−12 + θ1Θ1wt−13

The forecast is

t xt+1 = xt + xt−11 − xt−12 − θ1wt − Θ1wt−11 + θ1Θ1wt−12

t t xt+2 = xt+1 + xt−10 − xt−11 − Θ1wt−10 + θ1Θ1wt−11 Continuing in the same manner, we obtain

t t xt+12 = xt+11 + xt − xt−1 − Θ1wt + θ1Θ1wt−1

for the 12 month forecast. 50 1 Univariate Time Series Models

The forecast limits are quite variable with a standard error that rises to 20% of the mean by the end of the forecast period The plot shows that the general trend is upward, rising from about 250,000 to about 290,000 births per year. One could check the actual records from the years 1979- 1982. The direction is not certain because of the large uncertainty. One could compute the probability µ ¶ 250 − 290 P (B ≤ 250, 000) = Φ = .25, t+47 60 so there is a 75% chance of increase. A website where the forecasts can be compared on a yearly basis is http://www.cdc.gov/nccdphp/drh/pdf/nvs/nvs48 tb1.pdf

Example 2.10: Figure 2.8 shows the autocorrelation function of the log-transformed J&J earnings series that is plotted in Figure 1.4 and we note the slow decay indicating the nonstationarity which has already been obvious in the Chapter 1 discussion. We may also compare the ACF with that of a random walk, shown in Figure 3.2, and note the close similarity. The partial autocorrelation function is very high at lag one which, under or- dinary circumstances, would indicate a first order autoregressive AR(1,0) model, except that, in this case, the value is close to unity, indicating a root close to 1 on the unit circle. The only question would be whether differencing or detrending is the better transformation to stationarity. Following in the Box-Jenkins tradition, differencing leads to the ACF and PACF shown in the second panel and no simple structure is appar- ent. To force a next step, we interpret the peaks at 4, 8, 12, 16,... as contributing to a possible seasonal autoregressive term, leading to a pos- sible ARIMA(0, 1, 0)×(1, 0, 0)4 and we simply fit this model and look at the ACF and PACF of the residuals, shown in the third two panels. The fit improves somewhat, with significant peaks still remaining at lag 1 in both the ACF and PACF. The peak in the ACF seems more isolated and there remains some exponentially decaying behavior in the PACF, so we try a model with a first-order moving average. The bottom two panels show the ACF and PACF of the resulting ARIMA(0, 1, 1) × (1, 0, 0)4 and we note only relatively minor excursions above and below the 95% intervals under the assumption that the theoretical ACF is white noise. The final model suggested is (yt = log x2) 4 (1 − Φ1B )(1 − B)yt = (1 − θ1B)wt, ˆ ˆ 2 where Φ1 = .820(.058), θ1 = .508(.098) andσ ˆw = .0086. The model can be written in forecast form as

yt = yt−1 + Φ1(yt−4 − yt−5) + wt − θ1wt−1. 2.6 Correlated Regression 51

To forecast the original series for, say 4 quarters, we compute the forecast limits for yt = log xt and then exponentiate, i.e.

t t xt+h = exp{yt+h}

We note the large limits on the forecast values in Figure 2.9 and mention that the situation can be improved by the regression approach in the next section

2.7 Regression Models With Correlated Errors

The standard method for dealing with correlated errors et in the in the regres- sion model 0 0 yt = βββ zzzt + et (2.2) is to try to transform the errors et into uncorrelated ones and then apply the standard least squares approach to the transformed observations. For example, 0 let P be an n × n matrix that transforms the vector eee = (e1, . . . , en) into a set of independent identically distributed variables with variance σ2. Then, transform the matrix version (2.4) to

Pyyy = P Zβββ + Peee and proceed as before. Of course, the major problem is deciding on what to choose for P but in the time series case, happily, there is a reasonable solution, based again on time series ARMA models. Suppose that we can find, for example, a reasonable ARMA model for the residuals, say, for example the ARMA(p,0,0) model Xp et = φket−k + wt, k=1 which defines a linear transformation of the correlated et to a sequence of uncorrelated wt. We can ignore the problems near the beginning of the series by starting at t = p. In the ARMA notation, using the backshift operator B, we may write φ(B)et = wt, (2.43) where Xp k φ(B) = 1 − φkB , (2.44) k=1 and applying the operator to both sides of (2.2) leads to the model

0 φ(B)yt = βββ φ(B)zzzt + wt, (2.45) 52 1 Univariate Time Series Models

ACF PACF 1 1 log(J&J) 1 0.5 0.5

0 0

−0.5 −0.5 0 10 20 30 0 10 20 30 1 1 4 8 12 diff 0.5 0.5

0 0

−0.5 −0.5 0 10 20 30 0 10 20 30 1 1 ARIMA(0,1,0)X(1,0,0) 0.5 4 0.5

0 0 1 1 −0.5 −0.5 0 10 20 30 0 10 20 30 1 1 ARIMA(0,1,1)X(1,0,0) 0.5 4 0.5

0 0

−0.5 −0.5 0 10 20 30 0 10 20 30

lag h lag h

Figure 2.8 Autocorrelation functions (ACF) and partial autocorrelation functions (PACF) for the log J&J earnings series (top two panels), the first difference (second two panels) and two sets of ARIMA residuals. 2.6 Correlated Regression 53

30

25

forecasts 20

− observed

15 earnings −− predicted

10

5

0 0 10 20 30 40 50 60 70 80 90 quarter

Figure 2.9 Observed and predicted values for the Johnson and Johnson Earnings Series with forecast values for the next four quarters, using the ARIMA(0, 1, 1)× (1, 0, 0)4 model for the log-transformed data.

where the wt now satisfy the independence assumption. Doing ordinary least squares on the transformed model is the same as doing weighted least squares on the untransformed model. The only problem is that we do not know the values of the coefficients φk, k = 1, . . . , p in the transformation (2.42). However, if we knew the residuals et, it would be easy to estimate the coefficients, since (2.42) can be written in the form

0 et = φφφ eeet−1 + wt, (2.46)

0 which is exactly the usual regression model (2.2) with φ = (φ1, . . . , φp) replac- 0 ing β and eeet−1 = (et−1, et−2, . . . , et−p) replacing zzzt. The above comments suggest a general approach known as the Cochran- Orcutt procedure (Cochrane and Orcutt, 1949) for dealing with the problem of correlated errors in the time series context. 1. Begin by fitting the original regression model (2.2) by least squares, ob- ˆ ˆ0 taining βββ and the residualse ˆt = yt − βββ zzzt

2. Fit an ARMA to the estimated residuals, say

φ(B)ˆet = θ(B)wt 54 1 Univariate Time Series Models

3. Apply the ARMA transformation found to both sides of the regression equation (2.2)’ to obtain

φ(B) φ(B) y = βββ0 zzz + w θ(B) t θ(B) t t

4. Run an ordinary least squares on the transformed values to obtain the new βββˆ. 5. Return to 2. if desired. Often, one iteration is enough to develop the esti- mators under a reasonable correlation structure. In general, the Cochran- Orcutt procedure converges to the maximum likelihood or weighted least squares estimators.

ACF PACF 1 1

4,8,12,16 4

0.5 detrended 0.5

0 0

−0.5 −0.5 0 10 20 30 0 10 20 30

1 1

ARIMA(1,0,0) 0.5 4 0.5

0 0

−0.5 −0.5 0 10 20 30 0 10 20 30

Figure 2.10 Autocorrelation functions (ACF) and partial autocorrelation functions (PACF) for the detrended log J&J earnings series (top two panels)and the fitted ARIMA(00, 0, 0) × (1, 0, 0)4 residuals. 2.6 Correlated Regression 55

30

25 forecasts

20

− observed

15 earnings −− predicted

10

5

0 0 10 20 30 40 50 60 70 80 90 quarter

Figure 2.8 Observed and predicted values for the Johnson and Johnson Earnings Se- ries with forecast values for the next four quarters, using the correlated regression model for the log-transformed data.

Example 2.11:

We might consider an alternative approach to treating the Johnson and Johnson Earnings Series, assuming that

yt = log xt = β1 + β2t + et

In order to analyze the data with this approach, first we fit the model ˆ ˆ above, obtaining β1 = −.6678(.0349) and β2 = .0417(.0071). The com- ˆ ˆ puted residualse ˆt = yt − β1 − β2 t can be computed easily, the ACF and PACF are shown in the top two panels of Figure 2.7. Note that the ACF and PACF suggest that a seasonal AR series will fit well and we show the ACF and PACF of these residuals in the bottom panels of Figure 2.7. The seasonal AR model is of the form

et = Φ1et−4 + wt

ˆ 2 and we obtain Φ1 = .7614(.0639), withσ ˆw = .00779. Using these values, we transform yt to

yt − Φˆ 1yt−4 = β1(1 − Φˆ 1) + β2[t − Φˆ 1(t − 4)] + wt 56 1 Univariate Time Series Models

using the estimated value Φˆ 1 = .7614. With this transformed regression, ˆ ˆ we obtain the new estimators β1 = −.7488(.1105) and β2 = .0424(.0018). The new estimator has the advantage of being unbiased and having a smaller generalized variance. ˆ To forecast, we consider the original model, with the newly estimated β1 ˆ and β2. We obtain the approximate forecast for

t ˆ ˆ t yt+h = β1 + β2(t + h) +e ˆt+h

for the log transformed series, along with upper and lower limits de- pending on the estimated variance that only incorporates the prediction t variance of et+h, considering the trend and seasonal autoregressive pa- rameters as fixed. The narrower upper and lower limits shown in Figure 2.8 are mainly a reflection of a slightly better fit to the residuals and the ability of the trend model to take care of the nonstationarity.

2.8 Chapter 2 Problems

2.1 Consider the regression model

yt = β1yt−1 + et

2 where et is white noise with zero-mean and variance σe . Assume that we observe y1, y2, . . . , yn and consider the model above for t = 2, 3, . . . , n. Show that the least squares estimator of β1 is

Pn ˆ t=2 ytyt−1 β1 = Pn 2 . t=2 yt−1

If we pretend that yt−1 are fixed, show that

2 ˆ σe var{β1} = Pn 2 t=2 yt−1 Relate your answer to a method for fitting a first-order AR model to the data yt.

2.2 Consider the autoregressive model (2.13) for p = 1, i.e.

xt − φ1xt−1 = wt

(a) show that the necessary condition below (2.15) implies that |φ1| < 1. Chapter 2 Problems 57

(b) Show that X∞ k xt = φ1 wt−k k=0 is the form of (2.16) in this case. 2 (c) Show that E[wtxt] = σw and E[wtxt−1] = 0, so that future errors are uncorrelated with past data.

2.3 The autocovariance and autocorrelation functions for AR processes are often derived from the Yule-Walker equations, obtained by multiplying both sides of the defining equation, successively by xt, xt−1, xt−2,..., using the result (2.16).

(a) Derive the Yule-Walker equations  2  σw, h = 0 γ (h) − φ γ (h − 1) = x 1 x  0, h > 0.

(b) Use the Yule-Walker equations to show that

|h| ρx(h) = φ1

for the first-order AR.

2.4 For an ARMA series we define the optimal forecast based on xt, xt−1,... as the conditional expectation

t xt+h = E[xt+h|xt, xt−1,...]

for h = 1, 2, 3,....

(a) Show, for the general ARMA model that   0, h > 0 E[w |x , x ,...] = t+h t t−1  wt+h, h ≤ 0

(b) For the first-order AR model, show that the optimal forecast is   φ1xt, h = 1 t xt+h =  t φ1xt+h−1, h > 1

t 2 2 (c) Show that E[(xt+1 − xt+1) ] = σw is the prediction error variance of the one-step forecast. 58 1 Univariate Time Series Models

2.5 Suppose we have the simple linear trend model

yt = β1t + xt, t = 1, 2, . . . , n, where xt = φ1xt−1 + wt. Give the exact form of the equations that you would use for estimating 2 β1, φ1 and σw using the Cochran-Orcutt procedure of Section 2.7.

150

LA Cardiovascular Mortality

100

50 50 100 150 200 250 300 350 400 450 500

100 Temperature

80

60

40 50 100 150 200 250 300 350 400 450 500

100 Particulate Level

50

0 50 100 150 200 250 300 350 400 450 500

Figure 2.9 Los Angeles Mortality, Temperature and Particulates (6-day increment).

2.6 Consider the file la regr.dat, in the syllabus, which contains cardio- vascular mortality, temperature values and particulate levels over 6-day periods from Los Angeles County (1970-1979). The file also contains two dummy variables for regression purposes, a column of ones for the con- stant term and a time index. The order is as follows: Column 1: 508 cardiovascular mortality values (6-day ), Column 2: 508 ones, Column 3: the integers 1, 2,..., 508, Column 3: Temperature in degrees F and Column 4: Particulate levels. A reference is Shumway et al (1988). The point here is to examine possible relations between the temperature and mortality in the presence of a time trend in cardiovascular mortality. (a) Use scatter diagrams to argue that particulate level may be lin- early related to mortality and that temperature has either a linear or quadratic relation. Check for lagged relations using the cross correlation function. Chapter 2 Problems 59

(b) Adjust temperature for its mean value, using the Scale option and fit the model

2 Mt = β0 + β1(Tt − T¯) + β2(Tt − T¯) + β3Pt + et,

where Mt,Tt and Pt denote the mortality, temperature and particu- late pollution series. You can use as inputs Columns 2 and 3 for the trend terms and run the without the constant option. Note that you need to transform temperature first. Retain the residuals for the next part of the problem. (c) Plot the residuals and compute the autocorrelation (ACF) and par- tial autocorrelation (PACF) functions. Do the residuals appear to be white? Suggest an ARIMA model for the residuals and fit the residuals. The simple ARIMA(2, 0, 0) model is a good compromise. (d) Apply the ARIMA model obtained in part (c) to all of the input variables and to cardiovascular mortality using the ARIMA trans- formation option. Retain the forecast values for the transformed ˆ ˆ mortality, saym ˆ t = Mt − φ1Mt−1 − φ2Mt−2.

2.7 Generate 10 realizations of a (n = 200 points each) series from an 2 ARIMA(1,0,1) Model with φ1 = .90, θ1 = .20 and σ = .25. Fit the ARIMA model to each of the series and compare the estimators to the true values by computing the average of the estimators and their stan- dard deviations.

2.8 Consider the bivariate time series record containing monthly U.S. Pro- duction as measured monthly by the Federal Reserve Board Production Index and unemployment as given in the file frb.asd. The file contains n = 372 monthly values for each series. Before you begin, be sure to plot the series. Fit a seasonal ARIMA model of your choice to the Federal Reserve Production Index. Develop a 12 month forecast using the model.

2.9 The file labeled clim-hyd.asd has 454 months of measured values for the climatic variables Air Temperature, Dew Point, Cloud Cover, Wind Speed, Preciptation, and Inflow at Shasta Lake. We would like to look at possible relations between the weather factors and between the weather factors and the inflow to Shasta Lake.

(a) Fit the ARIMA(0, 0, 0) × (0, 1, 1) 2 model to transformed precipi- √ 1 tation Pt = pt and transformed flow it = log it. Save the residuals for transformed precipitation for use in part (b). (b) Apply the ARIMA model fitted in part (a) for transformed precip- itation to the flow series. Compute the cross correlation between the flow residuals using the precipitation ARIMA model and the precipitation residuals using the precipitation model and interpret. 60 1 Univariate Time Series Models

200

150 Federal Reserve Board Production Index

100

50

0 50 100 150 200 250 300 350

1000

800 Monthly Unemployment 600

400

200

0 50 100 150 200 250 300 350 month

Figure 2.10 Federal Reserve Board Production and Unemployment for Problem 2.7.

Use the coefficients from the ARIMA model in the transform option in the main menu to construct the transformed flow residuals. Sug- gest two possible models for relating the two series. More analysis can be done using the transfer function models of Chapter 4. ASTSA Notes 61

2.9 Chapter 2 ASTSA Notes

8. Regression Analysis →Multiple Regression Model (without constant):

yt = β1zt1 + β2zt2 + ... + βqztq + et

Model (with constant):

yt = β0 + β1zt1 + β2zt2 + ... + βqztq + et

Series(dependent):yt No. of independent series: q

series 1: zt1−h1

lag: h1 Often is zero ···

series q: ztq−hq

lag: hq Often is zero forecasts: 0

constant(y/n):

selector(AIC,AICc, BIC, FPEL, AICL): AICc

Save →Residuals Save →Predicted

9. Fit ARIMA(p, d, q) × (P,D,Q)s Time Domain →ARIMA

Series:

p: AR order

d: Difference

q: MA order

P: SAR order

D: Seasonal Difference 62 1 Univariate Time Series Models

Q: SMA order

season: s

forecasts: h

use .1 guess(y/n): y

selector(AIC,AICc, BIC, FPEL, AICL): AICc

Save →Residuals Save →Predicted

10. ARIMA Transformation Transform →Transform →ARIMA Residual

Series:

p: AR order

d: Difference

q: MA order

P: SAR order

D: Seasonal Difference

Q: SMA order

season: s