CHAPTER 2 Univariate Time Series Models 2.1 Least Squares

CHAPTER 2 Univariate Time Series Models 2.1 Least Squares Regression We begin our discussion of univariate and multivariate time series methods by considering the idea of a simple regression model, which we have met before in other contexts. All of the multivariate methods follow, in some sense, from the ideas involved in simple univariate linear regression. In this case, we assume that there is some collection of fixed known functions of time, say zt1; zt2; : : : ztq that are influencing our output yt which we know to be random. We express this relation between the inputs and outputs as yt = ¯1zt1 + ¯2zt2 + ¢ ¢ ¢ + ¯qztq + et (2:1) at the time points t = 1; 2; : : : ; n, where ¯1; : : : ; ¯q are unknown fixed regression coefficients and et is a random error or noise, assumed to be white noise; this means that the observations have zero means, equal variances σ2 and are independent. We traditionally assume also that the white noise series, et, is Gaussian or normally distributed. Example 2.1: We have assumed implicitly that the model yt = ¯1 + ¯2t + et is reasonable in our discussion of detrending in Chapter 1. This is in the form of the regression model (2.1) when one makes the identification zt1 = 1; zt2 = t. The problem in detrending is to estimate the coefficients ¯1 and ¯2 in the above equation and detrend by constructing the estimated residual series et. We discuss the precise way in which this is accomplished below. The linear regresssion model described by Equation (2.1) can be conve- niently written in slightly more general matrix notation by defining the column 2.1: Least Squares Regression 27 0 0 vectors zzzt = (zt1; : : : ; ztq) and ¯¯¯ = (¯1; : : : ; ¯q) so that we write (2.1) in the alternate form 0 yt = ¯¯¯ zzzt + et: (2:2) To find estimators for ¯ and σ2 it is natural to determine the coefficient vector P 2 ¯¯¯ minimizing et with respect to ¯. This yields least squares or maximum likelihood estimator ¯ˆ and the maximum likelihood estimator for σ2 which is proportional to the unbiased n¡1 1 X 0 σˆ2 = (y ¡ ¯¯¯ˆ zzz )2 (2:3) (n ¡ q) t t t=0 An alternate way of writing the model (2.2) is as yyy = Z¯¯¯ + eee (2:4) 0 where Z = (zzz1; zzz2; : : : ; zzzn) is a q£n matrix composed of the values of the input 0 variables at the observed time points and yyy = (y1; y2; : : : ; yn) is the vector of 0 observed outputs with the errors stacked in the vector eee = (e1; e2; : : : ; en) .The ordinary least squares estimators ¯ˆ are the solutions to the normal equations Z0Z¯¯¯ˆ = Z0y; You need not be concerned as to how the above equation is solved in practice as all computer packages have efficient software for inverting the q £ q matrix Z0Z to obtain ¯¯¯ˆ = (Z0Z)¡1Z0yyy: (2:5) An important quantity that all software produces is a measure of uncertainty for the estimated regression coefficients, say ˆcovf¯¯¯ˆg =σ ˆ2 (Z0Z)¡1: (2:6) 0 ¡1 ˆ ˆ 2 If cij denotes an element of C = (Z Z) , then cov( ¯i; ¯j) = σ cij and a 100(1 ¡ ®)% confidence interval for ¯i is ˆ p ¯i § tn¡q(®=2)ˆσ cii; (2:7) where tdf (®=2) denotes the upper 100(1 ¡ ®)% point on a t distribution with df degrees of freedom. Example 2.2: Consider estimating the possible global warming trend alluded to in Sec- tion 1.1.2. The global temperature series, shown previously in Figure 1.3 suggests the possibility of a gradually increasing average temperature over the 123 year period covered by the land-based series. If we fit the model in Example 2.1, replacing t by t=100 to convert to a 100 28 1 Univariate Time Series Models year base so that the increase will be in degrees per 100 years, we obtain ˆ ˆ ¯1 = 38:72; ¯2 = :9501 using (2.5). The error variance, from (2.3), is .0752, with q = 2 and n = 123. Then (2.6) yields µ ¶ 1:8272 ¡:0941 ˆcov(¯ˆ ; ¯ˆ ) = ; 1 2 ¡:0941 :0048 p leading to an estimated standard error of :0048 = :0696. The value of t with n¡q = 123¡2 = 121 degrees of freedom for ® = :025 is about 1:98, leading to a narrow confidence interval of :95 § :138 for the slope leading to a confidence interval on the one hundred year increase of about :81 to 1:09 degrees. We would conclude from this analysis that there is a substantial increase in global temperature amounting to an increase of roughly one degree F per 100 years. Detrended Temperature 1 1 ACF = γ (h) PACF = Φ x hh 0.5 0.5 0 0 −0.5 −0.5 0 5 10 15 20 0 5 10 15 20 lag lag Differenced Temperature 1 1 ACF = γ (h) PACF = Φ x hh 0.5 0.5 0 0 −0.5 −0.5 0 5 10 15 20 0 5 10 15 20 lag lag Figure 2.1 Autocorrelation functions (ACF) and partial autocorrelation functions (PACF) for the detrended (top panel) and differenced (bottom panel) global temperature series. ˆ ˆ If the model is reasonable, the residualse ˆt = yt ¡ ¯1 ¡ ¯2 t should be essentially independent and identically distributed with no correlation evident. The plot that we have made in Figure 1.3 of the detrended global temperature series shows that this is probably not the case because of the long low frequency 2.1: Least Squares Regression 29 in the observed residuals. However, the differenced series, also shown in Figure 1.3 (second panel), appears to be more independent suggesting that perhaps the apparent global warming is more consistent with a long term swing in an underlying random walk than it is of a fixed 100 year trend. If we check the autocorrelation function of the regression residuals, shown here in Figure 2.1, it is clear that the significant values at higher lags imply that there is significant correlation in the residuals. Such correlation can be important since the estimated standard errors of the coefficients under the assumption that the least squares residuals are uncorrelated is often too small. We can partially repair the damage caused by the correlated residuals by looking at a model with correlated errors. The procedure and techniques for dealing with correlated errors are based on the Autoregressive Moving Average (ARMA) models to be considered in the next sections. Another method of reducing correlation is to apply a first difference ∆xt = xt ¡ xt¡1 to the global trend data. The ACF of the differenced series, also shown in Figure 2.1, seems to have lower correlations at the higher lags. Figure 1.3 shows qualitatively that this transformation also eliminates the trend in the original series. Since we have again made some rather arbitrary looking specifications for the configuration of dependent variables in the above regression examples, the reader may wonder how to select among various plausible models. We mention that two criteria which reward reducing the squared error and penalize for additional parameters are the Akaike Information Criterion 2K AIC(K) = logσ ˆ2 + (2:8) n and the Schwarz Information Criterion K log n SIC(K) = logσ ˆ2 + ; (2:9) n (Schwarz, 1978) where K is the number of parameters fitted (exclusive of variance parameters) andσ ˆ2 is the maximum likelihood estimator for the variance. This is sometimes termed the Bayesian Information Criterion, BIC and will often yield models with fewer parameters than the other selection methods. A modification to AIC(K) that is particularly well suited for small samples was suggested by Hurvich and Tsai (1989). This is the corrected AIC, given by n + K AIC (K) = logσ ˆ2 + (2:10) C n ¡ K ¡ 2 The rule for all three measures above is to choose the value of K leading to the smallest value of AIC(K) or SIC(K) or AICC (K). We will give an example later comparing the above simple least squares model with a model where the errors have a time series correlation structure. The organization of this chapter is patterned after the landmark approach to developing models for time series data pioneered by Box and Jenkins (see 30 1 Univariate Time Series Models Box et al, 1994). This assumes that there will be a representation of time series data in terms of a difference equation that relates the current value to its past. Such models should be flexible enough to include non-stationary realizations like the random walk given above and seasonal behavior, where the current value is related to past values at multiples of an underlying season; a common one might be multiples of 12 months (1 year) for monthly data. The models are constructed from difference equations driven by random input shocks and are labeled in the most general formulation as ARIMA , i.e., AutoRegressive Integrated Moving Average processes. The analogies with differential equations, which model many physical processes, are obvious. For clarity, we develop the separate components of the model sequentially, considering the integrated, autoregressive and moving average in order, fol- lowed by the seasonal modification. The Box-Jenkins approach suggests three steps in a procedure that they summarize as l identification, estimation and forecasting. Identification uses model selection techniques, combining the ACF and PACF as diagnostics with the versions of AIC given above to find a parsimonious (simple) model for the data. Estimation of parameters in the model will be the next step. Statistical techniques based on maximum likelihood and least squares are paramount for this stage and will only be sketched in this course.

Load more