Bootstrap Prediction Intervals for Linear, Nonlinear and Nonparametric Autoregressions

UC San Diego Recent Work Title Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions Permalink https://escholarship.org/uc/item/67h5s74t Authors Pan, Li Politis, Dimitris N Publication Date 2014 License https://creativecommons.org/licenses/by/4.0/ 4.0 Peer reviewed eScholarship.org Powered by the California Digital Library University of California arXiv: arXiv:0000.0000 Bootstrap prediction intervals for linear, nonlinear and nonparametric autoregressions Li Pan and Dimitris N. Politis∗ Li Pan Department of Mathematics University of California{San Diego La Jolla, CA 92093-0112, USA e-mail: [email protected] Dimitris N. Politis Department of Mathematics University of California{San Diego La Jolla, CA 92093-0112, USA e-mail: [email protected] Abstract: In order to construct prediction intervals without the cumbersome|and typically unjustifiable—assumption of Gaussianity, some form of resampling is necessary. The regression set-up has been well-studied in the literature but time series prediction faces additional difficulties. The paper at hand focuses on time series that can be modeled as linear, nonlinear or nonparametric autoregressions, and develops a coherent methodology for the construction of bootstrap prediction intervals. Forward and backward bootstrap methods using predictive and fitted residuals are introduced and compared. We present detailed algorithms for these different models and show that the bootstrap intervals manage to capture both sources of variability, namely the innovation error as well as estimation error. In simulations, we compare the prediction intervals associated with different methods in terms of their achieved coverage level and length of interval. Keywords and phrases: Confidence intervals, forecasting, time series.. 1. Introduction Statistical inference is not considered complete if it is not accompanied by a measure of its inherent accuracy. With point estimators, the accuracy is measured either by a standard error or a confidence interval. With (point) predictors, the accuracy is measured either by the predictor error variance or by a prediction interval. In the setting of an i.i.d. (independent and identically distributed) sample, the problem of prediction is not interesting. However, when the i.i.d. assumption no longer holds, the prediction problem is both important and intriguing; see Geisser (1993)[20] for an introduction. Typical situations where the i.i.d. assumption breaks down include regression and time series. The literature on predictive intervals in regression is not large; see e.g. Caroll and Ruppert (1991) [12], Patel (1989) [32], Schmoyer (1992)[36] and the references therein. Note that to avoid the cumbersome (and typically unjustifiable) assumption of Gaussianity, some form of resampling is necessary. The residual-based bootstrap in regression is able to capture the predictor variability due to errors in model estimation. Nevertheless, bootstrap prediction intervals in regression are often characterized by finite-sample undercoverage. As a remedy, Stine (1985)[37] suggested resampling the studentized residuals but this modification does not fully correct the problem; see the discussion ∗Research partialy supported by NSF grants DMS 13-08319 and DMS 12-23137. 1 L. Pan and D. Politis/Bootstrap prediction intervals for autoregressions 2 in Olive (2007)[30]. Politis(2013)[33] recently proposed the use of predictive (as opposed to fitted) residuals to be used in resampling which greatly alleviates the finite-sample undercoverage. Autoregressive (AR) time series models, be it linear, nonlinear, or nonparametric, have a formal resemblance to the analogous regression models. Indeed, AR models can typically be successfully fitted by the same methods used to estimate a regression, e.g., ordinary Least Square (LS) regression methods for parametric models, and scatterplot smoothing for nonparametric ones. The practitioner has only to be careful regarding the standard errors of the regression estimates but the model-based, i.e., residual-based, bootstrap should in principle be able to capture those. Therefore, it is not surprising that model-based resampling for regression can be extended to model-based resampling for auto-regression. Indeed, standard errors and confidence intervals based on resampling the residuals from a fitted AR model has been one of the first bootstrap approaches for time series; cf. Freedman (1984) [19], Efron and Tibshirani (1986) [15], and Bose (1988) [6]. However, the situation as regards prediction intervals is not as clear; for example, the conditional nature of the predictive inference in time series poses a difficulty. There are several papers on prediction intervals for linear AR models but the literature seems scattered and there are many open questions: (a) how to implement the model-based bootstrap for prediction, i.e., how to generate bootstrap series; (b) how to construct prediction intervals given the availability of many bootstrap series already generated; and lastly (c) how to evaluate asymptotic validity of a prediction interval. In addition, little seems to be known regarding prediction intervals for nonlinear and nonparametric autoregressions. In the paper at hand we attempt to give answers to the above, and provide a comprehensive approach towards bootstrap prediction intervals for linear, nonlinear, or nonparametric autoregressions. The models we will consider are of the general form: • AR model with homoscedastic errors Xt = m(Xt−1; :::; Xt−p) + t (1.1) • AR model with heteroscedastic errors Xt = m(Xt−1; :::; Xt−p) + σ(Xt−1; :::; Xt−p)t: (1.2) In the above, m(·) and σ(·) are unknown; if they can be are assumed to belong to a finite-dimensional, parametric family of functions, then the above describe a linear or nonlinear AR model. If m(·) and σ(·) are only assumed to belong to a smoothness class, then the above models describe a nonparametric autoregression. Regarding the errors, the following assumption is made: 2 1; 2; ··· are i.i.d. (0; σ ); and such that t is independent from fXs; s < tg for all t; (1.3) in conjuction with model (1.2), we must further assume that σ2 = 1 for identifiability. Note, that under either model (1.1) or (1.2), the causality assumption (1.3) ensures that E(XtjfXs; s < tg) = m(Xt−1; :::; Xt−p) gives the optimal predictor of Xt given fXs; s < tg; here optimality is with respect to Mean Squared Error (MSE) of prediction. Section2 describes the foundations of our approach. Pseudo-series can be generated by either a forward or backward bootstrap, using either fitted or predictive residuals|see Section 2.1 for a discussion. Predictive roots are defined in Section 2.2 while Sections 2.3 and 2.4 discuss notions of asymptotic validity. Section3 goes in depth as regards bootstrap prediction intervals for linear AR models. Section 4 addresses the nonlinear case using two popular nonlinear models as concrete examples. Finally, Section 5 introduces bootstrap prediction intervals for nonparametric autoregressions. Appendix A contains some technical proofs. A short conclusions section recapitulates the main findings making the point that the forward bootstrap with fitted or predictive residuals serves as the unifying principle across all types of AR models, linear, nonlinear or nonparametric. L. Pan and D. Politis/Bootstrap prediction intervals for autoregressions 3 2. Bootstrap prediction intervals: laying the foundation 2.1. Forward and backward bootstrap for prediction As previously mentioned, an autoregression can be formally viewed as regression. However, in prediction with an AR(p) model, linear or nonlinear, an additional difficulty is that the one-step-ahead prediction is done conditionally on the last p observed values that are themselves random. To fix ideas, suppose X1; ··· ;Xn are data from the linear AR(1) model: Xt = φ1Xt−1 + t where jφ1j < 1 and the t are i.i.d. with mean zero. Given the data, the MSE{optimal predictor of Xn+1 ^ given the data is φ1Xn which is approximated in practice by plugging-in an estimator, say φ1, for ∗ ∗ φ1. Generating bootstrap series X1 ;X2 ; ··· from the fitted AR model enables us to capture the ^ ∗ ∗ variability of φ1 when the latter is re-estimated from bootstrap datasets such as X1 ; ··· ;Xn. For the application to prediction intervals, note that the bootstrap also allows us to generate ∗ ^ Xn+1 so that the statistical accuracy of the predictor φ1Xn can be gauged. However, none of these ∗ bootstrap series will have their last value Xn exactly equal to the original value Xn as needed for ^ prediction purposes. Herein lies the problem, since the behavior of the predictor φ1Xn needs to be captured conditionally on the original value Xn. To avoid this difficulty, Thombs and Schucany(1990)[39] proposed to generate the bootstrap data ∗ ∗ ∗ X1 ; ··· ;Xn going backwards from the last value that is fixed at Xn = Xn. This is the backward bootstrap method that was revisited by Breidt, Davis and Dunsmuir(1995)[10] who gave the correct ∗ algorithm of finding the backward errors. Note that the generation of Xn+1 is still done in a forward fashion using the fitted AR model conditionally on the value Xn. Nevertheless, the natural way autoregressions evolve is forward in time, i.e., given Xt−1, the next observation is generated as Xt = φ1Xt−1 + t, and so on. Thus, it is intuitive to construct bootstrap ∗ procedures that run forward in time, i.e., given Xt−1, the next bootstrap observation is generated as ∗ ^ ∗ ∗ Xt = φ1Xt−1 + t ; (2.1) and so on. Indeed, most (if not all) of the literature on bootstrap confidence intervals for AR models uses the natural time order to generate bootstrap series. It would be nice to be able to build upon this large body of work in order to construct prediction intervals. However, recall that predictive inference is to be conducted conditionally on the last value Xn in order to be able to place prediction bounds ^ ∗ ∗ ^ ∗ around the point predictor φ1Xn. So how can one ensure that Xn = Xn so that Xn+1 = φ1Xn+n+1? Aided by the additive structure of the AR model, it is possible to \have our cake and eat it too", ∗ i.e., generate bootstrap series forward in time but also ensure that Xn+1 is constructed correctly.

Bootstrap Prediction Intervals for Linear, Nonlinear and Nonparametric Autoregressions

CONFIDENCE Vs PREDICTION INTERVALS 12/2/04 Inference for Coefﬁcients Mean Response at X Vs

Choosing a Coverage Probability for Prediction Intervals

STATS 305 Notes1

Inference in Normal Regression Model

Sieve Bootstrap-Based Prediction Intervals for Garch Processes

On Small Area Prediction Interval Problems

Bayesian Prediction Intervals for Assessing P-Value Variability in Prospective Replication Studies Olga Vsevolozhskaya1,Gabrielruiz2 and Dmitri Zaykin3

4.7 Confidence and Prediction Intervals

1 Confidence Intervals

Correlations & Confidence Intervals

Lecture 32 the Prediction Interval Formulas for the Next Observation from a Normal Distribution When Σ Is Unknown

Chapters 2 and 10: Least Squares Regression