UC San Diego Recent Work
Title Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions
Permalink https://escholarship.org/uc/item/67h5s74t
Authors Pan, Li Politis, Dimitris N
Publication Date 2014
License https://creativecommons.org/licenses/by/4.0/ 4.0
Peer reviewed
eScholarship.org Powered by the California Digital Library University of California arXiv: arXiv:0000.0000
Bootstrap prediction intervals for linear, nonlinear and nonparametric autoregressions
Li Pan and Dimitris N. Politis∗ Li Pan Department of Mathematics University of California–San Diego La Jolla, CA 92093-0112, USA e-mail: [email protected]
Dimitris N. Politis Department of Mathematics University of California–San Diego La Jolla, CA 92093-0112, USA e-mail: [email protected]
Abstract: In order to construct prediction intervals without the cumbersome—and typically unjustifiable—assumption of Gaussianity, some form of resampling is necessary. The regres- sion set-up has been well-studied in the literature but time series prediction faces additional difficulties. The paper at hand focuses on time series that can be modeled as linear, nonlinear or nonparametric autoregressions, and develops a coherent methodology for the construction of bootstrap prediction intervals. Forward and backward bootstrap methods using predictive and fitted residuals are introduced and compared. We present detailed algorithms for these different models and show that the bootstrap intervals manage to capture both sources of variability, namely the innovation error as well as estimation error. In simulations, we compare the prediction intervals associated with different methods in terms of their achieved coverage level and length of interval.
Keywords and phrases: Confidence intervals, forecasting, time series..
1. Introduction
Statistical inference is not considered complete if it is not accompanied by a measure of its inherent accuracy. With point estimators, the accuracy is measured either by a standard error or a confidence interval. With (point) predictors, the accuracy is measured either by the predictor error variance or by a prediction interval. In the setting of an i.i.d. (independent and identically distributed) sample, the problem of predic- tion is not interesting. However, when the i.i.d. assumption no longer holds, the prediction problem is both important and intriguing; see Geisser (1993)[20] for an introduction. Typical situations where the i.i.d. assumption breaks down include regression and time series. The literature on predictive intervals in regression is not large; see e.g. Caroll and Ruppert (1991) [12], Patel (1989) [32], Schmoyer (1992)[36] and the references therein. Note that to avoid the cum- bersome (and typically unjustifiable) assumption of Gaussianity, some form of resampling is nec- essary. The residual-based bootstrap in regression is able to capture the predictor variability due to errors in model estimation. Nevertheless, bootstrap prediction intervals in regression are often characterized by finite-sample undercoverage. As a remedy, Stine (1985)[37] suggested resampling the studentized residuals but this modification does not fully correct the problem; see the discussion
∗Research partialy supported by NSF grants DMS 13-08319 and DMS 12-23137.
1 L. Pan and D. Politis/Bootstrap prediction intervals for autoregressions 2 in Olive (2007)[30]. Politis(2013)[33] recently proposed the use of predictive (as opposed to fitted) residuals to be used in resampling which greatly alleviates the finite-sample undercoverage. Autoregressive (AR) time series models, be it linear, nonlinear, or nonparametric, have a formal resemblance to the analogous regression models. Indeed, AR models can typically be successfully fitted by the same methods used to estimate a regression, e.g., ordinary Least Square (LS) regression methods for parametric models, and scatterplot smoothing for nonparametric ones. The practitioner has only to be careful regarding the standard errors of the regression estimates but the model-based, i.e., residual-based, bootstrap should in principle be able to capture those. Therefore, it is not surprising that model-based resampling for regression can be extended to model-based resampling for auto-regression. Indeed, standard errors and confidence intervals based on resampling the residuals from a fitted AR model has been one of the first bootstrap approaches for time series; cf. Freedman (1984) [19], Efron and Tibshirani (1986) [15], and Bose (1988) [6]. However, the situation as regards prediction intervals is not as clear; for example, the conditional nature of the predictive inference in time series poses a difficulty. There are several papers on prediction intervals for linear AR models but the literature seems scattered and there are many open questions: (a) how to implement the model-based bootstrap for prediction, i.e., how to generate bootstrap series; (b) how to construct prediction intervals given the availability of many bootstrap series already generated; and lastly (c) how to evaluate asymptotic validity of a prediction interval. In addition, little seems to be known regarding prediction intervals for nonlinear and nonparametric autoregressions. In the paper at hand we attempt to give answers to the above, and provide a comprehensive approach towards bootstrap prediction intervals for linear, nonlinear, or nonparametric autoregres- sions. The models we will consider are of the general form: • AR model with homoscedastic errors
Xt = m(Xt−1, ..., Xt−p) + t (1.1) • AR model with heteroscedastic errors
Xt = m(Xt−1, ..., Xt−p) + σ(Xt−1, ..., Xt−p)t. (1.2) In the above, m(·) and σ(·) are unknown; if they can be are assumed to belong to a finite-dimensional, parametric family of functions, then the above describe a linear or nonlinear AR model. If m(·) and σ(·) are only assumed to belong to a smoothness class, then the above models describe a nonparametric autoregression. Regarding the errors, the following assumption is made: 2 1, 2, ··· are i.i.d. (0, σ ), and such that t is independent from {Xs, s < t} for all t; (1.3) in conjuction with model (1.2), we must further assume that σ2 = 1 for identifiability. Note, that under either model (1.1) or (1.2), the causality assumption (1.3) ensures that E(Xt|{Xs, s < t}) = m(Xt−1, ..., Xt−p) gives the optimal predictor of Xt given {Xs, s < t}; here optimality is with respect to Mean Squared Error (MSE) of prediction. Section2 describes the foundations of our approach. Pseudo-series can be generated by either a forward or backward bootstrap, using either fitted or predictive residuals—see Section 2.1 for a discussion. Predictive roots are defined in Section 2.2 while Sections 2.3 and 2.4 discuss notions of asymptotic validity. Section3 goes in depth as regards bootstrap prediction intervals for linear AR models. Section 4 addresses the nonlinear case using two popular nonlinear models as concrete examples. Finally, Section 5 introduces bootstrap prediction intervals for nonparametric autoregres- sions. Appendix A contains some technical proofs. A short conclusions section recapitulates the main findings making the point that the forward bootstrap with fitted or predictive residuals serves as the unifying principle across all types of AR models, linear, nonlinear or nonparametric. L. Pan and D. Politis/Bootstrap prediction intervals for autoregressions 3
2. Bootstrap prediction intervals: laying the foundation
2.1. Forward and backward bootstrap for prediction
As previously mentioned, an autoregression can be formally viewed as regression. However, in pre- diction with an AR(p) model, linear or nonlinear, an additional difficulty is that the one-step-ahead prediction is done conditionally on the last p observed values that are themselves random. To fix ideas, suppose X1, ··· ,Xn are data from the linear AR(1) model: Xt = φ1Xt−1 + t where |φ1| < 1 and the t are i.i.d. with mean zero. Given the data, the MSE–optimal predictor of Xn+1 ˆ given the data is φ1Xn which is approximated in practice by plugging-in an estimator, say φ1, for ∗ ∗ φ1. Generating bootstrap series X1 ,X2 , ··· from the fitted AR model enables us to capture the ˆ ∗ ∗ variability of φ1 when the latter is re-estimated from bootstrap datasets such as X1 , ··· ,Xn. For the application to prediction intervals, note that the bootstrap also allows us to generate ∗ ˆ Xn+1 so that the statistical accuracy of the predictor φ1Xn can be gauged. However, none of these ∗ bootstrap series will have their last value Xn exactly equal to the original value Xn as needed for ˆ prediction purposes. Herein lies the problem, since the behavior of the predictor φ1Xn needs to be captured conditionally on the original value Xn. To avoid this difficulty, Thombs and Schucany(1990)[39] proposed to generate the bootstrap data ∗ ∗ ∗ X1 , ··· ,Xn going backwards from the last value that is fixed at Xn = Xn. This is the backward bootstrap method that was revisited by Breidt, Davis and Dunsmuir(1995)[10] who gave the correct ∗ algorithm of finding the backward errors. Note that the generation of Xn+1 is still done in a forward fashion using the fitted AR model conditionally on the value Xn. Nevertheless, the natural way autoregressions evolve is forward in time, i.e., given Xt−1, the next observation is generated as Xt = φ1Xt−1 + t, and so on. Thus, it is intuitive to construct bootstrap ∗ procedures that run forward in time, i.e., given Xt−1, the next bootstrap observation is generated as ∗ ˆ ∗ ∗ Xt = φ1Xt−1 + t , (2.1) and so on. Indeed, most (if not all) of the literature on bootstrap confidence intervals for AR models uses the natural time order to generate bootstrap series. It would be nice to be able to build upon this large body of work in order to construct prediction intervals. However, recall that predictive inference is to be conducted conditionally on the last value Xn in order to be able to place prediction bounds ˆ ∗ ∗ ˆ ∗ around the point predictor φ1Xn. So how can one ensure that Xn = Xn so that Xn+1 = φ1Xn+n+1? Aided by the additive structure of the AR model, it is possible to “have our cake and eat it too”, ∗ i.e., generate bootstrap series forward in time but also ensure that Xn+1 is constructed correctly. This procedure will be called the forward bootstrap method for prediction intervals, and comprises of two steps: ∗ A. Choose a starting value X0 appropriately, e.g., choose it at random from one of the original data X1, ··· ,Xn. Then, use recursion (2.1) for t = 1, 2, . . . , n in order to generate bootstrap ∗ ∗ ˆ data X1 , ··· ,Xn. Re-compute the statistic of interest (in this case φ1) from the bootstrap data ∗ ∗ ˆ∗ X1 , ··· ,Xn to obtain the bootstrap statistic φ1. ∗ B. Re-define the last value in the bootstrap world, i.e., let Xn = Xn. Compute the one-step ˆ ∗ ˆ∗ ahead bootstrap predictor Xn+1 = φ1Xn, and also generate the future bootstrap observation ∗ ˆ ∗ Xn+1 = φ1Xn + n+1.
The above algorithm works because the two constituents of the prediction error Xn+1 − Xˆn+1 = ˆ ˆ (φ1Xn − φ1Xn) + n+1, i.e., estimation error (φ1Xn − φ1Xn) and innovation error n+1 are indepen- dent, and the same is true in the bootstrap world. L. Pan and D. Politis/Bootstrap prediction intervals for autoregressions 4
As stated above, the algorithm is specific to an AR(1) model but its extension to higher-order models is straightforward and will be given in the sequel. Indeed, the forward bootstrap is the method that can be immediately generalized to apply for nonlinear and nonparametric autoregressions as well, thus forming a unifying principle for treating all AR models. The forward bootstrap idea has been previously used for prediction intervals in linear AR models by Masarotto(1990)[26] and Pascual et al. (2004)[31] but with some important differences; for example, Masarotto(1990)[26] omits the important step B above—see Section 3.8 for a discussion. Remark 2.1. Both aforementioned bootstrap ideas, backward and forward, hinge on an i.i.d. re- sampling of the residuals obtained from the fitted model. In the AR(1) case, the fitted residuals are ˆ obtained as ˆt = Xt − φ1Xt−1 for t = 2, 3, ··· , n. Nevertheless, Politis(2013)[33] made a strong case that resampling the predictive residuals gives more accurate prediction intervals in regression, be it linear or nonparametric. Section3 defines a particular notion of predictive residuals in autoregres- sion, and shows their potential benefit in constructing bootstrap prediction intervals.
2.2. Predictive roots and h-step ahead optimal prediction
Given the ability to generate bootstrap datasets using a valid resampling procedure, the question arises as to how to actually construct the prediction interval. Notably, in the related problem of confidence interval construction there are two main approaches: (a) the percentile approach along with the associated bias correction and acceleration expounded upon in Efron and Tibshirani (1994) [16]; and (b) the approach based on pivots and roots as in Bickel and Freedman (1981) [3], Beran (1984) [2], Hall (1992) [22], and Politis, Romano and Wolf (1999) [34]. Both approaches are popular although the latter is more conducive for theoretical analysis. Politis(2013)[33] gave the definition of ’predictive roots’ to be used in order to construct prediction intervals in regression. We will extend this idea to autoregression. Let X1, ··· ,Xn be an observed stretch of a time series that follows a stationary autoregressive model with order p, i.e., model (1.1) or (1.2); the autoregression can be linear, nonlinear or nonparametric. The objective is a prediction interval for the h-step ahead value Xn+h for some integer h ≥ 1; the one-step ahead case is, of course, the most basic. Denote by Xˆn+h the point predictor of Xn+h based on the data X1, ··· ,Xn; since Xˆn+h is a ˆ ˆ 2 function of the data, we can write Xn+h = Π(X1, ··· ,Xn). Let Vn be an estimate of V ar(Xn+h − Xˆn+h|X1, ··· ,Xn) which is the conditional variance in h-step ahead prediction; since Vˆn is a function of the data, we denote Vˆn = V (X1, ··· ,Xn). Definition 2.1. Predictive root and studentized predictive root. The h-step ahead predictive root is defined as Xn+h − Xˆn+h, i.e., it is the error in the h-step ahead prediction. The studentized ˆ predictive root is Xn+h−Xn+h . Vˆn ∗ ∗ Given a bootstrap pseudo series X1 , ··· ,Xn, analogs of the aforementioned quantities can be ˆ ∗ ∗ ∗ ˆ ∗ ∗ ∗ defined, i.e., Xn+h = Π(X1 , ··· ,Xn) and Vn = V (X1 , ··· ,Xn).We can then similarly define the bootstrap predictive roots. Definition 2.2. Bootstrap predictive root and studentized bootstrap predictive root. The ∗ ˆ ∗ bootstrap predictive root is defined as Xn+h − Xn+h. The studentized bootstrap predictive root is ∗ ˆ ∗ Xn+h−Xn+h ˆ ∗ . Vn L. Pan and D. Politis/Bootstrap prediction intervals for autoregressions 5
2.3. Prediction intervals and asymptotic validity
Given the data X1, ··· ,Xn, our goal is to construct a prediction interval that will contain the future value Xn+h with a prespecified coverage probability. With an AR(p) model, linear or nonlinear, the predictor will be a function of the last p data points, i.e., Xn−p+1, ··· ,Xn. Hence the prediction interval’s coverage probability should be interpreted as conditional probability given Xn−p+1, ··· ,Xn. Definition 2.3. Asymptotic validity of prediction intervals. Let Ln, Un be functions of the data X1, ··· ,Xn. The interval [Ln,Un] will be called a (1 − α)100% asymptotically valid prediction interval for Xn+h given Xn−p+1, ··· ,Xn if
P (Ln ≤ Xn+h ≤ Un) → 1 − α as n → ∞ (2.2) for all (Xn−p+1, ··· ,Xn) in a set that has (unconditional) probability equal to one.
The probability P in (2.2) should be interpreted as conditional probability given Xn−p+1, ··· ,Xn although it is not explicitly denoted; hence, Definition 2.3 indicates conditional validity of the pre- diction interval [Ln,Un]. The salient point in all bootstrap algorithms that will be discussed is to use the bootstrap dis- tribution of the (potentially studentized) bootstrap predictive root to estimate the true distribu- tion of the (potentially studentized) predictive root. Bootstrap probabilities and expectations are usually denoted by P ∗ and E∗, and they are understood to be conditional on the original data X1 = x1, ··· ,Xn = xn. Since Definition 2.3 involves conditional validity, we will understand that ∗ ∗ ∗ ∗ P and E are also conditional on Xn−p+1 = xn−p+1, ··· ,Xn = xn when they are applied to ‘future’ ∗ events in the bootstrap world, i.e., events determined by {Xs for s > n}; this is not restrictive since we will ensure that our bootstrap algorithms satisfy this requirement. For instance, both P and P ∗ in Remark 2.2 below represent probabilities conditional on Xn−p+1 = xn−p+1, ··· ,Xn = xn and ∗ ∗ Xn−p+1 = xn−p+1, ··· ,Xn = xn respectively.
Remark 2.2. Suppose the (conditional) probability P (Xn+h − Xˆn+h ≤ a) is a continuous function of a in the limit as n → ∞. If one can show that
ˆ ∗ ∗ ˆ ∗ P sup |P (Xn+h − Xn+h ≤ a) − P (Xn+h − Xn+h ≤ a)| −→ 0, a
∗ ∗ ˆ ∗ then standard results imply that the quantiles of P (Xn+h − Xn+h ≤ a) can be used to consistently estimate the quantiles of P (Xn+h − Xˆn+h ≤ a), thus leading to asymptotically valid prediction intervals. Similarly, if one wants to construct asymptotically valid bootstrap prediction intervals based on studentized predictive roots, it suffices to show that
X − Xˆ X∗ − Xˆ ∗ sup |P ( n+h n+h ≤ a) − P ∗( n+h n+h ≤ a)| −→P 0. ˆ ˆ ∗ a Vn Vn
2.4. Asymptotic pertinence of bootstrap prediction intervals
Asymptotic validity is a fundamental property but it does not tell the whole story. Prediction intervals are particularly useful if they can also capture the uncertainty involved in model estimation although the latter is asymptotically negligible. 2 To give a concrete example, consider the simple case where X1,X2, ··· are i.i.d. N(µ, σ ); this is a special case of an AR model with no dependence present. Given the data X1, ··· ,Xn, we estimate L. Pan and D. Politis/Bootstrap prediction intervals for autoregressions 6 the unknown µ, σ2 by the sample mean and varianceµ, ˆ σˆ2 respectively. Then, the exact Normal theory (1 − α)100% prediction interval for Xn+h is given by
p −1 µˆ ± tn−1(α/2)ˆσ 1 + n . (2.3)
One could use the standard normal quantile z(α/2) instead of tn−1(α/2), i.e., construct the prediction interval: p µˆ ± z(α/2)ˆσ 1 + n−1. (2.4) Since 1 + n−1 ≈ 1 for large n, an even simpler prediction interval is available:
µˆ ± z(α/2)ˆσ. (2.5)
Notably, all three above prediction intervals are asymptotically valid in the sense of Definition 2.3. Nevertheless, as discussed in Politis(2013)[33], interval (2.5) can be called naive since it fails to take into account the variability that results from the error in estimating the theoretical predictor µ byµ ˆ. The result is that, although asymptotically valid, interval (2.5) will be characterized by under-coverage in finite samples; see Geisser (1993) for an in-depth discussion. By contrast, interval (2.4) does take into account the variability resulting from estimating the theoretical predictor. Therefore, interval (2.4) deserves to be called something stronger than asymp- totically valid; we will call it pertinent to indicate that it asymptotically captures all three elements of the exact interval (2.3), namely: (i) the quantile tn−1(α/2) associated with the studentized root; 2 (ii) the error variance σ ; and √ (iii) the variability associated with the estimated parameters, i.e., the factor 1 + n−1. In general, an exact interval analogous to (2.3) will not be available because of non-normality of the errors and/or nonlinearity of the optimal predictor. A ‘pertinent’ interval such as (2.4) would be something to strive for. Notably, the bootstrap is an attempt to create prediction intervals that are asymptotically pertinent in that (a) they are able to capture the variability due to the estimated quantities—note that in AR(p) models the correction term inside the square root of (2.3) would be 1 O(p/n) not just 1/n, and in nonparametric AR models it would be O( hn ) with h → 0 as n → ∞, i.e., this correction is not so trivial; and (b) they are able to approximate well the necessary quantiles. Interestingly, while interval (2.3) is based on the distribution of the studentized predictive root, the bootstrap can also work with nonstudentized roots; in this case, the bootstrap would attempt to estimate the product tn−1(α/2)σ ˆ as a whole instead of breaking it up in its two constituent pieces. Nevertheless, it may be the case that the studentized bootstrap may lead to better approximations, and therefore more accurate prediction intervals, although the phenomenon is not as clear-cut as in the case of bootstrap confidence intervals. Finally, note that bootstrap prediction intervals are not restricted to be symmetric around the predictor like (2.3); thus, they may also capture the skewness of the predictive distribution which is valuable in its own right. To formally define the notion of pertinence, consider the homoscedastic model (1.1), and recall that eq. (1.3) implies that the MSE–optimal predictor of Xn+1 given X1 = x1,...,Xn = xn is m(xn, ..., xn−p+1). Hence we set Xˆn+1 =m ˆ (xn, . . . , xn−p+1) wherem ˆ (·) is a consistent estimator of m(·); without loss of generality, assume thatm ˆ (·) has rate of convergence an, i.e., an(m ˆ (·)−m(·)) has a well-defined, non-trivial asymptotic distribution where an → ∞ as n → ∞. Then, the predictive root can be written as Xn+1 − Xˆn+1 = n+1 + Am (2.6) where Am = m(xn, . . . , xn−p+1) − mˆ (xn, . . . , xn−p+1) = Op(1/an) represents the estimation error. L. Pan and D. Politis/Bootstrap prediction intervals for autoregressions 7
Similarly, the bootstrap predictive root can be written as
∗ ˆ ∗ ∗ ∗ Xn+1 − Xn+1 = n+1 + Am (2.7)
∗ ∗ where Am =m ˆ (xn, . . . , xn−p+1) − mˆ (xn, . . . , xn−p+1). By construction, the model-based bootstrap should, in principle, be capable of asymptotically capturing both the pure prediction error, i.e., the distribution of n+1, as well as the estimation error. We are then led to the following definition. Definition 2.4. Asymptotic pertinence of bootstrap prediction intervals under model (1.1). Consider a bootstrap prediction interval for Xn+1 that is based on approximating the distribution of the predictive root Xn+1 − Xˆn+1 of eq. (2.6) by the distribution of the boot- ∗ ˆ ∗ strap predictive root Xn+1 − Xn+1 of eq. (2.7). The interval will be called asymptotically per- tinent provided the bootstrap satisfies the following three conditions as n → ∞ conditionally on Xn−p+1 = xn−p+1, ··· ,Xn = xn. ∗ ∗ P (i) supa |P (n+1 ≤ a) − P (n+1 ≤ a)| −→ 0, presupposing that the error distribution is continuous. ∗ ∗ P (ii) |P (anAm ≤ a) − P (anAm ≤ a)| −→ 0 for some sequence an → ∞, and for all points a where the assumed nontrivial limit of P (anAm ≤ a) is continuous. ∗ ∗ (iii) n+1 and Am are independent in the bootstrap world—as their analogs are in the real world due to the causality assumption (1.3). Furthermore, the bootstrap prediction interval for Xn+1 that is based on the approximating the dis- tribution of the studentized predictive root (Xn+1 − Xˆn+1)/Vˆn by the distribution of the bootstrap ∗ ˆ ∗ ˆ ∗ studentized predictive root (Xn+1 − Xn+1)/Vn will be called asymptotically pertinent if, in addition to (i)—(iii) above, the following also holds: ˆ ˆ ∗ P (iv) Vn/Vn −→ 0. For concreteness, the above focuses on one-step ahead prediction but analogous definitions can be constructed for h-step ahead prediction intervals using studentized or unstudentized predictive roots. Remark 2.3. Note that asymptotic pertinence is a stronger property than asymptotic validity. In fact, under model (1.1), just part (i) of Definition 2.4 together with the consistency ofm ˆ (·) and ∗ ∗ mˆ (·), i.e., the fact that both Am and Am are op(1) due to an → ∞, are enough to imply asymptotic validity of the bootstrap prediction interval. Also note that part (ii) of Definition 2.4 is the condition needed in order to show that the bootstrap can yield asymptotically valid confidence intervals for the conditional mean m(·). In many cases in the literature, this condition has been already established; we can build upon this for the purpose of constructing pertinent prediction intervals. Consider now the heteroscedastic model (1.2). Much of the above discussion carries over verbatim; for example, our predictor of Xn+1 given X1 = x1,...,Xn = xn is still Xˆn+1 =m ˆ (xn, . . . , xn−p+1). The only difference is that the predictive root is
Xn+1 − Xˆn+1 = σ(xn, . . . , xn−p+1)n+1 + Am (2.8) and the bootstrap predictive root is
∗ ˆ ∗ ∗ ∗ Xn+1 − Xn+1 =σ ˆ(xn, . . . , xn−p+1)n+1 + Am (2.9) whereσ ˆ(·) is a (consistent) estimator of σ(·) that is employed in the bootstrap data generation mechanism. Hence, the following definition is immediate. Definition 2.5. Asymptotic pertinence of bootstrap prediction intervals under model (1.2). Consider a bootstrap prediction interval for Xn+1 that is based on approximating L. Pan and D. Politis/Bootstrap prediction intervals for autoregressions 8 the distribution of the predictive root Xn+1 − Xˆn+1 of eq. (2.8) by the distribution of the bootstrap ∗ ˆ ∗ predictive root Xn+1 −Xn+1 of eq. (2.9). The interval will be called asymptotically pertinent provided the bootstrap satisfies conditions (i)—(iii) or Definition 2.4 together with the additional consistency requirement: 0 P (iv ) σ(xn, . . . , xn−p+1) − σˆ(xn, . . . , xn−p+1) −→ 0. Furthermore, the bootstrap prediction interval for Xn+1 that is based on the approximating the dis- tribution of the studentized predictive root (Xn+1 − Xˆn+1)/Vˆn by the distribution of the bootstrap ∗ ˆ ∗ ˆ ∗ studentized predictive root (Xn+1 − Xn+1)/Vn will be called asymptotically pertinent if, in addition condition (iv) or Definition 2.4 also holds.
Remark 2.4. Taking into account that Am = op(1) as n → ∞, a simple estimator for the (condi- tional) variance of the predictive root Xn+1 − Xˆn+1 under model (1.2) is Vˆn =σ ˆ(xn, . . . , xn−p+1). Thus, in the case of one-step ahead prediction, condition (iv) or Definition 2.4 can be re-written as ∗ P 0 σˆ(xn, . . . , xn−p+1) − σˆ (xn, . . . , xn−p+1) −→ 0, i.e., it is just a bootstrap version of condition (iv ) or Definition 2.5. As a matter of fact, resampling in the heteroscedastic model (1.2) entails using stu- dentized residuals. In this case, the predictive root method becomes tantamount to the studentized predictive root method when the simple estimator Vˆn =σ ˆ(xn, . . . , xn−p+1) is used; see Section 5.2 for more discussion.
2.5. Prediction interval accuracy: asymptotics vs. finite samples
The notion of ‘pertinence’ was defined in order to identify a prediction interval that has good finite- sample performance as it imitates all the constituent parts of an ‘ideal’ interval. Inadvertently, the notion of ‘pertinence’ was defined via asymptotic requirements. Nevertheless, the asymptotics alone are still not very informative in terms of the interval’s finite-sample accuracy, i.e., the attained coverage level. 2 2 √ Going back to the simple interval (2.3), note√ that we typically haveσ ˆ = σ + Op(1/ n) while −1 the approximations z(α/2) ≈ tn−1(α/2) and 1 + n ≈ 1 have a smaller error of order O(1/n). 2 So the determining√ factor for the asymptotics is the accuracy of the estimatorσ ˆ . However, there are many n–convergent estimators of σ2 having potentially very different finite-sample behavior. The default estimator is the sample variance of the fitted residuals but this tends to be downwardly biased; the predictive residuals—although asymptotically equivalent to the fitted residuals—lead to improved finite-sample performance as alluded to in Remark 2.1. √ Similarly, the quantity (i) of Definition 2.4 is typically of order Op(1/ n). In general, this rate of convergence can not be improved, and dictates the rate of convergence for the coverage probability of eq. (2.2) under the homoscedastic model (1.2). Once again, using the empirical distribution of ∗ the predictive—as opposed to the fitted—residuals in order to generate t makes a huge practical difference in finite samples despite the fact that the two are asymptotically equivalent. The most challenging situation in practice occurs in the set-up of the heteroscedastic model (1.2) with the conditional variance function σ(·) being unknown but assumed smooth. Here the rate of convergence of the coverage probability of eq. (2.2) will be dictated by the nonparametric rate of estimating σ(·), i.e., the rate of convergence of the quantity appearing in condition (iv0) of Definition 2.5; see Section 5.2 for more details. This is perhaps the only case where the asymptotics are informative as regards the finite-sample performance of the prediction interval, i.e., the less accurately attained coverage levels in the presence of heteroscedasticity of unknown functional form. L. Pan and D. Politis/Bootstrap prediction intervals for autoregressions 9
3. Bootstrap Prediction Intervals for Linear Autoregressions
Consider the strictly stationary, causal AR(p) model defined by the recursion
p X Xt = φ0 + φjXt−j + t (3.1) j=1
2 which is a special case of model (1.1) with the t being i.i.d. with mean zero, variance σ and distribution F. The assumed causality condition (1.3) is now tantamount to φ(z) = 1 − φ1z − · · · − p 0 φpz 6= 0 for |z| ≤ 1. Denote φ = (φ0, φ1, φ2, ··· , φp) the vector of autoregressive parameters, and ˆ ˆ ˆ ˆ 0 ˆ ˆ ˆ p φ = (φ0, φ1, ··· , φp) and φ(z) = 1 − φ1z − · · · − φpz the respective estimates. Let Xˆt be the fitted ˆ Pp ˆ 0 value of Xt, i.e., Xt = φ0 + j=1 φjXt−j. Finally, let Yt = (Xt,Xt−1, ··· ,Xt−p+1) be the vector of the last p observations up to Xt. Stine(1987)[38] used a bootstrap method to estimate the prediction mean squared error of the estimated linear predictor of an AR(p) model with i.i.d. Gaussian errors. Relaxing the assumption of Gaussian errors, Thombs and Schucany(1990)[39] proposed a backward bootstrap method to find prediction intervals for linear autoregressions conditioned on the last p observations; their method was described in Section 2.1. The backward bootstrap method was revisited by Breidt, Davis and Dunsmuir(1995)[10] who gave the correct algorithm of finding the backward errors. Masarotto(1990)[26] proposed a forward bootstrap method based on the studentized predictive root to obtain prediction intervals for AR(p) models. Notably, his method omits the crucial step B of the Forward bootstrap method defined in Section 2.1. As a result, his intervals are not asymptotically pertinent since the basic premise of Definition 2.4 regarding the construction of the interval is not satisfied; however, his intervals are asymptotically valid because the omitted/distorted term has to do with the estimation error which vanishes asymptotically. Finally, Pascual et al. (2004)[31] proposed another forward bootstrap method and applied it to prediction intervals for both autoregressive as well as ARMA models; their intervals are constructed via an analog of the percentile method without considering predictive roots—see Section 3.8 for more discussion. In the present section, we first give the detailed algorithms for constructing forward bootstrap prediction intervals using fitted and/or predictive residuals, and then prove the consistency of the predictive root method for prediction intervals. We then study the corresponding backward methods. We show how both backward and forward methods can be improved by introducing the predictive residuals. In simulation, we will see that the methods with predictive residuals have improved cover- age level compared to the methods with fitted residuals; this result is not unexpected since a similar phenomenon occurs in linear regression—cf. Politis (2013)[33]. In Section 3.8, we review alternative approaches to construct bootstrap prediction intervals, and compare them with ours.
3.1. Forward Bootstrap Algorithm
As described in Section 2.1, the idea of forward bootstrap method is that given observations X1 = x1, ··· ,Xn = xn, we can use the fitted AR recursion to generate bootstrap series “forward” in time starting from some initial conditions. This recursion stops when n bootstrap data have been generated; to generate the (n + 1)th bootstrap point (and beyond), the recursion has to be re- started with different initial values that are fixed to be the last p original observations. The details for estimating the coefficients, generating the bootstrap pseudo-data and constructing the prediction intervals using both fitted and predictive residuals are given below in Sections 3.1.1 and 3.1.2 L. Pan and D. Politis/Bootstrap prediction intervals for autoregressions 10
3.1.1. Forward Bootstrap with Fitted Residuals
Given a sample {x1, ··· , xn} from (3.1), the following are the steps needed to construct the prediction interval for future value Xn+h based on the predictive root method. Algorithm 3.1. Forward bootstrap with fitted residuals (Ff) ˆ 1. Use all observations x1, ··· , xn to obtain the Least Squares (LS) estimators φ = ˆ ˆ ˆ 0 (φ0, φ1, ··· , φp) by fitting the following linear model xn 1 xn−1 ··· xn−p φ0 n xn−1 1 xn−2 ··· xn−p−1 φ1 n−1 = + . (3.2) . . . . . . . . . . . . . . xp+1 1 xp ··· x1 φp p+1
2. For t = p + 1, ··· , n, compute the fitted value and fitted residuals:
p ˆ X ˆ xˆt = φ0 + φjxt−j, and ˆt = xt − xˆt. j=1
¯ ¯ −1 Pn 3. Center the fitted residuals: let rt = ˆt − ˆ for t = p + 1, ··· , n, and ˆ = (n − p) p+1 ˆt; let the empirical distribution of rt be denoted by Fˆn. ∗ ˆ (a) Draw bootstrap pseudo residuals {t , t ≥ 1} i.i.d. from Fn. (b) To ensure stationarity of the bootstrap series, generate n + m pseudo-data for some large ∗ ∗ positive m and then discard the first m data. Let (u1, ··· , up) be chosen at random from the ∗ set of p–tuplets {(xk, ··· , xk+p−1) for k = 1, ··· , n − p + 1}; then generate {ut , t ≥ p + 1} by the recursion:
p ∗ ˆ X ˆ ∗ ∗ ut = φ0 + φjut−j + t , for t = p + 1, ··· , n + m. j=1
∗ ∗ Then define xt = um+t for t = 1, 2, ··· , n. ∗ ∗ (c) Based on the pseudo-data {x1, ··· , xn}, re-estimate the coefficients φ by the LS estimator ˆ∗ ˆ ∗ ˆ ∗ ˆ ∗ 0 φ = (φ0 , φ1 , ··· , φp ) as in step 1. Then compute the future bootstrap predicted values ∗ ∗ xˆn+1, ··· , xˆn+h by the recursion:
p ∗ ˆ ∗ X ˆ ∗ ∗ xˆn+t = φ0 + φj xˆn+t−j for t = 1, ··· , h j=1
∗ where xˆn+t−j = xn+t−j when t ≤ j (d) In order to conduct conditionally valid predictive inference, re-define the last p obser- ∗ ∗ vations to match the original observed values, i.e., let xn−p+1 = xn−p+1, ··· , xn = xn. ∗ ∗ ∗ Then, generate the future bootstrap observations xn+1, xn+2, ··· , xn+h by the recursion:
p ∗ ˆ X ˆ ∗ ∗ xn+t = φ0 + φjxn+t−j + n+t, for t = 1, 2, ··· , h. j=1 L. Pan and D. Politis/Bootstrap prediction intervals for autoregressions 11
∗ ∗ (e) Calculate a bootstrap root replicate as xn+h − xˆn+h. 4. Steps (a)-(e) above are repeated B times, and the B bootstrap replicates are collected in the form of an empirical distribution whose α-quantile is denoted q(α). 5. Compute the predicted future values xˆn+1 ··· , xˆn+h by following recursion: p ˆ X ˆ xˆn+t = φ0 + φjxˆn+t−j for t = 1, ··· , h j=1
where xˆn+t−j = xn+t−j for t ≤ j. 6. Construct the (1 − α)100% equal-tailed prediction interval for Xn+h as
[ˆxn+h + q(α/2), xˆn+h + q(1 − α/2)]. (3.3)
3.1.2. Forward Bootstrap with Predictive Residuals
Motivated by Politis(2013)[33], we consider using predictive, as opposed to fitted, residuals for the (t) (t) (t) bootstrap. We define the predictive residuals in the AR context as ˆt = xt − xˆt wherex ˆt is com- puted from the delete-xt data set, i.e., the available data for the scatterplot of xk vs. {xk−p, ··· , xk−1} over which the LS fitting takes place excludes the single point that corresponds to k = t. The forward bootstrap procedure with predictive residuals is similar to Algorithm 3.1; the only difference is in step 2. Algorithm 3.2. Forward bootstrap with predictive residuals (Fp) 1 same as step 1 in Algorithm 3.1. 2 Use the delete-xt dataset to compute the LS estimator
ˆ(t) ˆ (t) ˆ (t) ˆ (t) 0 φ = (φ0 , φ1 , ··· , φp )
(t) as in step 1, i.e., compute φˆ by changing regression model (3.2) as follows: delete the row of xt in left hand side of (3.2), delete the row (1, xt−1, ··· , xt−p) in the design matrix, delete ˆ ˆ(t) t from the vector of and change the φ to φ at the right hand side . Then, calculate the delete-xt fitted values: p (t) ˆ (t) X ˆ (t) xˆt = φ0 + φj xt−j, for t = p + 1, ··· , n j=1
(t) (t) and the predictive residuals: ˆt = xt − xˆt for t = p + 1, ··· , n. (t) 3-6 Change the ˆt into ˆt ; the rest is the same as in Algorithm 3.1. Remark 3.1. The LS estimator φˆ is asymptotically equivalent to the popular Yule-Walker (YW) estimators for fitting AR models. The advantage of YW estimators is that they almost surely lead to a causal fitted model. By contrast, the LS estimator φˆ is only asymptotically causal but it is completely scatterplot-based, and thus convenient in terms of our notion of predictive residuals. Indeed, for any bootstrap method using fitted residuals (studentized or not), e.g., the forward Algorithm 3.1 or the backward Algorithm 3.5 in the sequel, we could equally employ the Yule-Walker instead of the LS estimators. But for methods using our notion of predictive residuals, it is most convenient to be able to employ the LS estimators. To elaborate, consider the possibility that for our given dataset the LS estimator φˆ turns out to be not causal; we then we have two options: L. Pan and D. Politis/Bootstrap prediction intervals for autoregressions 12
• Use a bootstrap algorithm with fitted residuals based on the YW estimator φ˜ = Γˆ−1γˆ where p p Γˆ is the p × p matrix with ij elementγ ˆ(i − j),γ ˆ is a vector with ith elementγ ˆ(i) = p p −1 Pn−|i| ¯ ¯ ¯ −1 Pn n k=1 (Xk − X)(Xk+|i| − X) and X = n k=1 Xk. • To use bootstrap with predictive residuals, we can define the delete-xt estimates of the au- (t) −1 P ¯ (t) ¯ (t) tocovariance asγ ˆ (i) = n k(Xk − X )(Xk+|i| − X ) where the summation runs for ¯ (t) −1 P k = 1, . . . , n − |i| but with k 6= t and k 6= t − |i|, and X = (n − 1) k6=t Xk. Then define (t) (t) (t) (t) the delete-x YW estimator φ˜ = (φ˜ , φ˜ , ··· , φ˜ )0 = (Γˆ(t))−1γˆ(t) where matrix Γˆ(t) t 0 1 p p p p has ij elementγ ˆ(t)(i − j), and vectorγ ˆ(t) has ith elementγ ˆ(t)(i). Now, barring positive defi- p ˜(t) (t) ˜ (t) Pp ˜ (t) niteness issues, we can carry out Algorithm 3.2 with φ , x˜t = φ0 + j=1 φj xt−j and (t) (t) ˆ(t) (t) (t) ˜t = xt − x˜t instead of φ , xˆt and ˆt respectively. Note that if the LS estimator φˆ is causal—as it is hopefully the case—we can use either fitted or ∗ predictive residuals but will need to discard all bootstrap pseudo-series that lead to a non-causal φˆ ; this is equally important for the Backward Bootstrap methods discussed in Section 3.4.
3.2. Forward Studentized Bootstrap with Fitted Residuals
In the previous two subsections we have described the forward bootstrap based on predictive roots. However, as already mentioned, we can use studentized predictive roots instead; see Definition 2.1 and Remark 2.2. The forward bootstrap procedure with fitted and/or predictive residuals is similar to Algorithm 3.1; the only differences is in step 3(e) and 6. To describe it, let ψj for j = 0, 1, ··· be the MA(∞) coefficients of the AR(p) model, i.e., ψj is the coefficient associated with zj in the power series expansion of φ−1(z) for |z| ≤ 1, defined by 1/φ(z) = ψ0 +ψ1z+· · · ≡ ψ(z); the power series expansion is guaranteed by the causality assumption (1.3). It is then easy to see that the variance of the h-step ahead predictive root Xn+h − Xˆn+h is 2 Ph−1 2 σ j=0 ψj . The latter can be interpreted as either conditional or unconditional variance since the two coincide in a linear AR(p) model. ˆ ˆ ˆ ˆ ˆ∗ ˆ∗ ˆ∗ ˆ∗ 2 Similarly, let 1/φ(z) = ψ0 +ψ1z +· · · ≡ ψ(z), and 1/φ (z) = ψ0 +ψ1 z +· · · ≡ ψ (z). Denote byσ ˆ andσ ˆ∗2 the sample variances of the fitted residuals and the bootstrap fitted residuals respectively; ∗ ∗ the latter are defined as xt − xˆt for t = p + 1, ··· , n. Algorithm 3.3. Forward Studentized bootstrap with fitted residuals (FSf) The algorithm is the same as Algorithm 3.1 except for steps 3(e) and 6 that should be replaced by the following steps: 3(e) Calculate a studentized bootstrap root replicate as x∗ − xˆ∗ n+h n+h . ∗ Ph−1 ˆ∗2 1/2 σˆ ( j=0 ψj )
6 Construct the (1 − α)100% equal-tailed predictive interval for Xn+h as
h−1 h−1 X ˆ2 1/2 X ˆ2 1/2 [ˆxn+h +σ ˆ( ψj ) q(α/2), xˆn+h +σ ˆ( ψj ) q(1 − α/2)] (3.4) j=0 j=0 where q(α) is the α-quantile of the empirical distribution of the B collected studentized bootstrap roots. L. Pan and D. Politis/Bootstrap prediction intervals for autoregressions 13
Remark 3.2. For all the algorithms introduced above, in step 3(d) we redefine the last p val- ues of the bootstrap pseudo-series to match the observed values in order to generate out-of-sample bootstrap data and/or predictors. If we calculate the future bootstrap predicted values and observa- tions without fixing the last p values of the bootstrap pseudo-series, i.e., if we omit step 3(d), then Algorithm 3.3 becomes identical to the method proposed by Masarotto(1990)[26].
3.2.1. Forward Studentized Bootstrap with Predictive Residuals
As mentioned before, we can resample the predictive—as opposed to the fitted—residuals; the algo- rithm is as follows. Algorithm 3.4. Forward Studentized bootstrap with predictive residuals(FSp)
1 Same as step 1 in Algorithm 3.1. 2 Same as step 2 in Algorithm 3.2 (t) 3-6 Change the ˆt into ˆt ; the rest is the same as Algorithm 3.3 Remark 3.3. As in the regression case discussed in Politis(2013)[33], the Fp method yields improved coverage as compared to the Ff method since predictive residuals are inflated as compared to fitted residuals. Interestingly, the FSp method is not much better than the FSf method in finite samples. The reason is that when we studentize the predictive residuals, the aforementioned inflation effect is offset by the simultaneously inflated bootstrap estimatorσ ˆ∗ in the denominator. In the Monte Carlo simulations of Section 3.7, we will see that the Fp, FSf and FSp methods have similarly good performance while the Ff method is the worst, exhibiting pronounced undercoverage.
3.3. Asymptotic Properties of Forward Bootstrap
We now discuss the asymptotic validity of the aforementioned Forward Bootstrap methods. First ∗ note that Step 3(c) of the Algorithm that concerns the construction of φˆ is identical to the related construction of the bootstrap statistic routinely used to derive confidence intervals for φ; see e.g. Freedman(1984)[19].
Theorem 3.1 (Freedman(1984)[19]). Let {Xt} be the causal AR(p) process (3.1) with Et = 0, 2 4 var(t) = σ > 0 and E|t| < ∞. Let {x1, ··· , xn} denote a realization from {Xt}. Then as n → ∞, √ ∗ √ ∗ ˆ ˆ ˆ P d0(L n(φ − φ) , L n(φ − φ ) −→ 0 (3.5)
∗ where L, L denote probability law in the real and bootstrap world, and d0 is Kolmogorov distance. For the next theorem, continuity (and twice differentiability) of the error distribution are assumed.
Theorem 3.2 (Boldin(1982)[4]). Let Fˆn be the empirical distribution of fitted residuals {ˆt} centered to mean zero. Let F be the distribution of errors satisfying the assumptions of Theorem 3.1 and 00 supx |F (x)| < ∞. Then, for any integer h ≥ 1, √ sup |Fˆn(x) − F(x)| = Op(1/ n) (3.6) x
0 Recall the notation Yt = (Xt,Xt−1, ··· ,Xt−p+1) . Then,
∗ 0 ˆ ∗ Xn+1 = 1 Yn φ + n+1 L. Pan and D. Politis/Bootstrap prediction intervals for autoregressions 14