<<

A New Test In A Predictive Regression with Structural Breaks∗

Zongwu Cai† Seong Yeon Chang‡ University of Kansas Xiamen University

December 5, 2018

Abstract This paper considers predictive regressions where a structural break is allowed at some unknown date. We establish novel testing procedures for testing via empirical likelihood methods based on some weighted score equations. Theoretical results are useful in practice because we adopt a unified framework under which it is unnecessary to distinguish whether the predictor variables are stationary or nonstationary. In particular, nonstationary predictor variables are allowed to be (nearly) integrated or exhibit a structural change at some unknown date. Simulations show that the empirical likelihood-based tests perform well in terms of size and power in finite samples. As an empirical analysis, we test asset returns predictability using various predictor variables.

JEL Classification: C12, C14, C32, G12 Keywords: Autoregressive process; Empirical likelihood; Structural change; ; Weighted estimation

∗We are grateful to Dukpa Kim, Yuichi Kitamura, Whitney Newey, Youngki Shin, Liangjun Su, Yanping Yi, Fukang Zhu and participants at Symposium on Modern (Xiamen University, 2015), International Conference on (Korea University, 2015), Korea’s Allied Economic Associations Annual Meeting (Seoul National University, 2016), Sogang University, the 2016 North American Meeting of the Econometric Society, Academy of Mathematics and Systems Science (Chinese Academy of Sciences, 2016), and the 2017 Asian Meeting of the Econometric Society for helpful comments and discussions. Cai’s research was partly supported by the National Natural Science Foundation of China Grants No. 71131008 (Key Project) and No. 71631004 (Key Project). Chang’s research was partly supported by the Fundamental Research Funds for the Central Universities Grant No. 20720151146. †Department of Economics, University of Kansas, Lawrence, KS 66045 USA (Email: [email protected]) ‡Corresponding author, Wang Yanan Institute for Studies in Economics & Department of Statistics, Xiamen University, 422 Ximing South Road, Xiamen, China 361005, and MOE Key Laboratory of Econometrics, and Fujian Key Laboratory of Statistical Science (Email: [email protected]) 1 Introduction

The predictability of asset returns has been studied for decades as a cornerstone research topic in economics and finance. It is widely examined in many financial applications such as the mutual fund performance, the conditional capital asset pricing, and the optimal asset allocations. Dealing with asset returns predictability has two major aspects: the first is to check whether the return series is autocorrelated or a random walk or a martingale difference sequence (MDS), and the second is to use financial (state) variables as predictors to see if the financial (state) variables can predict asset returns. There is a vast amount of literature devoted to testing if asset returns are autocorrelated or random walk or MDS or other types of dependent structures; see the book by Campbell, Lo, and Mackinlay (1997), and the references therein. Recently, tremendous empirical studies document predictability of asset returns using various lagged financial or state variables, such as the log dividend-price ratio, the log earnings-price ratio, the log book-to-market ratio, the dividend yield, the term spread, the default premium, and the interest rates, as well as other economic variables. Though a huge amount of works has been accumulated, empirical evidences are still inconclusive. Predictive regression is a conventional method to check whether some financial variables have the explanatory power on the stock return predictability. The classical predictive regression is a structural predictive : ( yt = α + β xt−1 + ut, (1) xt = θ + φ xt−1 + vt, 2 2 where |φ| < 1, ut = ρ vt + εt,(εt, vt) ∼ N (0, diag(σε , σv )) are an independent and identically −1 2 2 −1 distributed (i.i.d.) series and x0 ∼ N (θ(1 − φ) , σv (1 − φ ) ). In predictive regression,

future values of some scalar yt can be predicted from lagged values of a financial variable xt−1, thereby the null hypothesis is no predictability, that is, H0 : β = 0. Note that the above normality assumption may not be needed in this article. It is noteworthy that predictive regressions contain some econometric issues which have crucial effects on testing predictability. Campbell (2008), Phillips and Lee (2013), and Phillips (2015) gave an overview of econometric issues and remedies in predictive regressions. Here, we review the argument briefly for completeness. First, correlation between ut and vt plays an important role in many applications; see Table 4 in Campbell and Yogo (2006), which creates the so-called “embedded endogeneity”. Stambaugh (1999) showed that the ordinary (OLS) estimator for β in (1) is biased in finite samples due to the correlation

between ut and vt under normality and with stationary regression (|φ| < 1). More precisely, the bias of the OLS estimator βˆ can be represented as

E[βˆ − β] = ρ E[φˆ − φ], (2)

1 ˆ where ρ = cov(ut, vt)/var(vt). The autoregressive bias function E[φ−φ] depends only on φ and the sample size T , so that the sample is biased downward about −(1 + 3φ)/T and the predictive slope β biased upward with ρ < 0. Stambaugh (1999) suggested the first order bias-correction estimator while Amihud and Hurvich (2004) considered a linear

projection of ut on vt as ut = ρ vt + εt and then regressed yt on vˆt and xt−1 with intercept, that is,

yt = α + β xt−1 + ρ vˆt + εt, (3) ˆ where vˆt is obtained from the second equation of (1). The OLS estimator of β, denoted by β,

is the so-called two-stage approach. However, Amihud and Hurvich (2004) assumed that xt is stationary.

Second, the autoregressive parameter φ in xt is assumed to be very persistent, which is crucial in on β. Important contributions about nearly integrated or integrated regressor include, to name just a few, Phillips (1987), Elliott and Stock (1994), Cavanagh, Elliott, and Stock (1995), Lewellen (2004), Torous, Valkanov, and Yan (2005), Campbell and Yogo (2006), Jansson and Moreira (2006), Amihud, Hurvich, and Wang (2009), Chen and Deo (2009), Cai and Wang (2014) and the reference therein. In the literature, a

persistent regressor xt is represented in a local-to-unity framework, that is, φ = 1−c/T, c ≥ 0. Indeed, Cai and Wang (2014) showed that βˆ in (3) has the following asymptotic distribution ˆ d T (β − β) → ξc, where ξc is a involving the integration of a geometric Brownian motion and “→d ” denotes convergence in distribution; see Cai and Wang (2014) for details. Therefore, the asymptotic results, in particular, the limiting distribution, depend on c, which is not estimable consistently although its estimate has a limiting distribution. Recently, a series of research considers some uniform inferences on predictive regressions in the sense that testing procedure for predictability is robust to general time series characteristics on the regressor and errors. These include, but not limited to, the papers by Campbell and Yogo (2006), Phillips and Magdalinos (2007, 2009), Chen and Deo (2009), Phillips and Lee (2013), Zhu, Cai, and Peng (2014), Lee (2016), and Liu, Yang, Cai, and Peng (2016). In particular, Campbell and Yogo (2006) proposed a new method, the so-called Q-test, based on the Bonferroni idea to construct a confidence interval for β for each φ. Chen and Deo (2009) found that the intercept parameter in predictive regression with persistent covariates makes inference difficult and proposed the restricted likelihood method, which is free of such nuisance intercept parameter. More importantly, the bias of the restricted maximum likelihood estimates is much less than that of the OLS estimates near the unit root without loss of efficiency. Phillips and Magdalinos (2009) introduced a -filtering procedure called IVX estimation, which makes the degree of the persistence of data-filtered IVX instruments restricted within the class of near- defined in Phillips

2 and Magdalinos (2007). A standard instrumental variable estimation with the constructed instruments is robust to general time series characteristics of regressors in the sense that the derived estimator converges in distribution to a mixed normal limit, and hence the corresponding Wald asymptotically follows the chi-square distribution under the null. Phillips and Lee (2013) considered the IVX estimation to long-horizon predictive regressions with persistent covariates while Lee (2016) extended the IVX filtering method to predictive quantile regression. Zhu, Cai, and Peng (2014) proposed an empirical likelihood (EL) approach together with a weighted least squares idea to construct a confidence interval for β. In this line of work, predictive regression models are assumed to be stable; that is, no structural break is allowed. As addressed by Stock and Watson (1996) and Lettau and Van Nieuwerburgh (2008), economic and financial variables are subject to smooth or structural changes, which makes it reasonable to allow for the possibility of structural changes in predictive regression models. In succeeding research, structural breaks in predictive regressions have been formally considered. For example, Viceira (1997), Paye and Timmermann (2006), and Rapach and Wohar (2006) tested for structural breaks and found strong evidences of instability in predictive regression models. Lettau and Van Nieuwerburgh (2008) focused on level shifts in the predictor variables and explained that the relationship may be unstable unless such shifts are included in the analysis. Cai, Wang, and Wang (2015) considered a model with coefficients changing over time smoothly and then proposed a nonparametric testing procedure to test whether the time-varying coefficients are changing over time. They found that the coefficients are unstable indeed. On the other hand, Gonzalo and Pitarakis (2012) considered predictive regression models with threshold effects that allowed the parameters of the model to vary across regimes driven by an external variable and established the limiting distributions of Wald type test statistics equipped with the IVX estimation. More recently, Gonzalo and Pitarakis (2017) developed testing methodology without restricting the remaining model parameters in the predictive threshold models. A practical question is how to specify the form of time-varying coefficients, that is, one is interested in how the coefficients change with time. In this article, we assume that the coefficients are piecewise constant; structural changes. Since the work by Perron (1989), it is well known that structural changes in the data generating process (DGP) should be taken into account in an appropriate way so as to make statistical inference reliable. Contrary to large literature on classical predictive regression models, work pertaining to testing and estimating predictability allowing for structural changes is more scarce. The main contributions of this article are twofold. First, we consider predictive regressions where model parameters exhibit a structural break at some unknown date. When the true

3 break date is unknown, we estimate it and propose testing procedures for predictability based on the consistent estimate of the break fraction, which sharply contrasts with conventional predictability tests. To test for a structural break, or instability of the parameters, important contributions include Andrews (1993) and Andrews and Ploberger (1994). Bai (1994, 1997) showed that the break fraction can be estimated consistently by minimizing the sum of squared residuals from the unrestricted model and derived the limiting distribution of the estimate of the break date, which can be applied to constructing confidence intervals for the true break date. Bai and Perron (1998, 2003) considered statistical inference related to multiple structural changes under general conditions. Elliott and M¨uller(2006) considered the problem of testing for general types of parameter variations, including infrequent breaks, and established a partial sums type test based on the residuals obtained from the restricted model. The proposed tests are optimal in the sense that they nearly attain some local Gaussian power envelop. The estimator of the break date is referred to the literature, for instance, Bai (1994, 1997), Bai and Perron (1998), Bai, Lumsdaine, and Stock (1998), and Kurozumi and Arai (2006). Second, we propose empirical likelihood methods based on some weighted score equations, which were first introduced by Zhu et al. (2014) without allowing for a structural break. Weighted estimating procedures have been considered in the literature, and a normal limiting distribution is obtained (see, e.g., Ling, 2005; Chan and Peng, 2005). Remarkably, Chan, Li, and Peng (2012) extended a weighted estimation method to the first-order autoregression, denoted AR(1), to estimate the autoregressive parameter and showed that the estimate still has a normal limit regardless of whether the autoregressive process is stationary, (nearly) integrated, or even explosive. They suggested using the empirical likelihood method to the weighted score equation of the weighted least squares estimate for constructing confidence intervals for all values of the AR parameter, and showed that confidence intervals obtained by the EL method perform better in finite samples than those constructed by a weighted least squares proposed by So and Shin (1999). Recently, Li, Chan, and Peng (2014) established EL tests for causality of bivariate first-order autoregressive processes. Zhu et al. (2014), most relevant to this article, considered predictive regressions and applied EL methods to test the null hypothesis of β = 0 and constructed confidence intervals for β. Chan et al. (2012) and Zhu et al. (2014) have shed new light on predictive regressions in the sense that we can avoid estimating the autoregressive parameter φ and test for predictability based on the EL method which performs well in finite samples. In this article, we extend the analysis of Zhu et al. (2014) to practical directions; (i) a structural break, that is, instability of model parameters, is allowed in predictive regressions, (ii) the i.i.d. assumption on the errors is relaxed to include a martingale difference sequence, (iii) a structural break in the predictor variable is

4 explicitly introduced. We show that the empirical likelihood methods based on some weighted score equations work well under such general circumstances. Simulation results support that the proposed empirical likelihood tests have good finite sample properties in terms of both size and power. To the best of our knowledge, this article is the first that incorporates the estimate of the break date and adopts a unified framework in predictive regressions. The structure of this paper is as follows. Section 2 introduces predictive regression models which exhibit a structural break at some unknown date. The empirical likelihood-based methodologies are considered and some useful asymptotic results are presented. Section 3 provides simulation results to support the usefulness of the proposed empirical likelihood methods. In Section 4, the techniques are applied to test the predictability of the stock returns using a variety of predictive regressors. Section 5 provides brief concluding remarks. All technical derivations are relegated to the Appendix.

2 Econometric Approaches and Related Theories

We consider a linear predictive regression model which experiences a structural change at some unknown date. The standard model (1) is now modified to allow for a structural change 0 at the date T1 as follows: for t = 1,...,T , ( yt = (α1 + β1xt−1) 1t≤T 0 + (α2 + β2xt−1) 1t>T 0 + ut, 1 1 (4) xt = θ + φ xt−1 + vt.

The following assumption replaces the usual i.i.d. assumption on the noise components

{ut, vt} with the less restrictive martingale difference sequence assumption.

0 Assumption 1. Let Ut ≡ (ut, vt) .

(i) E[Ut|Ft−1] = 0 almost surely, where Ft is the σ-field generated by {ut−s, vt−s|s ≥ 0} satisfying  2  0 σu σuv E[UtUt|Ft−1] = 2 ≡ Σ < ∞. (5) σvu σv

2+q2 (ii) E[|ut| |Ft−1] < M2 for some q2 > 0 and M2 < ∞.

Part (i) incorporates the contemporaneous correlation between the errors ut and vt, while

Part (ii) is necessary to derive asymptotic results for EL methods. Assumption 1 allows ut to 2 2 be conditionally heteroskedastic with E[ut |Ft−1] 6= E[ut ], as will often be relevant when the observed time series are asset returns. Assumption 1 may cover a variety of conditionally heteroskedastic models such as ARCH, GARCH, EGARCH, and stochastic volatility models (see, e.g., Clark and West, 2006; Phillips, 2015).

5 The aim of this article is to test the null hypothesis of no predictability allowing for a structural change at some unknown date. To this end, we specify the predictive regression model as 0 0 y = z γ 1 0 + z γ 1 0 + u , t = 1,...,T, (6) t t−1 1 t≤T1 t−1 2 t>T1 t

and employ the following assumptions. Note that in (6), zt−1 = xt−1 and γi = βi for i = 1, 2 0 0 if the constant term α is zero, and zt−1 = (1, xt−1) and γi = (αi, βi) for i = 1, 2 otherwise.

The magnitude of a change is denoted by kγ2 − γ1k = kδk 6= 0 where the notation k · k Pm 2 1/2 m denotes the Euclidean norm, that is, kkk = ( i=1 ki ) for k ∈ . The true values of the 0 0 0 0 0 unknown coefficients will be denoted with a 0 superscript, i.e., γi = (αi , βi ) for i = 1, 2, T1 , 0 0 p and λ = T1 /T . Let “→” denote convergence in probability as T → ∞ and let [·] be the greatest smaller integer function. Assumption 2 pertains to the case with a stationary predictor variable.

0 0 Assumption 2. (i) β1 6= β2 and |φ| < 1.

0 0 0 0 (ii) T1 = [T λ ], where λ is constant and 0 < λ < 1.

−1 P[T s] 0 p 0 −1 P[T s] 0 p (iii) T t=1 zt−1zt−1 → sΣz1 uniformly in 0 ≤ s ≤ λ , and T 0 zt−1zt−1 → t=T1 +1 0 0 (s − λ )Σz2 uniformly in λ ≤ s ≤ 1, where Σz1 and Σz2 are full rank.

0 −1 Pj 0 −1 PT 0 −1 PT1 0 (iv) The matrices j t=1 zt−1zt−1, j t=T −j+1 zt−1zt−1, j 0 zt−1zt−1, and t=T1 −j+1 0 −1 PT1 +j 0 j 0 zt−1zt−1 have minimum eigenvalues bounded away from zero in proba- t=T1 +1 bility for all j ≥ 2. In addition, these four matrices have stochastically bounded norms uniformly in j for j ≥ 2.

4+q1 (v) supt Ekzt−1k ≤ M1 for some q1 > 0 and M1 < ∞.

Part (i) guarantees that we have a one time break in the predictive regression model and

the predictor variable xt is stationary. Parts (ii) and (iii) are Condition 1 in Elliott and M¨uller(2007). Part (ii) is a standard for a structural break model because it ensures that the pre-break and post-break subsamples are large enough to obtain consistent estimates of the unknown parameters. Parts (iv) and (v) follow Assumptions A3 and A5 in Bai (1997). Note that part (iv) rules out a unit root regressor. On the other hand, Assumption 3 (see below) pertains to the case with an integrated regressor.

0 0 Assumption 3. (i) β1 6= β2 and φ = 1.

0 0 0 0 (ii) T1 = [T λ ], where λ is constant and 0 < λ < 1.

6 2 (iii) x0 is a fixed or random variable with E[x0] < ∞ and independent of T .

0 −v −1/2 0 (iv) (α2, β2) = T (α2o,T β2o) where α2o and β2o are fixed parameters and 0 ≤ v < 1/2.

Assumption 3 indeed is similar to Assumption 1 in Kurozumi and Arai (2006). Part (iii)

assigns the initial value condition under which x0 has no effect on the asymptotic results, in particular, consistency of the estimate of the break fraction. Part (iv) adopts the so-called shrinking shift framework that is useful to derive the asymptotic results of the estimator. Relevant explanations in detail are referred to Kurozumi and Arai (2006). In matrix notation, we can rewrite the predictive regression as follows:

¯ Y = XT1 γ + U,

0 0 0 0 0 ¯ where Y = [y1, . . . , yT ], U = [u1, . . . , uT ], γ = (γ1, γ2) , and XT1 = [X1(λ),X2(λ)], where 0 0 X1(λ)t = zt−1 if t ≤ T1 and 0 otherwise, while X2(λ)t = zt−1 if t > T1 and 0 otherwise. The break date can be estimated by using a global least squares criterion:

ˆ 0 T1 = arg min Y (I − PT1 )Y (7) T1∈Λ

¯ ¯ ¯ 0 ¯ −1 ¯ 0 where PT1 is the matrix that projects on the space of XT1 , i.e., PT1 = XT1 (XT1 XT1 ) XT1 and Λ = [πT, (1 − π)T ], 0 < π < 1/2. The OLS estimate of the regression coefficient γ is ¯ 0 ¯ −1 ¯ 0 ¯ ˆ γˆ = (X X ˆ ) X Y where X ˆ is constructed with the estimate of the break date T1. The Tˆ1 T1 Tˆ1 T1 consistency of the estimate of the break fraction λˆ is well established in the literature.

ˆ Proposition 1. Suppose δ is a fixed parameter and Assumption 1 holds. Let T1 be estimated ˆ 0 −2 by (7). Then, T1 − T1 = Op(kδk ), where kδk = kγ2 − γ1k.

Bai, Lumsdaine, and Stock (1998) and Kurozumi and Arai (2006) established the con- ˆ sistency and limiting distribution of λ when the regressor xt is integrated, that is, I(1) process.

ˆ 1−2v ˆ Proposition 2. Suppose Assumption 2 holds. Let T1 be estimated by (7). Then, T (λ − 0 λ ) = Op(1).

Note that the convergence rate of λˆ may vary depending on the rate of the parameter because the shrinking shift framework is adopted in Assumption 2.

Remark 1. Allowing more than one break in predictive regression models is not difficult, but because our main interest is to construct empirical likelihood-based tests for predictability under general assumptions on the predictor variable and errors, the single break model is effective to delineate the results.

7 Remark 2. Someone might have a concern about estimating a spurious break that does not exist in the DGP. Nunes, Kuan, and Newbold (1995) and Bai (1998) showed that the OLS ˆ estimate of the break date T1 is a spurious one when the error is an I(1) process. Kuan and Hsu (1998) and Hsu and Kuan (2008) considered a change in model for a fractionally integrated process and found that a spurious break can be estimated if memory parameter d∗ ∈ (0, 1.5). Recently, Chang and Perron (2016) considered a linear trend model with a change in slope with or without a concurrent level shift and showed that it is likely to estimate a spurious break when fractionally integrated errors have memory parameters in the interval 0 (0, 1.5) excluding the boundary case 0.5. In this article, the noise component Ut = (ut, vt) is assumed to be a MDS, which excludes the risk of estimating a spurious break. Further, Propositions 1 and 2 confirm that the estimate of the break fraction is consistent regardless of the regressor xt being either I(0) or I(1).

2.1 Empirical Likelihood Method When α is Known

To simplify the subsequent arguments, we assume that the value of α = (α1, α2) is known in this subsection. It is equivalent to assume that the value of α is zero or the univariate time

series of yt has been adjusted by the nonzero value of α. The presence of an intercept term α has a crucial implication in the EL method, which will be explained later in detail.

Remark 3. As noted in Chen and Deo (2009), the usual likelihood ratio test for β has inflated size when there exists an intercept term in predictive regressions. As a remedy, the restricted likelihood is considered. The assumption of no intercept term may help concentrate on the slope β; see also Zhu et al. (2014) and Choi, Jacewitz, and Park (2016).

Specifically, if α1 = α2 = 0, then the predictive regression model (4) is simplified as follows: for t = 1,...,T , ( yt = β1 xt−11t≤T 0 + β2 xt−11t>T 0 + ut, 1 1 (8) xt = θ + φ xt−1 + vt, where (yt, xt) are observable and (β1, β2, θ, φ) are unknown parameters. For the break date 0 T1 , we consider two possible cases. 0 First, consider the case where the true break date T1 is also known a priori. Under this circumstance, we can divide the whole sample into two parts, the pre-break and post-break subsamples. In each subsample, the empirical likelihood method in Zhu et al. (2014) can be implemented without any modification; see Theorem 1 in Zhu et al. (2014). 0 Second, consider the case in which the true break date T1 is unknown, which is common ˆ in practice. We can estimate the break date T1 that minimizes the sum of squared residuals

8 0 from (8) replacing T1 with some admissible break dates T1 ∈ Λ = [πT, (1 − π)T ], π ∈ (0, 1/2)

as addressed in (7). Because the weighted least squares estimate of β1 is the solution to the ˆ PT1 p 2 score equation t=1(yt − β1 xt−1)xt−1/ 1 − xt−1, the empirical in the pre-break subsample can be defined as follows:

Tˆ Tˆ Tˆ  Y1 X1 X1  L (β ) = sup (Tˆ p ): p ≥ 0, . . . , p ≥ 0, p = 1, p H (β ) = 0 , 1 1 1 t 1 Tˆ1 t t t 1 t=1 t=1 t=1

p 2 where Ht(β1) = (yt − β1xt−1) xt−1/ 1 + xt−1. After applying the Lagrange multiplier tech- nique, we have Tˆ X1 l1(β1) = −2 log L1(β1) = 2 log{1 + λ1Ht(β1)}, t=1 where λ1 = λ1(β1) satisfies Tˆ1 X Ht(β1) = 0. 1 + λ H (β ) t=1 1 t 1 By the same token, in the post-break subsample,

T T T  Y X X  L (β ) = sup ((T −Tˆ )p ): p ≥ 0, . . . , p ≥ 0, p = 1, p H (β ) = 0 , 2 2 1 t Tˆ1+1 T t t t 2

t=Tˆ1+1 t=Tˆ1+1 t=Tˆ1+1 (9) p 2 where Ht(β2) = (yt − β2xt−1) xt−1/ 1 + xt−1, and

T X l2(β2) = −2 log L2(β2) = 2 log{1 + λ2Ht(β2)},

t=Tˆ1+1 where λ2 = λ2(β2) satisfies T X Ht(β2) = 0. 1 + λ2Ht(β2) t=Tˆ1+1 The following theorem shows that the Wilks’ theorem holds for the proposed empirical likelihood method.

Theorem 1. Suppose that model (8) holds under Assumptions 1 along with either Assumption 0 0 2 or Assumption 3. Then l1(β1 ) and l2(β2 ) converge in distribution to a chi-square limit with one degree of freedom as T → ∞.

Confidence intervals are frequently used in conjunction with point estimates to convey information about the uncertainty of the estimates. Based on Theorem 1, empirical likelihood 0 0 confidence intervals for β1 and β2 with level τ can be obtained as

2 2 CI1,τ = {β1 : l1(β1) ≤ χ1,τ } and CI2,τ = {β2 : l2(β2) ≤ χ1,τ }, (10)

9 2 respectively, where χ1,τ denotes the τth quantile of a chi-square distribution with one degree of freedom. Therefore, the implementation for constructing confidence intervals is straightforward without estimating any additional quantity. Indeed, the function “emplik” in

the R package (see Zhou, 2015) can be employed to compute l1(β1) and l2(β2) as easily as we do in the simulation study below.

Remark 4. The Chow (1960) test which is an F test can be used to determine whether a regression function differs across two groups. In linear time series regressions, one important requirement of the is that the break date be known. As a special case, the Chow test can be combined with predictability, thereby the null hypothesis is either α1 = α2, β1 = β2 = 0, or simply β1 = β2 = 0. If the Chow test rejects the null hypothesis, it implies that either

β1 =6 0 or β2 6= 0, that is, the predictor variable has some predictability on asset returns. The asymptotic behaviors of the Chow test with nonstationary regressors are of own interest, which is warranted as a future research topic.

2.2 Empirical Likelihood Method When α is Unknown

We relax an unrealistic assumption that the intercept term α in predictive regressions is known and establish the EL method to test asset return predictability in this subsection. Now consider the predictive regression defined in (4), that is, for t = 1,...,T , ( y = (α + β x ) 1 0 + (α + β x ) 1 0 + u , t 1 1 t−1 t≤T1 2 2 t−1 t>T1 t

xt = θ + φ xt−1 + vt. Similar to the arguments previously, we consider first the case where the true break date 0 0 T1 is known, then the case where T1 is unknown.

2.2.1 The Case with a Known Break Date

0 First, we assume that the true break date T1 is known a priori. Based on the true break date 0 T1 , we can construct the pre-break and post-break subsamples. In both subsamples, however,

the intercepts α1 and α2 are unknown, thereby the empirical likelihood method may fail; see, for example, Chan et al. (2012) and Zhu et al. (2014) for a heuristic explanation on this issue. To apply the empirical likelihood method, we consider the following :

T 0 T 0 X1 X1 (yt − α1 − β1xt−1) = 0, (yt − α1 − β1xt−1) w(xt−1) = 0, (11) t=1 t=1 and T T X X (yt − α2 − β2xt−1) = 0, (yt − α2 − β2xt−1) w(xt−1) = 0, (12) 0 0 t=T1 +1 t=T1 +1

10 p 2 where the weight w(xt−1) ≡ xt−1/ 1 + xt−1. Solving the above equations (11) and (12) gives

the weighted estimates of αj and βj for j = 1, 2. When xt is integrated 0 0 −1 PT1 or nearly integrated, (T1 ) t=1 ut w(xt−1) does not converge in probability to a constant, but converges in distribution to some random variable as T → ∞ because of the intercept term (see, e.g., Chan and Wei, 1987; Chan et al., 2012). It implies that the joint limit of 0 0 0 −1/2 PT1 0 −1/2 PT1 (T1 ) t=1(yt − α1 − β1xt−1) and (T1 ) t=1(yt − α1 − β1xt−1) w(xt−1) cannot be the 0 −1/2 PT bivariate . By the same token, the joint limit of (T −T ) 0 (yt − 1 t=T1 +1 0 −1/2 PT α2 − β2xt−1) and (T − T ) 0 (yt − α2 − β2xt−1) w(xt−1) cannot be the bivariate 1 t=T1 +1 normal distribution, either. Hence, when the predictor variable xt is nonstationary, the empirical likelihood method based on weighted score equations fails in the sense that the Wilks’ theorem does not hold. Zhu et al. (2014) suggested an avenue to avoid this problem. Here we review the argument

in brief. We can get rid of αj, j = 1, 2, by taking first difference. It comes, however, with

a cost. When φ = 1, that is, xt is a unit root process, the sequence of {xt − xt−1} turns

out to be a stationary process. Inference on βj, j = 1, 2, becomes less efficient with the first differences in the sense that the rate of convergence is T 1/2 rather than T with a unit

root process xt. Further, the noise components {ut − ut−1} are not a martingale difference sequence. To accommodate those difficulties, Zhu et al. (2014) took first difference with a large lag. Let m = [n/2] where n is the number of observations in a sample. The observables

{yt, xt} take first difference with the m-horizon differences. Then, we have y˜t = yt − yt+m,

x˜t = xt − xt+m, and u˜t = ut − ut+m for t = 1, 2, . . . , m. The empirical likelihood function for

βj, j = 1, 2 can be constructed byy ˜t andx ˜t. 0 Because the true break date T1 is assumed to be known, we can adopt the difference 0 0 0 0 procedure with minor modification. Let m1 = [T1 /2] and m2 = [(T − T1 )/2] in the pre-

break and post-break subsample, respectively. In the pre-break subsample, we have y˜t = 0 y − y 0 , x˜ = x − x 0 , and u˜ = u − u 0 for t = 1, 2, . . . , m . Similarly, in the t t+m1 t t t+m1 t t t+m1 1 post-break subsample, we have y˜ = y − y 0 , x˜ = x − x 0 , and u˜ = u − u 0 for t t t+m2 t t t+m2 t t t+m2 0 0 0 t = T1 + 1,...,T1 + m2. Hence, we have ( β x˜ +u ˜ for t = 1, . . . , m0, y˜ = 1 t−1 t 1 t 0 0 0 β2 x˜t−1 +u ˜t for t = T1 + 1,...,T1 + m2.

The empirical likelihood function is defined as follows:

m0 m0 m0  1 1 1  Y 0 X X L˜ (β ) = sup (m p ): p ≥ 0, . . . , p 0 ≥ 0, p = 1, p H˜ (β ) = 0 , 1 1 1 t 1 m1 t t t 1 t=1 t=1 t=1

˜ p 2 where Ht(β1) = (y˜t − β1x˜t−1) x˜t−1/ 1 +x ˜t−1. After applying the Lagrange multiplier tech-

11 nique, we have 0 m1 ˜ ˜ X ˜ ˜ l1(β1) = −2 log L1(β1) = 2 log{1 + λ1Ht(β1)}, t=1 ˜ ˜ where λ1 = λ1(β1) satisfies the following equation

m0 1 ˜ X Ht(β1) = 0. ˜ ˜ t=1 1 + λ1Ht(β1) Similarly,

T 0+m0 T 0+m0 T 0+m0  1 2 1 2 1 2  Y 0 X X L˜ (β ) = sup (m p ): p 0 ≥ 0, . . . , p 0 0 ≥ 0, p = 1, p H˜ (β ) = 0 , 2 2 2 t T1 +1 T1 +m2 t t t 2 0 0 0 t=T1 +1 t=T1 +1 t=T1 +1

˜ p 2 where Ht(β2) = (˜yt − β2x˜t−1)x ˜t−1/ 1 +x ˜t−1 and

0 0 T1 +m2 ˜ ˜ X ˜ ˜ l2(β2) = −2 log L2(β2) = 2 log{1 + λ2Ht(β2)}, 0 t=T1 +1 ˜ ˜ where λ2 = λ2(β2) satisfies the following equation

T 0+m0 1 2 ˜ X Ht(β2) = 0. ˜ ˜ 0 1 + λ2Ht(β2) t=T1 +1 The following theorem shows that the Wilks’ theorem holds for the proposed empirical likelihood method.

Theorem 2. Suppose that model (4) holds under Assumption 1 along with either Assumption 0 ˜ 0 ˜ 0 2 or Assumption 3. If the true break date T1 is known a priori, l1(β1 ) and l2(β2 ) converge in distribution to a chi-square limit with one degree of freedom as T → ∞.

0 Since the true break date T1 is assumed to be known, it is unnecessary to estimate the break date in Theorem 2.

2.2.2 The Case with an Unknown Break Date

0 The true break date T1 is unknown in general. Here we consider the most relevant case in practice where both the true break date and the intercept term are unknown. We now use the consistent estimates to apply the empirical likelihood method. Without ˆ 0 ˆ ˆ loss of generality, it is assumed that T1 < T1 . Let mˆ 1 = [T1/2] and mˆ 2 = [(T − T1)/2]. The

12 ˆ difference series {y¨t} is obtained as follows in each subsample: (i) In the pre-T1 subsample,

for t = 1, 2,..., mˆ 1, we define y¨t = yt − yt+m ˆ 1 , x¨t = xt − xt+m ˆ 1 , and u¨t = ut − ut+m ˆ 1 , thereby

y¨t = β1 x¨t−1 +u ¨t. ˆ ˆ ˆ (ii) In the post-T1 subsample, for t = T1 + 1,..., T1 + mˆ 2, we define y¨t = yt − yt+m ˆ 2 , ˆ x¨t = xt − xt+m ˆ 2 , and u¨t = ut − ut+m ˆ 2 . It is noteworthy that the post-T1 subsample should be divided into two parts if the estimate of the break date is different from the true break date, ˆ 0 that is, T1 < T1 . More precisely, ( α + β x + u , for t = Tˆ + 1,...,T 0, y = 1 1 t−1 t 1 1 t 0 α2 + β2 xt−1 + ut, for t = T1 + 1,...,T, and ˆ ˆ yt+m ˆ 2 = α2 + β2 xt−1 + ut, t = T1 + 1,..., T1 +m ˆ 2, from Propositions 1 and 2. Hence, we have

y¨t = β2 x¨t−1 +u ¨t, where ( (u − u ) + (α − α ) + (β − β )x , for t = Tˆ + 1,...,T 0, u¨ = t t+m ˆ 2 1 2 1 2 t−1 1 1 t 0 ˆ (ut − ut+m ˆ 2 ), for t = T1 + 1,..., T1 +m ˆ 2. Based on the preceding equations, the empirical likelihood function is defined as follows:

 mˆ 1 mˆ 1 mˆ 1  ¨ Y X X ¨ L1(β1) = sup (m ˆ 1pt): p1 ≥ 0, . . . , pmˆ 1 ≥ 0, pt = 1, ptHt(β1) = 0 , t=1 t=1 t=1

¨ p 2 where Ht(β1) = (y¨t − β1 x¨t−1) x¨t−1/ 1 +x ¨t−1. After applying the Lagrange multiplier technique, we have

mˆ 1 ¨ ¨ X ¨ ¨ l1(β1) = −2 log L1(β1) = 2 log{1 + λ1Ht(β1)}, t=1 ¨ ¨ where λ1 = λ1(β1) satisfies mˆ 1 ¨ X Ht(β1) = 0. ¨ ¨ t=1 1 + λ1Ht(β1) Similarly,

Tˆ +m ˆ Tˆ +m ˆ Tˆ +m ˆ  1Y 2 1X 2 1X 2  L¨ (β ) = sup (m ˆ p ): p ≥ 0, . . . , p ≥ 0, p = 1, p H¨ (β ) = 0 , 2 2 2 t Tˆ1+1 Tˆ1+m ˆ 2 t t t 2

t=Tˆ1+1 t=Tˆ1+1 t=Tˆ1+1 (13)

13 ¨ p 2 where Ht(β2) = (¨yt − β2 x¨t−1)x ¨t−1/ 1 +x ¨t−1, and

Tˆ1+m ˆ 2 ¨ ¨ X ¨ ¨ l2(β2) = −2 log L2(β2) = 2 log{1 + λ2Ht(β2)},

t=Tˆ1+1 ¨ ¨ where λ2 = λ2(β2) satisfies ˆ T1+m ˆ 2 ¨ X Ht(β2) = 0. ¨ ¨ 1 + λ2Ht(β2) t=Tˆ1+1 The following theorem shows that the Wilks’ theorem holds for the proposed empiri- cal likelihood method. Propositions 1 and 2 are crucial to derive the asymptotic results. Theoretical details are deferred to the Appendix.

Theorem 3. Suppose that model (4) holds under Assumption 1 along with either Assumption 0 ¨ 0 ¨ 0 2 or Assumption 3. If the true break date T1 is unknown, l1(β1 ) and l2(β2 ) converge in distribution to a chi-square limit with one degree of freedom as T → ∞.

Although we implement the estimate of the break date so as to construct the EL test, the obtained limiting distribution is identical to that of EL test based on the true break date (Theorem 2). From Theorem 3, similar to (10), we can construct confidence intervals for 0 βj , j = 1, 2, with level τ as follows:

¨ 2 ¨ 2 CI1,τ = {β1 : l1(β1) ≤ χ1,τ } and CI2,τ = {β2 : l2(β2) ≤ χ1,τ }, (14)

2 where χ1,τ denotes the τth quantile of a chi-square distribution with one degree of freedom.

Remark 5. It would be more coherent if we can test structural breaks and estimate the break dates based on EL methods as well. The consistent estimates of the break fractions, however, have not been established in the literature (See Baragona, Battaglia, and Cucina, 2013). Hence, we solely rely on the consistent estimate of the break fraction that is estimated by minimizing the sum of squared residuals in the unrestricted regression model.

2.3 Empirical Likelihood Method with a Structural Change in the Predictor Variable xt

In contrast to the preceding arguments, we consider the case in which the predictor variable

exhibits a structural change in mean or slope. More precisely, the predictor variable xt can be specified as follows:

xt = θ11t≤Tb + θb1t>Tb + φ xt−1 + vt, (15)

xt = θ1 + φ1 xt−11t≤Tb + φb xt−11t>Tb + vt, (16)

14 and

xt = (θ1 + φ1 xt−1)1t≤Tb + (θb + φb xt−1)1t>Tb + vt. (17) The following assumption is standard to ensure that there is a single structural change in

the marginal distribution of the predictor variable xt.

Assumption 4. θ1 6= θb, −1 < φ1, φb ≤ 1, φ1 6= φb, and Tb = [λbT ], where λb is constant and λb ∈ (0, 1).

0 If the true break date T1 is unknown, we may estimate it by minimizing the sum of squared ˆ residuals (SSR) in the unrestricted model. The consistency of the estimate T1, however, has

not been established in the literature when the predictor variable xt exhibits a structural change in mean or slope. In the following theorem, we derive the limiting distributions of EL 0 tests under the assumption that the true break date T1 is known.

Theorem 4. Suppose that model (4) replacing the specification of xt with one of (15)-(17) holds under Assumptions 1 and 4 along with either Assumption 2 or Assumption 3. If the true 0 ˜ 0 ˜ 0 break date T1 is known a priori, l1(β1 ) and l2(β2 ) with an unknown α converge in distribution to a chi-square limit with one degree of freedom as T → ∞.

Note that it is unnecessary to estimate the break date in the predictor variable, Tb, in Theorem 4. It implies that EL methods are robust to a structural change occurred in the predictor variable.

3 Monte Carlo Simulation

In this section, we assess the finite sample properties of the procedure. The data are generated by the following processes:

y = β 1 0 + β 1 0 + u , t 1 t≤T1 2 t>T1 t with the predictor variable xt specified by

xt = θ + φ xt−1 + vt, (No structural change) (18)

xt = θ1 1t≤Tb + θb 1t>Tb + φ xt−1 + vt, (A level shift) (19) or

xt = (θ1 + φ1 xt−1)1t≤Tb + (θb + φb xt−1)1t>Tb + vt, (A level shift & a slope change) (20) where the noise components (ut, vt) are contemporaneously correlated as follows:

p 2 ut = ρ vt + 1 − ρ ηt, ρ ∈ {0, −0.95}. (21)

15 For the distributions of (v , η ) in (21), we consider normal and t(3) distributions. Moreover, t t √ we allow for GARCH(1,1) errors of the form ηt = ht ςt, where ςt is i.i.d. N (0, 1) and 2 ht = 0.05 + 0.1 ηt + 0.85 ht−1, t = 1,...,T , and vt is i.i.d. N (0, 1). We set the various −1/2 0 parameters at the following values: β1 = d T and β2 = 0, λ ∈ {0.5, 0.7}, λb = 0.5,

θ = θ1 = 0, θb = −0.8, φ1 = 0.5, and φ, φb ∈ {0.9, 0.99, 1}. Under the configuration, the size 0 is the rejection probabilities of the null hypothesis β2 = 0, while the power is those of the 0 null hypothesis β1 = 0. We set d = 8 and the sample size T = 100, 250, 500, 1000, which

that β1 = 0.8, 0.51, 0.36, 0.25, respectively. It is adequate for medium-sized structural changes and widely used in the literature (see, e.g., Elliott and M¨uller,2007). The system is

initialized at x0 = 0. The number of simulation replications is 5000 for each combination of parameters. We use the R package “emplik” in Zhou (2015) to compute the empirical likelihood function. Tables 1-3 present the size properties of EL methods with a nominal size of 5%. Since we

set α1 = α2 = 0 in the DGP, the empirical likelihood method in (9) is well defined (labeled EL1), while the empirical likelihood method in (13) allows for an irrelevant intercept (labeled EL2). In addition, to investigate the effect of the embedded endogeneity on the EL tests, we

consider the case in which the errors ut and vt are contemporaneously uncorrelated, that is, 0 ρ = 0. It is noteworthy that we estimate the unknown break date T1 in order to implement the proposed EL methods only when the predictor variable exhibits no structural change as in (18). The consistency of the estimate of the break fraction has not established in the literature when the predictor variable experiences a structural change at an unknown date, 0 for instance, as in (19) and (20), thereby the true break date T1 is assumed to be known.

The results regarding to the empirical size in the case of normally distributed errors (vt, ηt) are presented in Table 1. In column (a), we consider the case of (18), that is, no level shift in the predictor variable. We observe that for sample sizes T ≥ 250 the EL tests have excellent size control across all values of φ and ρ. For T = 100, EL2 appears to be oversized when a structural change occurs near boundary (λ0 = 0.7). For the other combinations of φ and ρ, both EL1 and EL2 have the correct size. In general, EL1 performs better than EL2. It is an expected consequence because EL2 is somewhat inefficient by taking first difference with a large lag to remove an irrelevant intercept term, but the difference between EL1 and EL2 is minor with large samples. In column (b), we consider the case of (19), that is, a level shift in the predictor variable. Since the level shift in the predictor variable is set to be located at

the center of the sample (λb = 0.5), it may occur either at the same time as the structural break in the predictive regression (λ0 = 0.5) or at a different time (λ0 = 0.7). In column (c), we consider the case of (20), that is, a slope change in the predictor variable concurrent with a level shift. In both cases, EL methods control the empirical size quite well, which supports

16 that the proposed EL methods are robust to the existence and the timing of a structural change embedded in the predictor variable. Furthermore, the values of ρ that capture the

correlation between ut and vt have little effect on the size properties of the EL methods, whereby the proposed EL tests may resolve the so-called embedded endogeneity problem.

The results regarding to the empirical size in the case of t(3) errors (vt, ηt) are presented in Table 2. We find that the EL statistic shows size close to the nominal rate 5% apart from some mild oversizing for T = 100 and λ0 = 0.7, particularly so for EL2. It confirms that the errors with heavier tails have an effect on the rejection probability of EL tests, but such effect has been alleviated as the sample size increases. Since the asymptotic results for the EL methods are robust under conditional heteroskedasticity, we employ GARCH(1,1) errors whose specifications have been described previously. We examine the empirical sizes of EL methods in Table 3. As in the cases of normally and t(3) distributed errors, EL methods maintain the empirical size close to the nominal rate 5%, while they show somewhat severe size distortions for T = 100 and λ0 = 0.7. It is also noteworthy that the empirical sizes show similar patterns to the case of normally distributed errors regardless of having a structural change in the predictor variable. On the other hand, the powers of EL tests turn out to be very close to 1 across all configurations considered.1 It occurs because the magnitude of the structural change in 0 the predictive regression, β1, is large enough or the true break date T1 is known a priori. This result highlights that the magnitude of the change in the predictive regression plays an important role in finite sample properties of the proposed EL tests. To investigate the

finite sample performance of the EL tests, we allow the value of β1 to range from -1 to 1 in

increments of 0.1. For other parameters, we set β2 = 0, ρ = −0.95, θ = 0, φ = 0.99, and

T = 150. The noise components (vt, ηt) ∼ i.i.d. N (0, 1). The break date can be estimated

by minimizing the SSR as described in (7), where the predictor variable xt has an AR(1)

structure defined in (18). For sizes of EL methods, we test the null of β2 = 0, while testing ˆ the null of β1 = 0 for powers. We simulate the following statistics: EL1 with T1, EL1 with 0 ˆ 0 T1 , EL2 with T1, and EL2 with T1 . The results are presented in Figure 1 where sizes are ˆ reported in the top panel reports and powers in the bottom panel. The tests, EL1 with T1 ˆ and EL2 with T1, suffer from some liberal size distortions unless |β1| is large. This implies ˆ that the estimates T1 are quite variable when |β1| is not large enough. This ˆ translates into distributions of EL tests constructed with T1 which are far from the true ˆ break date case and liberal size distortions occur. In comparison to the test EL2 with T1, ˆ the test EL1 with T1, which uses the fact that the intercept terms α1, α2 are known, is less affected by this problem, though some size distortions still remain. Regarding to powers,

1Power results are available upon request.

17 power reduction of the empirical likelihood test with unknown α (EL2) is inevitable due to the demeaning procedure that is unnecessary to the EL test with known α (EL1). The power ˆ 0 of EL1 with T1 is close to that of EL1 with T1 . But it is noteworthy that the power of EL1 ˆ 0 with T1 is falsely exaggerated compared to that of EL1 with T1 when there is no structural break, which emphasizes the cost of estimating a spurious break date. The power difference ˆ 0 between EL2 with T1 and EL2 with T1 is somewhat greater than that between EL1 tests. Finite sample properties of EL methods emphasize the fact that testing for and estimating the break date in the predictive regression is crucial so as to make EL tests valid. Simulation results evidently confirm that the proposed EL methods have good finite sample properties in terms of both size and power. The results shed new light on testing for stock return predictability because the EL methods can help deal with various time series characteristics of the predictor variable and incorporate a structural break at some unknown date in predictive regressions. Note that model specification between EL1 and EL2 is still substantial to improve powers of the EL tests. Moreover, simulation are devised to compare finite sample performance of EL tests with existing methods when a level shift in the predictor variable is ignored. For a fair comparison, we consider a conventional predictive regression model with a predictor variable having a level shift as in (19). We report the finite sample rejection frequencies of four predictability tests: the conventional t-test, the Bonferroni t-test by Cavanagh, Elliott, and Stock (1995), the Bonferroni Q-test by Campbell and Yogo (2006), and the sup-bound Q-test by Lewellen (2004). In Tables 4-5, we present the size properties of the aforementioned tests. In Table 4,

the DGP is specified as follows: yt = ut, xt = θb1t>[λbT ] + φ xt−1 + vt with θb = −0.8

and λb ∈ {0.25, 0.5}. For the noise components (vt, ηt), N (0, 1) and t(3) distributions are considered. The sample size is T = 100. We test the null hypothesis of no predictability, 0 that is, the regression coefficient on xt−1, β , is zero. The proposed EL methods have the empirical sizes close to the nominal level 5%. As expected, the conventional t-test shows severe size distortions when the noise components have a negative correlation, ρ = −0.95. It is interesting to observe the finite sample performance of Bonferroni type tests. If the errors

(ut, vt) are uncorrelated, Bonferroni type tests control sizes quite well. It is not the case with ρ = −0.95. Bonferroni t-test and Bonferroni Q-test can be somewhat liberal in the sense that the finite sample rejection frequencies are above the nominal level 5%. We find that size distortions for the conventional t-test and Bonferroni type tests worsen considerably when the errors follow t(3) distribution. On the other hand, the sup-bound Q-test tends to be conservative (the finite sample rejection frequencies are below the nominal rate 5%). Overall, EL methods are remarkable in terms of the size correction, which confirms the validity of EL

18 methods allowing a level shift in the predictor variable. In Table 5, we examine the power properties of the aforementioned tests without size adjustment, as there is no oversizing in the proposed EL tests. Our simulation study computes −1/2 power with respect to the alternatives specified by yt = β xt−1 + ut and β = 0.8 T for the sample size T = 100, that is, β = 0.08. To our surprise, the t-test, Bonferroni type tests, and the sup-bound Q-test show virtually no power against the of predictability. It highlights that those test statistics may not be reliable when there exists at least a single level shift in the predictor variable. For EL methods, we observe that (i) they have significant power, and (ii) the proposed EL method with known α (EL1) is much more powerful than the one with unknown α (EL2) especially for t(3) errors. In simulation experiments, we consider predictive regressions under practical assumptions on the predictor variable and the errors: (i) there exists a structural break in the predictive regression model, (ii) the predictor variable can be stationary or nonstationary, and (iii) the

errors (ut, vt) are contemporaneously correlated. Simulation results strongly support that the proposed empirical likelihood methods have good finite sample properties in terms of both size and power. Because the Bonferroni type tests requires the complicated implementation and more importantly does not provide theoretical justification for a structural change embedded in the predictor variable, the proposed EL method can be a useful complement to testing for asset return predictability.

4 Empirical Analysis

We apply the empirical likelihood method to test stock return predictability in this section.

More specifically, the predictable variable yt is the monthly S&P 500 excess return, and

the predictor variable xt includes the log dividend-price (d/p) ratio, the log earnings-price (e/p) ratio, the book-to-market (b/m) ratio, Treasury-bill rates (tbl), the default yield spread (dfy), the log dividend payout (d/e) ratio, the net equity expansion (ntis), and the term spread (tms). The data set contains monthly data and covers over 1927:01-2005:12, hence the sample size T = 948. The monthly returns are computed by the difference between the S&P 500 index including dividends and the one month Treasury bill rate. The data from Goyal and Welch (2008) have been widely used in the predictive regression literature, for instance, Cenesizoglu and Timmermann (2008), Maynard, Shimotsu, and Wang (2011), and Lee (2016) in a quantile regression framework. Distinguished from the existing methodologies such as Lewellen (2004), Campbell and Yogo (2006), Cai and Wang (2014), and many others,

it is unnecessary for the EL methods to evaluate how persistent the predictor variable xt is and to test whether the predictor variable experiences a structural break. In fact, based on the unit root tests, both d/p and e/p ratios are unit root or local-to-unity processes; see Cai,

19 Wang, and Wang (2015) for details. We first test for structural instability of parameters in the predictive regression model. Following Bai and Perron (1998, 2006), we consider the UDmax or WDmax test to check whether at least a structural break is present in models.2 Then the number of breaks can be

determined based on examinations of the sup FT (l + 1|l) statistic, a test for l versus l + 1 breaks, constructed by using estimates of the break dates obtained from a global minimization of the sum of squared residuals. Test results indicate that there exists a single break for all predictive regression models considered, so we estimate the break dates. For testing the return predictability, we apply the EL method based on the estimate of the break date. Here, we shall clarify two issues. First, the EL method in (13) is implemented for testing stock return predictability. That is, we exclude the possibility that the intercept term α is zero. Second, we use the estimate of the break date, while the precedent works consider various subsamples based on arbitrarily chosen dates. In Table 6, we report the EL test results, where p-values (%) are rounded to two decimal places for exposition. The results shown in bold imply the rejection of the null hypothesis of ˆ no predictability at the 10% significance level. The estimated break date, T1, in a univariate predictive regression model appears in the column labeled “Break Date”. When the log dividend-price (d/p) ratio is the predictor variable, we find that the estimated break date in the predictive regression is 1993:10 and the null of no predictability can be rejected in regime 2 (1993:11–2005:12). For the log earnings-price (e/p) ratio, EL test results confirm significant predictive ability in both regimes. The evidence of strong predictability is also provided for Treasury-bill rate (tbl) in regime 2 (1942:05–2005:12), while little evidence of predictability in regime 1 (1927:01–1942:04). To the contrary, there is no evidence of significance predictive ability for b/m, dfy, d/e, ntis, and tms. In Table 7, we apply the EL test in (13) to revisit the data set analyzed by Campbell and Yogo (2006). This data set may be effective to argue that the proposed EL method can provide more accurate statistical inference on return predictability than Campbell and Yogo (2006) given the time series characteristics of predictor variables. More specifically, the

predictable variable yt is the monthly CRSP value-weighted index data (1926:12–2002:12)

from the Center for Research in Security Prices, and the predictor variable xt is either the log dividend-price (d/p) ratio or the log earnings-price (e/p) ratio. The total number of observations in the sample is 913. The detailed description of this data set can be found in

2As addressed in Cai, Wang, and Wang (2015), one might suspect that the coefficients should be time- varying in a predictive regression model. Indeed, this can be supported by empirical evidences. We next apply the nonparametric test procedure proposed in Cai, Wang, and Wang (2015) to test whether the coefficient vector is constant or not. The test result concludes that the coefficients are changing over time, which provides an evidence to clear the suspicion; see Cai, Wang, and Wang (2015) for details.

20 Campbell and Yogo (2006). First, for the log dividend-price (d/p) ratio, the proposed EL method rejects the null hypothesis of no predictability at the 5% significance level on the pre-break subsample (p-value = 3.49%). On the other hand, on the post-break subsample, we cannot reject the null of no predictability using the EL test. It is an interesting feature that the log dividend-price ratio shows significant predicting power before May 1940, while it cannot predict the CRSP value-weighted index from 1940:06 to 2002:12. The 90% Bonferroni confidence interval (Panel B in Table 5, Campbell and Yogo, 2006), however, contains zero on the full-sample, which implies that the log dividend-price ratio has no predictability. Second, for the log earnings-price (e/p) ratio, the EL method shows that the null hypothesis of no predictability cannot be rejected at the 10% significance level. It is a striking contrast with the result in Campbell and Yogo (2006) that the Bonferroni confidence interval is located above zero, whereby the Bonferroni Q-test rejects the null of no predictability for the earnings-price ratio when the monthly data span from 1926:12 to 2002:12 (Panel B in Table 5, Campbell and Yogo, 2006). The predictability of the log earnings-price ratio has been reversed when we consider a structural break in a predictive regression model. Note that the Bonferroni type tests are valid if the predictor variable is as persistent as a local-to-unity process with strong assumptions, for instance, normality and known of innovations. Hence, a further investigation is necessary to find out the source of the contradictory finding. The empirical findings reveal the merits of EL methods which complement the existing methodologies. Before finishing this section, we should discuss the stationarity assumption on the predictor variables in brief. To understand the implications of the assumption, we examine it in two different aspects. In an economic point of view, the predictor variables should be stationary, that is, |φ| < 1. Otherwise, there may exist an explosive bubble in stock prices, which is still a controversial topic in empirical studies. As noted in Lettau and Van Nieuwerburgh (2008), a level shift in the predictor variable is a form of nonstationarity as well. In a statistical point of view, it is preferred to make tests reliable regardless of the predictor variables being stationary or nonstationary. Test results, however, cannot pin down the time series characteristic of the predictor variable. Since it makes little sense to predict asset returns with an integrated process, a further investigation on the predictor variable might be required.

5 Concluding Remarks

In this article, we considered predictive regression models where model parameters exhibit a structural break at an unknown date and the predictor variable is allowed to be either a stationary or a nonstationary process. As a special case of nonstationary processes, we considered the possibility that there exists a structural break in the predictor variable. As addressed by Viceira (1997), Paye and Timmermann (2006), and Rapach and Wohar (2006),

21 instability in predictive regression models is a long-standing problem in predictive regressions. The main contribution of this article to the literature is in providing a unified approach that is valid regardless of time series characteristics of the predictor variable when model parameters are instable in predictive regressions. We established novel testing procedures for asset return predictability via empirical likelihood methods based on some weighted score equations and studied their asymptotic distributions. The proposed methods are particularly useful because they require no prior knowledge as to (i) whether the predictor variable is either stationary or nonstationary, where (nearly) integrated processes with or without a structural break are included as nonstationary processes, (ii) whether the noise components are contemporaneously correlated or not, common cases in practice. Simulations have demonstrated EL methods’ usefulness and clear improvement over existing tests. Nevertheless, devising predictability tests with good size and power requires information about the presence or absence of a change and the date of the structural change if it exists. A generalized Chow test with nonstationary regressors can deliver such information to some extent. Furthermore, an extension of practical interest is to generalize our model for the purpose of analyzing a large global data set, which is useful for capturing the predictable component of stock returns (see, e.g., Paye and Timmermann, 2006; Hjalmarsson, 2010). Such investigations, and others, are the objects of the ongoing research.

References

Amihud, Y. and C. Hurvich (2004). Predictive regressions: A reduced-bias estimation method. Journal of Financial and Quantitative Analysis 39, 813–841.

Amihud, Y., C. Hurvich, and Y. Wang (2009). Multiple-predictor regressions: Hypothesis testing. Review of Economic Studies 22, 413–434.

Andrews, D. W. K. (1993). Tests for parameter instability and structural change with unknown change point. Econometrica 61, 821–856.

Andrews, D. W. K. and W. Ploberger (1994). Optimal tests when a nuisance parameter is present only under the alternative. Econometrica 62, 1383–1414.

Bai, J. (1994). Least squares estimation of a shift in linear processes. Journal of Time Series Analysis 15, 453–472.

Bai, J. (1997). Estimation of a change point in multiple regression models. Review of Economics and Statistics 79, 551–563.

Bai, J. (1998). A note on spurious break. Econometric Theory 14, 663–669.

22 Bai, J., R. L. Lumsdaine, and J. H. Stock (1998). Testing for and dating common breaks in multivariate time series. Review of Economic Studies 65, 395–432.

Bai, J. and P. Perron (1998). Estimating and testing linear models with multiple structural changes. Econometrica 66, 47–78.

Bai, J. and P. Perron (2003). Computation and analysis of multiple structural change models. Journal of Applied Econometrics 18, 1–22.

Bai, J. and P. Perron (2006). Multiple structural change models: A simulation analysis. Econometric Theory and Practice: Frontiers of Analysis and Applied Research, 212–237. in: Corbae, D., Durlauf, S.N., Hansen, B.E. (Eds.), Cambridge Press.

Baragona, R., F. Battaglia, and D. Cucina (2013). Empirical likelihood for break detection in time series. Electronic Journal of Statistics 7, 3089–3123.

Cai, Z. and Y. Wang (2014). Testing predictive regression models with nonstationary regressors. Journal of Econometrics 178, 4–14. Corrigendum, Journal of Econometrics 181, 194.

Cai, Z., Y. Wang, and Y. Wang (2015). Testing instability in a predictive regression model with nonstationary regressors. Econometric Theory 31, 953–980.

Campbell, J. Y. (2008). Viewpoint: Estimating the equity premium. Canadian Journal of Economics 41, 1–21.

Campbell, J. Y., A. W. Lo, and A. C. Mackinlay (1997). The Econometrics of Financial Markets. Princeton University Press.

Campbell, J. Y. and M. Yogo (2006). Efficient tests of stock return predictability. Journal of Financial Economics 81, 27–60.

Cavanagh, C. L., G. Elliott, and J. H. Stock (1995). Inference in models with nearly integrated regressors. Econometric Theory 11, 1131–1147.

Cenesizoglu, T. and A. Timmermann (2008). Is the distribution of stock returns predictable? Unpublished Working Paper. HEC Montreal and UCSD.

Chan, N. H., D. Li, and L. Peng (2012). Toward a unified of autoregressions. Econometric Theory 28, 705–717.

Chan, N. H. and L. Peng (2005). Weighted least absolute deviations estimation for an AR(1) process with ARCH(1) errors. Biometrika 92, 477–484.

23 Chan, N. H. and C. Z. Wei (1987). Asymptotic inference for nearly nonstationary AR(1) processes. Annals of Statistics 15, 1050–1063.

Chang, S. Y. and P. Perron (2016). Inference on a structural break in trend with fractionally integrated errors. Journal of Time Series Analysis 37, 555–574.

Chen, W. W. and R. S. Deo (2009). Bias reduction and likelihood-based almost exactly sized hypothesis testing in predictive regressions using the restricted likelihood. Econometric Theory 25, 1143–1179.

Choi, Y., S. Jacewitz, and J. Y. Park (2016). A reexamination of stock return predictability. Journal of Econometrics 192, 168–189.

Chow, G. C. (1960). Tests of equality between sets of coefficients in two linear regressions. Econometrica 28, 591–605.

Clark, T. E. and K. D. West (2006). Using out-of-sample mean squared prediction errors to test the martingale difference hypothesis. Journal of Econometrics 135, 155–186.

Elliott, G. and U. K. M¨uller(2006). Efficient tests for general persistent time variation in regression coefficients. Review of Economic Studies 73, 907–940.

Elliott, G. and U. K. M¨uller(2007). Confidence sets for the date of a single break in linear time series regressions. Journal of Econometrics 141, 1196–1218.

Elliott, G. and J. H. Stock (1994). Inference in time series regression when the order of integration of a regressor is unknown. Econometric Theory 10, 672–700.

Gonzalo, J. and J.-Y. Pitarakis (2012). Regime-specific predictability in predictive regressions. Journal of Business & Economic Statistics 30, 229–241.

Gonzalo, J. and J.-Y. Pitarakis (2017). Inferring the predictability induced by a persistent regressor in a predictive threshold model. Journal of Business & Economic Statistics 35, 202–217.

Goyal, A. and I. Welch (2008). A comprehensive look at the empirical performance of equity premium prediction. Review of Financial Studies 21, 1455–1508.

Hall, P. and C. C. Heyde (1980). Martingale Limit Theory and Its Applications. Academic Press, New York.

Hjalmarsson, E. (2010). Predicting global stock returns. Journal of Financial and Quantitative Analysis 45, 49–80.

24 Hsu, Y.-C. and C.-M. Kuan (2008). Change- of nonstationary I(d) processes. Economic Letters 98, 115–121.

Jansson, M. and M. J. Moreira (2006). Optimal inference in regression models with nearly integrated regressors. Econometrica 74, 681–714.

Kuan, C.-M. and C.-C. Hsu (1998). Change-point estimation of fractionally integrated processes. Journal of Time Series Analysis 19, 693–708.

Kurozumi, E. and Y. Arai (2006). Efficient estimation and inference in cointegrating regressions with structural change. Journal of Time Series Analysis 28, 545–575.

Lee, J. H. (2016). Predictive quantile regression persistent covariates: IVX-QR aproach. Journal of Econometrics 192, 105–118.

Lettau, M. and S. Van Nieuwerburgh (2008). Reconciling the return predictability evidence. Review of Financial Studies 21, 1607–1652.

Lewellen, J. (2004). Predicting returns with financial ratios. Journal of Financial Eco- nomics 74, 209–235.

Li, D., N. H. Chan, and L. Peng (2014). Empirical likelihood test for causality of bivariate AR(1) processes. Econometric Theory 30, 357–371.

Ling, S. (2005). Self-weighted least absolute deviation estimation for infinite autoregressive models. Journal of the Royal Statistical Society: Series B 67, 381–393.

Liu, X., B. Yang, Z. Cai, and L. Peng (2016). A unified test to predictability of asset returns. Unpublished Working Paper, Department of Economics, University of Kansas.

Maynard, A., K. Shimotsu, and Y. Wang (2011). Inference in predictive quantile regressions. Unpublished Working Paper. University of Guelph.

Nunes, L. C., C.-M. Kuan, and P. Newbold (1995). Spurious break. Econometric Theory 11, 736–749.

Owen, A. B. (2001). Empirical Likelihood. Chapman & Hall, New York.

Paye, B. S. and A. Timmermann (2006). Instability of return prediction models. Journal of Empirical Finance 13, 274–315.

Perron, P. (1989). The great crash, the oil price shock, and the unit root hypothesis. Econometrica 57, 1361–1401.

25 Phillips, P. C. B. (1987). Towards a unified asymptotic theory for autoregression. Biomet- rica 74, 535–547.

Phillips, P. C. B. (2015). Halbert White Jr. memorial JFEC lecture: Pitfalls and possibilities in predictive regression. Journal of Financial Econometrics 13, 521–555.

Phillips, P. C. B. and J. H. Lee (2013). Predictive regression under various degrees of persistence and robust long-horizon regression. Journal of Econometrics 177, 250–264.

Phillips, P. C. B. and T. Magdalinos (2007). Limit theory for moderate deviations from a unit root. Journal of Econometrics 136, 115–130.

Phillips, P. C. B. and T. Magdalinos (2009). Econometric inference in the vicinity of unity. CoFie Working Paper No.7 . Singapore Management University.

Rapach, D. E. and M. E. Wohar (2006). Structural breaks and predictive regression models of aggregate U.S. stock returns. Journal of Financial Econometrics 4, 238–274.

So, B. S. and D. W. Shin (1999). Cauchy estimators for autoregressive processes with applications to unit root tests and confidence intervals. Econometric Theory 15, 165–176.

Stambaugh, R. (1999). Predictive regressions. Journal of Financial Economics 54, 375–421.

Stock, J. H. and M. W. Watson (1996). Evidence on structural instability in macroeconomic time series relations. Journal of Business & Economic Statistics 14, 11–30.

Torous, W., R. Valkanov, and S. Yan (2005). On predicting stock returns with nearly integrated explanatory variables. Journal of Business 77, 937–966.

Viceira, L. M. (1997). Testing for structural change in the predictability of asset returns. Unpublished Working Paper. Harvard University.

Zhou, M. (2015). Empirical likelihood ratio for censored/truncated data. R package version 1.0 , http://CRAN.R–project.org/package=emplik.

Zhu, F., Z. Cai, and L. Peng (2014). Predictive regressions for macroeconomic data. Annals of Applied Statistics 8, 577–594.

26 Appendix

Note that all limit statements are taken as T → ∞ whenever there is no confusion.

Proof of Proposition 1. For a proof, see Proposition 1 in Bai (1997). 

Proof of Proposition 2. For a proof, see Proposition 2 in Kurozumi and Arai (2006).  Proof of Theorem 1. In Theorem 1, the intercept term α in the predictive regression model is assumed to be known. Since the proof is easier, it is referred to that of Theorem 3 where α is an unknown parameter.  0 Proof of Theorem 2. In Theorem 2, the true break date T1 is assumed to be known. The proof 0 ˆ follows from that of Theorem 3 replacing T1 with the estimate T1, hence is omitted.  Proof of Theorem 3. We consider the following predictive regression model with a structural 0 break at some unknown date T1 : for t = 1,...,T , ( yt = (α1 + β1xt−1) 1 0 + (α2 + β2xt−1) 1 0 + ut, t≤T1 t>T1 xt = θ + φ xt−1 + vt.

The estimate of the break date Tˆ1 can be obtained by using a global least squares criterion as ˆ 0 explained in (7). Here, we consider the case in which T1 < T1 , that is, the estimated break date is 0 located in the pre-T1 subsample. Due to symmetry, the same arguments can be applied to the case ˆ 0 ˆ in which T1 > T1 . Henceforth, we consider two subsamples: the pre-T1 subsample (Case 1) and the post-Tˆ1 subsample (Case 2). Define two sets as follows:

0 V0(ε0) = {T1 : |T1 − T1 | < ε0, ∀ε0 > 0}, 0 2v V1(ε1) = {T1 : |T1 − T1 | < ε1T , ∀ε1 > 0}.

From the results of Propositions 1 and 2, Pr(Tˆ1 ∈ V0(ε0)) → 1 when xt is stationary, and Pr(Tˆ1 ∈ V1(ε1)) → 1 when xt is integrated. Hence, it suffices to consider the behavior of empirical likelihood (EL) methods for all T1 ∈ Vj(εj), j = 0, 1. Let m1 ≡ [T1/2] and m2 ≡ [(T − T1)/2]. Let Ft denote the σ-field generated by {u¨t−s, v¨t−s|s ≥ 0}. We establish EL methods in each subsample below. Case 1: For t = 1, . . . , m1, after some algebra, we have

t X 1 − φt x = φt−jv + θ + φtx , and t j 1 − φ 0 j=1 t+m X1 1 − φt+m1 x = φt+m1−jv + θ + φt+m1 x t+m1 j 1 − φ 0 j=1 m t X1 X 1 − φt φt − φt+m1 = φt+m1−jv + φt−jv + θ + θ + φtx + (φt+m1 − φt)x . j j+m1 1 − φ 1 − φ 0 0 j=1 j=1 Hence, we have

x¨t ≡ xt − xt+m1 t m X X1 φt − φt+m1 = φt−jv¨ − φt+m1−jv − θ − (φt+m1 − φt)x = w − w , j j 1 − φ 0 1,t 2,t j=1 j=1

27 Pt t−j Pm1 t+m1−j t t+m1 t+m1 t where w1,t ≡ j=1 φ v¨j and w2,t ≡ j=1 φ vj + ((φ − φ )/(1 − φ))θ + (φ − φ )x0, and v¨j = vj − vj+m1 for j = 1, . . . , m1. Furthermore, we define y¨t = yt − yt+m1 and u¨t = ut − ut+m1 , therebyy ¨t = β1 x¨t−1 +u ¨t. When |φ| < 1, it follows that

m1 2 m1  Pt t−j 2   Pt t−j 2  −1 X w1,t−1 −1 X ( j=1 φ v¨j) ( j=1 φ v¨j) m = m = lim E + op(1) 1 1 + w2 1 Pt t−j 2 t→∞ Pt t−j 2 t=1 1,t−1 t=1 1 + ( j=1 φ v¨j) 1 + ( j=1 φ v¨j) 2 ≡ σ0,1 + op(1). (22)

When φ = 1, we have

p |w | p |w | → ∞, |w | = O (t1/2), and 1,t → ∞ (23) 1,t 1,t p t1/2−κ for any κ > 0 as t → ∞. Hence,

2 w1,t−1 p 2 → 1 as t → ∞, 1 + w1,t−1 that is, m1 2 X w1,t−1 p m−1 → 1. (24) 1 1 + w2 t=1 1,t−1 Due to (22) and (24), we have

m1  2 2  m1 2 −1 X u¨t w1,t−1 −1 X w1,t−1 2 m E Ft−1 = m E[¨ut |Ft−1] 1 1 + w2 1 1 + w2 t=1 1,t−1 t=1 1,t−1 ( 2 p σ Σ¨ if |φ| < 1, → 0,1 u Σ¨ u if φ = 1,

2 ¨ since E[¨ut |Ft−1] ≡ Σu < ∞. Moreover, for any c > 0,

m1  2 2  m1  2+q2  −1 X u¨t w1,t−1 1 −1 X u¨tw1,t−1 m1 E 2 1 u¨2w2 Ft−1 ≤ √ m1 E Ft−1 1 + w t 1,t−1 2 (c m )q2 q 2 1,t−1 2 >c m1 1 1 + w t=1 1+w1,t−1 t=1 1,t−1 m 1 1 |w |2+q2 −1 X 1,t−1 2+q2 p = √ m1 E[|u¨t| |Ft−1] → 0 (c m )q2 (1 + w2 )(2+q2)/2 1 t=1 1,t−1

2+q2 ¨ ¨ since E[|u¨t| |Ft−1] < M for some q2 > 0 and M < ∞. By Corollary 3.1 of Hall and Heyde (1980), we have m1 ( 2 ¨ −1/2 X u¨tw1,t−1 d N (0, σ0,1Σu) if |φ| < 1, m1 q → (25) 2 N (0, Σ¨ u) if φ = 1. t=1 1 + w1,t−1 On the other hand, it is easy to check that ( −t Op(1) if |φ| < 1, |φ w2,t−1| = 1/2 (26) Op(m1 ) if φ = 1,

28 and |w | p 2,t−1 → ∞ (27) t1/2−κ for any κ > 0 as t ≤ m1 goes to infinity. Using (23), (26), and (27), Taylor series expansion shows   m1 m1 1 X u¨tx¨t−1 u¨tw1,t−1 1 X 1 − = − u¨ w √ q q  √ t 2 3/2 2,t−1 m1 2 2 m1 {1 + (at−1x¨t−1 + (1 − at−1)w1,t−1) } t=1 1 +x ¨t−1 1 + w1,t−1 t=1 m 1 X1 1 = −√ u¨ w m t {1 + (w − a w )2}3/2 2,t−1 1 t=1 1,t−1 t−1 2,t−1    1 Pm1 t Op √ φ |u¨t| if |φ| < 1,  m1 t=1 =  1/2+κ  1 Pm1 m1 Op √ |u¨t| if φ = 1,  m1 t=1 t3(1/2−κ)

= op(1), (28) where at−1 ∈ [0, 1] may depend onx ¨t−1 and w1,t−1, and κ > 0 is sufficiently small. From (25) and (28), we have m1 ( 2 ¨ −1/2 X d N (0, σ Σu) if |φ| < 1, m H¨ (β0) → 0,1 1 t 1 ¨ t=1 N (0, Σu) if φ = 1. In a similar way, we can show that m1 ( 2 ¨ X p σ Σu if |φ| < 1, m−1 [H¨ (β0)]2 → 0,1 1 t 1 ¨ t=1 Σu if φ = 1. The rest follows from the standard arguments in the proof of the empirical likelihood method (see Chapter 11 of Owen, 2001).

Case 2: As addressed previously, it suffices to consider all T1 in V (εj) for j = 0, 1. For t = T1 + 1,...,T1 + m2, we have t X 1 − φt−T1 x = φt−jv + θ + φt−T1 x , and t j 1 − φ T1 j=T1+1 t+m X2 1 − φt−T1+m2 x = φt+m2−jv + θ + φt−T1+m2 x t+m2 j 1 − φ T1 j=T1+1

T1+m2 t X t+m2−j X t−j = φ vj + φ vj+m2 j=T1+1 j=T1+1 1 − φt−T1 φt−T1 − φt−T1+m2 + θ + θ + φt−T1 x + (φt−T1+m2 − φt−T1 )x . 1 − φ 1 − φ T1 T1 Hence, we have t T +m X 1X 2 φt−T1 − φt−T1+m2 x¨ = φt−jv¨ − φt+m2−jv − θ − (φt−T1+m2 − φt−T1 )x t j j 1 − φ T1 j=T1+1 j=T1+1

= w1,t − w2,t,

29 where w ≡ Pt φt−jv¨ and w ≡ PT1+m2 φt+m2−jv + ((φt−T1 − φt−T1+m2 )/(1 − φ))θ + 1,t j=T1+1 j 2,t j=T1+1 j t−T1+m2 t−T1 (φ − φ )xT1 , and v¨j = vj − vj+m2 for j = T1 + 1,...,T1 + m2. Moreover, we define y¨t = yt − yt+m2 , x¨t = xt − xt+m2 , and u¨t = ut − ut+m2 . The post-T1 subsample is divided into two 0 parts around T1 . More precisely, ( α + β x + u , for t = T + 1,...,T 0, y = 1 1 t−1 t 1 1 t 0 α2 + β2 xt−1 + ut, for t = T1 + 1,...,T, and

yt+m2 = α2 + β2 xt−1 + ut, t = T1 + 1,...,T1 + m2. 0 On the set Vj(εj), j = 0, 1, it holds that (t + m2) > T1 for t = T1 + 1,...,T1 + m2. Hence, y¨t = β2 x¨t−1 +u ¨t, where ( (u − u ) + (α − α ) + (β − β )x , for t = T + 1,...,T 0, u¨ = t t+m2 1 2 1 2 t−1 1 1 (29) t 0 (ut − ut+m2 ), for t = T1 + 1,...,T1 + m2.

Consider first the case in which |φ| < 1. We have

T1+m2 2 ! T1+m2  Pt t−j 2   Pt t−j 2  −1 X w1,t−1 −1 X ( j=1 φ v¨j) ( j=1 φ v¨j) m = m = lim + op(1) 2 2 2 Pt t−j 2 t→∞ E Pt t−j 2 1 + w1,t−1 1 + ( φ v¨j) 1 + ( φ v¨j) t=T1+1 t=T1+1 j=1 j=1 2 ≡ σ0,2 + op(1).

On the other hand, when φ = 1, due to (23),

T1+m2 2 −1 X w1,t−1 p m2 2 → 1. 1 + w1,t−1 t=T1+1

0 Sinceu ¨t is defined differently around T1 as (29), we have

T1+m2  2 2   T1+m2 2 −1 X u¨t w1,t−1 −1 X w1,t−1 2 m2 E 2 Ft−1 = m2 2 E[¨ut |Ft−1] 1 + w1,t−1 1 + w1,t−1 t=T1+1 t=T1+1 T 0 1 2  X 2 w1,t−1 + [(α1 − α2) + (β1 − β2)xt−1] 2 1 + w1,t−1 t=T1+1 ( 2 p σ Σ¨ if |φ| < 1, → 0,2 u Σ¨ u if φ = 1, where T1 is in the set Vj(εj), j = 0, 1, with probability approaching to 1, thereby the number of T 0 terms in the summation P 1 is O (kδk−2) or O (T 2v), 0 ≤ v < 1/2 depending on whether x is t=T1+1 p p t stationary or integrated (see Propositions 1 and 2). Regardless of xt being stationary or integrated, T 0 p it is straightforward to show that m−1 P 1 [(α − α ) + (β − β )x ]2w2 /(1 + w2 ) → 0 2 t=T1+1 1 2 1 2 t−1 1,t−1 1,t−1 because m2 = [(T − T1)/2].

30 Furthermore,

T1+m2  2 2  T1+m2  2+q2  −1 X u¨t w1,t−1 1 1 X u¨tw1,t−1 m2 E 2 1 u¨2w2 Ft−1 ≤ √ E Ft−1 1 + w t 1,t−1 2 (c m )q2 m q 2 1,t−1 2 >c m2 2 2 1 + w t=T1+1 1+w1,t−1 t=T1+1 1,t−1 T +m  1 2 2+q2 1 1 X |w1,t−1| = [|u¨ |2+q2 |F ] √ q 2 (2+q )/2 E t t−1 (c m2) 2 m2 (1 + w ) 2 t=T1+1 1,t−1 T 0 1 2+q2  X 2 |w1,t−1| + [(α1 − α2) + (β1 − β2)xt−1] (1 + w2 )(2+q2)/2 t=T1+1 1,t−1 p → 0.

By Corollary 3.1 of Hall and Heyde (1980), we have

T1+m2 ( 2 ¨ −1 X u¨tw1,t−1 d N (0, σ0,2Σu) if |φ| < 1, m2 q → (30) 2 N (0, Σ¨ u) if φ = 1. t=T1+1 1 + w1,t−1

Similar to Case 1, it is straightforward from Taylor series expansion to show that   T1+m2 1 X u¨tx¨t−1 u¨tw1,t−1 √ q − q  m2 2 2 t=T1+1 1 +x ¨t−1 1 + w1,t−1 T +m 1 1X 2 1 = − u¨ w √ t 2 3/2 2,t−1 m2 {1 + (bt−1x¨t−1 + (1 − bt−1)w1,t−1) } t=T1+1 T +m 1 1X 2 1 = − u¨ w √ t 2 3/2 2,t−1 m2 {1 + (w1,t−1 − bt−1w2,t−1) } t=T1+1   T 0+m  √1 P 1 2 t Op 0 φ |u¨t| if |φ| < 1,  m2 t=T1 +1  1/2+κ  = T 0+m m √1 P 1 2 2 Op 0 |u¨t| 3(1/2−κ) if φ = 1,  m2 t=T1 +1 t

= op(1), (31) where bt−1 ∈ [0, 1] may depend onx ¨t−1 and w1,t−1, and κ > 0 is sufficiently small. From (30) and (31),

T1+m2 ( 2 ¨ −1/2 X ¨ 0 d N (0, σ0,2Σu) if |φ| < 1, m2 Ht(β2 ) → N (0, Σ¨ u) if φ = 1, t=T1+1 and T1+m2 ( 2 ¨ −1 X ¨ 0 2 p σ0,2Σu if |φ| < 1, m2 [Ht(β2 )] → Σ¨ u if φ = 1. t=T1+1 It completes the proof equipped with the standard arguments in the proof of the empirical likelihood method. 

31 Proof of Theorem 4. Here we only prove the case in which the intercept term α is unknown and the predictor variable xt exhibits a slope change concurrent with a level shift. Moreover, we assume 0 that the true break date T1 is known because in the literature the consistency of the estimate of the break fraction has not been established with xt having structural changes. Note again that it is unnecessary to estimate the date of a structural change, Tb, in the predictor variable. 0 We consider the case where T1 6= Tb. Otherwise, it is trivial to derive asymptotic results. 0 0 0 Without loss of generality, we assume that Tb < m ≡ [T /2] < T , and define y˜t = yt − y 0 , 1 1 1 t+m1 0 x˜t = xt − x 0 ,u ˜t = ut − u 0 , andv ˜t = vt − v 0 for t = 1, 2, . . . , m . For t = 1, 2, ··· ,Tb, t+m1 t+m1 t+m1 1

t X 1 − φt x = φt−jv + 1 θ + φt x , t 1 j 1 − φ 1 1 0 j=1 1 and

0 0 t+m1 t+m −Tb 0 1 − φ 1 0 X t+m1−j b t+m1−Tb 0 xt+m = φb vj + θb + φb xTb 1 1 − φb j=Tb+1 0 m1 t Tb 0 0 X t+m1−j X t−j t+m1−Tb X Tb−j = φ vj + φ v 0 + φ φ vj b b j+m1 b 1 j=Tb+1 j=1 j=1 0 T t+m −Tb 0 b 1 − φ 1 0 t+m1−Tb 1 − φ1 b t+m1−Tb Tb + φb θ1 + θb + φb φ1 x0. 1 − φ1 1 − φb Hence, we have

x˜t ≡ xt − x 0 t+m1 t t T t 0 b X t−j X t−j t−j 1 − φ1 t+m1−Tb 1 − φ1 t = φ v˜j + (φ − φ )vj+m0 + θ1 − φ θ1 + φ1x0 1 1 b 1 1 − φ b 1 − φ j=1 j=1 1 1

0 0 Tb m1 t+m −Tb 0 0 0 1 − φ 1 t+m1−Tb Tb t+m1−Tb X Tb−j X t+m1−j b − φb φ1 x0 − φb φ1 vj − φb vj − θb 1 − φb j=1 j=Tb+1

= w1,t − w2,t,

t 0 Tb Pt t−j Pt t−j t−j 1−φ1 t+m1−Tb 1−φ1 t where w1,t ≡ φ v˜j and w2,t ≡ (φ −φ )v 0 − θ1 +φ θ1 −φ x0 + j=1 1 j=1 b 1 j+m1 1−φ1 b 1−φ1 1 0 0 0 0 0 t+m1−Tb t+m1−Tb Tb t+m1−Tb PTb Tb−j Pm1 t+m1−j 1−φb φ φ x0 + φ φ vj + φ vj + θb. b 1 b j=1 1 j=Tb+1 b 1−φb 0 On the other hand, for t = Tb + 1, ··· , m1,

t 1 − φt−Tb X t−j b t−Tb xt = φb vj + θb + φb xTb , 1 − φb j=Tb+1 and

0 0 m1 t t+m −Tb 0 1 − φ 1 0 X t+m1−j X t−j b t+m1−Tb 0 0 xt+m = φb vj + φb vj+m + θb + φb xTb , 1 1 1 − φb j=Tb+1 j=1

32 T 1−φ b where x = 1 θ + PTb φTb−jv + φTb x . Then, we have Tb 1−φ1 1 j=1 1 j 1 0

0 0 t Tb m1 t−T t+m −Tb 0 φ b − φ 1 0 X t−j X t−j X t+m1−j b b t−m1−Tb t−Tb 0 x˜t = φb v˜j − φb vj+m − φb vj − θb − (φb − φb )xTb 1 1 − φb j=Tb+1 j=1 j=Tb+1

= w1,t − w2,t,

0 0 0 t−Tb t+m1−Tb Pt t−j PTb t−j Pm1 t+m1−j φb −φb where w1,t ≡ φ v˜j and w2,t ≡ φ v 0 + φ vj + θb + j=Tb+1 b j=1 b j+m1 j=Tb+1 b 1−φb 0 t−m1−Tb t−Tb 0 0 (φb + φb )xTb . In the T1 -subsample,y ˜t = β1 x˜t−1 +u ˜t for t = 1,...,T1 . From the argument previously, it follows that

m0 m0 1 2  Tb  Pt t−j 2  1  Pt t−j 2  X w1,t−1 X ( j=1 φ1 v˜j) X ( j=T +1 φb v˜j) p (m0)−1 = (m0)−1 + b → ξ, 1 2 1 Pt t−j 2 Pt t−j 2 1 + w1,t−1 1 + ( φ v˜j) 1 + ( φ v˜j) t=1 t=1 j=1 1 t=Tb+1 j=Tb+1 b where a constant ξ depends on the values of φ1 and φb (see (22) and (24)). Hence, we have

m0 m0 1  u˜2w2  1 w2 0 −1 X t 1,t−1 0 −1 X 1,t−1 2 p ˜ (m1) E Ft−1 = (m1) E[˜ut |Ft−1] → ξ Σu 1 + w2 1 + w2 t=1 1,t−1 t=1 1,t−1 2 ˜ since E[˜ut |Ft−1] ≡ Σu < ∞. Moreover, for any c > 0,

m0 m0 1  2 2  1  2+q2  0 −1 X u˜t w1,t−1 1 0 −1 X u˜tw1,t−1 (m1) E 2 1 u˜2w2 Ft−1 ≤ (m1) E Ft−1 1 + w t 1,t−1 2 p 0 q2 q 2 1,t−1 2 >c m1 (c m1) 1 + w t=1 1+w1,t−1 t=1 1,t−1 m0 2+q2 1 2+q2 E|u˜1| 0 −1 X |w1,t−1| p = (m1) 2 → 0. p 0 q2 (1 + w )(2+q2)/2 (c m1) t=1 1,t−1 By Corollary 3.1 of Hall and Heyde (1980), we have

0 m1 0 −1/2 X u˜tw1,t−1 d ˜ (m1) → N (0, ξ Σu). (32) q 2 t=1 1 + w1,t−1

It also follows from Taylor series expansion that

0   0 m1 m1 1 X u˜tx˜t−1 u˜tw1,t−1 1 X 1 − = − u˜ w p 0 q q  p 0 t 2 3/2 2,t−1 m 2 2 m {1 + (at−1x˜t−1 + (1 − at−1)w1,t−1) } 1 t=1 1 +x ˜t−1 1 + w1,t−1 1 t=1 m0 1 X1 1 = − u˜ w p 0 t {1 + (w − a w )2}3/2 2,t−1 m1 t=1 1,t−1 t−1 2,t−1

= op(1), (33) where at−1 ∈ [0, 1] may depend onx ˜t−1 and w1,t−1. From (32) and (33), we have

0 m1 0 −1/2 X ˜ 0 d ˜ (m1) Ht(β1 ) → N (0, ξ Σu). (34) t=1

33 In a similar way, we can show that

0 m1 0 −1 X ˜ 0 2 p ˜ (m1) [Ht(β1 )] → ξ Σu. (35) t=1 The rest follows from the standard arguments in the proof of the empirical likelihood method. 0 0 0 For completeness, we consider the case in which m1 < Tb < T1 . For t = 1, 2, ··· ,Tb − m1, 1−φt x = Pt φt−jv + 1 θ + φt x , and t j=1 1 j 1−φ1 1 1 0

0 0 t+m1 t+m 0 1 0 X t+m1−j 1 − φ1 t+m1 xt+m0 = φ vj + θ1 + φ x0 1 1 1 − φ 1 j=1 1

0 0 m1 t t+m 0 t t 1 0 X t+m1−j X t−j 1 − φ1 φ1 − φ1 t t+m1 t = φ vj + φ vj+m0 + θ1 + θ1 + φ1x0 + (φ − φ1)x0. 1 1 1 1 − φ 1 − φ 1 j=1 j=1 1 1

Hence, we have

0 0 t m1 t+m 0 t 1 0 X t−j X t+m1−j φ1 − φ1 t+m1 t x˜t = φ v˜j − φ vj+m0 − θ1 − (φ − φ1)x0 = w1,t − w2,t, 1 1 1 1 − φ 1 j=1 j=1 1

0 0 0 t t+m1 0 Pt t−j Pm1 t+m1−j φ1−φ1 t+m1 t where w1,t ≡ φ v˜j and w2,t ≡ φ v 0 + θ1 + (φ − φ )x0. j=1 1 j=1 1 j+m1 1−φ1 1 1 1−φt On the other hand, for t = T − m0 + 1, ··· , m0, x = Pt φt−jv + 1 θ + φt x , and b 1 1 t j=1 1 j 1−φ1 1 1 0

0 0 m1 t t t+m −Tb 0 1 − φ 1 0 X t+m1−j X t−j t−j X t−j b t+m1−Tb 0 0 0 xt+m = φb vj + (φb − φ1 )vj+m + φ1 vj+m + θb + φb xTb , 1 1 1 1 − φb j=Tb+1 j=1 j=1

T 1−φ b where x = 1 θ + PTb φTb−jv + φTb x . Then, we have Tb 1−φ1 1 j=1 1 j 1 0

0 t m1 t 0 X t−j X t+m1−j X t−j t−j x˜t = φ v˜j − φ vj − (φ − φ )v 0 1 b b 1 j+m1 j=1 j=Tb+1 j=1 0 t+m −Tb t 1 − φ 1 0 1 − φ1 t b t+m1−Tb + θ1 + φ1x0 − θb − φb xTb 1 − φ1 1 − φb = w1,t − w2,t,

0 0 t Pt t−j Pm1 t+m1−j Pt t−j t−j 1−φ1 where w1,t ≡ φ v˜j and w2,t ≡ φ vj + (φ − φ )v 0 − θ1 − j=Tb+1 b j=Tb+1 b j=1 b 1 j+m1 1−φ1 0 t+m1−Tb 0 t 1−φb t+m1−Tb φ x0 + θb + φ xT . Although both w1,t and w2,t are newly defined, it is straight- 1 1−φb b b forward to show that asymptotic results (34) and (35) still hold. Lastly, given the symmetric nature of the problem, it is straightforward to show that the 0 aforementioned result remains valid for the case in which the true break date T1 is known and 0 T1 < Tb, hence the proof is omitted. 

34 Table 1: Finite sample sizes with normally distributed errors

(a) No structural change (b) Change in mean only (c) Change in mean & slope λ0 = 0.5 λ0 = 0.7 λ0 = 0.5 λ0 = 0.7 λ0 = 0.5 λ0 = 0.7 EL1 EL2 EL1 EL2 EL1 EL2 EL1 EL2 EL1 EL2 EL1 EL2

T = 100 (0.9, 0) 0.066 0.074 0.081 0.100 0.050 0.067 0.058 0.083 0.052 0.067 0.058 0.084 (0.99, 0) 0.057 0.068 0.065 0.093 0.055 0.058 0.059 0.079 0.056 0.058 0.064 0.079 (φ, ρ) (1, 0) 0.056 0.069 0.067 0.088 0.057 0.065 0.066 0.076 0.058 0.065 0.058 0.078 (0.9, -0.95) 0.055 0.070 0.074 0.092 0.056 0.072 0.060 0.069 0.056 0.071 0.060 0.070 (0.99, -0.95) 0.060 0.076 0.069 0.109 0.057 0.068 0.066 0.078 0.056 0.068 0.067 0.078 (1, -0.95) 0.059 0.083 0.068 0.111 0.051 0.059 0.056 0.079 0.052 0.059 0.056 0.079 T = 250 (0.9, 0) 0.059 0.064 0.065 0.070 0.052 0.053 0.050 0.060 0.048 0.058 0.056 0.059 (0.99, 0) 0.050 0.051 0.060 0.060 0.053 0.054 0.052 0.058 0.056 0.051 0.053 0.055 (φ, ρ) (1, 0) 0.048 0.054 0.050 0.061 0.053 0.054 0.053 0.054 0.051 0.052 0.048 0.050 (0.9, -0.95) 0.051 0.055 0.054 0.061 0.053 0.053 0.054 0.054 0.056 0.058 0.054 0.051 (0.99, -0.95) 0.053 0.050 0.056 0.075 0.049 0.053 0.056 0.069 0.047 0.060 0.051 0.064 (1, -0.95) 0.056 0.068 0.057 0.079 0.053 0.052 0.053 0.060 0.052 0.055 0.052 0.055 T = 500 (0.9, 0) 0.053 0.058 0.067 0.062 0.048 0.055 0.056 0.056 0.052 0.047 0.052 0.054 (0.99, 0) 0.049 0.053 0.058 0.058 0.052 0.052 0.050 0.049 0.050 0.049 0.052 0.052 (φ, ρ) (1, 0) 0.046 0.058 0.047 0.053 0.051 0.057 0.052 0.053 0.048 0.043 0 051 0.052 (0.9, -0.95) 0.045 0.053 0.046 0.051 0.051 0.054 0.047 0.058 0.051 0.052 0.051 0.048 (0.99, -0.95) 0.053 0.037 0.054 0.049 0.047 0.054 0.048 0.059 0.050 0.062 0.048 0.065 (1, -0.95) 0.058 0.064 0.057 0.069 0.051 0.053 0.052 0.052 0.044 0.049 0.044 0.053 T = 1000 (0.9, 0) 0.055 0.049 0.056 0.052 0.048 0.049 0.052 0.049 0.050 0.053 0.046 0.053 (0.99, 0) 0.051 0.053 0.056 0.053 0.052 0.049 0.050 0.058 0.049 0.050 0.047 0.056 (φ, ρ) (1, 0) 0.051 0.050 0.047 0.053 0.052 0.051 0.056 0.053 0.050 0.053 0.049 0.052 (0.9, -0.95) 0.049 0.056 0.051 0.057 0.054 0.052 0.053 0.057 0.050 0.053 0.055 0.048 (0.99, -0.95) 0.050 0.041 0.049 0.040 0.050 0.049 0.050 0.038 0.048 0.056 0 048 0.043 (1, -0.95) 0.051 0.053 0.050 0.062 0.047 0.052 0.052 0.054 0.056 0.052 0.053 0.048

Note: The model is yt = β1xt−1 1t≤[λ0T ] + ut, where (a) xt = φ xt−1 + vt, (b) xt = −0.8 · 1t>[0.5T ] + φ xt−1 + vt, (c) −1/2 xt = 0.5 xt−11t≤[0.5T ] + (−0.8 + φ xt−1) · 1t>[0.5T ] + vt, and β1 = d T . The magnitude of the break is set to d = 8, hence β1 = 0.8, 0.51, 0.36, 0.25 with T = 100, 250, 500, 1000, respectively. For the noise components, vt ∼ i.i.d. N (0, 1), and p 2 ut = ρ vt + 1 − ρ ηt with ηt ∼ i.i.d. N (0, 1). Rejection frequencies are reported for testing H0 : β2 = 0 against H1 : β2 6= 0 with level 5% for the proposed empirical likelihood test in (9) with known α1 and α2 (EL1) and for the proposed empirical likelihood test in (13) with unknown α1 and α2 (EL2).

35 Table 2: Finite sample sizes with t(3) errors

(a) No structural change (b) Change in mean only (c) Change in mean & slope λ0 = 0.5 λ0 = 0.7 λ0 = 0.5 λ0 = 0.7 λ0 = 0.5 λ0 = 0.7 EL1 EL2 EL1 EL2 EL1 EL2 EL1 EL2 EL1 EL2 EL1 EL2

T = 100 (0.9, 0) 0.083 0.087 0.111 0.126 0.073 0.080 0.073 0.068 0.073 0.074 0.082 0.099 (0.99, 0) 0.073 0.086 0.085 0.110 0.079 0.080 0.077 0.084 0.079 0.077 0.088 0.100 (φ, ρ) (1, 0) 0.072 0.076 0.082 0.110 0.076 0.078 0.063 0.070 0.074 0.078 0.084 0.099 (0.9, -0.95) 0.075 0.077 0.104 0.116 0.073 0.068 0.088 0.070 0.074 0.069 0.088 0.070 (0.99, -0.95) 0.080 0.094 0.093 0.124 0.077 0.084 0.078 0.104 0.077 0.083 0.078 0.108 (1, -0.95) 0.073 0.098 0.094 0.131 0.063 0.070 0.073 0.103 0.065 0.070 0.071 0.103 T = 250 (0.9, 0) 0.073 0.069 0.082 0.082 0.061 0.070 0.066 0.076 0.061 0.070 0.065 0.064 (0.99, 0) 0.058 0.065 0.061 0.076 0.063 0.062 0.065 0.072 0.061 0.062 0.055 0.072 (φ, ρ) (1, 0) 0.060 0.062 0.066 0.072 0.062 0.067 0.064 0.074 0.061 0.067 0.059 0.070 (0.9, -0.95) 0.068 0.069 0.073 0.076 0.061 0.065 0.065 0.066 0.065 0.064 0.065 0.066 (0.99, -0.95) 0.062 0.060 0.065 0.076 0.055 0.072 0.056 0.070 0.055 0.072 0.056 0.069 (1, -0.95) 0.062 0.074 0.070 0.088 0.060 0.070 0.061 0.073 0.059 0.070 0.062 0.073 T = 500 (0.9, 0) 0.064 0.059 0.071 0.070 0.058 0.060 0.061 0.061 0.059 0.061 0.061 0.061 (0.99, 0) 0.057 0.059 0.059 0.062 0.062 0.062 0.066 0.069 0.060 0.063 0.066 0.067 (φ, ρ) (1, 0) 0.061 0.054 0.062 0.073 0.056 0.062 0.059 0.063 0.056 0.062 0.058 0.063 (0.9, -0.95) 0.062 0.056 0.062 0.064 0.058 0.057 0.062 0.072 0.059 0.056 0.062 0.072 (0.99, -0.95) 0.055 0.045 0.061 0.054 0.053 0.063 0.058 0.062 0.050 0.065 0.058 0.062 (1, -0.95) 0.059 0.062 0.065 0.072 0.054 0.057 0.056 0.065 0.062 0.057 0.056 0.065 T = 1000 (0.9, 0) 0.066 0.065 0.064 0.063 0.056 0.055 0.057 0.055 0.056 0.054 0.057 0.055 (0.99, 0) 0.054 0.055 0.057 0.061 0.058 0.055 0.057 0.060 0.058 0.056 0.057 0.060 (φ, ρ) (1, 0) 0.051 0.054 0.057 0.067 0.052 0.059 0.053 0.055 0.051 0.059 0.054 0.055 (0.9, -0.95) 0.055 0.056 0.059 0.058 0.052 0.056 0.056 0.057 0.052 0.055 0.056 0.057 (0.99, -0.95) 0.053 0.037 0.057 0.047 0.059 0.057 0.059 0.045 0.059 0.058 0.059 0.049 (1, -0.95) 0.054 0.061 0.058 0.063 0.055 0.057 0.051 0.056 0.054 0.057 0.051 0.056

2 1/2 Note: For the noise components, vt ∼ i.i.d. t(3), and ut = ρ vt + (1 − ρ ) ηt with ηt ∼ i.i.d. t(3). See notes to Table 1.

36 Table 3: Finite sample sizes with GARCH(1,1) errors

(a) No structural change (b) Change in mean only (c) Change in mean & slope λ0 = 0.5 λ0 = 0.7 λ0 = 0.5 λ0 = 0.7 λ0 = 0.5 λ0 = 0.7 EL1 EL2 EL1 EL2 EL1 EL2 EL1 EL2 EL1 EL2 EL1 EL2

T = 100 (0.9, 0) 0.072 0.074 0.084 0.108 0.054 0.065 0.059 0.079 0.054 0.064 0.059 0.080 (0.99, 0) 0.055 0.071 0.065 0.099 0.053 0.057 0.058 0.075 0.055 0.057 0.058 0.073 (φ, ρ) (1, 0) 0.058 0.070 0.065 0.086 0.059 0.069 0.067 0.077 0.057 0.069 0.064 0.077 (0.9, -0.95) 0.084 0.099 0.140 0.162 0.058 0.070 0.060 0.076 0.057 0.071 0.060 0.077 (0.99, -0.95) 0.063 0.083 0.087 0.130 0.053 0.076 0.064 0.076 0.055 0.063 0.063 0.076 (1, -0.95) 0.064 0.086 0.073 0.111 0.053 0.067 0.057 0.080 0.053 0.067 0.057 0.080 T = 250 (0.9, 0) 0.058 0.058 0.063 0.074 0.056 0.075 0.053 0.059 0.051 0.054 0.053 0.060 (0.99, 0) 0.050 0.052 0.059 0.058 0.056 0.062 0.053 0.056 0.055 0.055 0.053 0.057 (φ, ρ) (1, 0) 0.048 0.056 0.052 0.064 0.050 0.057 0.052 0.050 0.052 0.055 0.052 0.050 (0.9, -0.95) 0.075 0.071 0.101 0.103 0.057 0.077 0.051 0.062 0.053 0.055 0.051 0.058 (0.99, -0.95) 0.058 0.057 0.061 0.075 0.052 0.062 0.049 0.063 0.050 0.056 0.050 0.062 (1, -0.95) 0.050 0.063 0.051 0.073 0.049 0.061 0.053 0.056 0.052 0.056 0.053 0.056 T = 500 (0.9, 0) 0.055 0.055 0.063 0.059 0.047 0.072 0.054 0.057 0.047 0.053 0.053 0.057 (0.99, 0) 0.051 0.058 0.054 0.058 0.050 0.053 0.052 0.048 0.052 0.051 0.052 0.050 (φ, ρ) (1, 0) 0.050 0.056 0.050 0.056 0.051 0.058 0.050 0.052 0.050 0.057 0.050 0.053 (0.9, -0.95) 0.069 0.067 0.099 0.093 0.050 0.062 0.048 0.059 0.046 0.050 0.048 0.060 (0.99, -0.95) 0.054 0.051 0.055 0.054 0.047 0.054 0.046 0.053 0.045 0.050 0.046 0.053 (1, -0.95) 0.051 0.063 0.055 0.069 0.053 0.056 0.054 0.051 0.053 0.053 0.054 0.051 T = 1000 (0.9, 0) 0.056 0.049 0.059 0.053 0.052 0.065 0.054 0.050 0.051 0.050 0.053 0.053 (0.99, 0) 0.050 0.054 0.056 0.051 0.049 0.056 0.048 0.058 0.048 0.053 0.049 0.058 (φ, ρ) (1, 0) 0.054 0.049 0.046 0.055 0.053 0.050 0.051 0.054 0.051 0.048 0.052 0.054 (0.9, -0.95) 0.071 0.067 0.084 0.074 0.055 0.058 0.053 0.057 0.052 0.054 0.052 0.056 (0.99, -0.95) 0.050 0.050 0.054 0.053 0.051 0.054 0.053 0.046 0.051 0.052 0.053 0.044 (1, -0.95) 0.043 0.048 0.045 0.061 0.047 0.045 0.052 0.051 0.044 0.043 0.052 0.051

2 1/2 1/2 2 Note: For the noise components, vt ∼ i.i.d. N (0, 1), and ut = ρ vt + (1 − ρ ) ηt, where ηt = ht ςt, ht = ω + α ηt + β ht−1, and ςt ∼ i.i.d. N (0, 1). The GARCH parameters are set to ω = 0.05, α = 0.1, and β = 0.85. See notes to Table 1.

37 Table 4: Comparison of finite sample sizes when there is a level shift in the predictor variable

t-test Bonf. t-test Bonf. Q-test Sup Q-test EL1 EL2

Normally distributed errors

(i) The break fraction λb = 0.5 (0.9, 0) 0.054 0.050 0.050 0.020 0.055 0.058 (0.99, 0) 0.048 0.044 0.046 0.031 0.051 0.056 (1, 0) 0.055 0.050 0.051 0.041 0.055 0.058 (φ, ρ) (0.9, -0.95) 0.105 0.102 0.102 0.038 0.054 0.056 (0.99, -0.95) 0.105 0.100 0.101 0.052 0.050 0.045 (1, -0.95) 0.105 0.099 0.100 0.059 0.054 0.040

(ii) The break fraction λb = 0.25 (0.9, 0) 0.057 0.054 0.056 0.023 0.052 0.055 (0.99, 0) 0.050 0.047 0.049 0.027 0.051 0.055 (1, 0) 0.051 0.047 0.048 0.039 0.053 0.056 (φ, ρ) (0.9, -0.95) 0.110 0.107 0.110 0.038 0.055 0.060 (0.99, -0.95) 0.090 0.086 0.094 0.038 0.052 0.055 (1, -0.95) 0.087 0.083 0.092 0.042 0.054 0.056 t(3) errors

(i) The break fraction λb = 0.5 (0.9, 0) 0.054 0.050 0.051 0.021 0.068 0.068 (0.99, 0) 0.052 0.049 0.050 0.029 0.062 0.065 (1, 0) 0.054 0.050 0.050 0.033 0.060 0.065 (φ, ρ) (0.9, -0.95) 0.112 0.108 0.111 0.035 0.066 0.066 (0.99, -0.95) 0.154 0.151 0.154 0.063 0.064 0.053 (1, -0.95) 0.153 0.149 0.150 0.064 0.063 0.046

(ii) The break fraction λb = 0.25 (0.9, 0) 0.053 0.050 0.050 0.019 0.068 0.068 (0.99, 0) 0.053 0.048 0.050 0.027 0.062 0.064 (1, 0) 0.053 0.048 0.049 0.032 0.062 0.067 (φ, ρ) (0.9, -0.95) 0.116 0.112 0.119 0.036 0.068 0.069 (0.99, -0.95) 0.121 0.117 0.121 0.048 0.066 0.067 (1, -0.95) 0.111 0.108 0.113 0.045 0.066 0.060

Note: The model is yt = ut, xt = θb1t>[λbT ] + φ xt−1 + vt, θb = −0.8, and T = 100. For the noise components, 2 1/2 ut = ρ vt + (1 − ρ ) ηt, where (vt, ηt) ∼ i.i.d. FU (02×1, diag{1, 1}). For the distribution FU , N (0, 1) and t(3) 0 0 distributions are considered. Rejection frequencies are reported for testing H0 : β = 0 against H1 : β 6= 0 with level 5%.

38 Table 5: Comparison of finite sample powers when there is a level shift in the predictor variable

t-test Bonf. t-test Bonf. Q-test Sup Q-test EL1 EL2 Normally distributed errors

(i) The break fraction λb = 0.5 (0.9, 0) 0.025 0.022 0.023 0.010 0.078 0.069 (0.99, 0) 0.003 0.002 0.003 0.002 0.226 0.150 (1, 0) 0.002 0.002 0.002 0.003 0.310 0.189 (φ, ρ) (0.9, -0.95) 0.066 0.062 0.063 0.024 0.088 0.084 (0.99, -0.95) 0.018 0.017 0.018 0.008 0.243 0.142 (1, -0.95) 0.008 0.007 0.008 0.004 0.315 0.143

(ii) The break fraction λb = 0.25 (0.9, 0) 0.025 0.024 0.024 0.010 0.094 0.065 (0.99, 0) 0.001 0.001 0.001 0.001 0.496 0.270 (1, 0) 0.000 0.000 0.000 0.000 0.661 0.421 (φ, ρ) (0.9, -0.95) 0.069 0.066 0.068 0.024 0.107 0.079 (0.99, -0.95) 0.006 0.006 0.007 0.002 0.496 0.283 (1, -0.95) 0.000 0.000 0.000 0.000 0.634 0.426 t(3) errors

(i) The break fraction λb = 0.5 (0.9, 0) 0.029 0.026 0.025 0.012 0.082 0.075 (0.99, 0) 0.011 0.010 0.010 0.006 0.165 0.110 (1, 0) 0.009 0.007 0.008 0.005 0.232 0.129 (φ, ρ) (0.9, -0.95) 0.079 0.075 0.078 0.025 0.092 0.089 (0.99, -0.95) 0.063 0.062 0.063 0.024 0.189 0.092 (1, -0.95) 0.048 0.046 0.047 0.019 0.256 0.080

(i) The break fraction λb = 0.25 (0.9, 0) 0.029 0.027 0.027 0.010 0.090 0.075 (0.99, 0) 0.007 0.007 0.008 0.004 0.276 0.159 (1, 0) 0.002 0.003 0.003 0.002 0.388 0.225 (φ, ρ) (0.9, -0.95) 0.079 0.076 0.082 0.025 0.106 0.089 (0.99, -0.95) 0.033 0.032 0.034 0.012 0.303 0.170 (1, -0.95) 0.014 0.013 0.013 0.005 0.393 0.215

−1/2 Note: The model is yt = β xt−1 + ut, xt = θb1t>Tb + φ xt−1 + vt, θb = −0.8, and β = d T where d = 0.8 and T = 100. See notes to Table 4.

39 Table 6: p-values (%) of predictability tests on the S&P 500 excess returns (1927:01 - 2005:12)

Break Date Regime 1 Regime 2

d/p 1993:10 37.61 8.97 e/p 1993:10 5.08 5.99 b/m 1954:10 30.61 39.69 tbl 1942:04 54.99 3.24 dfy 1942:04 48.27 79.91 d/e 1956:03 69.21 94.10 ntis 1965:01 20.04 16.48 tms 1942:04 65.38 19.54

Note: This table reports p-values of EL tests for the null of no predictability. The predictor variables are the log dividend-price (d/p), the log earnings-price (e/p) ratios, the book-to-market (b/m) ratio, the Treasury-bill rate (tbl), the default yield spread (dfy), the log dividend-payout (d/e) ratio, the net equity expansion (ntis), and the term spread (tms). We consider a univariate predictive regression model allowing for a structural break at some unknown dates. The estimated break date in each univariate predictive regression model appears in the column labeled “Break Date”. In each subsample, we apply the EL method to test predictability. In particular, p-values less than 10% are in bold.

Table 7: p-values (%) of predictability tests on the CRSP value-weighted index (1926:12 - 2002:12)

Break Date Regime 1 Regime 2

d/p 1940:05 3.49 69.95 e/p 1959:07 12.92 41.99

Note: The data set is obtained from Campbell and Yogo (2006). See notes in Table 6.

40 Figure 1: Null Rejection Probabilities for the Proposed EL tests

^ EL1 with T1 0 EL1 with T1 ^ EL2 with T1 0 EL2 with T1 Size 0.0 0.2 0.4 0.6 0.8 1.0

−1 −0.5 0 0.5 1

β1 Power

^ EL1 with T1 0 EL1 with T1 ^ EL2 with T1 0 EL2 with T1 0.0 0.2 0.4 0.6 0.8 1.0

−1 −0.5 0 0.5 1

β1

p 2 Note: The model is yt = β1 xt−1 + ut, xt = 0.99 xt−1 + vt, ut = ρ vt + 1 − ρ ηt, ρ = −0.95, vt, ηt ∼ i.i.d. N (0, 1), and T = 150. For sizes, we test the null of β2 = 0, while testing the null of β1 = 0 for powers.