<<

arXiv:1306.4864v1 [math.ST] 20 Jun 2013 al praht aaerchptei etn eg,Vuo [e.g., testing hypothesis parametric to approach cable hsi o h ae n a alt eettenl hypothesi null the reject to fail may one tr case, the the contains not models is likelihood this alternative of family the distribu asymptotic its that parameters. phenomena nuisance Wilks so-called the oial pia oe.Mroe,teL ttsi follow as has LR test the LR Moreover, null maximum power. the optimal that t totically hypothesis. lemma under null Neyman–Pearson the the under of from explanation explanation best best the the with native compares test LR maximum 1166–1203 3, No. DOI: 41, Vol. 2013, of Annals The lentv,kre,Pta ffiiny mohn paramete smoothing efficiency, Pitman kernel, alternative,

c aiainadtpgahcdetail. typographic 1166–1203 and 3, the pagination by No. published 41, Vol. article 2013, original Statistics the Mathematical of of reprint Institute electronic an is This nttt fMteaia Statistics Mathematical of Institute OSFNTO PRAHT OE SPECIFICATION MODEL TO APPROACH FUNCTION LOSS A .Introduction. 1. nprmti yohsstsig oee,i simplicitl is it however, testing, hypothesis parametric In e od n phrases. and words Key M 00sbetclassifications. subject 2012. 2000 October AMS revised 2012; January Received 10.1214/13-AOS1099 onl nvriyadXae nvriy n nin Univer Indiana and University, Xiamen and University Cornell χ 2 itiuinwt nw ubro ere ffedm enjo freedom, of degrees of number known a with distribution hntetu ieiodfnto saalbefrteGRte GLR ev the and for available used is ma are function no function likelihood true kernel holds the and gain ter when parameter in efficiency test smoothing This GLR what criterion. the than efficiency powerful Pitman’s more asymptotically uncertainty m is under test decision-making which to relevant functions, more alternat loss are nonparametric and and on null prop based the We between property. test discrepancies power m applicable optimal inherits the (LR) generally have ratio it not a does likelihood although test maximum that GLR parametric the show i the we nonparametric of paper, applicable advantages this generally In a procedure. is Springer] (2003) a [ Yao hn n hn [ Zhang and Zhang h eeaie ieiodrto(L)ts rpsdb Fan by proposed test (GLR) ratio likelihood generalized The ETN N T EAIEEFFICIENCY RELATIVE ITS AND TESTING olna ieSre:NnaaercadPrmti Metho Parametric and Nonparametric Series: Time Nonlinear yYnma ogadYo-i Lee Yoon-Jin and Hong Yongmiao By h ieiodrto(R rnil sagnrlyappli- generally a is principle (LR) ratio likelihood The ffiiny eeaie ieiodrtots,ls functio loss test, ratio likelihood generalized Efficiency, n.Statist. Ann. 2013 , hsrpitdffr rmteoiia in original the from differs reprint This . 62G10. in h naso Statistics of Annals The 1 29 20)1313 n a and Fan and 153–193] (2001) r. v models ive h new The . , emdl When model. ue nasymptotic an s g( ng nference ti elknown well is It sue that assumed y st. easure ini reof free is tion erroneously. s sof ms test, tter any 1989 ose en ds , ealter- he sity ] The )]. ,local n, ymp- ying 2 Y. HONG AND Y.-J. LEE

In many testing problems in practice, while the null hypothesis is well for- mulated, the alternative is vague. Over the last two decades or so, there has been a growing interest in nonparametric inference, namely, inference for hypotheses on parametric, semiparametric and nonparametric models against a nonparametric alternative. The nonparametric alternative is very useful when there is no prior information about the true model. Because the nonparametric alternative contains the true model at least for large samples, it ensures the consistency of a test. Nevertheless, there have been few generally applicable nonparametric inference principles. One naive ex- tension would be to develop a nonparametric maximum LR test similar to the parametric maximum LR test. However, the nonparametric maximum likelihood (MLE) usually does not exist, due to the well-known infinite dimensional parameter problem [Bahadur (1958), Le Cam (1990)]. Even if it exists, it may be difficult to compute, and the resulting nonpara- metric maximum LR test is not asymptotically optimal. This is because the nonparametric MLE chooses the smoothing parameter automatically, which limits the choice of the smoothing parameter and renders it impossible for the test to be optimal. Fan, Zhang and Zhang (2001) and Fan and Yao (2003) proposed a gen- eralized likelihood ratio (GLR) test by replacing the nonparametric MLE with a reasonable nonparametric estimator, attenuating the difficulty of the nonparametric maximum LR test and enhancing the flexibility of the test by allowing for a of smoothing parameters. The GLR test maintains the intuitive feature of the parametric LR test because it is based on the likelihoods of generating the observed sample under the null and alterna- tive hypotheses. It is generally applicable to various hypotheses involving a parametric, semiparametric or nonparametric null model against a non- parametric alternative. By a proper choice of the smoothing parameter, the GLR test can achieve the asymptotically optimal rate of convergence in the sense of Ingster (1993a, 1993b, 1993c) and Lepski and Spokoiny (1999). Moreover, it enjoys the appealing Wilks phenomena that its asymptotic null distribution is free of nuisance parameters and nuisance functions. The GLR test is a nonparametric inference procedure based on the empir- ical Kullback–Leibler information criterion (KLIC) between the null model and a nonparametric alternative model. This measure can capture any dis- crepancy between the null and alternative models, ensuring the consistency of the GLR test. As Fan, Zhang and Zhang (2001) and Fan and Jiang (2007) point out, it holds an advantage over many discrepancy measures such as the L2 and L∞ measures commonly used in the literature because for the latter the choices of measures and weight functions are often arbitrary, and the null distributions of the test statistics are unknown and generally depend on nuisance parameters. We note that Robinson (1991) developed a nonpara- metric KLIC test for serial independence and White [(1982), page 17] also suggested a nonparametric KLIC test for parametric likelihood models. LOSS FUNCTION APPROACH 3

The GLR test assumes that stochastic errors follows some parametric distribution which need not contain the true distribution. It is essentially a nonparametric pseudo LR test. Azzalini, Bowman and H¨ardle (1989), Azza- lini and Bowman (1990) and Cai, Fan and Yao (2000) also proposed a non- parametric pseudo-LR test for the validity of parametric regression models. In this paper, we show that despite its general nature and appealing fea- tures, the GLR test does not have the optimal power property of the classi- cal LR test. We first propose a generally applicable nonparametric inference procedure based on loss functions and show that it is asymptotically more powerful than the GLR test in terms of Pitman’s efficiency criterion. Loss functions are often used in estimation, and prediction [e.g., Zellner (1986), Phillips (1996), Weiss (1996), Christoffersen and Diebold (1997), Giacomini and White (2006)], but not in testing. A loss function compares the models under the null and alternative hypotheses by speci- fying a penalty for the discrepancy between the two models. The use of a loss function is often more relevant to decision-making under uncertainty because one can choose a loss function to mimic the objective of the deci- sion maker. In inflation forecasting, for example, central banks may have asymmetric preferences which affect their optimal policies [Peel and Nobay (1998)]. They may be more concerned with underprediction than overpre- diction of inflation rates. In financial management, regulators may be more concerned with the left-tailed distribution of portfolio returns than the rest of the distribution. In these circumstances, it is more appropriate to choose an asymmetric loss function to validate an inflation rate model and an asset return distribution model. The admissible class of loss func- tions for our approach is large, including quadratic, truncated quadratic and asymmetric linex loss functions [Varian (1975), Zellner (1986)]. They do not require any knowledge of the true likelihood, do not involve any choice of weights, and enjoy the Wilks phenomena that its asymptotic distribution is free of nuisance parameters and nuisance functions. Most importantly, the loss function test is asymptotically more powerful than the GLR test in terms of Pitman’s efficiency criterion, regardless of the choice of the smooth- ing parameter and the kernel function. This efficiency gain holds even when the true is available for the GLR test. Interestingly, all admissible loss functions are asymptotically equally efficient under a general class of local alternatives. The paper is planned as follows. Section 2 introduces the framework and the GLR principle. Section 3 proposes a class of loss function-based tests. For concreteness, we focus on specification testing for regression models, although our approach is applicable to other nonparametric testing problems. Section 4 derives the asymptotic distributions of the loss function test and the GLR test. Section 5 compares their relative efficiency under a class of local alternatives. In Section 6, a simulation study compares the 4 Y. HONG AND Y.-J. LEE performance between two competing tests in finite samples. Section 7 con- cludes the paper. All mathematical proofs are collected in an Appendix and supplementary material [Hong and Lee (2013)].

2. Generalized likelihood ratio test. Maximum LR tests are a gener- ally applicable and powerful inference method for most parametric testing problems. However, the classical LR principle implicitly assumes that the alternative model contains the true data generating process (DGP). This is not always the case in practice. To ensure that the alternative model contains the true DGP, one can use a nonparametric alternative model. Recognizing the fact that the nonparametric MLE may not exist and so cannot be a generally applicable method, Fan, Zhang and Zhang (2001) and Fan and Yao (2003) proposed the GLR principle as a generally applicable method for nonparametric inference. The idea is to compare a suitable non- parametric estimator with a restricted estimator under the null hypothesis via a LR statistic. Specifically, suppose one is interested in whether a para- metric likelihood model fθ is correctly specified for the unknown density f of the DGP, where θ is a finite-dimensional parameter. The null hypothesis of interest is (2.1) H : f = f for some θ Θ, 0 θ0 0 ∈ where Θ is a . The is (2.2) H : f = f for all θ Θ. A 6 θ ∈ In testing H0 versus HA, a nonparametric model for f can be used as an alternative, as also suggested in White [(1982), page 17]. Suppose the log-likelihood function of a random sample is lˆ(f, η), where η is a nuisance parameter. Under H0, one can obtain the MLE (θˆ0, ηˆ0) by maximizing the model likelihood lˆ(fθ, η). Under the alternative HA, given η, one can obtain a reasonable smoothed nonparametric estimator fˆη of f. The nuisance pa- rameter η can then be estimated by the profile likelihood; that is, to find η to maximize l(fˆη, η). This gives the maximum profile likelihood l(fˆηˆ, ηˆ). The GLR test statistic is then defined as

(2.3) λn = l(fˆηˆ, ηˆ) l(fˆ , ηˆ0). − θ0 This is the difference of the log-likelihoods of generating the observed sample under the alternative and null models. A large value of λn is evidence against H0 since the alternative family of nonparametric models is far more likely to generate the observed data. The GLR test does not require knowing the true likelihood. This is appeal- ing since nonparametric testing problems do not assume that the underlying distribution is known. For example, in a regression setting one usually does not know the error distribution. Here, one can estimate model parameters by using a quasi-likelihood function q(fθ, η). The resulting GLR test statistic LOSS FUNCTION APPROACH 5 is then defined as

(2.4) λn = q(fˆηˆ, ηˆ) q(fˆ ηˆ0). − θ0, The GLR approach is also applicable to the cases with unknown nuisance functions. This can arise (e.g.) when one is interested in testing whether a function has an additive form which itself is still nonparametric. In this case, one can replace f by a nonparametric estimator under the null hy- θˆ0 pothesis of additivity. Robinson (1991) considers such a case in testing serial independence. As a generally applicable nonparametric inference procedure, the GLR principle has been used to test a variety of models, including univariate regression models [Fan, Zhang and Zhang (2001)], functional coefficient re- gression models [Cai, Fan and Yao (2000)], spectral density models [Fan and Zhang (2004)], varying-coefficient partly models [Fan and Huang (2005)], additive models [Fan and Jiang (2005)], diffusion models [Fan and Zhang (2003)] and partly linear additive models [Fan and Yao (2003)]. Analogous to the classical LR test statistic which follows an asymptotic null χ2 distribution with a known number of degrees of freedom, the asymptotic 2 distribution of the GLR statistic λn is also a χ with a known large number of degrees of freedom, in the sense that rλ χ2 n ≃ µn as a sequence of constants µ and some constant r> 0; namely, n →∞ rλ µ d n − n N(0, 1), √2µn → where µn and r are free of nuisance parameters and nuisance functions, although they may depend on the methods of nonparametric estimation and smoothing parameters. Therefore, the asymptotic distribution of λn is free of nuisance parameters and nuisance functions. One can use λn to 2 make inference based on the known distribution of N(µn, 2µn) or χµn in large samples. Alternatively, one can simulate the null distribution of λn by setting nuisance parameters at any reasonable values, such as the MLEη ˆ0 or the maximum profile likelihood estimatorη ˆ in (2.3). The GLR test is powerful under a class of contiguous local alternatives, H −γ an : f = fθ0 + n gn, where γ> 0 is a constant and gn is an unspecified sequence of smooth func- tions in a large class of function space. It has been shown [Fan, Zhang and Zhang (2001)] that when a local linear smoother is used to estimate f and the bandwidth is of order n−2/9, the GLR test can detect local alternatives with rate γ = 4/9, which is optimal according to Ingster (1993a, 1993b, 1993c).

3. A loss function approach. In this paper, we will show that while the GLR test enjoys many appealing features of the classical LR test, it does not 6 Y. HONG AND Y.-J. LEE have the optimal power property of the classical LR test. We will propose a class of loss function-based tests and show that they are asymptotically more powerful than the GLR test under a class of local alternatives. Loss functions measure discrepancies between the null and alternative models and are more relevant to decision making under uncertainty, because the loss function can be chosen to mimic the objective function of the decision maker. The ad- missible loss functions include but are not restricted to quadratic, truncated quadratic and asymmetric linex loss functions. Like the GLR test, our tests are generally applicable to various nonparametric inference problems, do not involve choosing any weight function and their null asymptotic distributions do not depend on nuisance parameters and nuisance functions. For concreteness, we focus on specification testing for time series regres- sion models. Regression modeling is one of the most important statisti- cal problems, and has been exhaustively studied, particularly in the i.i.d. contexts [e.g., H¨ardle and Mammen (1993)]. Focusing on testing regression models will provide deep insight into our approach and allow us to provide primitive regularity conditions for formal results. Extension to time series contexts also allows us to expand the scope of applicability of our tests and the GLR test. We emphasize that our approach is applicable to many other nonparametric test problems, such as testing parametric density models. Suppose X ,Y Rp+1 is a stationary time series with finite second { t t}∈ moments, where Yt is a scalar, p N is the dimension of vector Xt and Xt may contain exogenous and/or lagged∈ dependent variables. Then we can write

(3.1) Yt = g0(Xt)+ εt, where g (X )= E(Y X ) and E(ε X ) = 0. The fact that E(ε X )=0 does 0 t t| t t| t t| t not imply that εt is a martingale difference sequence. In a time series { } 2 context, εt is often assumed to be i.i.d. (0,σ ) and independent of Xt [e.g., Gao and Gijbels (2008)]. This implies E(ε X )=0 but not vice versa, and t| t so it is overly restrictive from a practical point of view. For example, εt may display volatility clustering [e.g., Engle (1982)],

εt = zt ht, 2 where ht = α0 + α1εt−1. Here, we have pE(εt Xt)=0 but εt is not i.i.d. We will allow such an important feature, which| is an empirical{ } stylized fact for high- financial time series. In practice, a parametric model is often used to approximate the unknown function g0(Xt). We are interested in testing validity of a parametric model g(Xt,θ), where g( , ) has a known functional form, and θ Θ is an unknown finite dimensional· parameter.· The null hypothesis is ∈ H :Pr[g (X )= g(X ,θ )] = 1 for some θ Θ 0 0 t t 0 0 ∈ LOSS FUNCTION APPROACH 7 versus the alternative hypothesis H :Pr[g (X ) = g(X ,θ)] < 1 for all θ Θ. A 0 t 6 t ∈ An important example is a linear time series model ′ g(Xt,θ)= Xtθ. This is called linearity testing in the time series literature [Granger and Ter¨asvirta (1993)]. Under HA, there exists neglected nonlinearity in the con- ditional . For discussion on testing linearity in a time series context, see Granger and Ter¨asvirta (1993), Lee, White and Granger (1993), Hansen (1999), Hjellvik and Tjøstheim (1996) and Hong and Lee (2005). Because there are many possibilities for departures from a specific func- tional form, and practitioners usually have no information about the true alternative, it is desirable to construct a test of H0 against a nonparametric alternative, which contains the true function g ( ) and thus ensures the con- 0 · sistency of the test against HA. For this reason, the GLR test is attractive. n N Suppose we have a random sample Yt, Xt t=1 of size n . Assum- 2 { } ∈ ing that the error εt is i.i.d. N(0,σ ), we obtain the conditional quasi-log- likelihood function of Yt given Xt as follows: n n 1 (3.2) lˆ(g,σ2)= ln(2πσ2) [Y g(X )]2. − 2 − 2σ2 t − t Xt=1 Letg ˆ(x) be a consistent local smoother for g0(x). Examples ofg ˆ(x) in- clude the Nadaraya– estimator [H¨ardle (1990), Li and Racine (2007), Pagan and Ullah (1999)] and local linear estimator [Fan and Yao (2003)]. Substitutingg ˆ(Xt) into (3.2), one obtains the likelihood of generating the observed sample Y , X n under H , { t t}t=1 A n 1 (3.3) lˆ(ˆg,σ2)= ln(2πσ2) SSR , − 2 − 2σ2 1 where SSR1 is the sum of squared residuals of the nonparametric model; namely, n SSR = [Y gˆ(X )]2. 1 t − t t=1 X Maximizing the likelihood in (3.3) with respect to nuisance parameter σ2 2 −1 yieldsσ ˆ = n SSR1. Substituting this estimator in (3.3) yields the following likelihood: n n (3.4) ˆl(ˆg, σˆ2)= ln(SSR ) [1 + ln(2π/n)]. − 2 1 − 2 Using a similar argument and maximizing the model quasi-likelihood func- tion with respect to θ and σ2 simultaneously, we can obtain the parametric 8 Y. HONG AND Y.-J. LEE maximum quasi-likelihood under H0, 2 n n (3.5) lˆ(ˆgˆ , σˆ )= lnSSR0 ln[1 + ln(2π/n)], θ0 0 − 2 − 2 ˆ 2 H where (θ0, σˆ0) are the MLE under 0, and SSR0 is the sum of squared residuals of the parametric regression model, namely, n SSR = [Y g (X , θˆ )]2. 0 t − 0 t 0 Xt=1 2 Given the i.i.d. N(0,σ ) assumption for εt, θˆ0 is the estimator that minimizes SSR0. Thus, the GLR statistic is defined as

2 2 n (3.6) λn = ˆl(ˆg, σˆ ) lˆ(ˆgˆ , σˆ )= ln(SSR0 / SSR1). − θ0 0 2 2 Under the i.i.d. N(0,σ ) assumption for εt, λn is asymptotically equivalent to the F test statistic SSR SSR (3.7) F = 0 − 1 . SSR1 The latter has been proposed by Azzalini, Bowman and H¨ardle (1989), Az- zalini and Bowman (1993), Hong and White (1995) and Fan and Li (2002) in i.i.d. contexts. The asymptotic equivalence between the GLR and F tests can be seen from a Taylor series expansion of λn, n λ = F + Remainder. n 2 ·

We now propose an alternative approach to testing H0 versus HA by com- paring the null and alternative models via a loss function D : R2 R, which → measures the discrepancy between the fitted valuesg ˆ(Xt) and g(Xt, θˆ0), n (3.8) Qn = D[ˆg(Xt), g0(Xt, θˆ0)]. t=1 X Intuitively, the loss function gives a penalty whenever the parametric model overestimates or underestimates the true model. The latter is consistently estimated by a nonparametric method. A specific class of loss functions D( , ) is given by D(u, v)= d(u v), where d(z) has a unique minimum at 0,· and· is monotonically nondecreasi− ng as z increases. Suppose d( ) is twice continuously differentiable at 0 with d(0)| =| 0, d′(0) = 0 and 0 < d′′·(0) < . The condition of d′(0) = 0 implies that the first-order term in the Taylor∞ expansion of d( ) around 0 vanishes to 0 identically. This class of loss functions d( ) has been· called a generalized cost-of- in the literature [e.g.,· Pesaran and Skouras (2001), LOSS FUNCTION APPROACH 9

Granger (1999), Christoffersen and Diebold (1997), Granger and Pesaran (2000), Weiss (1996)]. The loss function is closely related to decision-based evaluation, which assesses the economic value of forecasts to a particular decision maker or group of decision makers. For example, in risk manage- ment the extreme values of portfolio returns are of particular interest to regulators, while in macroeconomic management the values of inflation or output growth, in the middle of the distribution, may be of concern to cen- tral banks. A suitable choice of loss function can mimic the objective of the decision maker. Infinitely many loss functions d( ) satisfy the aforementioned conditions, although they may have quite different· shapes. To illustrate the scope of this class of loss functions, we consider some examples. The first example of d( ) is the popular quadratic loss function · (3.9) d(z)= z2. This delivers a statistic based on the sum of squared differences between the fitted values of the null and alternative models, n (3.10) Lˆ2 = [ˆg(X ) g (X , θˆ )]2. n t − 0 t 0 t=1 X This statistic is used in Hong and White (1995) and Horowitz and Spokoiny (2001) in an i.i.d. setup. It is also closely related to the statistics proposed by H¨ardle and Mammen (1993) and Pan, Wang and Yao (2007) but different ˆ2 from their statistics, Ln in (3.10) does not involve any weighting which suffers from the undesirable feature as pointed out in Fan and Jiang (2007). A second example of d( ) is the truncated quadratic loss function · 1 z2, if z c, (3.11) d(z)= 2 | |≤ c z 1 c2, if z > c, ( | |− 2 | | where c is a prespecified constant. This loss function is used in robust M- estimation. It is expected to deliver a test robust to that may cause extreme discrepancies between two . The quadratic and truncated quadratic loss functions give equal penalty to overestimation and underestimation of same magnitude. They cannot capture asymmetric loss features that may arise in practice. For example, central banks may be more concerned with underprediction than overpre- diction of inflation rates. For another example, in providing an estimate of the market value of a property of the owner, a real estate agent’s underesti- mation and overestimation may have different consequences. If the valuation is in preparation for a future sale, underestimation may lead to the owner losing money and overestimation to market resistance [Varian (1975)]. 10 Y. HONG AND Y.-J. LEE

The above examples motivate using an asymmetric loss function for model validation. Examples of asymmetric loss functions are a class of so-called linex functions β (3.12) d(z)= [exp(αz) (1 + αz)]. α2 − For each pair of parameters (α, β), d(z) is an asymmetric loss function. Here, β is a scale factor, and α is a . The magnitude of α controls the degree of asymmetry, and the sign of α reflects the direction of asymmetry. When α< 0, d(z) increases almost exponentially if z< 0, and almost linearly if z> 0, and conversely when α> 0. Thus, for this loss function, underestimation is more costly than overestimation when α< 0, and the reverse is true when α> 0. For small values of α , d(z) is almost symmetric and not far from a quadratic loss function. Indeed| | , if α 0, the linex loss function becomes a quadratic loss function → β d(z) z2. → 2 However, when α assumes appreciable values, the linex loss function d(z) will be quite different| | from a quadratic loss function. Thus, the linex loss function can be viewed as a generalization of the quadratic loss function allowing for asymmetry. This function was first introduced by Varian (1975) for real estate assessment. Zellner (1986) employs it in the analysis of several central statistical estimation and prediction problems in a Bayesian frame- work. Granger and Pesaran (1999) also use it to evaluate density forecasts, and Christoffersen and Diebold (1997) analyze the optimal prediction prob- lem under this loss function. Figure 1 shows the shapes of the linex function for a variety of choices of (α, β). Our loss function approach is by no only applicable to regression functions. For example, in such contexts as probability density and spectral , one may compare two nonnegative density estimators, ˆ say fθˆ and f, using the Hellinger loss function (3.13) D(f , fˆ) = (1 f /fˆ)2. θˆ − θˆ q This is expected to deliver a consistent robust test for H0 of (2.1). Our ˆ approach covers this loss function as well, because when fθˆ and f are close under H0 of (2.1), we have 2 1 fˆ fˆ D(f , fˆ)= θ − + Remainder, θˆ 4 ˆ  f  where the first-order term in the Taylor expansion vanishes to 0 identically. Interestingly, our approach does not apply to the KLIC loss function (3.14) D(f , fˆ)= ln(f /fˆ), θˆ − θˆ LOSS FUNCTION APPROACH 11

β Fig. 1. d z 2 αz αz The LINEX loss function ( )= α [exp( ) − (1+ )]. which delivers the GLR λn in (2.3). This is because the Taylor expansion of (3.14) yields 2 fˆ fˆ 1 fˆ fˆ (3.15) D(f , fˆ)= θ − + θ − + Remainder, θˆ − ˆ 2 ˆ  f   f  12 Y. HONG AND Y.-J. LEE where the first-order term in the Taylor expansion does not vanish to 0 iden- tically. Hence, the first two terms in (3.15) jointly determine the asymptotic distribution of the GLR statistic λn. As will be seen below, the presence of the first-order term in the Taylor expansion of the KLIC loss function in (3.14) leads to an efficiency loss compared to our loss function approach for which the first-order term of a Taylor expansion is identically 0 under the null.

4. Asymptotic null distribution. Using a local fit with kernel K : R R and bandwidth h h(n), one could obtain a → esti- ≡ matorg ˆ( ) and compare it to the parametric model g( , θˆ ) via a loss function, · · 0 where θˆ0 is a consistent estimator for θ0 under H0. To avoid undersmooth- ing [i.e., to choose h such that the squared bias ofg ˆ( ) vanishes to 0 faster than the ofg ˆ( )], we estimate the conditional mean· of the estimated parametric residual · εˆ = Y g(X , θˆ ), t t − t 0 and compare it to a zero function E(εt Xt) = 0 (implied by H0) via a loss function criterion | n n n (4.1) Qˆ = D[m ˆ (X ), 0] = d[m ˆ (X ) 0] = d[m ˆ (X )], n h t h t − h t t=1 t=1 t=1 X X X wherem ˆ h(Xt) is a nonparametric estimator for E(εt Xt). This is essentially a bias-reduction device. It is proposed in H¨ardle and| Mammen (1993) and also used in Fan and Jiang (2007) for the GLR test. This device helps remove the bias of nonparametric estimation because there is no bias under H0 when we estimate the conditional mean of the estimated model residuals. We note that the bias-reduction device does not lead to any efficiency gain of the loss function test. The same efficiency gain of the loss function approach over the GLR approach is obtained even when we compare estimators for E(Yt Xt). In the latter case, however, more restrictive conditions on the bandwidth| h are required to ensure that the bias vanishes sufficiently fast under H0. For simplicity, we use the Nadaraya–Watson estimator n−1 n εˆ K (x X ) (4.2) mˆ (x)= t=1 t h − t , h n−1 n K (x X ) P t=1 h − t ′ ′ where Xt = (X1t,...,Xpt) , x = (x1,...,xP p) , and p K (x X )= h−p K[h−1(x X )]. h − t i − it Yi=1 We note that a local polynomial estimator could also be used, with the same asymptotic results. LOSS FUNCTION APPROACH 13

To derive the null limit distributions of the loss function test based on Qˆn in (4.1) and the GLR statistic λn in a time series context, we provide the following regularity conditions: N ′ ′ Rp+1 Assumption A.1. (i) For each n , (Yt, Xt) , t = 1,...,n , p N, is a stationary and absolutely∈ regular{ mixing∈ process with mix-} ing∈ coefficient β(j) Cρj for all j 0, where ρ (0, 1), and C (0, ); 8+δ ≤ ≥ ∈ ∈ ∞ p (ii) E Yt 0 and all x,y G, where C (0, ) ≤ 4(1+η) ∈ ∈ ∞ does not depend on j; (v) E(Xit ) C for some η (0, ), 1 i p; (vi) var(ε )= σ2 and σ2(x)= E(ε2 X =≤x) is continuous∈ on G∞. ≤ ≤ t t | t Assumption A.2. (i) For each θ Θ, g( ,θ) is a measurable func- ∈ · tion of Xt; (ii) with probability one, g(Xt, ) is twice continuously differ- entiable with respect to θ Θ, with E sup· ∂ g(X ,θ) 4+δ C and ∈ θ∈Θ0 k ∂θ t k ≤ E sup ∂ g(X ,θ) 4 C, where Θ is a small neighborhood of θ in Θ. θ∈Θ0 k ∂θ∂θ′ t k ≤ 0 0 Assumption A.3. There exists a sequence of constants θ∗ int(Θ) such n ∈ that n1/2(θˆ θ∗ )= O (1), where θ∗ = θ under H for all n 1. 0 − n p n 0 0 ≥ Assumption A.4. The kernel K : R [0, 1] is a prespecified bounded symmetric probability density which satisfies→ the Lipschitz condition. Assumption A.5. d : R R+ has a unique minimum at 0 and d(z) is monotonically nondecreasing→ as z . Furthermore, d(z) is twice contin- | |→∞ ′ 1 ′′ uously differentiable at 0 with d(0) = 0, d (0) = 0, D 2 d (0) (0, ) and d′′(z) d′′(0) C z for any z near 0. ≡ ∈ ∞ | − |≤ | | Assumptions A.1 and A.2 are conditions on the DGP. For each t, we allow (Xt,Yt) to depend on the sample size n. This facilitates local power analysis. For notational simplicity, we have suppressed the dependence of (Xt,Yt) on n. We also allow time series data with weak serial dependence. For the β-mixing condition, see, for example, Doukhan (1994). The compact support for regressor Xt is assumed in Fan, Zhang and Zhang (2001) for the GLR test to avoid the awkward problem of tackling the KLIC function. This assumption allows us to focus on essentials while maintaining a relatively simple treatment. It could be relaxed in several ways. For example, we could impose a weight function 1( X

sion structure of θˆ0 because the variation in θˆ0 does not affect the limit distribution of Qˆn. We can estimate θˆ0 and proceed as if it were equal to θ0. The replacement of θˆ0 with θ0 has no impact on the limit distribution of Qˆn. We first derive the limit distribution of the loss function test statistic.

Theorem 1 (Loss function test). Suppose Assumptions A.1–A.5 hold, −ω ˆ 2 ˆ h n for ω (0, 1/2p) and p< 4. Define qn = Qn/σˆn where Qn is given ∝ 2∈ −1 −1 n 2 H in (4.1) and σˆn = n SSR1 = n t=1[ˆεt mˆ h(Xt)] . Then (i) under 0, d − s(K)q χ2 as n , in the sense that n ≃ νn →∞ P s(K)q ν d n − n N(0, 1), √2νn −→ 2 2 4 2 where s(K) = σ a(K) σ (x) dx/[Db(K) σ (x) dx], νn = a (K) [ σ2(x) dx]2/[hpb(K) σ4(x) dx], a(K) = K2(u) du, b(K) = [ K(u +× v K v v 2 u K u Rp u R ′ ) ( ) d ] d , ( )= i=1 K(ui), = (u1, . . . , up) . R R 2 R R R (ii) Suppose in addition var(εt Xt)= σ almost surely. Then s(K)= a(K)/ 2 Q p | [Db(K)] and νn =Ωa (K)/[h b(K)], where Ω is the Lebesgue’s measure of the support of Xt. Theorem 1 shows that under (and only under) conditional homoskedas- ticity, the factors s(K) and νn do not depend on nuisance parameters and nuisance functions. In this case, the loss function test statistic qn, like the GLR statistic λn, also enjoys the Wilks phenomena that its asymptotic dis- tribution does not depend on nuisance parameters and nuisance functions. This offers great convenience in implementing the loss function test. We note that the condition on the bandwidth h is relatively mild. In particular, no undersmoothing is required. This occurs because we estimate the conditional mean of the residuals of the parametric model g(Xt,θ). If we directly compared a nonparametric estimator of E(Y X ) with g(X ,θ), t| t t we could obtain the same asymptotic distribution for qn, but under a more restrictive condition on h in order to remove the effect of the bias. For simplicity, we consider the case with p< 4. A higher dimension p for Xt could be allowed by suitably modifying factors s(K) and νn, but with more tedious expressions. 0 ˆ 2 2 −1 Theorem 1 also holds for the statistic qn = Qn/σˆn,0, whereσ ˆn,0 = n SSR0, which is expected to have better sizes than qn in finite samples under H0 0 when using asymptotic theory. However, qn may have better power than qn because SSR0 may be substantially larger than SSR1 under HA. To compare the qn and GLR tests, we have to derive the asymptotic distribution of the GLR statistic λn in a time series context, a formal result not available in the previous literature, although the GLR test has been widely applied in the time series context [Fan and Yao (2003)]. LOSS FUNCTION APPROACH 15

Theorem 2 (GLR test in time series). Suppose Assumptions A.1–A.5 −ω hold, p< 4, and h n for ω (0, 1/2p), and p< 4. Define λn as in (3.6), n ∝ ∈ 2 n 2 ˆ where SSR1 = t=1[ˆεt mˆ h(Xt)] , SSR0 = t=1 εˆt and εˆt = Yt m(Xt, θ). − d − Then (i) under H , r(K)λ χ2 as n , in the sense that P 0 n ≃ µn →∞P r(K)λ µ d n − n N(0, 1), √2µn −→ 2 2 4 2 2 where r(K)= σ c(K) σ (x) dx/[d(K) σ (x) dx], µn = [c(K) σ (x) dx] / p 4 K 1 K2 u u K u [h d(K) σ (x) dx], c(K) = (0) 2 ( ) d , d(K) = [ ( ) 1 K u v K v v R2 u K u −p R u R ′ − 2 ( + ) ( ) d ] d , ( )= i=1 K(ui), = (u1, . . ., up) . R 2 R R (ii) Suppose in addition var(εt Xt)= σ almost surely. Then r(K)= c(K)/ R 2 p | Q d(K) and µn =Ωc (K)/[h d(K)], where Ω is the Lebesgue’s measure of the support of Xt.

Theorem 2 extends the results of Fan, Zhang and Zhang (2001). We allow Xt to be a vector and allow time series data. We do not assume that the error εt is independent of Xt or the past history of Xt,Yt so conditional heteroskedasticity in a time series context is allowed.{ This is} consistent with the empirical stylized fact of volatility clustering for high frequency financial time series. We note that the proof of the asymptotic normality of the GLR test in a time series context is much more involved than in an i.i.d. context. It is interesting to observe that the Wilks phenomena do not hold under conditional heteroskedasticity because the factors r(K) and µn involve the 2 nuisance function σ (Xt) = var(εt Xt), which is unknown under H0. Condi- tional homoskedasticity is required| to ensure the Wilks phenomena. In this case, r(K) and µn are free of nuisance functions. Like the qn test, we also consider the case of p< 4. A higher dimension p could be allowed by suitably modifying factors r(K) and µn, which would depend on the unknown density f(x) of Xt and thus are not free of nuisance functions, even under conditional homoskedasticity.

5. Relative efficiency. We now compare the relative efficiency between the loss function test qn and the GLR test λn under the class of local alter- natives

(5.1) Hn(an) : g0(Xt)= g(Xt,θ0)+ anδ(Xt), where δ : R R is an unknown with E[δ4(X )] C. The → t ≤ term anδ(Xt) characterizes the departure of the model g(Xt,θ0) from the true function g0(Xt) and the rate an is the speed at which the departure vanishes to 0 as the sample size n . For notational simplicity, we have →∞ suppressed the dependence of g0(Xt) on n here. Without loss of generality, we assume that δ(Xt) is uncorrelated with Xt, namely E[δ(Xt)Xt]=0. 16 Y. HONG AND Y.-J. LEE

Theorem 3 (Local power). Suppose Assumptions A.1–A.5 hold, h −ω −1/2 −p/∝4 n for ω (0, 1/2p), and p< 4. Then (i) under Hn(an) with an = n h , we have ∈ s(K)q ν d n − n N(ψ, 1) as n , √2νn −→ →∞

2 2 4 where ψ = σ E[δ (Xt)]/ 2b(K) σ (x) dx, and s(K) and νn are as in The- 2 orem 1. Suppose in additionq var(R εt Xt) = σ almost surely. Then ψ = 2 | E[δ (Xt)]/ 2b(K)Ω. (ii) under H (a ) with a = n−1/2h−p/4, we have p n n n r(K)λ µ d n − n N(ξ, 1) as n , √2µn −→ →∞

2 2 4 where ξ = σ E[δ (Xt)]/[2 2d(K) σ (x) dx], and r(K) and µn are as in 2 Theorem 2. Suppose in additionq var(R εt Xt)= σ almost surely. Then ξ = 2 | E[δ (Xt)]/[2 2d(K)Ω].

p −2/9 −1/2 −p/4 When Xt is a scalar (i.e., p=1) and h=n , the factor an =n h = n−4/9 achieves the optimal rate in the sense of Ingster (1993a, 1993b, 1993c). Following a similar reasoning to Fan, Zhang and Zhang (2001), we can show that the qn test can also detect local alternatives with the optimal rate n−2k/(4k+p) in the sense of Ingster (1993a, 1993b, 1993c), for the function space = δ L2 : δ(k)(x)2 dx C . For p =1 and k = 2, this is achieved Fk { ∈ ≤ } by setting h = n−2/9. R It is interesting to note that the noncentrality parameter ψ of the qn test is independent of the curvature parameter D = d′′(0)/2 of the loss func- tion d( ). This implies that all loss functions satisfying Assumption A.5 are · asymptotically equally efficient under Hn(an) in terms of Pitman’s efficiency criterion [Pitman (1979), Chapter 7], although their shapes may be different. While the qn and λn tests achieve the same optimal rate of convergence in the sense of Ingster (1993a, 1993b, 1993c), Theorem 4 below shows that under the same set of regularity conditions [including the same bandwidth h and the same kernel K( ) for both tests], q is asymptotically more efficient · n than λn under Hn(an).

Theorem 4 (Relative efficiency). Suppose Assumptions A.1–A.5 hold, h n−ω for ω (0, 1/2p) and p< 4. Then Pitman’s relative efficiency of the ∝ ∈ −1/2 −p/4 qn test over the GLR λn test under Hn(an) with an = n h is given by [2K(u) K(u + v)K(v) dv]2 du 1/(2−pω) (5.2) ARE(q : λ )= − , n n [ K(u + v)K(v) dv]2 du R R  R R LOSS FUNCTION APPROACH 17

K u p u ′ where ( )= i=1 K(ui), = (u1, . . ., up) . The asymptotic relative effi- ciency ARE(qn : λn) is larger than 1 for any kernel satisfying Assumption A.4 and the conditionQ of K( ) 1. · ≤ Theorem 4 holds under both conditional heteroskedasticity and condi- tional homoskedasticity. It suggests that although the GLR λn test is a natural extension of the classical parametric LR test and is a generally ap- plicable nonparametric inference procedure with many appealing features, it does not have the optimal power property of the classical LR test. In particular, the GLR test is always asymptotically less efficient than the loss function test under Hn(an) whenever they use the same kernel K( ) and the same bandwidth h, including the optimal kernel and the optimal bandwidth· (if any) for the GLR test. The relative efficiency gain of the loss function test over the GLR test holds even if the GLR test λn uses the true likeli- hood function. This result is in sharp contrast to the classical LR test in a parametric setup, which is asymptotically most powerful according to the Neyman–Pearson lemma. Insight into the relative efficiency between qn and λn can be obtained by a Taylor expansion of the λn statistic,

1 SSR0 SSR1 (5.3) λn = − + Remainder, 2 (SSR1 /n) where the remainder term is an asymptotically negligible higher order term under Hn(an). This is equivalent to use of the loss function (5.4) D(g, g ) = [Y g(X ,θ)]2 [Y g(X )]2. θ t − t − t − t When g(Xt) is close to g(Xt,θ0), the first-order term in a Taylor expansion H of D(g, gθ) around gθ0 does not vanish to 0 under 0. More specifically, the asymptotic distribution of λn is determined by the dominant term, n n 1 1 [SSR SSR ]= εˆ2 [ˆε mˆ (X )]2 2 0 − 1 2 t − t − h t "t=1 t=1 # (5.5) X X n n 1 = εˆ mˆ (X ) mˆ 2 (X ). t h t − 2 h t t=1 t=1 X X The first term in (5.5) corresponds to the first-order term of a Taylor ex- pansion of (5.4). It is a second-order V -statistic [Serfling (1980)], and after demeaning, it can be approximated as a second-order degenerate U-statistic. The second term in (5.5) corresponds to the second-order term of a Taylor expansion of (5.4). It is a third-order V -statistic and can be approximated by a second-order degenerate U-statistic after demeaning. These two degen- erate U-statistics are of the same order of magnitude and jointly determine 18 Y. HONG AND Y.-J. LEE

Table 1 Asymptotic relative efficiency of the loss function test over the GLR test

Uniform Epanechnikov Biweight Triweight

1 3 2 15 2 2 35 2 3 K(u) 2 1(|u|≤ 1) 4 [1 − u ]1(|u|≤ 1) 16 [1 − u ] 1(|u|≤ 1) 32 [1 − u ] 1(|u|≤ 1) ARE1 2.80 2.04 1.99 1.98 ARE2 2.84 2.06 2.01 1.99

Note: ARE denotes Pitman’s asymptotic relative efficiency of the loss function qn test to −1/5 −2/9 the GLR λn test. ARE1 is for h = cn and ARE2 is for h = cn , for 0

6. Monte Carlo evidence. We now compare the finite sample perfor- mance of the loss function qn test and the GLR λn test. To examine the sizes of the tests, we consider the following null linear regression model in a time series context: 20 Y. HONG AND Y.-J. LEE

DGP 0 (Linear regression).

Yt =1+ Xt + εt, X = 0.5X + v ,  t t−1 t v i.i.d.N(0, 1).  t ∼ Here, Xt is truncated within its two standard deviations. To examine ro- bustness of the tests, we consider a variety of distributions for the error εt: (i) εt i.i.d.N(0, 1), (ii) εt i.i.d. Student-t5, (iii) εt i.i.d.U[0, 1], ∼ ∼ 2 ∼ (iv) εt i.i.d. ln N(0, 1) and (v) εt i.i.d. χ1, where the εt in (iii)–(v) have been scaled∼ to have mean 0 and variance∼ 1.

Because the asymptotic normal approximation for the qn and λn tests might not perform well in finite samples, we also use a conditional bootstrap procedure based on the Wilks phenomena: Step 1: Obtain the parameter estimator θˆ0 (e.g., OLS) of the null linear regression model, and the nonparametric estimatorg ˆ(Xt). Step 2: Compute the qn statistic and the residualε ˆt = Yt gˆ(Xt) from the nonparametric model. − ∗ Step 3: Conditionally on each Xt, draw a bootstrap error εt from the ∗ ′ ˆ∗ ∗ centered empirical distribution ofε ˆt and compute Yt = Xtθ0 +ε ˆt . This ∗ n forms a conditional bootstrap sample Xt,Yt t=1. { } ∗ n Step 4: Use the conditional bootstrap sample Xt,Yt t=1 to compute a ∗ { } bootstrap statistic qn, using the same kernel K( ) and the same bandwidth h as in step 2. · Step 5: Repeat steps 3 and 4 for a total of B times, where B is a large ∗ B number. We then obtain a collection of bootstrap test statistics, qnl l=1. ∗ −1 B 1 { ∗ } Step 6: Compute the bootstrap P value P = B l=1 (qn

DGP 1 (Quadratic regression). 2 Yt =1+ Xt + θXt + εt.

DGP 2 (Threshold regression). Y =1+ X 1(X > 0) + (1 + θ)X 1(X 0) + ε . t t t t t ≤ t DGP 3 (Smooth transition regression). Y =1+ X + [1 θF (X )]X + ε , t t − t t t where F (X ) = [1 + exp( X )]−1. t − t We consider various values for θ in each DGP to examine how the power of the tests changes as the value of θ changes. To examine sensitivity of all tests to the choices of h, we consider h = −ω 2 1 SX n for ω = 9 and 5 , respectively, where SX is the sample standard n of Xt t=1. These correspond to the optimal rate of convergence in the sense of Ingster{ } (1993a, 1993b, 1993c) and the optimal rate of estimation in terms of mean squared errors, respectively. The results are similar. Here, −2/9 we focus our discussion on the results with h = SX n , as reported in −1/5 Tables 2–6. The results with h = SX n are reported in Tables S.1–S.5 of 1 1 the supplementary material. We use the uniform kernel K(z)= 2 ( z 1) for all tests. We have also used the biweight kernel, and the results are| |≤ very similar (so, not reported here). Tables 2 and 3 report the empirical rejection rates of the tests under H0 (DGP 0) at the 10% and 5% levels, using both asymptotic and bootstrap critical values, respectively. We first examine the size of the tests using asymptotic critical values, with n = 100, 250 and 500, respectively. Table 2 22 Y. HONG AND Y.-J. LEE 8 8 8 8 5 5 5 5 2 2 2 2 1 1 1 1 9 9 9 9 ...... he 1); DGP 2 3 2 3 2 3 2 3 7 4 7 4 7 4 7 4 3 3 3 3 3 3 3 3 9 4 9 4 9 4 9 4 4 2 4 2 4 2 4 2 GLR , ...... (0 , the sum of 1 tests are based .N d . n 0 7 1 7 5 7 6 7 4 7 5 7 6 7 0 7 5 6 5 6 6 6 7 6 4 6 5 6 3 6 2 6 8 5 1 5 3 5 1 5 i q ...... i ∼ 0 n i q ε = 500 3 4 2 4 1 4 0 4 3 2 6 2 8 2 1 3 6 3 3 3 5 3 1 3 3 4 6 4 1 5 3 6 4 2 9 3 3 3 9 4 ...... n ; (v) The =1 n t } t 8 6 9 6 8 6 0 6 1 4 2 4 3 4 5 5 2 5 2 5 5 5 6 6 1 6 6 6 9 7 1 8 5 4 7 4 9 5 6 5 ...... X { is standardized by SSR n n q q 8 4 6 4 6 4 9 5 4 3 4 3 6 3 2 3 4 4 2 4 2 4 7 4 9 5 5 5 9 5 5 7 2 3 2 3 5 3 3 4 ...... 1), where DGP S.1: , . (0 2 1 5 6 5 6 5 6 5 6 7 5 7 5 7 5 7 6 0 6 0 6 0 6 0 6 6 6 6 7 6 7 6 9 1 5 1 5 1 5 1 6 .N . χ ...... d d . . i i . . i i ∼ ∼ GLR t 3 4 3 4 3 4 3 4 9 2 9 2 9 2 9 2 6 4 6 4 6 4 6 4 7 3 7 3 7 3 7 3 3 3 3 3 3 3 3 3 i ...... v ε , t errors v 5 , the sum of squared residuals of the null linear model; (iv) T + t 0 1 6 7 5 7 6 7 0 7 4 4 4 4 7 4 4 4 9 6 9 6 1 6 4 6 8 6 3 6 3 6 5 6 8 6 0 6 7 6 8 6 ...... − t X 0 n 5 , loss function-based tests; (iii) is the sample of . q 1); DGP S.5: = 250 0 n , q 6 2 7 2 8 2 2 3 4 2 4 2 3 2 0 3 5 2 4 2 6 3 9 3 8 2 9 3 9 4 6 5 6 2 1 3 6 3 3 4 X ...... n = 0 (0 S t N and X , n t log q ε 6 4 8 4 7 4 3 5 3 4 8 4 9 4 4 5 2 5 4 5 5 5 1 5 2 4 7 4 8 5 9 7 0 4 2 5 8 5 6 6 ...... Table 2 , where d + . 9 i t / . i DGP S.1: i.i.d. normal errors 2 n DGP S.3: i.i.d. uniform errors X is standardized by SSR DGP S.5: i.i.d. chi-square errors q − DGP S.2: i.i.d. Student- DGP S.4: i.i.d. log-normal errors ∼ n 0 n 0 3 1 3 6 3 8 4 0 3 i 8 3 9 3 4 4 8 4 8 4 0 4 1 5 1 4 5 4 1 4 2 6 3 4 5 4 0 4 8 5 ...... q ε X ratio test, S =1+ t = Y h 1 6 1 6 1 6 1 6 4 6 4 5 4 5 4 6 5 6 5 6 5 7 5 7 6 6 6 6 6 8 6 9 2 6 2 6 2 7 2 7 es, and ...... ]; (vi) 1]; DGP S.4: αz GLR , 4 5 4 5 4 5 4 5 4 3 4 3 4 3 4 3 2 3 2 3 2 3 2 3 5 6 5 6 5 6 5 6 3 5 3 5 3 5 3 5 ...... − [0 1 Empirical sizes of tests using asymptotic critical values . U − d ) . i ; the bandwidth . 9 7 0 7 5 7 3 7 5 6 6 6 6 6 0 6 7 6 5 6 7 6 7 6 3 8 2 8 5 8 9 8 5 7 8 7 6 7 2 7 i αz 0 n ...... q ∼ i 0 n ε [exp( q and 2 = 100 β 5 2 5 3 9 3 7 4 0 2 1 2 3 2 0 3 3 2 5 2 9 2 7 3 8 4 6 5 2 6 5 7 6 3 0 3 0 4 9 6 α n ...... q n )= z ( d 0 4 2 4 2 4 8 5 3 4 3 4 5 4 1 5 2 4 4 4 4 4 0 5 2 6 9 7 7 8 2 9 9 5 3 6 2 7 2 8 ; DGP S.3: ...... 5 t n q 7 5 1 5 4 5 4 5 8 4 7 4 9 4 5 5 1 5 1 5 4 5 0 6 4 7 9 7 4 8 4 10 6 5 1 6 8 7 8 9 ...... Student- 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% . d . i . i ∼ : (i) 1000 iterations; (ii) GLR, the generalized likelihood 0) 7 0) 8 0) 8 0) 9 0) 6 0) 6 0) 6 0) 8 0) 7 0) 7 0) 7 0) 9 0) 9 0) 9 0) 10 0) 12 0) 7 0) 8 0) 8 0) 10 ...... i ) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ε , , , , , , , , , , , , , , , , , , , , 0 2 5 0 0 2 5 0 0 2 5 0 0 2 5 0 0 2 5 0 ...... a,β Notes (0 ( (0 (0 (1 (0 (0 (0 (1 (0 (0 (0 (1 (0 (0 (0 (1 (0 (0 (0 (1 on the linex loss function: uniform kernel is used for GLR, squared residuals of the nonparametric regression estimat S.2: LOSS FUNCTION APPROACH 23 he 1 7 1 7 1 7 1 7 3 9 5 3 5 3 9 9 9 3 5 5 ...... 1); DGP , GLR 0 4 5 5 0 4 5 5 0 4 5 5 0 4 5 5 6 5 0 5 9 6 6 5 9 6 6 5 0 5 0 5 0 5 6 5 9 6 9 6 (0 , the sum of ...... 1 tests are based .N d . n i q . i 7 9 2 11 8 11 0 11 7 9 7 9 7 9 2 11 1 10 4 11 1 10 0 10 2 10 0 10 4 11 4 11 4 11 2 10 0 10 1 10 ...... ∼ i ε 0 n = 500 q 7 3 7 5 8 4 3 5 9 3 0 3 2 3 5 5 5 6 0 5 2 6 5 6 6 6 4 6 3 5 4 5 4 5 4 6 4 6 6 6 ; (v) The n ...... =1 n t } t X 7 10 0 10 6 7 1 10 7 7 4 10 7 8 7 8 6 10 4 12 0 9 4 11 0 9 3 10 3 10 6 10 0 9 0 9 1 11 1 11 { is standardized by SSR ...... n q n q 1 4 6 5 8 3 5 5 8 3 6 5 7 3 9 3 9 5 7 6 6 6 4 6 6 6 9 5 3 5 1 5 4 6 5 6 4 6 4 6 1), where DGP S.1: ...... , . (0 2 1 .N . χ d d . 9 10 9 10 9 10 6 7 9 10 6 7 6 7 6 7 0 11 0 11 5 9 5 9 5 10 5 10 5 9 5 9 5 9 0 11 5 9 0 11 . i ...... i . . i i ∼ ∼ t i v ε GLR 3 3 3 3 3 3 3 4 3 3 3 4 3 4 3 4 5 5 5 5 8 4 9 6 8 4 8 4 9 6 8 4 9 6 5 5 9 6 5 5 , ...... t v errors , the sum of squared residuals of the null linear model; (iv) T 5 + 0 t 1 − 5 9 6 9 5 9 6 9 7 9 9 9 5 11 9 9 8 11 9 9 7 10 8 10 1 10 4 10 7 10 9 10 7 10 3 11 8 10 9 11 t ...... X , loss function-based tests; (iii) 5 is the sample standard deviation of . 0 n 1); DGP S.5: 0 n q , q X = 250 = 0 (0 9 4 0 4 8 4 9 3 2 4 2 4 2 5 0 4 7 5 2 4 4 4 2 5 6 5 5 4 9 5 5 4 2 5 1 6 0 5 8 5 S ...... t n N and X , n t log q ε . Table 3 , where d + 3 8 3 9 5 8 7 9 7 10 8 11 0 9 9 10 9 11 8 10 8 11 7 9 8 10 6 9 5 9 6 11 7 9 2 12 7 11 1 11 . 9 ...... i t / . i 2 DGP S.1: i.i.d. normal errors X DGP S.3: i.i.d. uniform errors is standardized by SSR − DGP S.5: i.i.d. chi-square errors ∼ DGP S.2: i.i.d. Student- DGP S.4: i.i.d. log-normal errors n n 0 n i q q ε X ratio test, 8 4 8 4 1 4 3 3 2 4 3 5 8 5 2 4 4 5 2 4 1 5 5 4 1 5 7 4 7 4 2 5 0 4 8 6 7 5 1 6 ...... S =1+ t = Y h es, and 8 8 8 8 3 10 8 9 3 10 4 11 3 10 4 11 3 10 8 8 2 11 2 11 5 9 2 11 5 10 4 11 5 9 5 9 2 10 4 11 ...... ]; (vi) 1]; DGP S.4: αz , − [0 GLR 5 3 5 3 7 5 5 3 7 5 4 4 7 5 4 4 7 5 5 3 1 4 1 4 2 5 1 4 2 5 4 4 2 5 2 5 1 4 4 4 ...... Empirical sizes of tests using bootstrap critical values . . 1 . U − d ) . i ; the bandwidth . i αz 0 n q 4 9 2 9 5 10 8 9 9 10 4 9 8 9 3 9 4 9 5 9 3 9 6 10 5 9 0 10 6 10 1 10 9 10 4 9 3 10 2 9 ∼ ...... i ε [exp( and 0 n 2 β q α n = 100 q 8 4 7 4 7 4 5 4 7 4 7 4 1 4 0 4 2 4 2 4 1 5 8 5 3 5 6 6 0 5 9 6 0 5 7 5 0 5 6 5 ...... n )= z ( d ; DGP S.3: 5 1 8 2 8 6 9 2 8 9 9 0 8 9 9 9 10 5 9 0 10 4 9 8 10 1 10 9 10 4 10 8 10 8 11 1 10 4 10 5 10 t ...... n q 1 4 8 4 2 4 7 5 4 4 4 4 2 4 8 3 3 4 3 5 4 4 9 5 3 5 7 5 0 5 0 5 7 5 8 5 0 5 3 5 ...... Student- 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% . d . i . i : (i) 1000 iterations; (ii) GLR, the generalized likelihood ∼ 0) 9 0) 8 0) 9 0) 10 0) 8 0) 9 0) 9 0) 9 0) 10 0) 9 0) 9 0) 10 0) 10 0) 10 0) 9 0) 11 0) 10 0) 10 0) 10 0) 10 ...... i ) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ε , , , , , , , , , , , , , , , , , , , , 0 2 0 0 0 5 5 5 5 0 2 2 2 2 0 0 0 5 0 0 ...... Notes a,β (0 (0 (1 (0 (1 (0 (0 (0 (0 (1 (0 (0 (0 (0 (0 (0 (1 (0 (0 ( (1 on the linex loss function: uniform kernel is used for GLR, S.2: squared residuals of the nonparametric regression estimat 24 Y. HONG AND Y.-J. LEE

0 shows that all tests, λn,qn and qn, have reasonable sizes in finite samples, and they are robust to various error distributions, but they all show some underrejection, particularly at the 10% level. The qn and λn tests have 0 similar sizes in most cases, whereas qn shows a bit more underrejection. 0 Overall, the sizes of the qn,qn and λn tests display some underrejections in most cases in finite samples, but they are not unreasonable. Next, we examine the size of the tests based on the bootstrap. Table 3 shows that overall, the rejection rates of all tests based on the bootstrap are close to the significance levels (10% and 5%), indicating the gain of using the bootstrap in finite samples. The sizes of all tests are robust to a variety of error distributions, confirming the Wilks phenomena that the asymptotic distribution of both the qn and λn tests are distribution free. For the loss 0 function tests qn and qn, the sizes are very similar for different choices of parameters (α, β) governing the shape of the linex loss function. We note that when asymptotic critical values are used, the sizes of the tests with −2/9 −1/5 h = SX n are slightly better than with h = SX n . When bootstrap −2/9 critical values are used, however, the sizes of all tests with h = SX n and −1/5 h = SX n , respectively, are very similar. Next, we turn to the powers of the tests under HA. Since the sizes of the tests using asymptotic critical values are different in finite samples, we use the bootstrap procedure only, which delivers similar sizes close to sig- nificance levels and thus provides a fair ground for comparison. Tables 4–6 report the empirical rejection rates of the tests under DGP 1 (quadratic regression), DGP 2 (threshold regression) and DGP 3 (smooth transition 0 regression), respectively. For all DGPs, the loss function tests qn and qn are more powerful than the GLR test, confirming our asymptotic efficiency analysis. Interestingly, for the two loss function tests, qn, which is standard- 0 ized by the nonparametric SSR1, is roughly equally powerful to qn, which is standardized by the parametric SSR0, although asymptotic analysis sug- 0 H gests that qn should be more powerful than qn under A, because SSR0 is significantly larger than SSR1 under HA. Obviously, this is due to the use ∗ 0∗ of the bootstrap. Since the bootstrap statistics qn and qn are standardized ∗ ∗ ∗ ∗ by SSR0 and SSR1, respectively, where SSR1 < SSR0, the ranking between ∗ 0 0∗ qn and qn remains more or less similar to the between qn and qn , ∗ 0∗ and therefore qn and qn have similar power. Under DGP 1, the powers of 0 qn and qn increase as the degree of asymmetry of the linex loss function, 0 which is indexed by α, increases. When α = 1, the powers of qn and qn are substantially higher than the GLR test. Under DGP 2, there is some ten- 0 dency that the powers of qn and qn increase in α for θ< 0, whereas they 0 decrease in α for θ> 0. When θ is close to 0, qn and qn have similar power to the GLR test, but as θ > 0 increases, they are more powerful than the | | 0 GLR test. Similarly, under DGP 3, the powers of qn and qn increase in α for θ< 0, whereas they decrease in α for θ> 0. Nevertheless, by and large, the LOSS FUNCTION APPROACH 25 6 0 6 0 0 0 6 0 6 0 0 6 0 6 0 0 0 6 0 6 ...... is the 4 95 0 100 4 95 0 100 0 100 0 100 4 95 0 100 4 33 0 100 0 100 4 95 0 100 4 33 0 100 0 100 0 100 4 33 0 100 4 33 X GLR ...... S 0 97 0 100 2 97 0 100 0 100 0 100 4 97 0 100 8 47 0 100 0 100 4 97 0 100 0 47 0 100 0 100 0 100 6 47 0 100 6 47 is standardized ...... ]; (vi) DGP P.1, n , where q αz 9 0 n / , the sum of squared q 2 − = 500 4 99 0 100 4 99 0 100 0 100 0 100 6 99 0 100 2 50 0 100 0 100 8 99 0 100 0 52 0 100 0 100 0 100 6 53 0 100 0 56 0 ...... − 1 n n − X ) S 0 99 0 100 2 99 0 100 0 100 0 100 4 99 0 100 6 62 0 100 0 100 4 99 0 100 8 64 0 100 0 100 0 100 4 64 0 100 0 66 αz ...... = h n [exp( q 2 4 99 0 100 4 99 0 100 0 100 0 100 6 99 0 100 0 50 0 100 0 100 6 99 0 100 8 51 0 100 0 100 0 100 8 53 0 100 4 66 β ...... α )= z ( 6 99 2 100 6 99 0 100 2 100 0 100 6 99 0 100 4 62 2 100 0 100 6 99 0 100 4 63 2 100 0 100 0 100 4 64 0 100 4 66 ...... d is standardized by SSR 0 n ; the bandwidth q 4 70 8 97 4 70 0 100 8 97 0 100 4 70 0 100 4 20 8 97 0 100 4 70 0 100 4 20 8 97 0 100 0 100 4 20 0 100 4 20 0 n GLR ...... q , loss function-based tests; (iii) 0 n q and 4 80 2 98 8 80 0 100 2 98 0 100 8 80 0 100 8 29 4 98 0 100 2 80 0 100 4 29 2 98 0 100 0 100 2 29 0 100 4 29 ...... n and q n 0 n q q = 250 0 83 6 99 4 83 0 100 6 99 0 100 0 85 0 100 2 27 6 99 0 100 6 88 0 100 0 28 6 97 0 100 0 100 4 29 0 100 2 31 ...... n on estimates, and 4 91 2 99 0 91 0 100 2 99 0 100 2 92 0 100 2 39 4 99 0 100 0 92 0 100 0 40 6 99 0 100 0 100 0 41 0 100 4 45 ...... ratio test, Table 4 n DGP P.1: Quadratic regression s used for GLR, q 6 83 6 99 0 84 0 100 8 99 0 100 4 85 0 100 4 27 8 99 0 100 2 87 0 100 8 28 8 99 0 100 0 100 6 30 0 100 6 32 ...... 8 90 8 99 8 91 6 100 8 99 0 100 8 91 6 100 4 38 8 99 0 100 8 92 6 100 4 39 8 99 0 100 6 100 4 42 0 100 4 44 ...... tests are based on the linex loss function: n 1). q , 2 34 8 65 2 34 6 96 8 65 0 100 2 34 6 96 2 12 8 65 0 100 2 34 6 96 2 12 8 65 0 100 6 96 2 12 0 100 2 12 GLR ...... (0 . N Empirical powers of tests using bootstrap critical values d . i 6 45 2 76 0 45 8 98 6 76 0 100 4 45 2 98 4 20 2 76 0 100 4 45 2 98 0 20 6 76 0 100 2 98 6 20 0 100 0 20 ...... ; (v) The i 78 =1 n t 0 n }∼ } i q t = 100 6 39 2 77 8 41 4 98 0 100 0 43 4 99 0 12 6 80 0 100 2 46 4 99 6 13 4 80 0 100 6 99 2 14 0 100 2 18 ε ...... { X n { 8 53 0 85 2 55 4 99 4 86 0 100 8 59 6 99 8 23 0 86 0 100 4 62 8 99 6 23 6 88 0 100 0 99 4 24 0 100 2 26 ...... , where t ε n q + 2 39 2 77 8 41 4 98 2 78 0 100 0 43 4 98 6 12 0 80 0 100 8 46 4 98 4 13 8 81 0 100 6 99 8 14 0 100 6 16 ...... 2 t 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% θX + θ t 0.2 53 0.3 85 0.2 55 0.5 99 0.3 86 1.0 100 0.2 58 0.5 99 0.3 87 1.0 100 0.2 60 0.5 99 0.3 88 1.0 100 0.5 99 1.0 100 X , the sum of squared residuals of the nonparametric regressi 1 0) 0.1 22 0) 0.1 23 0) 0.1 24 0) 0.1 27 : (i) 1000 iterations; (ii) GLR, the generalized likelihood . . . . ) 1 1 1 1 , , , , =1+ 0 2 5 0 . . . . t a, β (0 ( (0 (0 (1 sample standard deviation of residuals of the null linear model; (iv) The uniform kernel i by SSR Notes Y 26 Y. HONG AND Y.-J. LEE 2 0 0 6 2 8 0 0 0 6 0 8 2 0 0 6 0 8 0 2 0 6 8 0 ...... he GLR 6 12 6 11 0 100 6 51 6 12 8 51 6 11 0 100 0 100 6 51 6 11 8 51 6 12 0 100 0 100 6 51 6 11 8 51 0 100 6 12 0 100 6 51 8 51 0 100 ...... , the sum of 1) 1 , tests are based (0 n q 6 18 4 18 0 100 4 62 6 18 4 64 2 18 0 100 0 100 8 62 0 18 0 64 8 18 0 100 0 100 4 62 0 18 8 64 0 100 0 18 0 100 8 62 2 64 0 100 .N ...... d . i . i 0 n q = 500 }∼ 2 15 4 14 0 100 8 69 2 15 0 69 4 14 0 100 0 100 8 69 4 14 4 69 6 15 0 100 0 100 8 71 0 13 0 66 0 100 6 17 0 100 2 73 8 65 0 100 ...... i ; (v) The n ε { =1 n t } t X 4 25 6 23 0 100 0 81 2 25 6 79 8 23 0 100 0 100 4 81 0 22 6 78 2 25 0 100 0 100 6 81 6 21 8 78 0 100 0 26 0 100 2 83 0 75 0 100 ...... { is standardized by SSR , where n t q ε n q 8 15 2 14 0 100 6 69 2 15 2 69 4 14 0 100 0 100 0 70 0 14 6 68 2 16 0 100 0 100 4 71 4 12 4 66 0 100 0 16 0 100 6 74 4 64 0 100 0) + ...... ≤ t X 1( 8 24 2 23 8 100 6 81 8 25 4 79 2 23 2 100 8 100 6 81 2 23 4 78 8 26 2 100 8 100 6 81 2 21 4 78 2 100 8 26 8 100 6 82 4 76 2 100 t ...... X ) θ GLR 4 6 6 7 0 90 2 28 2 25 4 6 6 7 6 87 0 90 2 28 2 25 6 7 6 87 4 6 0 90 2 28 6 7 2 25 6 87 0 90 4 6 2 28 2 25 6 87 ...... , the sum of squared residuals of the null linear model; (iv) T 0 0)+(1+ 8 13 4 12 4 95 8 39 0 37 2 13 8 12 6 92 8 95 8 39 2 37 0 12 6 92 0 13 0 95 4 39 8 12 0 37 0 92 8 95 8 13 0 39 2 37 0 92 ...... > t loss function-based tests; (iii) X is the sample standard deviation of 0 n 0 n q q 1( = 250 X 4 8 0 8 8 96 6 39 4 37 6 94 2 8 2 7 8 96 6 39 2 36 8 7 4 94 4 10 8 97 4 40 4 6 6 35 0 94 0 96 6 10 8 44 8 32 0 93 t ...... S n X and n q 8 16 0 15 2 98 2 52 8 50 6 97 4 17 6 14 8 98 2 53 6 49 8 13 2 97 4 18 8 98 2 56 4 12 2 48 0 97 8 99 0 19 0 57 6 46 8 96 =1+ ...... Table 5 DGP P.2: Threshold regression , where t 9 Y / n 2 is standardized by SSR q − 8 96 0 8 6 8 8 38 4 36 6 94 0 96 6 8 0 7 0 40 0 36 2 94 0 6 0 96 0 9 0 41 8 35 2 6 8 94 0 96 6 11 6 44 2 31 8 92 n 0 n ...... atio test, q X S = 4 98 8 17 8 14 6 51 8 50 4 97 4 99 8 17 8 14 6 53 8 49 4 97 8 13 4 99 8 18 6 55 8 48 8 12 4 96 4 99 8 18 6 57 8 47 4 95 h ...... es, and ]; (vi) DGP P.2, GLR 8 46 0 7 2 15 6 5 0 11 2 43 8 46 0 7 2 15 6 5 0 11 2 43 8 46 6 5 0 7 2 15 0 11 2 43 6 5 8 46 0 7 2 15 0 11 2 43 αz ...... − 1 Empirical powers of tests using bootstrap critical values − 4 60 2 13 4 24 4 12 4 21 6 60 2 60 2 13 6 24 0 12 8 21 4 60 0 60 8 12 2 13 8 24 0 21 8 60 0 12 2 60 2 13 0 24 6 21 2 60 ) ...... ; the bandwidth αz 0 n q 0 n q = 100 2 61 4 18 4 7 4 6 2 14 2 58 8 62 8 18 8 7 2 6 8 12 4 56 8 63 2 19 6 5 0 8 0 12 8 52 8 64 4 6 4 22 8 8 4 9 2 46 ...... [exp( n and 2 β α n q 2 73 0 28 2 13 8 11 6 25 0 72 6 74 2 29 4 13 2 11 8 23 8 70 2 76 2 31 8 14 0 9 4 21 6 67 6 77 0 9 4 32 0 14 0 18 2 62 )= ...... z ( d n q 0 60 0 18 4 7 0 5 0 15 4 59 0 61 4 18 0 7 8 6 0 13 2 56 2 62 4 19 0 7 4 6 4 12 8 54 0 63 2 21 2 6 0 8 2 10 8 48 ...... 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 0 73 5 28 2 13 2 11 5 25 0 71 0 74 5 29 2 14 2 10 5 24 0 70 0 76 5 30 2 14 2 9 5 21 0 68 0 77 5 31 2 9 2 14 5 19 0 63 ...... θ 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 − − − − − − − − − − − − : (i) 1000 iterations; (ii) GLR the generalized likelihood r 0) 0) 0) 0) . . . . ) 1 1 1 1 , , , , 0 2 5 0 . . . . Notes a,β (0 ( (0 (0 (1 on the linex loss function: uniform kernel is used for GLR, squared residuals of the nonparametric regression estimat LOSS FUNCTION APPROACH 27 , 0 0 0 0 0 0 0 0 0 2 0 0 2 8 0 2 8 2 8 8 ...... ) ...... t X ; (v) , the 1 − =1 1 n t } 0 100 0 100 4 96 0 100 4 96 8 35 0 100 4 96 8 35 6 35 4 96 8 35 6 35 8 95 8 35 6 35 8 95 6 35 8 95 8 95 GLR t ...... X 1+exp( { )= t 0 100 0 100 8 98 0 100 4 98 2 46 0 100 6 98 0 46 2 47 6 98 2 46 8 47 2 97 6 46 0 47 2 97 8 47 2 97 2 97 ...... X ( F , 0 n t q ε = 500 0 100 0 100 2 97 0 100 4 98 6 45 0 100 4 98 8 48 0 52 4 98 0 49 4 51 6 99 2 50 8 51 6 99 2 49 6 99 6 99 ...... + n t is standardized by SSR X ] n θ q ) t 0 100 0 100 6 99 0 100 2 99 4 56 0 100 6 99 4 57 4 64 6 99 8 59 4 61 2 99 4 60 8 60 2 99 2 60 2 99 2 99 ...... X ( F n − q 0 100 0 100 2 97 0 100 2 98 8 44 0 100 4 98 0 47 6 52 4 98 2 48 4 51 6 99 8 49 6 49 6 99 2 49 6 99 6 99 ...... + [1 t X , the sum of squared residuals of the null linear is the sample standard deviation of 0 8 100 8 100 0 99 8 100 0 99 0 56 8 100 0 99 0 58 4 64 0 99 0 59 4 62 8 99 0 60 4 61 8 99 4 61 8 99 8 99 ...... X S =1+ t Y 2 96 2 96 2 69 2 96 2 69 2 17 2 96 2 69 2 17 8 20 2 69 2 17 8 20 8 69 2 17 8 20 8 69 8 20 8 69 8 69 GLR ...... , where 9 / 2 , loss function-based tests; (iii) 6 98 0 98 8 78 0 98 8 78 6 29 2 98 4 78 2 29 6 27 8 78 2 29 2 27 8 78 2 29 2 27 6 78 4 27 6 78 2 78 ...... − 0 n q n X 0 n S q = 250 and 4 98 6 99 8 75 6 99 0 77 8 20 6 99 6 78 0 23 8 30 8 79 2 25 6 29 8 86 6 26 2 28 0 64 8 27 4 85 2 84 ...... = is standardized by SSR ]; (vi) DGP P.3, n n h q 0 n αz q − 4 99 8 99 4 82 4 99 6 85 2 31 4 99 0 85 0 34 8 43 6 86 4 35 4 40 4 91 0 35 4 40 6 86 2 38 0 90 6 90 ...... 1 Table 6 − ) n atio test, q DGP P.3: Smooth transition regression αz 4 98 4 98 0 75 4 99 0 77 0 20 6 99 8 79 6 23 6 30 4 79 4 24 4 29 0 87 2 25 4 28 2 63 8 27 6 85 2 84 ...... Empirical powers of tests timates, and [exp( ; the bandwidth 2 0 n β 6 99 6 99 8 83 6 99 8 85 6 31 6 99 8 86 6 34 8 43 8 87 6 35 8 40 6 92 6 36 8 39 6 86 8 37 6 90 6 90 α q ...... )= and z ( 6 65 6 65 0 29 6 65 0 29 6 9 6 65 0 29 6 9 8 11 0 29 6 9 8 11 8 33 6 9 8 11 8 33 8 11 8 33 8 33 n GLR d ...... q 8 75 0 75 0 44 2 75 2 44 6 16 0 75 2 44 8 16 4 20 6 44 4 16 4 20 0 43 4 16 2 20 0 43 8 20 6 43 6 43 ...... 0 n q = 100 4 64 6 72 2 29 0 75 6 33 0 7 6 77 8 36 4 8 2 16 4 37 4 10 0 14 8 46 8 11 0 14 6 44 8 13 0 42 0 41 ...... n 8 79 2 82 8 43 2 85 0 47 4 15 2 86 2 49 4 18 2 26 2 51 2 19 4 24 4 58 4 19 0 23 0 57 0 22 4 56 0 54 ...... n 1). , q 2 62 0 70 0 27 6 74 8 34 6 7 6 75 4 37 4 9 2 16 2 38 4 10 0 14 4 46 6 10 8 14 6 43 6 13 0 40 2 40 (0 ...... 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% . N d . i 5 78 5 83 0 44 5 84 0 48 5 14 5 85 0 50 5 18 5 26 0 52 5 19 5 25 0 58 5 20 5 23 0 56 5 22 0 55 0 53 ...... i θ 1 1 1 1 1 0 1 1 0 0 1 0 0 1 0 0 1 0 1 1 − − − − − − − − }∼ i ε tests are based on the linex loss function: : (i) 1000 iterations; (ii) GLR the generalized likelihood r { 0) 0) 0) 0) . . . . n ) 1 1 1 1 q , , , , 0 5 2 0 . . . . Notes a, β (1 (0 (0 (0 ( where sum of squared residuals of the nonparametric regression es The model; (iv) The uniform kernel is used for GLR, 28 Y. HONG AND Y.-J. LEE

0 powers of both qn and qn do not change much across the different choices of parameters (α, β) governing the shape of the linex loss function. Although the shape of the loss function changes dramatically when α changes from 0 0 to 1, the powers of qn and qn remain relatively robust. All tests become more powerful as the departures from linearity increases (as characterized by the value of θ in each DGP), and as the sample size n increases.

7. Conclusion. The GLR test has been proposed as a generally applica- ble method for nonparametric testing problems. It inherits many advantages of the maximum LR test for parametric models. In this paper, we have shown that despite its general nature and many appealing features, the GLR test does not have the optimal power property of the classical LR test. We pro- pose a loss function test in a time series context. The new test enjoys the same appealing features as the GLR test, but is more powerful in terms of Pitman’s asymptotic efficiency. This holds no matter what kernel and bandwidth are used, and even when the true likelihood function is available for the GLR test. The efficiency gain, together with more relevance to deci- sion making under uncertainty of using a loss function, suggests that the loss function approach can be a generally applicable and powerful nonparametric inference procedure alternative to the GLR principle.

MATHEMATICAL APPENDIX

Throughout the appendix, we letm ˜ h(x) be defined in the same way as n ˆ n mˆ h(x) in (4.2) with εt = Yt g0(Xt) t=1 replacing εˆt = Yt g(Xt, θ0) t=1. Also, C (1, ) denotes{ a generic− bounded} constant.{ This− appendix} pro- vides the∈ structure∞ of our proof strategy. We leave the detailed proofs of most technical lemmas and propositions to the supplementary material.

Proof of Theorem 1. Theorem 1 follows as a special case of Theo- rem 3(i) with δ(Xt)=0. 

Proof of Theorem 2. Theorem 2 follows as a special case of Theo- rem 3(ii) with δ(Xt)=0. 

Proof of Theorem 3(i). We shall first derive the asymptotic distri- bution of qn under Hn(an). From Lemmas A.1 and A.2 and Propositions A.1 and A.2 below, we can obtain p/2 −1 −p/2 −2 2 2 h D q h σ K (u) du σ (x) dx d n − N(ψ, 1). 2σ−4 [ K(u)K(u +Rv) du]2 dv R σ4(x) dx −→ The desiredq resultR ofR Theorem 3(i) then followsR immediately.  LOSS FUNCTION APPROACH 29

ˆ n 2 Lemma A.1. Under the conditions of Theorem 3, Qn = D t=1 mˆ h(Xt)+ o (h−p/2). p P Lemma A.2. Under the conditions of Theorem 3, n mˆ 2 (X )= n t=1 h t × mˆ 2 (x)f(x) dx + o (h−p/2). h p P R 2 Proposition A.1. Under the conditions of Theorem 3, n mˆ h(x)f(x) dx= n m˜ 2 (x)f(x) dx + h−p/2E[δ2(X )] + o (h−p/2). h t p R

RProposition A.2. Under the conditions of Theorem 3, and Hn(an) with −1/2 −p/4 an = n h ,

p/2 2 −p/2 2 4 nh m˜ h(x)f(x) dx h a(K) σ (x) dx 2b(K) σ (x) dx − s  Z Z . Z d N(ψ, 1). −→ Proof of Lemma A.1. Given in the supplementary material. 

Proof of Lemma A.2. Given in the supplementary material. 

Proof of Proposition A.1. Given in the supplementary material. 

Proof of Proposition A.2. Proposition A.2 follows from Lemmas A.3 and A.4 below. 

ˆ −1 n t−1 ′ ′ Lemma A.3. Put Hq = n t=2 s=1 Hn(Zt,Zs), where Zt = (εt, Xt) , Hn(Zt,Zs) = 2εtεsWh(Xt, Xs), and P P K (X x)K (X x) W (X , X )= h t − h s − dx. h t s f(x) Z Suppose Assumptions A.1 and A.4 hold, h n−ω for ω (0, 1/2p) and p< 4. Then ∝ ∈ 2 −p K2 u u 2 ˆ −p/2 n mˆ h(x)g(x) dx = h ( ) d σ (x) dx + Hq + op(h ). Z Z Z Lemma A.4. Suppose Assumptions A.1 and A.4 hold, and h n−ω for ω (0, 1/2p). Define V = 2 [ K(v)K(u + v) dv]2 du σ4(x) dx.∝ Then ∈ q −1/2 p/2 d Vq h Hˆ N(ψ, 1). × q −→ R R R Since Lemmas A.3 and A.4 are the key results for deriving the asymptotic distributions of the proposed qn statistic when εt may not be an i.i.d. sequence nor martingale difference sequence, we provide{ } detailed proofs for them below. 30 Y. HONG AND Y.-J. LEE

Proof of Lemma A.3. Let Fˆn(x) be the empirical distribution func- tion of X n . We have { t}t=1 2 n m˜ h(x)f(x) dx Z [n−1 n ε K (X X )]2 = n s=1 s h t − s dx f(x) Z P n 2 −1 K 1 1 + n εs h(Xt Xs) 2 f(x) dx " − # fˆ2(x) − f (x) Z Xs=1   n n K (X x)K (X x) = n−1 ε ε h t − h s − dx t s f(x) t=1 s=1 Z (A.1) X X −1 −p −1/2 −p/2 2 + Op(n h )Op(n h ln n + h ) n K2 (X x) = n−1 ε2 h t − dx t f(x) t=1 X Z n K (X x)K (X x) + n−1 2ε ε h t − h s − + o (h−p/2) t s f(x) p 1≤Xs≤t≤n Z −p/2 = Cˆq + Hˆq + oP (h ), −1/2 where we have made use of the fact that sup G fˆ(x) f(x) = O (n x∈ | − | p × h−p/2 ln n + h2) given Assumption A.2, and h n−ω for ω (0, 1/2p). By change of variable, the law of iterated∝ expectations,∈ and Assump- tion A.1, we can obtain K2 (y x) E(Cˆ )= σ2(x) h − f(y) dx dy q f(x) (A.2) ZZ = h−p K2(u) du σ2(x) dx[1+ O(h2)]. Z Z On the other hand, by Chebyshev’s inequality and the fact that E(Cˆq 2 −1 −2p − ECˆq) = Op(n h ) given Assumption A.1, we have −1/2 −p (A.3) Cˆq = E(Cˆq)+ Op(n h ). Combining (A.1)–(A.3) and p< 4 then yields the desired result of Lemma A.3. 

′ Proof of Lemma A.4. Because E[Hn(Zt, z)] = E[Hn(z ,Zs)]=0 for ′ ˆ −1 all z, z , Hq n 1≤s

−2 p 2 −1/2 p/2 ˆ d series context process, we have [n 1≤s 0, γ0 < 2 and γ1 > 0, (i) un(4 + δ0)= O(n ), (ii) vn(2) = o(1), δ0 1/2 γ1 (iii) wn(2 + 2 )= o(n ) and (iv) zn(2)n = O(1), where p/2 un(r)= h max max Hn(Zt,Z0) , Hn(Z0, Z¯0) , 1≤t≤nk kr k kr n o p vn(r)= h max max Gn0(Zt,Z0) , Gn0(Z0, Z¯0) , 1≤t≤nk kr k kr n o w (r)= hp G (Z ,Z ) , n k n0 0 0 kr p zn(r)= h max max Gns(Zt,Z0) , Gns(Z0,Zt) , Gns(Z0, Z¯0) , 0≤t≤n,1≤s≤n {k kr k kr k kr} p Gns(u, v)= E[Hn(Zs, u)Hn(Z0, v)] for s N and u, v R , Z¯0 is an indepen- 1/r r ∈ ∈ dent copy of Z0, and ξ r = E ξ . We first show n−2 k k hp|E|[H2(Z ,Z )] V as n . By change 1≤s

E hp/2H (Z ,Z ) r = 2γh(p/2)rE εrεrW r(X , X ) | n t 0 | | t 0 h t 0 | 2γh(p/2)r(Eε2cr)1/c(E W (X , X ) cr)1/c ≤ 0 | h t 0 | 1/c Ch(p/2)r W (x,x ) crf (x,x ) dx dx ≤ | h 0 | Xt,X0 0 0 Z  Ch(p/2)r(h−pcrhp)1/c Ch−(r/2)p+(p/c) ≤ ≤ 8+δ p/2 for all c > 1, and given E(εt ) C. We obtain h Hn(Zt,Z0) r = (Ch−(r/2)p+(p/c))1/r Ch−p/2+p/(cr)≤. Given h n−ω fork ω (0, 1/2p),k we have hp/2H (Z ,Z ≤) Cnωp(1/2−2/(8+δ)), with∝ c = 8+δ and∈ if r< 4+ δ .By k n t 0 kr ≤ 2r 2 a similar argument and replacing fXt,X0 (x,x0) with f(x)f(x0), we can ob- p/2 tain the same order of magnitude for h Hn(Z0, Z¯0) r. Hence, we obtain u (r) Cnωp(1/2−2/(8+δ)), and conditionk (i) holds by settingk γ = ωp( 1 2 ). n ≤ 0 2 − 8+δ 32 Y. HONG AND Y.-J. LEE

Now we verify condition (ii). Note that for all s 0, we have ≥ ′ ′ Gns(z, z )= E[Hn(Zs, z)Hn(Z0, z )] ′ ′ = 4E[εtεWh(Xs,x)ε0ε Wh(X0,x )] = 4ε ε′E[ε ε W (X ,x)W (X ,x′)], · s 0 h s h 0 where z = (ε, x) and z′ = (ε′,x′). To compute the order of magnitude for vn(r), we first consider the case of s = 0. We have ′ ′ 2 ¯ ¯ ′ Gn0(z, z ) = 4εε E0[¯ε0Wh(X0,x)Wh(X0,x )] ′ 2 ′ = 4εε E0[σ (X¯0)Wh(X¯0,x)Wh(X¯0,x )], where E0( ) is an expectation taken over (X¯0, ε¯0). By the Cauchy–Schwarz inequality· and change of variables, we have E hpG (Z ,Z ) 2 | n0 t 0 | = 16E h2pε2ε2E2[σ2(X¯ )W (X¯ , X )W (X¯ , X )] | t 0 0 0 h 0 t h 0 0 | 16h2pE ε2ε2[E σ2c(X¯ )]2/c[E W c(X¯ , X )W c(X¯ , X )]2/c ≤ | t 0 0 0 0 h 0 t h 0 0 | 16h2pC[E ε4ε4 ]1/2 E E (W c(X¯ , X )W c(X¯ , X )) 4/c 1/2 ≤ | t 0| { | 0 h 0 t h 0 0 | } Ch2p E h−2cp+pA (X , X ) 4/c 1/2 ≤ { | c,h t 0 | } = O(h2p[h(−2cp+p)(4/c)zhp]1/2)= O(h(2/c−3/2)p) for any c> 1, where

c ¯ c ¯ −2cp+p c c E0[Wh(X0, Xt)Wh(X0, X0)] = h Wh(¯x0, Xt)Wh(¯x0, X0)f(¯x0) dx¯0 Z −2cp+p = h Ac,h(Xt, X0) by change of variable, where Ac,h(Xt, X0) is a function similar to Kh(Xt p (1/c−3/4)p − X0). Thus, we obtain h Gn0(Zt,Z0) 2 Ch . By a similar argu- k k ≤ p ment, we obtain the same order of magnitude for h Gn0(Zt, Z¯0) 2. Thus, (1/c−3/4)p k k we have vn(r) Ch , and condition (ii) holds, that is, vn(2) = o(1), 4 ≤ with 1

2 ¯ 2c (1−2c)p where E0[Wh (X0, X0)] = Wh (¯x0, X0)fX¯0 (¯x0) dx¯0 = O(h ) by change of variable. Thus, we obtain hpG (Z ,Z ) Chp(1/c−1) = Cnωp(1−1/c) R n0 0 0 r given h n−ω. Thus conditionk (iii) holds byk choosing≤ c sufficiently small subject to∝ the constraint of c> 1. Finally, we verify condition (iv). We first consider the case with t =0 and s = 0. We have, by the Cauchy–Schwarz inequality and change of variables, 6 E hpG (Z ,Z ) 2 = 16E h2pε2ε2E2[¯ε ε¯ W (X¯ , X )W (X¯ , X )] | ns 0 0 | | 0 0 0 s 0 h s 0 h 0 0 | 16h2pE ε4(E ε¯cε¯c )2/c[E W c(X¯ , X )W c(X¯ , X )]2/c ≤ | 0 0 s 0 0 h s 0 h 0 0 | 16h2p(E ε 8)1/2[E E4/cW c(X¯ , X )W c(X¯ , X ) ]1/2 ≤ | 0| | 0 h s 0 h 0 0 | = O(h2p[h2(1−c)p·(4/c)]1/2)= O(h2(2/c−1)p), c ¯ c ¯ c c where E0[Wh(Xs, X0)Wh(X0, X0)] = Wh(¯x, X0)Wh(¯x0, X0)fX¯sX¯0 (¯x, x¯ ) dx¯ dx¯ = O(h2(1−c)p) by change of variable. Thus, we have hpG (Z , 0 0 R ns 0 2(2/c−1)p 1/2 (2/c−1)p γ1 p k Z0) 2 [Ch ] = Ch , and so n h Gns(Z0,Z0) 2 = (1−k2/c≤)ωp+γ1 −ω p k k −γ1 n if h = O(n ). Therefore, we obtain h Gns(Z0,Z0) 2 = O(n ) c k k with γ1 = ( 2 1)ωp, if we choose c small enough for 1

Proof of Theorem 3(ii). We shall now derive the asymptotic distri- bution of λn under Hn(an). From Lemmas A.5 and A.6 and Propositions A.3 and A.4 below, we have under Hn(an),

−p −2 2 −4 4 d λn h σ c(K) σ (x) dx 2σ d(K) σ (x) dx N(ξ, 1). − s −→   Z . Z Lemma A.5. Under the conditions of Theorem 3, λ = n SSR0 − SSR1 + n 2 SSR1 −p/2 −1/2 −p/4 op(h ) under Hn(an) with an = n h .

2 −1 2 Lemma A.6. Under the conditions of Theorem 3, σˆn n SSR1 = σ + −1/2 −1/2 −p/4 ≡ Op(n ) under Hn(an) with an = n h .

Proposition A.3. Let SSR0 and SSR1 be defined in the same way as n n SSR0 and SSR1, respectively, with εt t=1 replacing εˆt t=1. Then under the { } { } −p/2 2 conditions of Theorem 3, SSRg0 SSR1 g= SSR0 SSR1 + h E[δ (Xt)] + −p/2 − −1/2 −p/4 − op(h ) under Hn(an) with an = n h . g g 34 Y. HONG AND Y.-J. LEE

Proposition A.4. Under the conditions of Theorem 3 and Hn(an) with −1/2 −p/4 an = n h ,

SSR SSR 0 − 1 h−pσ−2c(K) σ2(x) dx 2σ−4d(K) σ4(x) dx 2σ2 − s  g g Z . Z d N(ξ, 1). −→ Proof of Lemma A.5. Given in the supplementary material. 

Proof of Lemma A.6. Given in the supplementary material. 

Proof of Proposition A.3. Given in the supplementary material. 

Proof of Proposition A.4. Proposition A.4 follows from Lemmas A.7 and A.8 below. 

ˆ −1 n t−1 ′ ′ Lemma A.7. Put Hλ = n t=2 s=1 Hn(Zt,Zs), where Zt = (εt, Xt) , Hn(Zt,Zs)= εtεsWh(Xt, Xs) and P P 1 1 K (X x)K (X x) W (X , X )= + K (X X ) h t − h s − dx. h t s f(X ) f(X ) h t − s − f(x)  t s  Z Suppose Assumptions A.1 and A.4 hold, h n−ω for ω (0, 1/2p) and p< 4. Then ∝ ∈

SSR SSR = h−p 2K(0) K2(u) du σ2(x) dx + Hˆ + o (h−p/2). 0 − 1 − λ p  Z  Z g g Lemma A.8. Suppose Assumptions A.1 and A.4 hold, and h n−ω for ω (0, 1/2p). Define ∝ ∈ 1 2 V = 2 K(u) K(v)K(u + v) dv du σ4(x) dx. λ − 2 Z  Z  Z d Then V −1/2hp/2Hˆ N(ξ, 1). λ λ −→ Proof of Lemma A.7. Given in the supplementary material. 

Proof of Lemma A.8. Given in the supplementary material. 

Proof of Theorem 4. Pitman’s asymptotic relative efficiency of the qn test over the λn test is the limit of the ratio of the sample sizes required by the two tests to have the same asymptotic power at the same signifi- cance level, under the same local alternative; see Pitman (1979), Chapter 7. LOSS FUNCTION APPROACH 35

Supposed n1 and n2 are the sample sizes required for the qn and λn tests, respectively. Then Pitman’s asymptotic relative efficiency of qn to λn is de- fined as n1 (A.5) ARE(qn : λn) = lim n1,n2→∞ n2 under the condition that λn and qn have the same asymptotic power under −1/2 −p/4 1/2 −p/4 the same local alternatives n1 h1 δ1(x) n2 h2 δ2(x) in the sense that ∼ n−1/2h−p/4δ (x) lim 1 1 1 = 1. n1,n2→∞ 1/2 −p/4 n2 h2 δ2(x) −ω −2γ 2 −2γ 2 Given hi = cni , i = 1, 2, we have n1 E[δ1 (Xt)] n2 E[δ2 (Xt)], where 2−ωp ∼ γ = 4 . Hence,

2γ 2 n1 E[δ2 (Xt)] (A.6) lim = 2 . n1,n2→∞ n E[δ (X )]  2  1 t On the other hand, from Theorem 3(ii), we have

γ(K)λ µ d n1 − n1 N(ξ, 1), √2µn1 → H −1/2 −1/4 2 under n1 (an1 ) : g0(Xt)= g(Xt,θ0)+n1 h1 δ1(Xt), where ξ = E[δ1 (Xt)]/ [2σ−2 2d(K) σ4(x) dx]. Also, from Theorem 3(i), we have q R q ν d n2 − n2 N(ψ, 1) √2νn2 → H −1/2 −1/4 2 under n2 (an2 ) : g0(Xt)= g(Xt,θ0)+n2 h2 δ2(Xt), where ψ = E[δ2 (Xt)]/ σ−2 2b(K) σ4(x) dx. To have the same asymptotic power, the noncentral- ity parameters must be equal; namely ξ = ψ, or q R E[δ2(X )] E[δ2(X )] (A.7) 1 t = 2 t . 2 2d(K) σ4(x) dx 2b(K) σ4(x) dx q q Combining (A.5)–(A.7) yieldsR R

2 d(K) 1/(2γ) 4d(K) 1/(4γ) ARE(q , λ )= = n n b(K)  pb(K)    p(2K(u) K(u)K(u + v) du)2 dv 1/(2−ωp) = − . ( K(u)K(u + v) du)2 dv R R  R R 36 Y. HONG AND Y.-J. LEE

Finally, we show ARE(qn : λn) 1 for any positive kernels with K( ) 1. For this purpose, it suffices to show≥ · ≤ 2 2 2K(u) K(u)K(u + v) du dv K(u)K(u + v) du dv − ≥ Z  Z  Z Z  or equivalently,

K2(v) dv K(u)K(v)K(u + v) du dv. ≥ Z ZZ This last inequality follows from Zhang and Dette [(2004), Lemma 2]. This completes the proof. 

Acknowledgments. We would like to thank the Editor, an Associate Ed- itor and two anonymous referees for insightful comments and suggestions which significantly improved our paper. We also thank participants of 2008 International Symposium on Recent Developments in Time Series Econo- metrics, Xiamen, China and 2008 Symposium on Econometric Theory and Applications (SETA), Seoul, Korea, 2009 Time Series Conference, CIREQ, Montreal, Canada, Midwest Group Meeting, Purdue, and sem- inar participants at Cornell University, Hong Kong University of Science and Technology, Korea University, Ohio State University, Texas A&M and UC Riverside for their helpful comments and discussions.

SUPPLEMENTARY MATERIAL Supplementary material for a loss function approach to model specifi- cation testing and its relative efficiency (DOI: 10.1214/13-AOS1099SUPP; .pdf). In this supplement, we present the detailed proofs of Theorems 1–4 −1/5 and report the simulation results with the bandwidth h = SX n .

REFERENCES Azzalini, A., Bowman, A. W. and Hardle,¨ W. (1989). On the use of nonparametric regression for model checking. Biometrika 76 1–11. MR0991417 Azzalini, A. and Bowman, A. (1990). A look at some data on the Old Faithful geyser. Appl. Statist. 39 357–365. Azzalini, A. and Bowman, A. (1993). On the use of nonparametric regression for check- ing linear relationships. J. Roy. Statist. Soc. Ser. B 55 549–557. MR1224417 Bahadur, R. R. (1958). Examples of inconsistency of maximum likelihood estimates. Sankhy¯a 20 207–210. MR0107331 Cai, Z., Fan, J. and Yao, Q. (2000). Functional-coefficient regression models for nonlinear time series. J. Amer. Statist. Assoc. 95 941–956. MR1804449 Christoffersen, P. F. and Diebold, F. X. (1997). Optimal prediction under asymmet- ric loss. Econometric Theory 13 808–817. MR1610075 Doukhan, P. (1994). Mixing: Properties and Examples. Lecture Notes in Statistics 85. Springer, New York. MR1312160 LOSS FUNCTION APPROACH 37

Engle, R. F. (1982). Autoregressive conditional with estimates of the variance of United Kingdom inflation. Econometrica 50 987–1007. MR0666121 Fan, J. and Huang, T. (2005). Profile likelihood inferences on semiparametric varying- coefficient partially linear models. Bernoulli 11 1031–1057. MR2189080 Fan, J. and Jiang, J. (2005). Nonparametric inferences for additive models. J. Amer. Statist. Assoc. 100 890–907. MR2201017 Fan, J. and Jiang, J. (2007). Nonparametric inference with generalized likelihood ratio tests. TEST 16 409–444. MR2365172 Fan, Y. and Li, Q. (2002). A consistent model specification test based on the kernel sum of squares of residuals. Econometric Rev. 21 337–352. MR1944979 Fan, J. and Yao, Q. (2003). Nonlinear Time Series: Nonparametric and Parametric Methods. Springer, New York. MR1964455 Fan, J., Zhang, C. and Zhang, J. (2001). Generalized likelihood ratio statistics and Wilks phenomenon. Ann. Statist. 29 153–193. MR1833962 Fan, J. and Zhang, C. (2003). A reexamination of diffusion estimators with applications to financial model validation. J. Amer. Statist. Assoc. 98 118–134. MR1965679 Fan, J. and Zhang, W. (2004). Generalised likelihood ratio tests for spectral density. Biometrika 91 195–209. MR2050469 Franke, J., Kreiss, J.-P. and Mammen, E. (2002). Bootstrap of kernel smoothing in nonlinear time series. Bernoulli 8 1–37. MR1884156 Gao, J. and Gijbels, I. (2008). Bandwidth selection in nonparametric kernel testing. J. Amer. Statist. Assoc. 103 1584–1594. MR2504206 Giacomini, R. and White, H. (2006). Tests of conditional predictive ability. Economet- rica 74 1545–1578. MR2268409 Granger, C. W. J. (1999). Outline of forecast theory using generalized cost functions. Spanish Economic Review 1 161–173. Granger, C. W. J. and Pesaran, M. H. (1999). Economic and statistical measures of forecast accuracy. Cambridge Working Papers in 9910, Faculty of Economics, Univ. Cambridge. Granger, C. W. J. and Pesaran, M. H. (2000). Economic and statistical measures of forecast accuracy. Journal of Forecasting 19 537–560. Granger, C. W. J. and Terasvirta,¨ T. (1993). Modelling Nonlinear Economic Rela- tionships. Oxford Univ. Press, New York. Hansen, B. (1999). Testing for linearity. Journal of Economic Survey 13 551–576. Hardle,¨ W. (1990). Applied Nonparametric Regression. Econometric Society Monographs 19. Cambridge Univ. Press, Cambridge. MR1161622 Hardle,¨ W. and Mammen, E. (1993). Comparing nonparametric versus parametric re- gression fits. Ann. Statist. 21 1926–1947. MR1245774 Hjellvik, V. and Tjøstheim, D. (1996). for testing of linearity and serial independence. J. Nonparametr. Stat. 6 223–251. MR1383053 Hong, Y. and Lee, Y.-J. (2005). Generalized spectral tests for conditional mean models in time series with conditional heteroscedasticity of unknown form. Rev. Econom. Stud. 72 499–541. MR2129829 Hong, Y. and Lee, Y. (2013). Supplement to “A loss function approach to model speci- fication testing and its relative efficiency.” DOI:10.1214/13-AOS1099SUPP. Hong, Y. and White, H. (1995). Consistent specification testing via nonparametric series regression. Econometrica 63 1133–1159. MR1348516 Horowitz, J. L. and Spokoiny, V. G. (2001). An adaptive, rate-optimal test of a para- metric mean-regression model against a nonparametric alternative. Econometrica 69 599–631. MR1828537 38 Y. HONG AND Y.-J. LEE

Ingster, Y. I. (1993a). Asymptotically hypothesis testing for nonparametric alternatives. I. Math. Methods Statist. 2 85–114. MR1257978 Ingster, Y. I. (1993b). Asymptotically minimax hypothesis testing for nonparametric alternatives. II. Math. Methods Statist. 3 1715–189. Ingster, Y. I. (1993c). Asymptotically minimax hypothesis testing for nonparametric alternatives. III. Math. Methods Statist. 4 249–268. MR1259685 Le Cam, L. (1990). Maximum likelihood—An introduction. ISI Review 58 153–171. Lee, T.-H., White, H. and Granger, C. W. J. (1993). Testing for neglected nonlinearity in time series models: A comparison of neural network methods and alternative tests. J. Econometrics 56 269–290. MR1219165 Lepski, O. V. and Spokoiny, V. G. (1999). Minimax nonparametric hypothesis testing: The case of an inhomogeneous alternative. Bernoulli 5 333–358. MR1681702 Li, Q. and Racine, J. S. (2007). Nonparametric Econometrics: Theory and Practice. Princeton Univ. Press, Princeton, NJ. MR2283034 Pagan, A. and Ullah, A. (1999). Nonparametric Econometrics. Cambridge Univ. Press, Cambridge. MR1699703 Pan, J., Wang, H. and Yao, Q. (2007). Weighted least absolute deviations estimation for ARMA models with infinite variance. Econometric Theory 23 852–879. MR2395837 Peel, D. A. and Nobay, A. R. (1998). Optimal monetary policy in a model of asymmetric central bank preferences. FMG Discussion Paper 0306. Pesaran, M. H. and Skouras, S. (2001). Decision based methods for forecast evaluation. In Companion to Economic Forecasting (M. P. Clements and D. F. Hendry, eds.). Blackwell, Oxford. Phillips, P. C. B. (1996). Econometric model determination. Econometrica 64 763–812. MR1399218 Pitman, E. J. G. (1979). Some Basic Theory for . Chapman & Hall, London. MR0549771 Robinson, P. M. (1991). Consistent nonparametric entropy-based testing. Rev. Econom. Stud. 58 437–453. MR1108130 Serfling, R. J. (1980). Approximation Theorems of . Wiley, New York. MR0595165 Sun, Y., Phillips, P. C. B. and Jin, S. (2008). Optimal bandwidth selec- tion in heteroskedasticity– robust testing. Econometrica 76 175–194. MR2374985 Tenreiro, C. (1997). Loi asymptotique des erreurs quadratiques int´egr´ees des estimateurs `anoyau de la densit´eet de la r´egression sous des conditions de d´ependance. Port. Math. 54 187–213. MR1467201 Varian, H. (1975). A Bayesian approach to real estate assessment. In Studies in Bayesian Econometrics and Statistics in Honor of Leonard J. Savage (S. E. Fienberg and A. Zellner, eds.). North-Holland, Amsterdam. Vuong, Q. H. (1989). Likelihood ratio tests for model selection and nonnested hypotheses. Econometrica 57 307–333. MR0996939 Weiss, A. A. (1996). Estimating time series models using the relevant cost function. J. Appl. Econometrics 11 539–560. White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica 50 1–25. MR0640163 Zellner, A. (1986). Bayesian estimation and prediction using asymmetric loss functions. J. Amer. Statist. Assoc. 81 446–451. MR0845882 Zhang, C. and Dette, H. (2004). A power comparison between nonparametric regression tests. Statist. Probab. Lett. 66 289–301. MR2045474 LOSS FUNCTION APPROACH 39

Department of Economics Department of Economics and Department of Statistical Science Indiana University Uris Hall Wylie Hall Cornell University Bloomington, Indiana 47405 Ithaca, New York 14850 USA USA E-mail: [email protected] and Wang Yanan Institute for Studies in Economics and MOE Key Laboratory of Economics Xiamen University Xiamen 361005, Fujian China E-mail: [email protected]