AN ALTERNATIVE ASYMPTOTIC ANALYSIS OF RESIDUAL-BASED Elena Andreou and Bas J. M. Werker*

Abstract—This paper presents an alternative method to derive the limiting estimated propensity score that can be a nonsmooth function distribution of residual-based statistics. Our method does not impose an explicit assumption of (asymptotic) smoothness of the statistic of interest of the estimated parameters and for which standard boot- with respect to the model’s parameters and thus is especially useful in cases strap inference is often not valid (Abadie & Imbens, 2008). where such smoothness is difficult to establish. Instead, we use a locally In applications where the statistic of interest is smooth, our uniform convergence in distribution condition, which is automatically sat- isfied by residual-based specification test statistics. To illustrate, we derive conditions can be checked along the traditional lines. In order the limiting distribution of a new functional form specification test for dis- to illustrate our approach, we derive the limiting distribution crete choice models, as well as a runs-based tests for conditional symmetry of a new test based on Kendall’s tau for omitted variables in in dynamic volatility models. binary choice models and a runs-based test for conditional symmetry in dynamic volatility models. I. Introduction Our proposed method applies to general model specifica- tions as long as they satisfy the uniform local asymptotic ESIDUAL-BASED tests are generally used for diag- normality (ULAN) condition. Most of the standard econo- R nostic checking of a proposed statistical model. Such metric models satisfy this condition (see section IIIA for a specification tests are covered in many textbooks and remain more detailed discussion). The ULAN condition is central in of interest in ongoing research. Similarly, residual-based esti- Hájek and Le Cam’s theory of asymptotic statistics (Bickel mators, often referred to as two-step estimators, are widely et al., 1993; Le Cam & Yang, 1990; Pollard, 2004; van der applied in econometric work. Traditionally the asymptotic Vaart, 1998). We use this theory to derive our results. Other distribution of residual-based statistics (be it tests or esti- advances in econometric theory using the LAN approach can mators) is derived using a particular model specification, be found in Abadir and Distaso (2007), Jeganathan (1995), some more or less stringent assumptions about the statistic, and Ploberger (2004). For ULAN models, our results offer and conditions on the first-step estimator employed. A key a simple yet general method to derive the asymptotic√ distri- assumption is some form of (asymptotic) smoothness of the bution of residual-based statistics using initial n-consistent statistic with respect to the parameter to be estimated as for- estimators. Under the conditions imposed, our main theo- malized first in Pierce (1982) and Randles (1982). Since then, rem, theorem 3.1, asserts that the residual-based statistic is this approach has been significantly extended (for example, asymptotically normally distributed with a variance that is Pollard, 1989; Newey & McFadden, 1994; Andrews, 1994). a simple function of the limiting variances and covariances We present a new and alternative approach that does of the innovation-based statistic, the central sequence (the not involve explicit smoothness conditions for the statis- ULAN equivalent of the derivative of the log likelihood), and tic of interest. Instead, we rely on a locally uniform weak the estimator.1 Using this approach, we can readily obtain the convergence assumption shown to be generally (automat- local power of such residual-based tests, which can also be ically) satisfied by residual-based statistics. Our approach interpreted in terms of specification tests with locally mis- offers a useful and unifying alternative, especially when specified alternatives such as in Bera and Yoon (1993). In smoothness conditions are nontrivial to establish or require particular, this allows US to assess in which situations the additional regularity. Some examples of such statistics are, local power of the residual-based test exceeds, falls below, or for instance, rank-based statistics (Hallin & Puri, 1991) equals that of the innovation-based test. and statistics based on nondifferentiable forecast error loss To illustrate our method, we consider two applications. functions (McCracken, 2000). Abadie and Imbens (2009) First, we derive the asymptotic distribution of a new non- present an application of our method to derive the asymp- parametric test for omitted variables in a binary choice totic distribution theory of matching estimators based on the model. Second, we discuss a runs-based test for conditional symmetry in dynamic volatility models. These applications Received for publication November 20, 2009. Revision accepted for publication August 6, 2010. purposely focus on nonparametric statistics as these are usu- * Andreou: University of Cyprus; Werker: Tilburg University. ally defined in terms of inherently nonsmooth statistics like Part of this research was completed when E.A. held a Marie Curie fel- ranks, signs, and runs. For these applications, an appropriate lowship at Tilburg University (MCFI-2000-01645) and while both authors were visiting the Statistical and Applied Mathematical Sciences Insti- form of asymptotic smoothness can probably be established, tute. E.A. acknowledges support of the European Research Council under but our technique offers a useful alternative for which this the European Community FP7/2008-2012 ERC grant 209116. B.W. also is not necessary. We introduce our applications in section II. acknowledges support from Mik. Comments by two anonymous referees, Anil Bera, Christel Bouquiaux, Rob Engle, Eric Ghysels, Lajos Horváth, A number of additional applications of our method can be Nour Meddahi, Bertrand Melenberg, Werner Ploberger, Eric Renault, found in Andreou and Werker (2009). Enrique Sentana, conference participants at the ESEM 2004, MEG 2004, and NBER time 2004 conferences, and seminar participants at 1 Throughout the paper, we use the term innovation-based statistic for the the London School of Economics, Université de Montréal, and Tilburg statistic applied to the true innovations in the model—the statistic obtained University are kindly acknowledged. if the true value of the model parameters were used.

The Review of Economics and Statistics, February 2012, 94(1): 88–99 © 2011 by the President and Fellows of Harvard College and the Massachusetts Institute of Technology

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/REST_a_00151 by guest on 25 September 2021 AN ALTERNATIVE ASYMPTOTIC ANALYSIS OF RESIDUAL-BASED STATISTICS 89

Although this paper deals mainly with residual-based test- In applications, the unknown parameter θ is replaced by ing, the results can be applied directly in the area of two-step an estimator θˆn—for instance, the maximum likelihood esti- θˆ(ML) θˆ estimation when assessing the estimation error in a second- mator n . This leads to the residual-based statistic Tn( n). step estimator calculated from the residuals of a model The traditional way of deriving the limiting distribution of estimated in a first step. This problem has received large Tn(θˆn) relies on linearizing the statistic Tn(θ) (see Pagan & attention in the econometrics literature (see Murphy & Topel, Vella, 1989). This approach leads to 1985, 2002; Pagan, 1986). In the notation below, this would L T T T −1 T merely mean that the statistic Tn should be taken as the Tn(θˆn) −→ N(0, EWZZ − EWXZ (EWXX ) EWXZ ), second-step estimation error. (5) The rest of this paper is organized as follows. The next section introduces the applications we use to illustrate the as n →∞, with scope of our technique. Section IIIA then presents the con- f (XT θ)2 ditions we need to derive the limiting distribution of a W = . residual-based statistic. Our main result is stated and dis- F(XT θ)(1 − F(XT θ)) cussed in section IIIB. Section IIIC uses our main theorem to derive the (local) power of residual-based tests and compares The test statistic, (4), checks for linear correlation this with the local power of the underlying innovation-based between the generalized residuals and the possibly omitted tests. We indicate that a technical issue arises when making variable Z. One could also be interested in a test with power our ideas rigorous. Section IV addresses this by discretiza- against nonlinear forms of dependence based on Kendall’s tau εG θ tion, and we provide a formal proof of our main result. Section applied to the pairs ( i ( ), Zi). For simplicity, we consider the case where the possibly omitted variables are univariate: V concludes, and the appendix contains the proofs and some ∈ R auxiliary results. Zi . Recall that the population version of Kendall’s tau is defined as II. Two Motivational Applications τ = P εG θ εG θ − = 4 i ( )< j ( ), Zi < Zj 1, i j. (6) A. Omitted Variable Test for the Binary Choice Model An appropriately scaled innovation-based version of Consider the binary choice model, Kendall’s tau is the U-statistic,

T − P{Y = 1|X}=F(X θ), (1) 1 n i−1 √ n T τ(θ) = n 4I εG(θ)<εG(θ), Z < Z − 1 where Y denotes a binary response variable, X some exoge- n i j i j 2 = = nous explanatory variables, and F a given probability distri- i 1 j 1 L 4 bution function. We assume that the distribution function F −→ N 0, . admits a continuous density f and that the Fisher information 9 matrix, This limiting distribution, under the null hypothesis of inde- f (XT θ)2 pendent εG and Z , can be obtained using the projection I (θ) = E XXT ,(2)i i F F(XT θ)(1 − F(XT θ)) theorem for U-statistics, for example, theorem 12.3 in van der Vaart (1998). exists and is continuous in θ. For inference, an i.i.d. sample Deriving the limiting distribution of the residual-based of observations (Y , X ), i = 1, ..., n, is available. τ i i statistic T (θˆ ) using linearization is less obvious due to The generalized residuals, for given parameter value θ, are n n the inherent nondifferentiability of the indicator functions in defined as τ θ Tn ( ). Our approach to residual-based statistics will give this − T θ G Yi F Xi T limiting distribution at about the same effort as the smooth ε (θ) = f X θ . (3) θ i F XT θ 1 − F XT θ i classical statistic Tn( ). More precisely, using our technique i i we show, in section III, The classical test for functional specification checks for a L 4 possibly omitted variable Zi using the statistic T τ θˆ(ML) −→ N 0; − αT I (θ)α , (7) n N 9 F 1 n T (θ) = √ εG(θ)Z .(4) n n i i with α defined in equation 14. This is not only a useful result i=1 that shows how the asymptotic distribution of Kendall’s tau The statistic Tn(θ) is innovation based, as it depends on the test statistic differs when applied to residuals (instead of inno- unknown true value of the parameter θ. The limiting distri- vations), but also a practical result given that the asymptotic bution of this innovation-based statistic follows immediately variance in equation (7) can easily be estimated consistently εG θ from the classical central limit theorem as soon as i ( )Zi (see section III for details). This test complements existing has finite variance and zero mean. tests in the literature.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/REST_a_00151 by guest on 25 September 2021 90 THE REVIEW OF ECONOMICS AND STATISTICS B. Runs Test for Symmetry in Dynamic Volatility Models (n) (n) (n) (n) (n) E = X , A , P = Pθ : θ ∈ Θ , Consider the following time series model, where (X(n), A(n)) is a sequence of measurable spaces and, for (n) (n) (n) each n and θ ∈ Θ, Pθ a probability measure on (X , A ). Y = σ − (θ)ε , t = 1, ..., n, (8) t t 1 t We assume throughout this paper that Θ is a subset of Rk so that we consider the effect of preestimating a Euclidean where σ − (θ) depends on past values Y − , Y − , ... and t 1 t 1 t 2 parameter. We also assume that pertinent asymptotics in the {ε } is a sequence of i.i.d. innovations. Assume that these √ t sequence of experiments take place at the usual n rate. Other innovations ε have an absolutely continuous density f t rates, as occur, for instance, in nonstationary time series, can with finite Fisher information for location and scale: Il :=  2  easily be adopted at the cost of more cumbersome notation (f (x)/f (x)) f (x)dx < ∞ and Is := (1 + xf (x)/ 2 ∞ Eε = Eε2 = only. f (x)) f (x)dx < . Finally, impose t 0, t 1, and 4 Our analysis is based on two assumptions. The first is con- κε := Eε < ∞ and assume that a stationary and ergodic t dition (ULAN), which imposes regularity on the model at solution to equation (8) exists. These are standard assump- hand. This condition involves neither the statistic T nor the tions for most GARCH-type models. n estimator θˆ . During the past thirty years, the ULAN condi- Many specification tests concerning the innovations in sto- n tion has been established for most standard cross-section and chastic volatility models have been introduced and studied in time series models. To introduce the condition, let θ ∈ Θ the literature. We consider a nonparametric test of conditional 0 denote a fixed value of the Euclidean parameter, and let symmetry based on Wald-Wolfowitz runs. One advantage θ θ θ ( n) and√ ( n) denote sequences√ contiguous to 0; that is, of such a test is that it does not require the existence of δ = θ − θ δ = θ − θ Rk n n( n 0) and n n( n 0) are bounded in . any higher-order moments of the innovation distribution and (n)  (n) (n) Write Λ (θ |θn) = log(dPθ /dPθ ) for the log likelihood thus can be considered robust to different distributions and n n n (n) (n) (n) outliers. This is particularly relevant given that there is no of Pθ with respect to Pθ . In case Pθ is not dominated by n n n consensus in the empirical literature as to the form of heavy- P(n), we mean the Radon-Nikodym derivative of the absolute θn tailed distributions in, for example, financial time series. This (n) continuous part in the Lebesgue decomposition of Pθ with test for conditional symmetry counts the number of runs n respect to P(n) (see Strasser, 1985, definition 1.3). of all negative or all positive residuals. Formally, defining θn It(θ) = I{εt(θ)<0}, the test statistic becomes E (n) Condition (ULAN). The sequence of experiments 1 n 1 is uniformly locally asymptotically normal (ULAN) in the θ = √ [ θ − θ ]2 − (n) Tn( ) It( ) It−1( ) . (9) sense that there exists a sequence of random variables Δ (θ) n 2  t=2 θ θ (the central sequence) such that for all sequences n and n contiguous to θ ,wehave This is a simple nonparametric test that complements existing 0 √ tests for symmetry such as, for instance, in Bai and Ng (2001) (n)   T (n) Λ θ |θn = δ − δn Δ (θ0 + δn/ n) and Bera and Premarantne (2005, 2009). n n Using standard central limit results, one easily finds that 1  T  − δ − δn IF δ − δn + oP(1) the limiting null distribution of the innovation-based statistic 2 n n θ √ Tn( ) is N(0, 1/4). Our detailed results in section III show = δ − δ T Δ(n) θ + δ / n that this asymptotic null distribution need not be adapted n n 0 n when applied to residuals of dynamic volatility models. The 1  T  + δ − δ I δ − δ + oP(1), results in section IIIC furthermore show that the asymptotic 2 n n F n n √ √ local power of this runs test is the same whether applied to θ = θ + δ θ = θ + δ δ innovations or residuals. where n 0 n/ n, n 0 n/ n, and both n δ Δ(n) θ and n are bounded sequences. The central sequence ( n) is asymptotically normally distributed with zero mean and III. Main Results L variance I (θ ), that is, Δ(n)(θ ) −→ N(0, I ), under P(n),as F 0 n F θn Our results are derived using the Hájek and Le Cam n →∞. Here, IF (θ0) is called the Fisher information matrix techniques of asymptotic statistics. We first introduce the (at θ = θ0). assumptions needed. Subsequently we derive the asymptotic (n) size and local power of residual-based tests. Remark 1. The central sequence Δ (θn) is the ULAN equivalent of the derivative of the log-likelihood function. The formulation in condition (ULAN) allows situations where the A. Assumptions log likelihood is not point-wise differentiable, for example, Let us formally introduce the models we consider in this when using double-exponential densities. The “uniformity” paper. Let E (n) denote a sequence of experiments defined on in the ULAN condition lies in the use of contiguous alterna- Θ ⊂ Rk θ Λ(n) θ |θ a common parameter set : tives n in the denominator of the log likelihood ( n n).

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/REST_a_00151 by guest on 25 September 2021 AN ALTERNATIVE ASYMPTOTIC ANALYSIS OF RESIDUAL-BASED STATISTICS 91

In case the likelihood expansion is required only for θn = θ0, The corresponding Fisher information matrix, at θ, is indeed the condition is called local asymptotic normality (LAN). The IF (θ), as follows from the central limit theorem. ULAN condition can usually be established under the same conditions as LAN. Weneed the uniform version in the proof of Condition (ULAN) for the dynamic volatility model.For our main result essentially due to the fact that residual-based σ θ statistics are calculated using residuals based on (random) various specifications of the conditional volatility t−1( ), the induced parametric model satisfies condition (ULAN). local alternatives θˆn. Note that the ULAN condition is sim- ply an assumption relating to the model and does not affect In particular, this holds for the classic GARCH(1,1) model σ2 θ = ω + α 2 + βσ2 θ θ = ω α β ∈ the (non)smoothness assumption of the statistic one wishes where t ( ) Yt−1 t−1( ) with ( , , ) R3 to consider. + under the Nelson (1990) condition for strict stationarity: E β+αε2 log( t )<0 (theorem 2.1 in Drost & Klaassen, 1997). Note that IGARCH(1,1) models, for which α+β = 1, are not Remark 2. The ULAN condition presents a prime exam- ruled out. Moreover, condition (ULAN) holds for (G)ARCH- ple in the theory of convergence of statistical experiments. in-mean models (Linton, 1993; Drost & Klaassen, 1997) and The quadratic expansion of the log-likelihood ratio in the δ − δ the asymmetric GARCH model (Sun & Stengos, 2006). In local parameter n n is equal to the log-likelihood ratio in −1 −1 k all cases, the central sequence for θ is given by the gaussian shift model {N(IF (θ0) δ, IF (θ0) ) : δ ∈ R }. This can be shown to imply that the sequence of localized (n) k n  experiments {P : δ ∈ R } converges, in an appropriate 1 1 f (εt(θ)) ∂ θn+δ Δ(n) θ = √ − + ε θ σ2 θ sense, to the gaussian shift experiment. This in turn implies ( ) 1 t( ) log t−1( ), n 2 f (εt(θ)) ∂θ that asymptotic analysis in the original experiments can be t=1 based on properties of the limiting gaussian shift model. A ε θ = σ θ useful consequence of the ULAN, or even LAN, property is where t( ) : Yt/ t−1( ). The Fisher information is given (n) (n) by I = I A(θ) with that the sequences Pθ and Pθ are contiguous (Le Cam & F s n n Yang, 1990, or van der Vaart, 1998). As a result, convergence P(n) n in probability under θ is equivalent to convergence under 1 ∂ ∂ n θ = σ2 θ σ2 θ (n) A( ) lim log t−1( ) log t−1( ), Pθ . In particular, any oP(1)-terms in condition (ULAN), n→∞ θ θT n 4n = ∂ ∂ and in the remainder of this paper,hold simultaneously under t 1 (n) (n) (n) Pθ , Pθ , and Pθ . 0 n n where the limit is taken in probability. Both the binary choice and the dynamic volatility model Our second assumption is about the statistic of interest introduced in section II satisfy the ULAN condition. This is Tn(θ) and the (first-step) estimator θˆn for θ used. Formally discussed below. we are interested in some innovation-based statistic Tn(θ), depending on the unknown model parameter θ. Generally the Condition (ULAN) for the binary choice model. The log- asymptotic behavior of this innovation-based statistic follows likelihood function for the binary choice model, equation (1), easily from classical limit arguments. The focus of interest is given by of this paper is the asymptotic behavior of the residual-based statistic Tn(θˆ) obtained by replacing the true value of θ by n some estimate θˆn. θ = T θ + − − T θ log L( ) Yi log F Xi (1 Yi) log 1 F Xi . Traditionally, following Pierce (1982) and Randles (1982), i=1 several papers analyze the asymptotic behavior of the (10) residual-based statistic relying on a condition like Under the assumption imposed on the Fisher information √ θ + δ = θ − T δ + matrix, equation (2), condition (ULAN) is easily seen to be Tn( 0 n/ n) Tn( 0) c n oP(1), (12) satisfied as proposition 2.1.1 in Bickel et al. (1993) applies. This proposition establishes ULAN for models with i.i.d. (n) under Pθ , for bounded sequences δn. Condition (12) is some- observations using the so-called differentiability in quadratic 0 times reinforced to hold for random sequences δˆ = OP(1). mean (DQM) condition that is obviously satisfied in the n However, this reinforcement is generally not required if one binary choice model. The central sequence is given by would resort to discretized estimators as in section IV. In the θ n − T θ ubiquitous case that the statistic Tn( ) is, up to op(1)-terms, (n) 1 Yi F Xi T −1/2 n θ Δ (θ) = √ f X θ Xi some average n t=1 m(Xt; ) of expectation-zero func- n F XT θ 1 − F XT θ i i=1 i i tions m of observations X1, ..., Xn, Condition (12) is verified n if m is differentiable in θ (a route, for instance, followed in 1 G = √ ε (θ)X . (11) Newey & McFadden, 1994) or if Em(Xt; θ) is differentiable n i i i=1 in θ as in, the empirical process approach of Andrews (1994).

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/REST_a_00151 by guest on 25 September 2021 92 THE REVIEW OF ECONOMICS AND STATISTICS √ τ Given equation (12), the limiting distribution of Tn(θˆn) is α = lim Covθ n(θˆn − θ0), T (θ0) (14) n→∞ 0 n subsequently obtained, under θ0, from − √ √ 1 n n i−1 −1 n G Tn(θˆn) = Tn(θ0 + n θˆn − θ0 / n) = I lim Cov ε Xl, √ F n→∞ 2 l = θ − T θˆ − θ + l=1 i=1 j=1 Tn( 0) c n n 0 oP(1). (13) εG θ εG θ − 4I i ( )< j ( ), Zi < Zj 1 Given this expansion, the asymptotic distribution of Tn(θˆn) ⎛ ⎞ follows immediately from the joint limiting distribution of −1 θ n i−1 √the innovation-based statistic Tn( 0) and the estimation error − n = 4I 1 lim ⎝ ⎠ E εGX + εGX n(θˆ −θ ), combined with the knowledge of c. Instead of the F →∞ i i j j n 0 n 2 smoothness in equation (12), we impose joint asymptotic nor- i=1 j=1 θˆ mality (AN) on the estimator n, the innovation-based statistic × εG θ εG θ (n) I i ( )< j ( ), Zi < Zj . Tn(θ), and the central sequence Δ (θ). Further simplification of this expression is not necessary as θ Condition (AN). Consider a sequence n contiguous to it is easily estimated consistently. θ θ 0. The innovation-based test statistic Tn(√n), the central (n) sequence Δ (θn), and the estimation error n(θˆn − θn) are (n) Condition (AN) for the dynamic volatility model.For jointly asymptotically normally distributed, under Pθ ,as →∞ δ → δ n conditionally heteroskedastic models, the QMLE estimator n and as n . More precisely, θ ⎡ ⎤ ⎡ ⎤ n based on an assumed Gaussian distribution for the innova- ε Tn(θn) T tions t is a popular choice. For the GARCH(1,1) model, ⎢ ⎥ L ⎢ ⎥ Lumsdaine (1996) establishes consistency and asymptotic ⎣ Λ(n)(θ |θ ) ⎦ −→ ⎣ 1 δT I δ + δT Δ⎦ √ n 0 2 F normality of this estimator under conditions implying the θˆ − θ n( n n) Z ones imposed above. Her results essentially also establish ⎛⎡ ⎤ ⎡ ⎤⎞ equation (15). Berkes and Horváth (2004) have improved 0 τ2 cT δαT on these results, showing, for GARCH(p, q) processes, an ⎜⎢ ⎥ ⎢ ⎥⎟ ∼ ⎝⎣ 1 δT δ⎦ ⎣δT δT δδT ⎦⎠ asymptotically linear representation (B4) for given θ: N 2 IF ; c IF . 0 αδΓ √ n 2 −1 1 1 Yt n(θn − θ) =−A(θ) √ 1 − Remark 3. Observe that the use of the notation c in the n 4 σ2 (θ) t=1 t−1 derivative in equation (12) is consistent with the use of c in ∂ τ2 × σ2 θ + condition (AN). This can be seen as follows. As denotes the log t−1( ) oP(1), (15) (n) ∂θ limiting variance of Tn(θ0) under Pθ , the limiting distribu- √ 0 tion of T (θ ) with θ = θ + δ/ n, under P(n), follows from (n) n n n 0 θ0 under Pθ . More precisely, equation (15) follows from their equation (12) as N(−cT δ, τ2). However, as in the proof of result, equation (4.18), which, as noted in the proof of their theorem 1 below, it also follows from Le Cam’s third lemma, theorem 2.1 is also valid for θn, applied to their example 2.1. as recalled in appendix V, as N(−cT δ, τ2). From equation (15), one finds the asymptotic variance of the QMLE estimator as Condition (AN) requires a locally uniform version of the central limit theorem as we consider convergence, under κε − 1 Γ = Γ θ = θ −1 P(n), of statistics evaluated at θ = θ . Such a condition ( ) A( ) , (16) θn n 4 is clearly stronger than convergence at and under a single θ κ = Eε4 fixed 0. However, Appendix V gives two results that effec- with ε t (compared to theorem 1.2 in Berkes & θ tively√ reduce the technical burden to an analysis of Tn( 0) Horváth, 2004). and n(θˆ − θ ) under P(n) only. These results are useful in These results can be reinforced to obtain an estimator that n 0 θ0 both applications we consider. actually satisfies the local uniformity in condition (AN) if equation (B5) holds. While neither Lumsdaine (1996) nor Condition (AN) for the binary choice model. We consider Berkes and Horváth (2004) explicitly mentions equation the situation where√ θ is estimated using maximum likeli- (B5), their results allow us to invoke proposition B2, as the −1 (n) hood, so that n(θˆn − θ0) = IF (θ0) Δ (θ0) + oP(1). following lemma shows: Using Proposition B2, Condition (AN) follows√ from Con- (n) dition (ULAN) as far as Λ (θn|θ0) and n(θˆn − θn) are Lemma 1. For the GARCH(p, q) model, equation (B5) Γ = −1 α = Γ concerned. Clearly, we have IF and c. Concern- holds for the gaussian QMLE estimator with ing the innovation-based statistic, proposition B1 applies. Γ = −1 Γ = α − ε2 θ Using maximum likelihood implies IF and c . −1 1 t ( ) ∂ 2 ψ (θ) =−A(θ) log σ − (θ). Finally, α itself follows from t 4 ∂θ t 1

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/REST_a_00151 by guest on 25 September 2021 AN ALTERNATIVE ASYMPTOTIC ANALYSIS OF RESIDUAL-BASED STATISTICS 93

θ × Proof. First, note that A( ) is invertible and continuous by where Ik denotes the k k identity matrix.√ From condition applying the mean-value theorem and using lemma 3.6 in (AN), applied with θn replaced by θn + δ/ n and using con- Berkes and Horváth (2004). As a result, it suffices to study, dition (ULAN), we have for all δ ∈ Rk, under P(n) √ and θn+δ/ n for local alternatives θn to θ0, as n →∞, ⎡ √ ⎤ ⎡ ⎤ 1 n 1 − ε2 (θ ) ∂ θ + δ √ t−1 n σ2 θ Tn( n / n) T log t− ( n) ⎢ √ ⎥ L ⎢ ⎥ n 4 ∂θ 1 ⎣ Λ(n)(θ |θ + δ/ n) ⎦ −→ ⎣− 1 δT I δ − δT Δ⎦ , t=1 √ n n √ 2 F θˆ − θ − δ − ε2 θ n( n n / n) Z 1 t−1( 0) ∂ 2 − log σ − (θ0) , (17) 4 ∂θ t 1 while, as a consequence of Le Cam’s third lemma, the same vector converges under P(n) to θn under P(n). Consider a given element θ(j) of the vector θ. θ0 ⎡ ⎤ Applying Taylor’s theorem to equation (17), we find, for the T − cT δ θ(j) ⎢ ⎥ element corresponding to , ⎣+ 1 δT I δ − δT Δ⎦ . 2 F n ε2 θ Z − δ 1 t n ∂ 2  ∂ 2  log σ − θ log σ − θ n 4 ∂θ t 1 n ∂θT t 1 n t=1 The quantity of interest can now be written, for t ∈ R,as  1 − ε2 θ 2 (n) t n ∂ 2  P {T (θˆ ) ≤ t} + log σ − θ θn n n 4 ∂θ∂θT t 1 n √ = P(n){T (θˆ ) ≤ t|θˆ = θ + δ/ n} √ θn n n n n δ∈Rk × n(θn − θ0), √ (n) × dPθ {θˆn ≤ θn + δ/ n} θ θ θ n with n on√ the line segment from 0 to n. Given the bound- √ √ θ − θ = P(n){T (θ + δ/ n) ≤ t| n(θˆ − θ ) = δ} edness of n n 0), it suffices to show that the term in θn n n n n δ∈Rk parentheses converges to A(θ0). This, however, follows from √ × dP(n){ n(θˆ − θ ) ≤ δ} lemma 4.4 in Berkes and Horváth (2004). θn n n T This shows that with respect to the initial estimator θn, → P{T − c δ ≤ t|Z = δ}dP{Z ≤ δ} condition (AN) is indeed satisfied. With respect to the log δ∈Rk t + (c − Γ−1α)T δ likelihood, condition (AN) follows immediately from condi- = Φ √ P{Z ≤ δ} − d , tion (ULAN) and with respect to the runs statistic, proposition δ∈Rk τ2 − αT Γ 1α B1 applies once more. where Φ denotes the cumulative distribution function of the B. Size of Residual-Based Tests standard normal distribution and we used the result that, conditional on Z = z, T has an N(αT Γ−1z, τ2 − αT Γ−1α) We now state and prove, in an informal way, the main distribution. Observe that if we introduce the distribution result of the paper. All statements will be made precise in section IV. X Z Theorem 1. Under conditions (ULAN) and (AN) and in a τ2 − αT Γ−1α + (α − Γc)T Γ−1(α − Γc)(α − Γc)T way that will be made precise in section IV, we have, for the ∼ N 0, , α − Γ Γ residual-based statistic Tn(θˆn), ( c)

L 2 T −1 T −1 = δ Tn(θˆn) −→ N(0, τ + (α − Γc) Γ (α − Γc) − α Γ α) the distribution of X conditionally on Z is N(−(c − Γ−1α)T δ, τ2 − αT Γ−1α). Consequently, the limit = N(0, τ2 + cT Γc − 2αT c), (18) of P(n){T (θˆ ) ≤ t} can be written as θn n n under P(n),asn→∞. θn P{X ≤ t|Z = δ}dP{Z ≤ δ}=P{X ≤ t}, δ∈Rk Proof. (Intuition) Introduce the distribution ⎡ ⎤ ⎛ ⎡ ⎤⎞ from which equation (18) follows. T τ2 cT αT ⎢ ⎥ ⎜ ⎢ ⎥⎟ Remark 4. Note that the limiting distribution, equation ⎣Δ⎦ ∼ N ⎝0, ⎣ cI I ⎦⎠ , (19) F k (18) of the residual-based statistic does not depend on the α Γ Z Ik actual sequence θn, but only on its limit θ0 (through the

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/REST_a_00151 by guest on 25 September 2021 94 THE REVIEW OF ECONOMICS AND STATISTICS covariances τ2,c,α, and Γ). In particular this limiting dis- 0 E {ε } + ε  ε ε = 1 +  tribution is also valid for θ = θ . Local parameter changes I t < 0 (1 tf ( t)/f ( t)) xf (x)dx n 0 2 x=−∞ within the model specified thus do not affect the limiting dis- 1 0 tribution of the residual-based statistic. The local power of = − f (x)dx = 0, 2 =−∞ Tn(θˆn) for alternatives outside the model specified is studied x in section IIIC. which implies E [ {ε }− {ε }]2 + ε  ε ε = Remark 5. In the above informal proof for theorem 1, t−1 I t < 0 I t−1 < 0 (1 tf ( t)/f ( t)) 0. the convergence of the conditional distribution P(n){T (θ + √ √ θn n n Consequently, when theorem 1 is used, the asymptotic covari- δ ≤ | θˆ − θ = δ} P{ − T δ ≤ / n) t n( n n) to the limit T c ance of the Wald-Wolfowitz runs test for symmetry need not | = δ} t Z is the most delicate part, since the convergence be adapted to estimation error when applied to the residuals takes place in the conditioning event as well. A formaliza- of dynamic volatility models. tion of such a convergence would require conditions under If we think of canonical applications, Tn(θ) represents a test which a conditional probability or expectation is continuous statistic for distributional or time series properties of some with respect to the conditioning event. This question has been innovations in the model, while Tn(θˆ) denotes the same statis- studied in the literature by introducing various topologies on tic applied to estimated residuals in the model. Theorem 1 σ the space of conditioning -fields. A good reference is Cotter shows that replacing innovations by residuals may leave the (1986), which compares some topologies. From our point of asymptotic variance of the test statistic unchanged, increase interest, Cotter essentially shows that the required continu- it, or decrease it, depending on the value of (α−Γc)T Γ−1(α− ity property holds only for discrete probability distributions. Γc) as compared to αT Γ−1α. Several special cases can occur. Indeed, we formalize theorem 1 by discretizing the estimator First, if c = 0, the residual-based statistic has the same θˆ n appropriately. (See section IV for details.) asymptotic variance as the statistic based on the true inno- vations. In particular, no adaptation is necessary in critical Remark 6. Theorem 1 has been stated for univariate sta- values in order to guarantee the appropriate asymptotic size tistics Tn(θ), but can easily be extended to the multivariate of the test when applied to estimated residuals. However, the case using the Cramér-Wold device. In such a multivariate power of the residual-based test Tn(θˆn) may be different from τ2 α setting, ,c,and in condition (AN) are matrices. By taking that of the innovation-based test Tn(θn) (see section IIIC for arbitrary linear combinations of the components of Tn and details). Recall that under c = 0, the test statistic and the cen- applying the univariate version of theorem 1, we find that tral sequence are asymptotically independent. As a result, τ2 the same limiting distribution in equation (18) holds, with the distribution of the test statistic Tn(θ0) is insensitive to replaced by the limiting variance matrix of Tn, c the limit- local changes in the parameter θ. In particular, the asymp- ing covariance matrix between the statistic and the central totic distribution of Tn(θ0) is the same under all probability sequence, and α the limiting covariance matrix between the distributions P(n), whatever the local parameter sequence θ . θn n statistic and the estimator used. Or, equivalently in our setup, the asymptotic distribution of T (θ ) under P(n) is the same for each local sequence θ .As n n θ0 n Size of Kendall’s tau for the binary choice model. Recall θˆ θ estimated parameter√ values differ from 0 in the order of that the limiting variance of the innovation-based Kendall’s magnitude of 1/ n, these remarks apparently carry over to tau is τ2 = 4/9. In order to derive the appropriate variance the residual-based statistic. Our runs-based test for symmetry correction when calculating Kendall’s tau using generalized in the dynamic volatility model falls under this scheme. residuals (calculated on the basis of the maximum likelihood A second special case occurs if α = Γc. This happens, for θˆ(ML) α = Γ estimator n ), we adopt theorem 1. As c, applying instance, when the initial estimator θˆn is efficient (as in the theorem 1 we obtain, under the null hypothesis of no omitted Γ = −1 α = −1 case IF and IF c). In such a situation, the limit- variables, ing variance of the residual-based statistic is smaller than the limiting variance of the statistic applied to the true innova- τ (ML) L 4 T tions, with strict inequality if α = 0. Pierce (1982) restricts T θˆ −→ N 0; − α IF α . n n 9 attention to this efficient initial estimator case and, imposing a differentiability condition on Tn(θ), finds the same reduc- This nonparametric test for omitted variables in the binary tion in the limiting variance. This occurs, for instance, in our choice model has not been considered before in the litera- application of omitted variables tests in the binary choice ture. We provide this application to show that its limiting model. distribution is easily derived in the framework we propagate. Finally, it might be that α = 0. In this case, the limiting variance of the residual statistic becomes τ2 + cT Γc ≥ τ2. Size of runs test for the dynamic volatility model. Without When α = 0, the test statistic Tn(θ) is asymptotically inde- going through all the calculations in detail, observe that in this pendent from the estimator θˆn and a test based on estimated case, c = 0 since residuals always has a larger asymptotic variance than the

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/REST_a_00151 by guest on 25 September 2021 AN ALTERNATIVE ASYMPTOTIC ANALYSIS OF RESIDUAL-BASED STATISTICS 95

same test applied to the actual innovations (unless c = 0, in The matrix B measures the covariance between the log- which case both variances are equal). The asymptotic inde- likelihood ratio with respect to ψ and the estimator for θn. pendence of the statistic Tn(θ) and the estimator θˆn implies Consequently, this matrix measures the bias in θˆn that occurs that the residual-based statistic Tn(θˆn) essentially behaves as due to possible local changes in ψ. The special case B = 0 a mixture over various values of θ. Such a mixture distribu- refers to the situation where θˆn is insensitive to local changes tion clearly has a larger variance than the distribution of Tn(θ) in ψ. This occurs, for example, if θˆn is an efficient estimator with θ fixed. for θ in a model where ψ is considered a nuisance parameter. (n) The asymptotic mean of Λ˜ (ψn|0) in equation (20) is a C. Power of Residual-Based Tests direct consequence of the fact that the limiting distribution is studied under (θ, ψ) = (θn,0). Note that local uniformity A question that arises naturally at this point is the effect with respect to ψ is not required. on the power of a test when applied to residuals rather than The derivations leading to theorem 1 remain valid and can innovations. First, note that the limiting distribution of the be carried out while taking into account the joint behavior residual-based test statistic in equation (10) clearly does not (n) (n) of Tn(θˆn) and Λ˜ (ψn|0). Under Pθ , one easily verifies for depend on the local parameter sequence θ . This implies that n,0 n (T (θˆ ), Λ˜ (n)(ψ |0)) the following limiting distribution: the statistic’s distribution is insensitive to local changes in the n n n ⎛⎡ ⎤ underlying parameter θ. Consequently, the test statistic T (θˆ ) n n 0 is valid for the model at hand. However, with the application N ⎝⎣ ⎦ , 1 T T of specification testing in mind, one may be interested in − η IPη + δ IFPη 2 the local power with respect to other parameters, for exam- ple, of the innovation distribution (like skewness) or omitted τ2 + (α − Γc)T Γ−1(α − Γc) − αT Γ−1αηT (d − Bc) . variables. (d − Bc)T ηηT I η Consider the case where there is an additional parameter P ψ in the model and we are interested in the (local) power of Applying Le Cam’s third lemma once more, we see that the residual-based statistic Tn(θˆn) with respect to this param- the shift in the innovation-based statistic Tn(θ) due to local eter. The model now consists of a set of probability measures ψ T η (n) changes in is given by d , while the same local change {P θ ∈ Θ ψ ∈ Ψ} T θ,ψ : , . For ease of notation, we assume in ψ induces a shift of size (d − Bc) η in the residual-based ψ = that the original model is obtained by setting 0, that statistic T (θˆ). Consequently, while the local power of the P(n) = P(n) θ ∈ Θ n is, θ,0 θ . As before, fix 0 and√ consider√ the innovation-based statistic T (θ) is determined by d/τ, that of θ ψ = θ + δ + η n local parameterization ( n, n) ( 0 / n,0 / n). the residual-based statistic Tn(θˆn) is determined by Introduce the log likelihood 2 T −1 T −1 (n) (d − Bc)/ τ + (α − Γc) Γ (α − Γc) − α Γ α. dPθ ψ Λ˜ (n) ψ | = 0, n (21) ( n 0) log (n) , dPθ 0,0 In case c = 0, we find that not only the size of the residual- with respect to the parameter ψ. We are interested in the based statistic is unaltered, but also that its power equals behavior of our test statistic T (θˆ ) under P(n) . Assume that that of the innovation-based statistic. In the special case that n n θ0,ψn condition (ULAN) is satisfied jointly in θ and ψ. Moreover, B = 0, we thus find that the power against local changes in ψ assume the equivalent of condition (AN) under ψ = 0, that in the residual-based statistic decreases, remains unchanged, is, under P(n) and as n →∞, or increases as the limiting variance of the residual-based θn,0 ⎡ ⎤ ⎡ ⎤ statistic increases, remains unchanged, or decreases, respec- T (θ ) T tively. It may thus very well be the case that residual-based ⎢ n n ⎥ ⎢ ⎥ ⎢ Λ(n) θ |θ ⎥ ⎢ 1 δT δ + δT Δ ⎥ statistics have more power against certain local alternatives ⎢ ( n 0) ⎥ L ⎢ IF ⎥ ⎢√ ⎥ −→ ⎢ 2 ⎥ than the same statistic applied to actual innovations. θˆ − θ ⎣ n( n n)⎦ ⎣ Z ⎦ Alternatively, the results in this section can be interpreted Λ˜ (n) ψ | − 1 ηT η + δT η + ηT Δ˜ ( n 0) 2 IP IFP in terms of specification testing with locally misspecified ⎛⎡ ⎤ ⎡ ⎤⎞ alternatives much in the same spirit as Bera and Yoon (1993). 0 τ2 cT δαT dT η ⎜⎢ ⎥ ⎢ ⎥⎟ Bera and Yoonderive a correction to standard LM tests, which ⎜⎢ 1 δT δ ⎥ ⎢δT δT δδT δT η⎥⎟ ⎜⎢ IF ⎥ ⎢ c IF IFP ⎥⎟ makes them insensitive to local misspecification. Not surpris- ∼ N ⎜⎢ 2 ⎥,⎢ ⎥⎟. ⎝⎣ 0 ⎦ ⎣ αδΓBT η ⎦⎠ ingly, this correction contains exactly the covariance term B, − 1 ηT η + δT η ηT ηT δηT ηT η which is Jψφ in their equation (11). 2 IP IFP d IFP B IP (20) Power of Kendall’s tau for the binary choice model.For ψ Here IP denotes the Fisher information for the parameter ψ, this application, let denote the coefficient of the possibly ψ = = while IFP denotes the cross-Fisher information between θ omitted variable Z. Using equation (2), we find, at 0, B EεG θ 2 = EεG θ 2 2 and ψ. i ( ) XZ and d i ( ) Z . Since we use the maximum

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/REST_a_00151 by guest on 25 September 2021 96 THE REVIEW OF ECONOMICS AND STATISTICS

θˆ(ML) α = Γ Γ = −1 where likelihood estimator n ,wehave c and IF .An expression for α was given in equation (14). Consequently, − − X ∼ N(0, τ2 + (α − Γc)T Γ 1(α − Γc) − αT Γ 1α). the local power of the residual-based test is determined by the shift Remark 7. In typical applications of the discretization d − BI α trick, the end result is a statistic whose first-order asymp- F . T totics do not depend on the discretization constant d used. 4/9 − α IF α Clearly, in such cases, taking the limit for d ↓ 0 in equation Power of runs test for the dynamic volatility model. (23) is not needed. Our assumptions precisely avoid smooth- Recall that in this application, we have c = 0. As a result, the ness of the statistic of interest and therefore do not allow us to (local) power of the residual-based runs test for conditional make this claim in general. When viewing asymptotic results symmetry equals that of the innovations-based test. as an approximation to finite sample distributions, this may be less of an issue because the discretization constant is an IV. Main Result: Formalization auxiliary variable that one prefers to be close to 0 in the first place.2 The problem with studying the asymptotic behavior of Tn(θˆn) is that arbitrary estimators θˆn (even if they are reg- Remark 8. As for the informal derivations in section III, the ular) can pick out very special points of the parameter space. formal proof in Appendix C is strongly based on a condition- θˆ Without strong uniformity conditions on the behavior of Tn(θ) ing argument with respect to the value of the estimator√ n,or, ˆ as a function of θ, the residual statistic Tn(θˆn) could behave more precisely, that of the local estimation error n(θn −θn). in an erratic way. We solve this problem by discretizing the This leads one to believe that it is possible to derive LAN con- estimator θˆn. This is a well-known technical trick due to Le ditions for conditional distributions where the conditioning Cam. We introduce this approach now and study the behavior event is the value of some estimation error. However, we have of the statistic based on the discretized estimated parameter. not seen any results in this direction.

The discretized estimator θˆn is obtained by rounding the original estimator θˆn to the nearest midpoint in a regular grid V. Conclusion of cubes. To be precise, consider a grid of cubes in Rk with √ This paper introduces a novel asymptotic analysis of sides of length d/ n. We call d the discretization constant. residuals-based statistics in a gaussian limiting framework: θˆ Then n is the estimator obtained by taking the midpoint of The models under consideration are assumed to be locally θˆ the cube to which n belongs. To formalize this even further, asymptotically normal (LAN), the statistics being studied introduce the function r : Rk → Zk, which arithmetically have limiting normal√ distributions, and the estimators under rounds each of the components of the input√ vector to√ the consideration are n-consistent and asymptotically normal. nearest integer. Then we may write θˆn = dr( nθˆn/d)/ n. In our approach, we do not explicitly require any smooth- Our interest lies in the asymptotic behavior of Tn(θˆn) for small ness of the statistics of interest with respect to the nuisance parameter, but we do impose a locally uniform conver- d. We first study the behavior of θˆn in the following lemma: gence condition that is satisfied for residual-based statistics. Lemma 2. Let the discretization constant√ d > 0√be given. We apply this method to derive several new results. For θ = θ example, we present a new omitted variable specification Define√ the “discretized truth” n dr( n 0/d)/ n. Then n(θˆ − θ ) is degenerated on {dj : j ∈ Zk}. Moreover, for test for limited dependent variable models and provide its n n asymptotic distribution. Our method is also useful for deriv- δn → δ as n →∞, we have √ ing the asymptotic distribution of two-step estimators and (n) nonparametric tests. P √ { n(θˆn − θn) = dj} θn+δn/ n The method proposed in the paper can be extended in sev- d d → P N(δ − dj, Γ) ∈ − ι, ι , (22) eral interesting directions. First, while the gaussian context 2 2 has many applications for residual-based testing in economet- ι = T ∈ Zk ric models, our method essentially builds on Le Cam’s third where (1, 1, ...,1) . lemma, which is not restricted to gaussian situations. In par- ticular, limiting χ2 distributions can be handled easily using The above lemma is basic to our formal main result that the same techniques. Also, in case the model of interest is now can be stated. Both proofs can be found in Appendix C. not specified in terms of likelihoods but in terms of moments Theorem 2. With the notation introduced above and under (like in GMM settings), the same ideas can be applied (see Andreou & Werker, 2009, for further details). Local devia- conditions (ULAN) and (AN), we have for δn → δ and as n →∞, tions of the moment conditions can be cast in likelihood terms such that our method still applies. Finally, our approach may P(n) √ { θˆ ≤ }=P{ ≤ } lim lim θ +δ Tn( n) t X t , (23) d↓0 n→∞ n n/ n 2 We thank an anonymous referee for pointing this out.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/REST_a_00151 by guest on 25 September 2021 AN ALTERNATIVE ASYMPTOTIC ANALYSIS OF RESIDUAL-BASED STATISTICS 97

also represent the foundations for an alternative to the deriva- Pierce, D. A., “The Asymptotic Effect of Substituting Estimators for Param- tion of asymptotic distributions of nongaussian statistics as, eters in Certain Types of Statistics,” Annals of Statistics 10 (1982), 475–478. for instance, in locally asymptotically mixed normal (LAMN) Ploberger, W., “A Complete Class of Tests When the Likelihood Is Locally models. Asymptotically Quadratic,” Journal of Econometrics 118 (2004), 67–94. Pollard, D., “Asymptotics via Empirical Processes,” Statistical Science 4 (1989), 341–366. REFERENCES Pollard, D., “Asymptopia,” Yale University unpublised manuscript (2004). Randles, R. H., “On the Asymptotic Normality of Statistics with Estimated Abadie, A., and G. W.Imbens, “On the Failure of the Bootstrap for Matching Parameters,” Annals of Statistics 10 (1982), 462–474. Estimators,” Econometrica 76 (2008), 1537–1558. Strasser, H., Mathematical Theory of Statistics (New York: Walter de ——— “Matching on the Estimated Propensity Score,” NBER working Gruyter, 1985). paper no. 15301 (2009). Sun, Y., and T. Stengos, “Semiparametric Efficient Adaptive Estimation of Abadir, K., and W. Distaso, “Testing Joint Hypotheses When One of the Asymmetric GARCH Models,” Journal of Econometrics 133 (2006), Alternatives is One-Sided,” Journal of Econometrics 140 (2007), 373–386. 695–718. van der Vaart, A., Asymptotic Statistics (Cambridge: Cambridge University Andreou, E., and B. J. M. Werker, “An Alternative Asymptotic Analy- Press, 1998). sis of Residual-Based Statistics,” Tilburg University working paper (2009). Andrews, D. W. K., “Empirical Process Methods in Econometrics,” APPENDIX A Handbook of Econometrics 37 (1994), 2247–2294. Bai, J., and S. Ng, “A Consistent Test for Conditional Symmetry in Times Le Cam’s Third Lemma Series Models,” Journal of Econometrics 103 (2001), 225–258. Bera, A. K., and G. Premaratne, “A Test for Symmetry with Leptokurtic Le Cam’s third lemma is discussed in several modern books on asymp- Financial Data,” Journal of Financial Econometrics 3 (2005), 169– totic statistics (Hájek & Šidák, 1967; Le Cam & Yang, 1990; Bickel 187. et al., 1993; van der Vaart, 1998). We recall it in its best-known form, for ——— “Adjusting the Tests for Skewness and Kurtosis for Distributional asymptotically normal distributions. Consider two sequences of probability Misspecifications,” University of Illinois at Urbana-Champaign (n) ∞ P(n) ∞ measures (Q )n=1 and ( )n=1 defined on the common measurable spaces working paper (2009). (n) X (n) ∞ Λ = (n) P(n) Bera, A. K., and M. J. Yoon, “Specification Testing with Misspecified Local (X , )n=1. Assume that the log-likelihood ratios n log dQ /d P(n) Alternatives,” Econometric Theory 9 (1993), 649–658. satisfy, jointly with some statistic Tn, under , Berkes, I., and L. Horváth, “The Efficiency of the Estimator of the Parameters in GARCH Processes,” Annals of Statistics 32 (2004), 2 Tn L T 0 τ c 633–655. −→ ∼ N 2 , , (A1) Λ Λ − σ σ2 Bickel, P. J., C. A. J. Klaassen, Y. Ritov, and J. A. Wellner, Effi- n 2 c cient and Adaptive Statistical Inference for Semiparametric Models (Baltimore, MD: John Hopkins University Press, 1993). as n →∞. Le Cam’s third lemma then gives the limiting behavior of the (n) (n) Cotter, K. D., “Similarity of Information and Behavior with a Pointwise statistic Tn under Q . More precisely it states, under Q , Convergence Topology,” Journal of Mathematical Economics 15 L 2 (1986), 25–38. Tn −→ N(c, τ ), Drost, F. C., and C. A. J. Klaassen, “Efficient Estimation in Semiparametric GARCH Models,” Journal of Econometrics 81 (1997), 193–221. as n →∞. The intuition for this result is based on the fact that a statistic T, Hájek, J., and Z. Šidák, Theory of Rank Tests (Orlando, FL: Academic Press, which is jointly normally distributed with some log-likelihood ratio Λ as in 1967). equation (A1), has N(c, τ2) distribution under the alternative measure. This Hallin, M., and M. L. Puri, “Time Series Analysis via Rank-Order The- nonasymptotic version follows trivially from writing down the appropriate ory: Signed-Rank Tests for ARMA models,” Journal of Multivariate densities and likelihood ratios. Le Cam’s third lemma takes this result to Analysis 39 (1991), 175–237. the limit. Jeganathan, P., “Some Aspects of Asymptotic Theory with Applications to Time Series Models,” Econometric Theory 11 (1995), 818–887. Le Cam, L., and G. L. Yang, Asymptotics in Statistics: Some Basic Concepts APPENDIX B (Berlin: Springer, 1990). Linton, O., “Adaptive Estimation in ARCH Models,” Econometric Theory Sufficient Conditions for Condition (AN) 9 (1993), 539–569. Lumsdaine, R., “Consistency and Asymptotic Normality of the Quasi- Condition (AN) is required√ to hold locally uniform, that is, under local Maximum Likelihood Estimator in IGARCH(1,1) and Covariance alternatives θn = θ0 +O(1/ n). In this appendix, we show that for residual- Stationary GARCH(1,1) Models,” Econometrica 64 (1996), 575– based statistics, convergence under fixed θ0 ∈ Θ generally implies this Λ(n) 596. √local uniform convergence. We discuss the three components Tn, , and McCracken, M. W., “Robust Out-of-Sample Inference,” Journal of Econo- n(θˆn − θ0) in condition (AN) separately, but, using the Cramér-Wold metrics 98 (2000), 195–223. device, the arguments can easily be combined to prove locally uniform Murphy, K., and R. Topel, “Estimation and Inference in Two-Step Econo- joint convergence. metric Models,” Journal of Business and Economic Statistics 3 First, consider the test statistic Tn. Recall that in our framework, Tn refers (1985), 370–379. to a residual-based statistic used for specification testing. In such a case, ——— “Estimation and Inference in Two-Step Econometric Models,” we can generally write Journal of Business and Economic Statistics 20 (2002), 88–97. Nelson, D., “Stationarity and Persistence in the GARCH(1,1) Model,” Tn(θ) = T(ε1(θ), ...εn(θ)), (B1) Econometric Theory 6 (1990), 318–334. (n) Newey, W., and D. McFadden, “Large Sample Estimation and Hypothe- where the innovations εt (θ) are i.i.d., under Pθ and T is some given func- sis Testing,” in R. Engle and D. McFadden (Eds.), Handbook of tion. It is not excluded that Tn depends on some exogenous variables as Econometrics (Amsterdam: Elsevier, 1994). well. When appropriate centering and scaling are used, such statistics often Pagan, A., “Two Stage and Related Estimators and Their Applications,” satisfy an asymptotic representation, for given θ = θ0, Review of Economic Studies 53 (1986), 517–538. Pagan, A., and F. Vella, “Diagnostic Tests for Models Based on Individual n θ = √1 τ ε θ ε θ + Journal of Applied Econometrics Tn( 0) ( t ( 0), ..., t−l( 0)) op(1), (B2) Data: A Survey,” 4 (1989), S29– n S59. t=1

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/REST_a_00151 by guest on 25 September 2021 98 THE REVIEW OF ECONOMICS AND STATISTICS

(n) l under P , for some function τ : R → R such that a (l + 1-dependent) Moreover, assume, for local alternatives θn to θ0, θ0 central limit theorem can be applied to equation (B2). We now have the n n following simple but useful result, which is invoked in all applications 1 1 √ √ ψ (θ ) − √ ψ (θ ) =− n(θ − θ ) + o (1), (B5) mentioned in this paper. n t n n t 0 n 0 p t=1 t=1

(n) (n) Proposition B1. Suppose that the statistic Tn(θ) can be written as in under P . Then, we can construct an estimator θ˜ such that, under P , θ0 n θn equation (B1) and satisfies equation (B2). Then, for local alternatives θn to θ , we have, under P(n), √ n 0 θn 1 n(θ˜n − θn) = √ ψt (θn) + op(1), (B6) n = 1 n t 1 θ = √ τ ε θ ε θ + √ Tn( n) ( t ( n), ..., t−l( n)) op(1). (B3) (n) n and, under Pθ , n(θ˜ − θˆ ) = o (1). t=1 0 n n p

Moreover, the limiting distribution of T (θ ) under P(n) does not depend on n n θn Proof. The construction follows the ideas in, van der Vaart (1998). First, the local alternatives θn and, thus, equals the limiting distribution of Tn(θ0) θˆ (n) discretize the estimator n as described in section IV. Denote this estimator under Pθ . Thus, condition (AN) holds with respect to Tn(θn). 0 θˆn. Define the estimator 1 n Proof. The result is immediate upon noting that θ˜ := θˆ + √ ψ (θˆ ). (B7) n n n t n t=1 n L θ − √1 τ ε θ ε θ θ θn Tn( n) ( t ( n), ..., t−l( n)) Observe, for local alternatives n and by applying equation (B5) twice, n = t 1 n n √ n 1 1 1 √ ψt (θn) − √ ψt (θn) =− n(θn − θn) + op(1). (B8) = Lθ T (θ ) − √ τ(ε (θ ), ..., ε − (θ )) , n n 0 n 0 n t 0 t l 0 t=1 t=1 t=1 Combining equation (B7) with (B8) gives L P(n) θ where θ denotes the distribution under θ . Observe that contiguity of n n θ √ √ 1 to 0 is actually not even needed for this result. n(θ˜ − θ ) = n(θˆ − θ ) + √ ψ (θˆ ) n n n n n t n t=1 Remark B1. The zero mean condition on the limiting distribution of T in n 1 condition (AN) holds for residual-based statistics discussed in proposition = √ ψ (θ ) + o (1), n t n p B1 but is indeed specific to this area of applications. The condition can t=1 easily be relaxed. Suppose that the mean of the limiting distribution of θ T δ θ ˜ θ = Tn( n) would√ be a . Consider, for given 0, the auxiliary statistic Tn( ) where the fact that θˆ may be considered deterministic follows from its T (θ) − aT n(θ − θ ). The statistic T˜ , by construction, satisfies condition n n 0 n discreteness (see van der Vaart, 1998,√ sect. 5.7, or section IV in this paper (AN), and our main theorem 1 can be invoked. This idea could also be for details). The statement about n(θ˜ − θˆ ) follows from (B4) applied in the analysis of generated or estimated regressors. n n and (B6) for θn = θ0. Concerning the locally uniform convergence of the likelihood ratio in If we forget about the discretization, the estimator introduced in equation θˆ condition (AN), we observe that this is immediately given the required (B7) actually equals the original estimator n in case local uniformity in condition (ULAN). Merely imposing on LAN condi- 1 n tion would, together with the local uniformity required in condition (AN), √ ψ (θˆ ) = 0. essentially imply ULAN. In order to be precise about the scope of our n t n results, we impose condition ULAN from the start. t=1 θˆ Concerning the estimator n, condition√ (AN) imposes that the limit- This is obviously the case for any estimator that exactly solves some (n) ing distribution, in particular its mean, of n(θˆn − θn), under Pθ , does appropriate score equations—a Z-estimator. In that respect, modulo the θ n √not depend on the local alternatives n and, hence, equals the limit of technical discretization issue, any Z-estimator satisfies condition (AN) as n(θˆ − θ ), under P(n). This is to say that the estimator is regular in long as equation (B5) holds. Condition (B5) usually follows from the n 0 θ0 the sense of Bickel et al. (1993) or van der Vaart (1998, p. 115). Regularity observation that the influence function of an estimator generally satisfies ∂ Eθ ψ θ =− = θ also implies that the asymptotic covariance between the estimator and the ∂θT t ( ) Ik , with Ik the identity matrix of dimension k dim( ). central sequence is the k ×k identity matrix Ik . In particular, the convolution Applying a Taylor expansion to equation (B5) then motivates the condition. theorem stating that asymptotic variances of estimators are always larger In the case of ML estimation, condition (B5) follows immediately from con- than the inverse of the Fisher information applies only to regular estimators; dition (ULAN). Although papers in the econometrics or statistics literature regularity is used to rule out superefficient estimators. Estimators that sat- do not always explicitly state or check the regularity of proposed estimators, isfy an asymptotically linear representation can often be transformed into the additional step of proving equation (B5) in our case is usually not very regular estimators. This is formalized in the proposition below, whose proof complicated and generally does not impose additional regularity conditions. follows easily along the lines of, say, van der Vaart (1998). APPENDIX C Proposition B2. Maintaining the condition (ULAN), consider an estima- (n) tor θˆn that satisfies, under Pθ , the asymptotic representation 0 Proofs of Main Results

√ 1 n We first recall the so-called Le Cam’s first lemma. For a more detailed n(θˆ − θ ) = √ ψ (θ ) + o (1), (B4) n 0 n t 0 p discussion we refer to van der Vaart (1998). Le Cam’s third lemma essen- t=1 tially states the operations of taking the limit (as n →∞) and changing the underlying probability measure from the null to a (local) alternative can with ψ some influence function that satisfies, still under P(n), t θ0 be interchanged. More precisely, consider the situation of condition (AN) with respect to the statistic of interest Tn(θ) and the log-likelihood ratio n Λ(n) θ |θ 1 L ( n ). Le Cam’s third lemma asserts that√ the limiting distribution of √ ψ (θ ) −→ N(0, Γ). T (θ) under the local alternatives θ = θ+δ/ n is the same as the distribu- n t 0 n n t=1 tion of T under the change-of-measure induced by the log-likelihood ratio

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/REST_a_00151 by guest on 25 September 2021 AN ALTERNATIVE ASYMPTOTIC ANALYSIS OF RESIDUAL-BASED STATISTICS 99

√ 1 δT I δ + δT Δ in the limit distribution in condition (AN). In the ubiquitous 2 F The number of values that n(θˆn − θn) takes in a bounded set, is finite. case of a gaussian limit distribution, the resulting limit is well-known to Consequently, we may write for each M > 0, be gaussian with the same variance τ2, but with a shift in the mean of size −cT δ. √ P(n) √ { θˆ ≤ | θˆ − θ |≤ } √ θ +δ Tn( n) t and n( n n) M θˆ − θ { ∈ n n/ n Proof of Lemma 2. The√ fact that n( n √n) is degenerated√ on dj : j √ √ k = P(n) √ { θ + ≤ θˆ − θ = } Z } follows easily from n(θˆn − θn) = dr( nθˆn/d) − dr( nθ0/d).To Tn( n dj/ n) t and n( n n) dj θn+δn/ n deduce its limiting distribution, observe the following equalities of events, j∈Zk ,d|j|≤M ∈ Zk for fixed j : √ √ √ → P T + (δ − dj)T c ≤ t and { n(θˆ − θ ) = dj}={dr( nθˆ /d) − dr( nθ /d) = dj} n n √ n √ 0 j∈Zk ,d|j|≤M ={r( nθˆ /d) − r( nθ /d) = j} n 0 √ − d ι + δ − ≤ d ι 1 n(θˆ − θ ) 1 < Z ( dj) , = j − ι < n n ≤ j + ι 2 2 2 d 2 √ (n) √ √ as n →∞. Since lim sup →∞ P {| n(θˆn − θn)| > M}→0as d d n θn+δn/ n = − ι < n(θˆn − θn) − dj ≤ ι . →∞ 2 2 M , we obtain

(n) P(n) √ { θˆ ≤ } P √ θ +δ Tn( n) t From the conditions (ULAN) and (AN), we find, under θ + ,as n n/ n n dj/ n δn → δ, and as n →∞, T d d → P T ≤ t − (δ − dj) c and − ι < Z + (δ − dj) ≤ ι , √ √ 2 2 Λ θ + δ |θ + j∈Zk (√n n/ n n dj√/ n) θˆ − θ + n( n ( n dj/ n)) T T ⎡ ⎤ as n →∞. Let ϕTZ denote the probability density function of [T, Z ] and 1 ϕ that of Z. Observe that, conditionally on Z = z, T ∼ N(αT Γ−1z, τ2 − L − δ − T δ − + δ − T Δ Z ⎣ ( dj) IF ( dj) ( dj) ⎦ αT Γ−1α −→ 2 , ). Consequently, Z d d P T ≤ t − (δ − dj)T c and − ι < Z + (δ − dj) ≤ ι [ ΔT ]T 2 2 with Z, as in (19). From Le Cam’s third lemma, this implies, under ∈Zk √ √ √ L j (n) √ P and as n →∞, n(θˆn − (θn + dj/ n)) = n(θˆn − θn) − dj −→ θn+δn/ n − δ− T − δ− + d ι √ t ( dj) c ( dj) 2 N(δ−dj, Γ). Together with the above result on the event { n(θˆn −θn) = dj}, = ϕTZ (x, z)dxdz x=−∞ z=−(δ−dj)− d ι the lemma now follows. j∈Zk 2 −(δ−dj)+ d ι T T −1 Proof of Theorem 2. From the proof of lemma 2, we know 2 t − (δ − dj) c − α Γ z = Φ √ ϕZ (z)dz 2 T −1 z=−(δ−dj)− d ι τ − α Γ α √ d √ d j∈Zk 2 { n(θˆn − θn) = dj}= − ι < n(θˆn − θn) − dj ≤ ι . −(δ−dj)+ d ι T −1 2 2 2 t − (α − Γc) Γ z = Φ √ ϕZ (z)dz + O(d) 2 T −1 z=−(δ−dj)− d ι τ − α Γ α Moreover, applying Le Cam’s third lemma as in the proof of lemma 2, we j∈Zk 2 P(n) √ →∞ find under θ +δ and as n , − n n/ n t − (α − Γc)T Γ 1z = Φ √ ϕ (z) z + O(d) − Z d √ ∈Rk τ2 − αT Γ 1α T (θ + dj/ n) L (δ − dj)T c τ2 αT z √n n −→ N , . n(θˆn − θn) − dj δ − dj αΓ = P{X ≤ t|Z = z}ϕZ (z)dz + O(d) z∈Rk ∈ Zk Taking these two results together, we get, for all j and with the → P{X ≤ t}, distribution (19), √ √ as d ↓ 0, with P(n) √ { θ + ≤ θˆ − θ = } θ +δ Tn( n dj/ n) t and n( n n) dj n n/ n X τ2 + (α − Γc)T Γ−1(α − Γc) − αT Γ−1αα− Γc T d d ∼ N 0, . → P T + (δ − dj) c ≤ t and − ι < Z + (δ − dj) ≤ ι . α − Γ T Γ 2 2 Z ( c)

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/REST_a_00151 by guest on 25 September 2021