POWER MAXIMIZATION AND SIZE CONTROL IN HETEROSKEDASTICITY AND ROBUST TESTS WITH EXPONENTIATED KERNELS

By

Yixiao Sun, Peter C. B. Phillips and Sainan Jin

January 2010

COWLES FOUNDATION DISCUSSION PAPER NO. 1749

COWLES FOUNDATION FOR RESEARCH IN ECONOMICS YALE UNIVERSITY Box 208281 New Haven, Connecticut 06520-8281

Power Maximization and Size Control in Heteroskedasticity and Autocorrelation Robust Tests with Exponentiated Kernels

Yixiao Sun Department of Economics University of California, San Diego

Peter C. B. Phillips Yale University, University of Auckland, Singapore Management University & University of Southampton

Sainan Jin School of Economics Singapore Management University,

August 14, 2009

The authors gratefully acknowledge partial research support from NSF under Grant No. SES- 0752443 (Sun), SES 06-47086 (Phillips), and NSFC under grant No. 70501001 and 70601001 (Jin). ABSTRACT

Using the power kernels of Phillips, Sun and Jin (2006, 2007), we examine the large sample asymptotic properties of the t-test for di¤erent choices of power parameter (). We show that the nonstandard …xed- limit distributions of the t- provide more accurate approximations to the …nite sample distributions than the conventional large- limit distribution. We prove that the second-order corrected critical value based on an asymptotic expansion of the nonstandard limit distribution is also second- order correct under the large- asymptotics. As a further contribution, we propose a new practical procedure for selecting the test-optimal power parameter that addresses the central concern of hypothesis testing: the selected power parameter is test-optimal in the sense that it minimizes the type II error while controlling for the type I error. A plug-in procedure for implementing the test-optimal power parameter is suggested. Simulations indicate that the new test is as accurate in size as the nonstandard test of Kiefer and Vogelsang (2002a, 2002b; KV), and yet it does not incur the power loss that often hurts the performance of the latter test. The new test therefore combines the advantages of the KV test and the standard (MSE optimal) HAC test while avoiding their main disadvantages (power loss and size distortion, respectively). The results complement recent work by Sun, Phillips and Jin (2008) on conventional and bT HAC testing.

JEL Classi…cation: C13; C14; C22; C51

Keywords: Asymptotic expansion, HAC estimation, Long run , , Optimal smoothing parameter, Power kernel, Power maximization, Size control, Type I error, Type II error. 1 Introduction

Seeking to robustify inference, many practical methods in now make use of heteroskedasticity and autocorrelation consistent (HAC) covariance matrix estimates. Most commonly used HAC estimates are formulated using conventional kernel smoothing techniques (for an overview, see den Haan and Levin (1997)), al- though quite di¤erent approaches like (Hong and Lee (2001)) and direct regression methods (Phillips (2005)) have recently been explored. While appealing in terms of their asymptotic properties, consistent HAC estimates provide only asymp- totic robustness in econometric testing and …nite sample performance is known to be unsatisfactory in many cases, but especially when there is strong autocorrelation in the data. HAC estimates are then biased downwards and the associated tests are liberal-biased. These size distortions in testing are often substantial and have been discussed extensively in recent work (e.g., Kiefer and Vogelsang (2005, hereafter KV (2005)) and Sul, Phillips and Choi (2005)). To address the size distortion problem, Kiefer, Vogelsang and Bunzel (2000, here- after KVB), Kiefer and Vogelsang (2002a, 2002b, hereafter KV) and Vogelsang (2003) suggested setting the bandwidth equal to the sample size (i.e. M = T ) in the con- struction of the long run variance (LRV) estimation. While the resulting LRV esti- mate is inconsistent, the associated test statistic is asymptotically nuisance parameter free and critical values can be simulated from its nonstandard asymptotic distribution. KV show by simulation that the nonstandard test has better size properties than the conventional asymptotic normal or chi-squared test. However the size improvement comes at the cost of a clear power reduction. In order to reduce the power loss while maintaining the good size properties of the KVB test, KV (2005) set the bandwidth to be proportional to the sample size (T ), i.e. M = bT for some b (0; 1]: Their 2 approach is equivalent to contracting traditional kernels k( ) to get kb(x) = k(x=b)  and using the contracted kernels kb( ) in the LRV estimation without truncation. In other work, Phillips, Sun and Jin (2006, 2007; PSJ) obtain a new class of kernels  by exponentiating the conventional kernels using k( ) = (k( )) ; where  is a power exponent parameter. For …nite b and ; both contracted and exponentiated kernels are designed to reduce, but not totally eliminate, the randomness of the denominator in the t-ratio and in doing so help to improve the power of the t-test. The parameter b in the KV approach and  in the PSJ approach are smoothing parameters that play an important role in balancing size distortion and potential power gain. In other words, the smoothing parameters entail some inherent tradeo¤ between type I and type II errors. Both KV and PSJ suggest plug-in procedures that rely on conventional asymptotic theories to select b or : More speci…cally, the plug-in selection of b or  suggested in these papers minimizes the asymptotic squared errors (AMSE) of the underlying LRV estimator. In theory, such formulations ensure that the selected values b 0 and  as T ; and thus the …xed b or …xed  asymptotic theory is not! applicable.! However, 1 ! to maintain1 good size properties or smaller type I errors in practical testing, KV and PSJ propose using critical values from the …xed b or …xed  asymptotic distributions, which are nonstandard, and treating the estimated b or  as …xed even though they are delivered by asymptotic

1 formulae. PSJ justify this hybrid procedure on the basis of simulations that show the resulting tests have better size properties than tests that use standard normal asymptotic critical values. However, there are two remaining problems with this procedure. First, the rationale for good test size based on such plug-in values of  is not rigorously established. Second, the AMSE-optimal choice of  is not necessarily optimal in the context of hypothesis testing. A primary contribution of the present paper is to provide analytic solutions to both of these problems by of asymptotic expansions that provide higher order information about the type I and type II errors. This approach provides a rigorous justi…cation for the recommended procedure. We consider both …rst order and second order power kernels from PSJ (2006, 2007)), which have been found to work very well in simulations and empirical applications (see Ray and Savin (2008), Ray, Savin, and Tiwari (2009)). To investigate the size properties of the PSJ test, we consider the Gaussian lo- cation model, which is used also in Jansson (2004) and SPJ (2008). Asymptotic expansions developed here reveal that the PSJ statistic is closer to its limit distrib- ution when  is …xed than when  increases with T: More speci…cally, the error in rejection probability (ERP) of the t-test based on the nonstandard limiting distrib- ution is of order O(1=T ) while that based on the standard normal is O(1=T q=q+1): This result relates to similar results in Jansson and SPJ who showed that the ERP of the KVB test is of order O(log T=T ) and O (1=T ), respectively, while the ERP of the conventional test using the Bartlett kernel is at most O(1=pT ); as shown in Velasco and Robinson (2001). These …ndings therefore provide theoretical support for the simulation results reported in PSJ (2006, 2007), Ray and Savin (2008), Ray, Savin, and Tiwari (2009). The PSJ test, which is based on the nonstandard limiting distribution, is not very convenient to use in practice as the critical values have to be simulated. To design an easy-to-implement test, we develop an expansion of the nonstandard limiting distri- bution about its limiting chi-squared distribution. A Cornish-Fisher type expansion then leads to second-order corrected critical values. We …nd that the corrected critical values provide good approximations to the actual critical values of the nonstandard distribution. The PSJ test based on the corrected critical values has the advantage of being easily implemented and does not require the use of tables of nonstandard distributions. To show that the hybrid PSJ test using a plug-in exponent and nonstandard critical value is generally less size-distorted than the conventional test, we develop a higher order asymptotic expansion of the …nite sample distribution of the t-statistic as T and  go to in…nity simultaneously. It is shown that the corrected critical values obtained from the asymptotic expansion of the nonstandard distribution are also second order correct under the conventional limiting theory. This …nding provides a theoretical explanation for the size improvement of the hybrid PSJ test compared to the conventional test. Combining the standard t-statistic and the high order corrected critical values, we obtain a new t-test whose type I and type II errors can be approximately measured

2 using the above asymptotic expansions. The type I and type II errors depend on the power parameter used in the HAC estimation. Following Sun (2009), we propose to choose the power parameter to minimize the type II error subject to the constraint that the type I error is bounded. The bound is de…ned to be  ; where is the nominal type I error and  > 1 is a parameter that captures the user’s tolerance on the discrepancy between the nominal and true type I errors. The parameter  is allowed to be sample size dependent. For a smaller sample size, we may have a higher tolerance while for larger sample sizes we may have lower tolerance. The new pro- cedure addresses the central concern of classical hypothesis testing, viz. maximizing power subject to controlling size. For convenience, we refer to the resulting  as the test-optimal : The test-optimal  is fundamentally di¤erent from the MSE-optimal  that applies when minimizing the asymptotic mean squared error of the corresponding HAC vari- ance estimate (c.f., PSJ (2006, 2007)). When the tolerance factor  is small enough, the test-optimal  is of smaller order than the MSE-optimal . The test optimal  can even be O (1) for certain choice of the tolerance factor : The theory provides some theoretic justi…cation for the use of …xed  rules in econometric testing. For typical economic and with a high tolerance for the type I error, the test-optimal  can be also considerably larger than the MSE-optimal : Simulation results show that the new plug-in procedure suggested in the present paper works remarkably well in …nite samples. In related work, SPJ (2008) considered conventional kernels and selected the bandwidth to minimize a loss function formed from a weighted average of type I and type II errors. The present paper di¤ers from SPJ in two aspects. First, while SPJ used the contracted kernel method, an exponentiated kernel approach is used here. An earlier version of the present paper followed SPJ and employed a loss function methodology to select the power parameter, …nding that for both LRV estimation and hypothesis testing, the …nite sample performance of the exponentiated kernel method is similar to and sometimes better than that of the contracted kernel method. So, exponentiated kernels appear to have some natural advantages. Second, the proce- dure for selecting the smoothing parameter is di¤erent in the present paper. While SPJ selected the smoothing parameter to minimize loss based on a weighted average of type I and type II errors, we minimize the type II error after controlling for the type I error. In e¤ect, the loss function here is implicitly de…ned with an endogenous weight given by the Lagrange multiplier while the loss function in SPJ is explicitly de…ned and thus requires a user-chosen weight. This requirement can be regarded as a drawback of the explicit loss function approach, especially when it is hard to evaluate the relative importance of type I and type II errors. Furthermore, the im- plicit loss function approach used here is more in line with the standard econometrics testing literature where size control is often a top priority. The rest of the paper is organized as follows. Section 2 overviews the class of power kernels that are used in the present paper and reviews some …rst order limit theory for Wald type tests as T with the power parameter  …xed and as  . Section 3 derives an exact! distribution 1 theory using operational techniques. ! 1

3 Section 4 develops an asymptotic expansion of the nonstandard limit distribution under both the null and alternative hypotheses as the power parameter  : Section 5 develops comparable …nite sample expansions of the statistic as T ! 1for a …xed  and as both T and  . Section 6 proposes a selection rule! 1 for  that is suitable for implementation! 1 in! semiparametric 1 testing. Section 7 reports some simulation evidence on the performance of the new procedure. Section 8 concludes. Proofs and additional technical results are in the Appendix.

2 HAR Inference for the Mean

Consider the location model:

yt = + ut; t = 1; 2; :::; T; (1) with ut autocorrelated and possibly heteroskedastic, and E(ut) = 0: To test a hy- ^  1 T pothesis about ; we consider the OLS estimator = Y = T t=1 yt: Recentering and normalizing gives us 1=2 P pT ( ^ ) = T ST ; (2) t where St = =1 u : We impose the following convenient high level condition (e.g., KVB, PSJ (2006, 2007), and JanssonP (2004)). 1=2 Assumption A1 The partial sum process S[T r] satis…es the functional law T S[T r] 2 ) !W (r); r [0; 1] ; where ! is the long run variance of ut and W (r) is the standard Brownian motion.2 2 Thus, pT ( ^ ) !W (1) = N(0;! ). Let u^ = y ^ be the demeaned time ) ^ t series and de…ne the corresponding partial sum process St = =1 u^ . Under A1, 1=2 ^ we have T S[T r] !V (r); r [0; 1] ; where V is a standard Brownian bridge ) 2 P2 process. When ut is stationary, the long run variance of ut is ! = 0 + 2 j1=1 (j); 2 where (j) = E(utut j): Conventional approach to estimating ! typically involves P smoothing and truncation lag covariances using kernel-based nonparametric HAC estimators. HAC estimates of !2 typically have the form

T 1 T j 1 2 j T t=1 u^t+ju^t for j 0 !^ (M) = k( )^ (j); ^(j) = 1 T  ; (3) M u^t+ju^t for j < 0 j= T +1 ( T Pt= j+1 X where ^(j) are sample covariances, k( ) is someP kernel function, M is a bandwidth parameter. Consistency of !^2(M) requires M grows with the sample size T but at a slower rate so that M = o(T ) (e.g. Andrews (1991), Andrews and Monahan (1992), Hansen (1992), Newey and West (1987,1994), de Jong and Davidson (2000)). Jansson (2002) provides a recent overview and weak conditions for consistency of such estimates. To test the null H0 : = 0 against H1 : = 0; the standard nonparametrically studentized t-ratio statistic is of the form 6 1=2 t = T ( ^ 0)=!^(M); (4) !^(M) 4 which is asymptotically standard normal. Tests based on t!^(M) and critical values from the standard normal are subject to size distortion, especially when there is strong autocorrelation in the time series. In a series of papers, KVB and KV propose the use of kernel-based estimators of !2 in which M is set equal to the sample size T or proportional to T; taking the form M = bT . These (so called …xed-b) estimates are inconsistent and tend to random quantities instead of !2; so the limit distribution of (4) is no longer standard normal. Nonetheless, use of these estimates results in valid asymptotically similar tests. In related work, PSJ (2006, 2007) propose the use of estimates of !2 based on power kernels without truncation. The power kernels were constructed by taking an arbitrary integer power  1 of conventional kernels. In this paper, we consider the     power kernels k(x) = (kBART (x)) ; (kPR(x)) ; or (kQS(x)) where

(1 x ); x 1 k (x) = BART 0; j j jxj>  1  j j 1 6x2 + 6 x 3 ; for 0 x 1=2; 3 j j  j j  kPR(x) = 2(1 x ) ; for 1=2 x 1; 8 j j  j j  < 0; otherwise. 25 sin(6x=5) kQS(x) = 2 2 cos (6x=5) :12 x 6x=5   are the Bartlett, Parzen and Quadratic Spectral (QS) kernels respectively. These kernels have a linear or quadratic expansion at the origin:

k (x) = 1 gxq + o (xq) ; as x 0+; (5) ! where g = 1; q = 1 for the Bartlett kernel, g = 6; q = 2 for the Parzen kernel and g = 182=125; q = 2 for the QS kernel. For convenience, we call the exponentiated Bartlett kernels the …rst order power kernels and the exponentiated Parzen and QS kernels the second order power kernels. Using k in (3) and letting M = T gives HAC estimates of the form

T 1 j !^2 = k ^(j): (6)   T j= T +1   X The associated t statistic is given by

1=2 t (^!) = T ( ^ 0)=!^: (7) When the power parameter  is …xed as T ; PSJ (2006, 2007) show that under 2 2 1 1 ! 1 A1, !^ ! ; where  = 0 0 k(r s)dV (r)dV (s). The associated t-statistic has the) nonstandard limit distribution R R 1=2 t (^!) W (1) ; (8) )  under the null and 1=2 t (^!) ( + W (1))  ; (9) )  5 1=2 under the local alternative H1 : = 0 + cT ; where  = c=!: When  goes to at a certain rate, PSJ (2006, 2007) further show that !^ is 1 consistent. In this case, the t-statistic has conventional normal limits: under the null t (^!) W (1) =d N(0; 1); and under the local alternative t (^!)  + W (1):  )  ) Thus, the t-statistic has nonstandard limit distributions arising from the random limit of the HAC estimate !^ when  is …xed as T ; just as the KVB and KV tests do. However, as  increases, the e¤ect of this! 1 randomness diminishes, and when  the limit distributions approach those of standard regression tests with consistent! HAC1 estimates. The mechanism we develop for making improvements in size without sacri…cing much power, is to use a test statistic constructed with !^ and to employ critical values that are second-order corrected. The correction is …rst obtained using an accurate but simple asymptotic expansion of the nonstandard distribution about its limiting chi-squared distribution that applies as  . This expansion is developed in Section 4. The correction is further justi…ed! by 1an asymptotic expansion of the …nite sample distribution in Section 5.

3 Probability Densities of the Nonstandard Limit Dis- tribution and the Finite Sample Distribution

This section develops some useful formulae for the probability densities of the …xed  limit theory and the exact distribution of the test statistic. First note that in the limit theory of the t-ratio test, W (1) is independent of , 1=2 so the conditional distribution of W (1) given  is normal with zero mean and 1 variance  : We can write  =  ( ) where the process has probability measure 1=2 V V P ( ) : The pdf of t = W (1) can then be written in the mixed normal form as V

1 pt (z) = N 0;  dP ( ) : (10)  V Z( )>0 V  For the …nite sample distribution of tT = t(^!), we assume that ut is a Gaussian process. Since ut is in general autocorrelated, pT ( ^ ) and !^ are statistically dependent. To …nd the exact …nite sample distribution of the t-statistic, we de- ^ compose and !^ into statistically independent components. Let u = (1; :::uT )0; T y = (y1; :::; yT ); lT = (1; :::; 1) and T = var(u): Then the GLS estimator of is 1 1 1 ~ = lT0 T lT lT0 T y and

 1 ^ = ~ + l0 lT l0 u;~ (11) T T 1 1 1  ~ where u~ = (I lT lT0 T lT lT0 T )u; which is statistically independent of : Therefore thet-statistic can be written as  ~ pT ( ) lT0 u~ tT = + : (12) !^(^u) pT !^(~u)

6 1 1 It is easy to see that u^ = I lT (lT0 lT ) lT0 u = I lT (lT0 lT ) lT0 u:~ In consequence, the conditional distribution of t given u~ is T   1 1 l u~ T l lT N T0 ; T0 T : (13) (^! (~u))2 pT !^(~u)   ! Letting P (~u) be the probability measure of u;~ we deduce that the probability density of tT is

1 1 lT0 u~ T lT0 T lT pt (z) = N ; dP (~u) T pT !^ (~u) (^! (~u))2 Z    ! 1 1 lT0 u~ T lT0 T lT = E N ; 2 ; (14) ( pT !^(~u) (^!(~u))  !) which is a mean and variance mixture of normal distributions. 1 1 Using u~ s N 0; T lT l0 lT l0 and employing operational techniques T T T along the lines developed in Phillips (1993),  we can write expression (14) in the form 1 1 lT0 @q T lT0 T lT z u~ pt (z) = N ; e 0 dP (~u) (15) T pT !^ (@q) (^! (@q))2 "    ! Z #q=0 1 1 1 1 l0 @q T lT0 T lT q0 T lT (l0 lT ) l0 q = N T ; e T T T : pT !^ (@q) (^! (@q))2 n o "    ! #q=0 This provides a general expression for the …nite sample distribution of the test statistic tT under Gaussianity.

4 Expansion of the Nonstandard Limit Theory

Asymptotic expansions of the limit distributions given in (8) and (9) can be obtained as the power parameter  : Moreover, these expansions may be developed for both the central and noncentral! 1 chi-squared limit distributions that apply when  ; corresponding to the null and alternative hypotheses. The approach adopted here! 1is to work with the expansion of the noncentral distribution and is therefore closely related to the approach used in SPJ (2008) for studying standard and bT type tests. 2 2 2 Let G = G( ;  ) be the cdf of a non-central 1( ) variate with noncentral- 2  1=2 2 2 ity parameter  , then P ( + W (1))  z = P ( + W (1)) z =   2 2 E G(z ) : An expansionn of E G(z ) cano be developedn in terms ofo the 2 moments of   where   = E () and  = var () : In particular, we have   

2 2 1 2 2 4 EG(z ) = G(z ) + G00(z )E ( ) z 2  1 2 3 6 4 + E G000(z ) ( ) z + O E ( ) ; (16) 6  h i n o 7 + as  ; where the O ( ) term holds uniformly for any z [Ml;Mu] R and Ml ! 1  2  and Mu may be chosen arbitrarily small and large, respectively. 1 1 It is easy to see that  = 0 0 k(r; s)dW (r)dW (s); where k(r; s) is de…ned by

1 R R 1 1 1 k(r; s) = k(r s) k(r t)dt k( s)d + k(t )dtd:  Z0 Z0 Z0 Z0

Since k(r; s) is a positive semi-de…nite kernel (see Sun (2004) for a proof), it can be represented as k(r; s) = i1=1 n fn(r)fn(s); by Mercer’s theorem, where n > 0 are the eigenvalues of the kernel and fn(r) are the corresponding eigenfunctions, 1 P i.e. n fn(s) = 0 k(r; s)fn(r)dr: Using this representation, we can write  as 2  = 1  Z ; where Z iidN(0; 1). Therefore, the characteristic function of  n=1 n nR n s it( ) it 1=2   is given by  (t) = E e = e  1 2i t : P n1=1 f n g Let 1; 2; 3, ... be the cumulants of  : Then 

1 1 m m 1 1 = 0 , m = 2 (m 1)! ::: k(j; j+1) d1 dm; (17) 0  1    0 0 j=1 Z Z Y @ A where 1 = m+1: 2 These calculations enable us to develop an asymptotic expansion of E G(z ) as the power parameter  : A full series expansion is possible using this method,  but we only require the leading! 1 term in the expansion in what follows. Ignoring the technical details, we have, up to smaller order terms

1 1 1=2 2 2 2 P ( + W (1))  z = G(z ) z G0 z k(r s)drds    Z0 Z0  n o 1 1  4 2 + z G00(z ) k2(r s)drds : (18)  Z0 Z0  1 1 Depending on the value of q; the integral k(r s)drds has di¤erent expan- 0 0 sions as  : For …rst order power kernels, direct calculation yields ! 1 R R 1 1 2 1 k(r s)drds = + O :  2 Z0 Z0   For second order power kernels, we use the Laplace approximation and obtain

1 1  1=2 1 k(r s)drds = + O : g  Z0 Z0     It is clear that the rate of decay of the integral depends on the local behavior of the kernel function at the origin. Plugging these expressions into (18), we obtain the following theorem.

8 1=2 Theorem 1 Let F(z) := P ( + W (1))  z be the nonstandard limiting  distribution. Then as  ;n o ! 1

2 1=q 2=q F(z) = G(z ) +  cq q(G; z) + O  L   where

2 4 2 2 2 4 2 2 q(G; z) = G00(z )z 2G0 (z )z I(q = 1) + G00(z )z p2G0 (z )z I(q = 2); L      n o  1=2 c = gI(q = 1) + I(q = 2); q 2g   and the remainder O ( ) term holds uniformly for any z [Ml;Mu] with 0 < Ml <  2 Mu < : 1 When  = 0; we have

2 1=q 2=q F0(z) = D(z ) + cq q(D; z) + O  (19) L   2 2 + where D( ) = G0( ) is the CDF of 1 distribution. For any (0; 1); let z ; R , 2 +   2 2 2 z R ; such that F0(z ;) = 1 ; D(z ) = 1 : Then, using a Cornish-Fisher type2 expansion, we obtain the following corollary.

Corollary 2 Second order corrected critical values based on the expansion (19) are as follows: (i) For the …rst order power kernel 1 1 z = z + (5z + z3 ) + O ; (20) ; 4 2   (ii) For the second order power kernel

1  1=2 p2 p2 1 z = z + 1 + z + z3 + O (21) ; 2 g 4 4    ( ! )   where z is the nominal critical value from the standard .

2 Consider as an example the case where = 0:05; z = 1:96 and P (W (1) 2  (1:96) ) = 0:95: Thus, for a two-sided t(^!) test, the corrected critical values at the 5% level for the Bartlett, Parzen and QS kernels are 4:3325 1:9230 3:951 1 zBART = 1:96 + ; zP AR = 1:96 + ; zQS = 1:96 + ; (22) ;  ; p ; p respectively. These are also the critical value for the one-sided test (>) at the 2.5% level. Similarly, the corrected critical values for = 0:10 are given by 12:676 1:3750 2:8252 zBART = 1:645 + ; zP AR = 1:645 + ; zQS = 1:645 + : (23) ;  ; p ; p

9 We can evaluate the accuracy of the approximate critical values by comparing them with the exact ones obtained via simulations. In all three cases, the second order corrected critical values are remarkably close to the exact ones. Details are available upon request. Since the limiting distributions (8) and (9) are valid for general regression models under certain conditions on the regressors (see PSJ (2006, 2007)), the corrected crit- ical value z ; may be used for hypothesis testing in a general regression framework. De…ne 2 j j 1=2 z=2 1  =2 2=2 z e j K (z) = e  j! (j + 1=2)2j+1=2 z j=0  X which is positive for all z and : When  = 0 and the corrected critical values are used, we can establish the local asymptotic6 power in the following corollary.

Corollary 3 The local asymptotic power satis…es

1=2 2 4 2 1=q 2=q P ( + W (1))  > z ; = 1 G(z ) cqz K z  + O( ) (24)  n o  Corollary 3 shows that the asymptotic test power increases monotonically with 4 2  when  is large. Fig. 1 graphs the function f(z ; ) = z K z against di¤erent values of z and : For any given local alternative hypothesis, it is apparent that the  function f(z ; ) is monotonically increasing in z : Thus, the power increase due to the choice of a large  increases with the con…dence level 1 : Further, the function obtains its maximum around  = 2 for a given critical value, implying that the power improvement from choosing a large  is greatest when the local alternative is in an intermediate neighborhood of the null.

5 Expansions of the Finite Sample Distribution

Following SPJ (2008), we now proceed to develop an asymptotic expansion of the …nite sample distribution of the t-statistic in a simple location model. We start with the following weak dependence condition. 2 Assumption A2 ut is a mean zero stationary Gaussian process with h1= h (h) < 1 j j ; where (h) = Eutut h: 1 P In what follows, we develop an asymptotic expansion of P pT ( ^ 0)=!^ z f  g for !^ =! ^ and for local alternatives of the form = + c= pT: A complicating  0 factor in the development is that pT ( ^ ) and !^ are in general statistically de- pendent due to the autocorrelation structure of ut. To overcome this di¢ culty, we decompose ^ and !^ into statistically independent components as in Section 3. After some manipulation, we obtain

^ 2 1 FT; (z) := P pT 0 =!^ z = E G(z &T ) + O T ; (25)  n   o  

10 1

0.8

0.6 f 0.4

0.2 2 0 1.75 4 3.5 3 1.5 2.5 z 2 a 1.5 1.25 1 0.5 d 0 1

4 2 Figure 1: The graph of f(z ; ) = z K z as a function of z and . 

+ 2 2 uniformly over z R , where &T := (^!=!T ) converges weakly to , and ! := 2 T var pT ( ^ ) . 2 1 1 Since !^ = T u^ Wu^ = T u AT WAT u; where W is T T with (j; s)-th 0 0  element k((j s)=T ) and AT = IT lT lT0 =T , &T is a quadratic form in a Gaussian 2 vector. To evaluate E G(z &T ) ; we proceed to compute the cumulants of &T T for  := E& . It is easy to show that the characteristic function of &  is T T  T T given by 1=2 T AT WAT T (t) = I 2it exp itT ; T!2 f g T where = E(uu ) and the cumulant generating function is T 0 m 1 T AT WAT 1 (it) ln (T (t)) = log det I 2it itT := m;T ; (26) 2 T!2 m! T m=1   X where the m;T are the cumulants of &T T : It follows from (26) that 1;T = 0 and

m 1 m 2 m m m;T = 2 (m 1)!T ! Trace [( T AT WAT ) ] for m 2: (27) T   11 By proving that m;T is close to m in the precise sense given in Lemma 9 in the appendix, we can establish the following theorem, which gives the order of magnitude of the error in the nonstandard limit distribution of the t-statistic as T with …xed : ! 1

1 2 Theorem 4 Let A2 hold. If 0 k(v)dv < 1= 16z ; then R  1 FT;(z) = F(z) + O T ; (28) as T with …xed :  ! 1 1 2 The requirement 0 k(v)dv < 1= 16z on  is a technical condition in the proof. It can be relaxed but at the cost of more tedious calculations. 1 FT;(z) gives the R  power of the test under the alternative hypothesis H1 : = 0. Theorem 4 indicates 6 that when  is …xed the power of the test can be approximated by 1 F(z) with an error of order O(1=T ): Under the null hypothesis H0 : = 0,  = 0; Theorem 4 shows that for …xed  the ERP for tests using critical values obtained from the 1=2 1 nonstandard limit distribution of W (1) is O T : This rate is faster than the rate for conventional tests based on consistent HAC estimates.  Combined with Theorem 1, Theorem 4 characterizes the size and power properties of the test under the sequential limit in which T goes to in…nity …rst for a …xed  and then as  goes to in…nity. Under this sequential limit theory, the size distortion of the t-test based on the corrected critical values is

2=q 1 P pT ^ 0 =!^ z ; = O  + O T  n   o    and the corresponding local asymptotic power is ^ 2 4 2 1=q 2=q 1 P pT 0 =!^ > z ; = 1 G(z ) cqz K z  +O  +O T : n   o     To evaluate the order of size distortion, we have to compare the orders of magnitude 2=q of  and 1=T: Such a comparison jeopardizes the sequential nature of the limiting directions and calls for a higher order approximation that allows T and  simultaneously. A corollary to this observation is that …xed  asymptotics! 1 do! not 1 provide an internally consistent framework for selecting the optimal power parameter. The next theorem establishes the required higher order expansion.

Theorem 5 Let A2 hold. If 1= + =T q 0 as T ; then ! ! 1 2 1=q 2 2 q FT;(z) = G(z ) + cq q (G; z) gd T G0 (z )z T L  q 1 2=q + o T + O T +  ;  (29)   2 T 1 q where d T = !T h= T +1 h (h). j j P

12 Under the local alternative hypothesis H1 : 0 = c=pT; the power of the j j test based on the corrected critical values is 1 FT;(z ;): Theorem 5 shows that 2 1=q 2 2 q FT;(z ;) can be approximated by G(z )+cq Lq (G; z) gd T G0 (z )z (T ) ; q 1 2=q and the approximation error is of order o (T ) + O T +  : Under the null hypothesis,  = 0 and G( ) = D( ); so    2 1=q 2 2 q FT;0(z) = D(z ) + cq q (D; z) gd T D0(z )z T L q 1 2=q + o T + O T +  :  (30)

 2  1=q  Importantly, the leading term D(z ) + cq Lq (D; z) in this expansion is the same as the corresponding expansion of the limit distribution F0(z) given in (19). A direct implication is that the corrected critical values from the nonstandard limiting distrib- ution are also second order correct under the conventional large  asymptotics. More 2 2 q speci…cally, up to smaller order terms, FT;0(z ;) = 1 gd T D0(z )z (T ) : 1=q By adjusting the critical values, we eliminate the term cqq L (D; z), which re- ‡ects the randomness of the estimator. So if the remaining term 2 2 q gd T D0(z )z (T ) is of smaller order than the eliminated term, the use of the corrected critical values given in Corollary 2 should reduce the size distortion. When 2 2 q  increases with T; the remaining term gd T D0(z )z (T ) approximately measures the size distortion in tests based on the corrected critical values, or equivalently those 1=q based on the nonstandard limit theory, at least to order O( ). If critical values from the standard normal are used, then the ERP is given by 1=q q the O  and O(T ) terms. To obtain the best rate of convergence of the ERP to zero, we set  = O(T q2=q+1) to balance these two terms and the resulting  q=q+1 rate of convergence is of order O(T ): As we discussed before, this rate is larger than that of the nonstandard test by an order of magnitude. One might argue that this comparison is not meaningful as the respective orders of the ERP are obtained under di¤erent asymptotic speci…cations of  : one for growing  and the other one for …xed : To sharpen the comparison, we can compare the ERP under the same asymptotic speci…cation as in SPJ (2008). When  is …xed, the ERP of the standard normal test contains an O(1) term, which makes the ERP larger by an order of magnitude than the ERP of the nonstandard test. When  grows with T; the ERP of the nonstandard test contains fewer terms than that of the standard normal test. This is usually regarded as an advantage in the econometrics literature. If we set  = O T 2q2=2q+1 which is the AMSE-optimal rate for the exponent, q q=(2q+1) then the ERP of the standard normal test is of order O (T ) = O(T ): In this case, the ERP of the nonstandard test or test using the second order corrected q=(2q+1) critical values is also of order O(T ): Therefore, compared with the standard normal test, the hybrid procedure suggested in PSJ (2006, 2007) does not reduce the ERP by an order of magnitude. However, the hybrid procedure eliminates the 1=q O( ) term from the ERP and thus reduces the over-rejection that is commonly seen in economic applications. This explains the better …nite sample performances found in the simulation studies of PSJ (2006, 2007), Ray and Savin (2008), and Ray, Savin, and Tiwari (2009).

13 For convenience, we refer to the test based the t-statistic t(^!) and the second order corrected critical values as the t-test. We formalize the results on the size distortion and local power expansion of the t-test in the following corollary.

Corollary 6 Let A2 hold. If 1= + =T q 0 0 as T , then ! ! ! 1 (a) the size distortion of the t-test is

2 2 q q 1 2=q gd T D0(z )z T + o T + O T +  : (31)    (b) under the local alternative, the power of the t-test is

2 4 2 1=q 2 2 q q 1 2=q 1 G(z ) cqz K z  + gd T G0 (z )z T + o T + O T +  :     (32) Just as standard limit theory, the nonstandard limit theory does not address the bias problem due to nonparametric smoothing in LRV estimation. Comparing (32) with (24), we get an additional term, re‡ecting the asymptotic bias of the LRV estimator. For economic time series, it is typical that d T > 0: So this additional term also increases monotonically with . Of course, size distortion tends to increase with  as well. In the next section, we propose a procedure to choose  that maximizes the local asymptotic power while controlling for the size.

6 Optimal Exponent Choice

When estimating the long run variance, PSJ (2006, 2007) show there is an optimal choice of  which minimizes the asymptotic mean squared error of the estimator and give an optimal expansion rate of O T 2q2=(2q+1) for  in terms of the sample size T: The present paper attempts to provide a new approach for optimal exponent selection that addresses the central concern of classical hypothesis testing, which can be expressed as maximizing power subject to controlling size. Using (31) and ignoring the higher order terms, the type I error of the t-test can be measured by 2 2 q eI = + gd T D0(z )z T :

Similarly, from (32), the type II error of the t-test can be measured by

2 4 2 1=q 2 2 q eII = G(z ) + cqz K z  gd T G0 (z )z T :  We choose  to minimize the type II error  while controlling for the type I error. More speci…cally, we solve min eII ; s:t: eI   where  is a constant greater than 1. Ideally, the type I error is less than or equal to the nominal type I error : In …nite samples, there is always some approximation error and we allow for some discrepancy by introducing the tolerance factor : For example, when = 5% and  = 1:2; we aim to control the type I error such that it

14 is not greater than 6%. Note that  may depend on the sample size T: For a larger sample size, we may require  to take smaller values. The solution to the minimization problem depends on the sign of d T : When d T < 0; the constraint eI  is not binding and we have the unconstrained  minimization problem opt = min eII ; whose solution for the optimal  is

q 2 2 q+1 2 cqz K z q  = T q+1 : (33) opt qgd G (z2 ) T 0  !

When d T > 0; the constraint eI  may be binding and we have to use the Kuhn-Tucker theorem to search for the optimum. Let  be the Lagrange multiplier, and de…ne

2 4 2 1=q 2 2 q L(; ) = G(z ) + cqz K z  gd T G0 (z )z T (34)  2 2 q +  + gd T D0(z )z T  :    It is easy to show that at the optimal ; the constraint eI  is indeed binding and  > 0: Hence, the optimal  is 

( 1) q opt = 2 2 T ; (35) gd T D0(z )z and the corresponding Lagrange multiplier is:

1 2 2 2 q G (z ) cqK z gd T D (z ) 1=q+2  = 0 + 0 z2 : opt D (z2 ) 1+1=q 0 q [(  1) ] T   Formulae (33) and (35) can be written collectively in the form

q 2 2 q+1 2 cqz K z q  = T q+1 ; opt 2 2 qgd T D (z ) G (z ) 0  0 ! where  

0; if d T < 0 1 2 2 2 q opt = G (z ) cqK(z )[gd T D0(z )] 1=q+2 (36) 8 0 2 2 + 1+1=q z ; if d T > 0 D0(z ) q[( 1) ] T <  If we ignore the: nonessential constant, then the function L(; ) is a weighted sum of the type I and type II errors with the weight given by the optimal Lagrange multiplier. When d T < 0; the type I error is expected to be capped by the nominal type I error. As a result, the optimal Lagrange multiplier is zero and we assign all weight to the type II error. This weighting scheme might be justi…ed by the argument that it is worthwhile to take advantage of the extra reduction in the type II error without in‡ating the type I error. When d T > 0; the type I error is expected to larger than the nominal type I error. The constraint on the type I error is binding

15 and the Lagrange multiplier is positive. In this case, the loss function is a genuine weighted sum of type I and type II errors. As the tolerance parameter  decreases toward 1, the weight attached to the type I error increases. When the nonparametric bias, as measured by d T ; is positive, the optimal  2 2 grows with T at the rate of T q =(q+1); which is slower than T 2q =(2q+1), the MSE- optimal expansion rate. When the nonparametric bias is negative, which is typical for economic time series, the expansion rate of the optimal  depends on the rate at q which  approaches 1. When  1 at a rate faster than T , the Lagrange multiplier ! q opt increases with the sample size at a rate faster than T : In this case, the optimal  is bounded. Fixed  rules may then be interpreted as assigning increasingly larger weight to the type I error. This gives us a practical interpretation of …xed  rules in terms of the permitted tolerance of the type I error. When  1 at a rate slower q ! than T q+1 ; the Lagrange multiplier opt is bounded and the optimal  expands with q q2=(q+1) T at a rate faster than T : In particular, when  1 at the rate of T 2q+1 ; the optimal  expands with T at the MSE-optimal rate T!2q2=(2q+1): So when  1 q ! at a rate faster than T 2q+1 ; the optimal  has a smaller order of magnitude than the MSE-optimal  regardless of the direction of the nonparametric bias. 2 q q =(q+1) When opt = O((T= opt) ); the size distortion of the t-test is of order O((T  ) q=(q+1)): It is apparent that the size distortion becomes smaller if a larger opt p weight is implicitly assigned to the type I error. In particular, when opt is …nite, q=(q+1) q the size distortion is of order O(T ); which is larger than O (T ), the size q distortion for the case opt s T : It is obvious that the use of opt involves some tradeo¤ between two elements in the loss function (34). Note that even when opt is q=(2q+1) …nite, the size distortion is smaller than O(T ); which is the size distortion for the conventional t-test using the AMSE optimal ; that is when  is set to be O(T 2q2=(2q+1)): The formula for opt involves the unknown parameter d T ; which could be es- timated nonparametrically or by a standard plug-in procedure based on a simple model like an AR(1). See Andrews (1991) and Newey and West (1994). Both meth- ods achieve a valid order of magnitude and the procedure is obviously analogous to conventional data-driven methods for HAC estimation. To sum up, the test-optimal  that maximizes the local asymptotic power while preserving size in large samples is fundamentally di¤erent from the MSE-optimal : The test-optimal  depends on the sign of the nonparametric bias and the permitted tolerance for the type I error while the MSE-optimal  does not. When the permitted tolerance becomes su¢ ciently small, the test-optimal  is of smaller order than the MSE-optimal :

7 Simulation Evidence

This section presents some simulation evidence on the performance of the t-test based on the plug-in implementation of the test-optimal power parameter. We consider the

16 simple location model with Gaussian ARMA(1,1) errors:

yt = + c=pT + ut; where ut = ut 1 + "t + "t 1;"t s iidN(0; 1): (37) 2 2 Note that the long run variance of ut is (1 + ) = (1 ) : On the basis of the long run variance, we consider the following c values: c = 0 (1 + ) = (1 ) for 0 [0; 5]: We consider three sets of parameter con…gurations for  and  : 2

AR(1) : (; ) = (0:9; 0); (0:6; 0); (0:3; 0); ( 0:3; 0); ( 0:6; 0); ( 0:9; 0) MA(1) : (; ) = (0; 0:9); (0; 0:6); (0; 0:3); (0; 0:3); (0; 0:6); (0; 0:9) ARMA(1; 1) : (; ) = ( 0:6; 0:3); (0:3; 0:6); (0:3; 0:3); (0; 0):(0:6; 0:3); ( 0:3; 0:6) and write ut s ARMA[; ]: We consider three sample sizes T = 100, 200 and 500: For each data generating process, we obtain an estimate ^ of the AR coe¢ cient by …tting an AR(1) model to the demeaned time series. Given the estimate ;^ d T can be estimated by d^ = 2 2 2=^ (1 ^) when q = 2 and d^ = 2=^ (1 ^ ) when q = 1: We consider the following tolerance factors  = 1:1 and 1:2: We set the signi…cance level to be = 10% and the corresponding nominal critical value for the two sided test is z = 1:645: To compute the test-optimal power parameter, we need to choose  for the local alternative hypothesis. In principle, we can estimate 0 by OLS but the OLS estimator is not consistent. Accordingly, we follow standard practice in the optimal testing literature and propose to select  such that the …rst order power is 75%, that is,  solves 1 G(1:645) = 75%: The solution is  = 2:3192; which lies in the intermediate of alternative hypotheses presented in the …gures below. Although this choice of  may not match the true 0 under the local alternative, Monte Carlo show that it delivers a test with good size and power properties. For each choice of ; we obtain ^opt and use it to construct the LRV estimate and corresponding t-statistic. We reject the null hypothesis if t is larger than the corrected critical values given in (23). Using 10,000 replications,j wej compute the empirical type I error (when 0 = 0 and c = 0). For comparative purposes, we also compute the empirical type I error when the power parameter is the ‘optimal’ one that minimizes the asymptotic mean squared error of the LRV estimate. The formulae for this power parameter are given in PSJ (2006, 2007) and plug-in versions are 4 2=5 ^ p 1  1 2 8=5 ^MSE = T g 0 16  ^2  1 B C for the second order power kernels@ and A

1=3 ^2 2 (1  ) 2=3 ^MSE = T : 2 4^2 ! 3 4 5 17 for the …rst order power kernels. In addition, we include the nonstandard test pro- posed by KV(2002a) which sets  to be one and uses nonstandard critical values. We refer to the three testing procedures as ‘OPT’,‘MSE’and ‘KV’respectively. Tables 1-3 report the empirical type I error for the ARMA(1,1) error with sample size T = 100, tolerance parameter  = 1:1 and signi…cance level = 10%: Several patterns emerge. First, when the empirical type I error is around or greater than the nominal type I error, the new plug-in procedure incurs a signi…cantly smaller type I error than the conventional plug-in procedure. In other cases, the type I errors are more or less the same for the two plug-in procedures. Second, compared with the KV procedure, the new t-test has similar size distortion except when the error process is highly persistent and second order kernels are used. Since the bandwidth is set equal to the sample size, the KV procedure is designed to achieve the smallest possible size distortion. Given this observation, we can conclude that the new t-test succeeds in controlling for the type I error. Third, the new t-test based on the exponentiated Bartlett kernel has the smallest size distortion in an overall sense. This is true even when the error autocorrelation is very high. The above qualitative observations remain valid for other con…gurations such as di¤erent sample sizes and di¤erent values of . All else being equal, the size distortion of the new t-test for  = 1:2 is slightly larger than that for  = 1:1: This is expected as we have a higher tolerance for the type I error when the value of  is larger. Figures 2-4 present …nite sample power under AR(1) errors for the three kernels considered. We compute power using the 10% empirical …nite sample critical values obtained from the null distribution. So the …nite sample power is size-adjusted and power comparisons are meaningful. The parameter con…guration is the same as those for Tables 1-3 except the DGP is generated under local alternatives. Two observations can be drawn from these …gures. First, the power of the new t-test is consistently higher than the KV test. The power di¤erence is larger for the second order kernels than for …rst order kernels. Second, the new t-test with test-optimal exponent is as powerful as the conventional MSE-based test. For a given power parameter ; using nonstandard critical values or second order corrected critical values improves the size accuracy of the test but at the cost of clear power reduction. Figures 2-4 show that we can employ the test-optimal  to compensate for the power loss. To sum up, the new t-test achieves the same degree of size accuracy as the nonstandard KV test and yet maintains the power of the conventional t-test. Rather than reporting all of the remaining …gures for other con…gurations, we present two representative …gures. Figure 5 presents the power curves under the MA(1) error and with the exponentiated Bartlett kernel while Figure 6 presents the power curves under the ARMA(1,1) error and with the exponentiated Parzen kernel. The basic qualitative observations remain the same: the new t-test is as powerful as the standard t-test and much more powerful than the nonstandard KV test. It is important to point out the KV test considered here does not use any smooth- ing parameter. The power of the KV test may be improved if the bandwidth is set proportional to the sample size (KV 2005). We may use the idea presented here and asymptotic expansions in SPJ (2008) to develop a test optimal procedure to select

18 the proportional factor. This extension is straightforward and we do not pursue it here.

8 Conclusion

Pursuing the same the line of research taken in SPJ (2008) to improve econometric testing where there is nonparametric studentization, the present paper employs the power kernel approach of Phillips, Sun and Jin (2006, 2007) and proposes a power parameter choice that maximizes the local asymptotic power while controlling for the asymptotic size of the test. This new selection criterion is fundamentally di¤erent from the MSE criterion for the of the long run variance. Depending on the permitted tolerance on the type I error, the expansion rate of the test-optimal power parameter may be larger or smaller than the MSE-optimal power parameter. The …xed power parameter rule can be interpreted as exerting increasingly tight control on the type I error. Monte Carlo experiments show that the size of the new t-test is as accurate as the nonstandard KV test with the bandwidth equal to the sample size. On the other hand, unlike the KV test, which is generally less powerful than the standard t-test, the t-test based on the test-optimal power parameter is as powerful as the standard t-test. We have therefore introduced a new test that combines the good elements of the two existing tests but without inheriting their main drawbacks.

19 Table 1: Finite Sample Sizes of Di¤erent Testing Procedures Under AR(1) errors (T = 100;  = 1:1; = 0:10) [; ] [0.9,0] [0.6,0] [0.3,0] [-0.3,0] [-0.6,0] [-0.9,0] Bartlett OPT 0.1399 0.1059 0.1044 0.0752 0.0583 0.0220 MSE 0.3675 0.1981 0.1538 0.0866 0.0847 0.0805 KV 0.1961 0.1159 0.1038 0.0893 0.0794 0.0426 Parzen OPT 0.2085 0.1171 0.1069 0.0891 0.0891 0.0881 MSE 0.3296 0.1707 0.1360 0.0866 0.0846 0.0815 KV 0.1326 0.0985 0.0955 0.0944 0.0928 0.0858 QS OPT 0.2051 0.1152 0.1067 0.0890 0.0890 0.0882 MSE 0.3277 0.1704 0.1356 0.0867 0.0847 0.0841 KV 0.1228 0.1019 0.0991 0.0991 0.0955 0.0862

Note: OPT: t-test with test-optimal rho; MSE: t-test with mse optimal ; KV: t-test with  = 1 and nonstandard critical values from KV (2002).

Table 2: Finite Sample Sizes of Di¤erent Testing Procedures Under MA(1) errors (T = 100;  = 1:1; = 0:10) [; ] [0,0.9] [0,0.6] [0,0.3] [0,-0.3] [0,-0.6] [0,-0.9] Bartlett OPT 0.0917 0.0934 0.0979 0.0638 0.0166 0.0000 MSE 0.1472 0.1444 0.1346 0.0671 0.0175 0.0000 KV 0.1015 0.1021 0.0995 0.0852 0.0477 0.0001 Parzen OPT 0.1004 0.0996 0.0982 0.0762 0.0345 0.0000 MSE 0.1324 0.1273 0.1209 0.0630 0.0117 0.0000 KV 0.0954 0.0964 0.0952 0.0932 0.0804 0.0192 QS OPT 0.0989 0.0986 0.0977 0.0756 0.0343 0.0000 MSE 0.1315 0.1263 0.1205 0.0629 0.0117 0.0000 KV 0.0995 0.0989 0.1002 0.0960 0.0806 0.0207

Note: OPT: t-test with test-optimal rho; MSE: t-test with mse optimal ; KV: t-test with  = 1 and nonstandard critical values from KV (2002).

20 Table 3: Finite Sample Sizes of Di¤erent Testing Procedures Under ARMA(1) errors (T = 100;  = 1:1; = 0:10) [; ] [-0.6,0.3] [0.3,-0.6] [0.3,0.3] [0,0] [0.6,-0.3] [-0.3,0.6] Bartlett OPT 0.0804 0.0364 0.0954 0.0976 0.1245 0.0953 MSE 0.0978 0.0342 0.1620 0.1060 0.1933 0.1253 KV 0.0907 0.0673 0.1055 0.0955 0.1112 0.0986 Parzen OPT 0.0933 0.0448 0.1054 0.0972 0.1317 0.0929 MSE 0.0987 0.0276 0.1425 0.1040 0.1803 0.1125 KV 0.0944 0.0850 0.0973 0.0952 0.0961 0.0947 QS OPT 0.0933 0.0440 0.1039 0.0971 0.1314 0.0925 MSE 0.0988 0.0271 0.1418 0.1041 0.1804 0.1121 KV 0.0983 0.0856 0.1017 0.0989 0.1012 0.0994

Note: OPT: t-test with test-optimal rho; MSE: t-test with mse optimal ; KV: t-test with  = 1 and nonstandard critical values from KV (2002).

21 (a) ARMA[0.9,0] (b) ARMA[0.6,0] 1 1

0.8 0.8

0.6 0.6 power 0.4 power 0.4 opt opt 0.2 mse 0.2 mse kv kv 0 0 0 1 2 3 4 5 0 1 2 3 4 5 d d 0 0 (c) ARMA[0.3,0] (d) ARMA[•0.3,0] 1 1

0.8 0.8

0.6 0.6 power power 0.4 0.4 opt opt 0.2 mse 0.2 mse kv kv 0 0 0 1 2 3 4 5 0 1 2 3 4 5 d d 0 0 (e) ARMA[•0.6,0] (f ) ARMA[•0.9,0] 1 1

0.8 0.8

0.6 0.6 power power 0.4 0.4 opt opt 0.2 mse 0.2 mse kv kv 0 0 0 1 2 3 4 5 0 1 2 3 4 5 d d 0 0

Figure 2: Size-adjusted power for di¤erent testing procedures under AR(1) errors with exponentiated Bartlett kernel

22 (a) ARMA[0.9,0] (b) ARMA[0.6,0] 1 1

0.8 0.8

0.6 0.6 power 0.4 power 0.4 opt opt 0.2 m se 0.2 m se kv kv 0 0 0 1 2 3 4 5 0 1 2 3 4 5 d d 0 0 (c) ARMA[0.3,0] (d) ARMA[•0.3,0] 1 1

0.8 0.8

0.6 0.6 power power 0.4 0.4 opt opt 0.2 m se 0.2 m se kv kv 0 0 0 1 2 3 4 5 0 1 2 3 4 5 d d 0 0 (e) ARMA[•0.6,0] (f) ARMA[•0.9,0] 1 1

0.8 0.8

0.6 0.6 power power 0.4 0.4 opt opt 0.2 m se 0.2 m se kv kv 0 0 0 1 2 3 4 5 0 1 2 3 4 5 d d 0 0

Figure 3: Size-adjusted power for di¤erent testing procedures under AR(1) errors with exponentiated Parzen kernel 23 (a) ARMA[0.9,0] (b) ARMA[0.6,0] 1 1

0.8 0.8

0.6 0.6 power 0.4 power 0.4 opt opt 0.2 m se 0.2 m se kv kv 0 0 0 1 2 3 4 5 0 1 2 3 4 5 d d 0 0 (c) ARMA[0.3,0] (d) ARMA[•0.3,0] 1 1

0.8 0.8

0.6 0.6 power power 0.4 0.4 opt opt 0.2 m se 0.2 m se kv kv 0 0 0 1 2 3 4 5 0 1 2 3 4 5 d d 0 0 (e) ARMA[•0.6,0] (f) ARMA[•0.9,0] 1 1

0.8 0.8

0.6 0.6 power power 0.4 0.4 opt opt 0.2 m se 0.2 m se kv kv 0 0 0 1 2 3 4 5 0 1 2 3 4 5 d d 0 0

Figure 4: Size-adjusted power for di¤erent testing procedures under AR(1) errors with exponentiated QS kernel 24 (a) ARMA[0,0.9] (b) ARMA[0,0.6] 1 1

0.8 0.8

0.6 0.6 power 0.4 power 0.4 opt opt 0.2 m se 0.2 m se kv kv 0 0 0 1 2 3 4 5 0 1 2 3 4 5 d d 0 0 (c) ARMA[0,0.3] (d) ARMA[0,•0.3] 1 1

0.8 0.8

0.6 0.6 power power 0.4 0.4 opt opt 0.2 m se 0.2 m se kv kv 0 0 0 1 2 3 4 5 0 1 2 3 4 5 d d 0 0 (e) ARMA[0,•0.6] (f) ARMA[0,•0.9] 1 1

0.8 0.8

0.6 0.6 power power 0.4 0.4 opt opt 0.2 m se 0.2 m se kv kv 0 0 0 1 2 3 4 5 0 1 2 3 4 5 d d 0 0

Figure 5: Size-adjusted power for di¤erent testing procedures under MA(1) errors with exponentiated Bartlett kernels 25 (a) ARMA[•0.6,0.3] (b) ARMA[0.3,•0.6] 1 1

0.8 0.8

0.6 0.6 power 0.4 power 0.4 opt opt 0.2 m se 0.2 m se kv kv 0 0 0 1 2 3 4 5 0 1 2 3 4 5 d d 0 0 (c) ARMA[0.3,0.3] (d) ARMA[0,0] 1 1

0.8 0.8

0.6 0.6 power power 0.4 0.4 opt opt 0.2 m se 0.2 m se kv kv 0 0 0 1 2 3 4 5 0 1 2 3 4 5 d d 0 0 (e) ARMA[0.6,•0.3] (f) ARMA[•0.3,0.6] 1 1

0.8 0.8

0.6 0.6 power power 0.4 0.4 opt opt 0.2 m se 0.2 m se kv kv 0 0 0 1 2 3 4 5 0 1 2 3 4 5 d d 0 0

Figure 6: Size-adjusted power for di¤erent testing procedures under ARMA(1,1) errors with exponentiated Parzen kernels26 9 Appendix

9.1 Technical Lemmas and Supplements Lemma 7 For quadratic power kernels, as  ; we have ! 1 (a) 1 k (x) dx = O 1 ; 0  p 1=2 1  1  1 (b) R (1 x)k (x) dx = + O : 0 2 g  R     Proof of Lemma 7. We prove part (b) only as part (a) follows from similar but easier arguments. For both the Parzen and QS kernels, we have for any  > 0; there exists  := () > 0 such that log k(x)  () for  x 1: Therefore, the contribution of the interval  x 1 satis…es     1 1 (1 x)k (x) dx = exp  log k (x) + log(1 x) dx   f g Z Z 1 exp [ ( 1)  ()] k (x) dx exp [ ( 1)  ()] : (38)   Z0 We now deal with the integral from  to : Both the Parzen and QS kernels exhibit quadratic behavior around the origin in the sense that

k (x) = 1 gx2 + o x2 ; as x 0 for some g > 0; ! which implies log k(x) = gx2 + o(x2):So, for any given " > 0, we can determine  > 0 such that log k(x) + gx2 "x2; for 0 x :    In consequence,

   2 2 (1 x) exp (g + ")x dx (1 x)k (x) dx (1 x) exp (g ")x dx:   Z0 Z0 Z0     Note that  (1 x) exp (g + ")x2 dx = 1(1 x) exp (g + ")x2 dx 1(1 x) exp (g + ")x2 : Z0 Z0 Z       In view of 1 p 1 1 1(1 x) exp (g + ")x2 dx = ; 2 (g + ") 2 (g + ") Z0   and p

1(1 x) exp (g + ")x2 dx 1 exp ( 1)(g + ")x2 exp (g + ")x2 ( 1 x ) dx  j j Z Z       exp ( 1)(g + ") 2 1 1 x exp (g + ")x2 dx  j j Z0 = O exp ( 1)g 2 ;     27 we have  1 p 1 (1 x) exp (g + ")x2 dx = + O : 2 (g + ")  Z0     Similarly, p  1 p 1 (1 x) exp (g ")x2 dx = + O : 2 (g ")  Z0     Therefore p  1  1=2 1 (1 x)k (x) dx = + O : (39) 2 g  Z0     Combining (38) and (39) yields

1 1  1=2 1 (1 x)k (x) dx = + O : (40) 2 g  Z0    

Lemma 8 The cumulants of   satisfy 1 m 1 3m 3 m 2 (m 1)! k(v)dv (41) j j  Z0  m and the moments m = E ( ) satisfy 1 m 1 4m 4 m 2 (m 1)! k(v)dv : (42) j j  Z0  Proof of Lemma 8. Note that

1 1 m ::: k(j; j+1) d1 dm 0  1    Z0 Z0 j=1 Y 1 @1 A

::: k(1; 2)k(2; 3) k(m 1; m) k(m; 1) d1 dm        Z0 Z0 1 1

::: k(1; 2)k(2; 3) k(m 1; m) d1 dm        Z0 Z0 1 1

sup k(1; 2) d1 k(2; 3)k(3; 4) k(m 1; m) d2 dm         2 Z0 Z0 1 1 1

sup k(1; 2) d1 sup k(2; 3) d2::: sup k(m 1; m) dm 1     2 Z0 3 Z0 m Z0 1 m 1   = sup k(r; s) dr : (43) s  Z0 

28 1 Let  = (r) > 0 be such that k() = k(r p)dp: Then, in view of the de…nition 0 1 R 1 1 1 k(r; s) = k(r s) k(r p)dp k(s q)dq + k(p q)dpdq;  Z0 Z0 Z0 Z0 we have 1 1 1 sup k(r; s) dr 2 sup k(r s) k(r p)dp dr s  s Z0 Z0 Z0 1

= 2 sup k(r s) k(r p)dp dr s r s  0 Zj j  Z  1 + 2 sup k(r p)dp k(r s) dr s r s  0 Zj j Z  1 2 sup k(r s)dr + 2 sup k(r p)dr dp  s r s  s 0 r s  Zj j Z Zj j !

2 sup sup k(r p)dr + 2 sup sup k(r p)dr  s p r s  s p r s  Zj j Zj j 1 = 2 sup k(r p)dr: p Z0 Therefore 1 1 sup k(r; s) dr 2 sup k(r s)dr s  s Z0 Z0 1 s 1 = 2 sup k(v)dv 4 k(v)dv: s [0;1] s  0 2 Z  Z As a result

1 1 m 1 m 1 ::: k(j; j+1) d1 dm 4 k(v)dv ; (44) 0  1     Z0 Z0 j=1  Z0  Y @ A and 1 m 1 3m 3 m 2 (m 1)! k(v)dv : (45) j j  Z0  Note that the moments j and cumulants j satisfy the following recursive re- lationship: f g f g m 1 m 1 1 = 1; m = jm j: (46) j j=0 X   It follows easily by induction from (45), (46) and the identity

m 1 m 1 m 1 = 2 ; (47) j j=0 X   29 that 1 m 1 4m 4 m 2 (m 1)! k(v)dv : (48) j j  Z0 

Lemma 9 Let A2 hold. When T for a …xed ; we have (a) ! 1 1  =  + O : (49) T  T   (b) 1 m 2 m!2m 1  =  + O 4 k (v)dv ; (50) m;T m T 2  (  Z0  ) uniformly over m 1: (c) 

2m 1 1 m 2 m 2 m! m;T = E (&T T ) = m + O 4 k(v)dv ; (51) T (  Z0  ) uniformly over m 1:  2 1 Proof of Lemma 9. We …rst calculate T = T!T Trace( T AT WAT ) : Let W = A W A ; then the (i,j)-th element of W is  T  T   i j i j 1 T i m k~ ; = k k T T T T T m=1     X   1 T k j 1 T T k m k + k : (52) T T T 2 T k=1 k=1 m=1 X   X X   So

Trace ( T AT WAT ) = Trace( T W)

T T r2 r1 r2 r2 + h1 r2 = (r1 r2)k~ ; = (h1)k~ ; T T T T 1 r1;r2 T r2=1 h1=1 r2    X n  o X X T 1 T h1 0 T r2 + h1 r2 = + (h1)k~ ; : (53) 0 1 T T h1=1 r2=1 h1=1 T r2=1 h1   X X X X @ A

30 But

T h1 T h1 T T r2 + h1 r2 h1 1 r1 m k~ ; = k k T T T T T r =1 r =1 r =1+h m=1 X2   X2   1 X 1 X   T h1 T T h1 T T 1 k r2 1 k m k + k T T T 2 T r =1 k=1 r =1 k=1 m=1 X2 X   X2 X X   T T T T 1 r1 m 1 k r2 = k k T T T T r =1 m=1 r =1 k=1 X1 X   X2 X   T T T 1 k m h1 + k + T k + C(h ) T 2  T  T 1 r =1 k=1 m=1 X2 X X     T T 1 r s h1 = k + T k + C(h1) T T T r=1 s=1 X X     T r2 r2 h1 = k~ ; + T k k (0) + C(h1) (54) T T T r =1 X2       where C(h1) is a function of h1 satisfying C(h1) h1: Similarly, j j  T T r2 + h1 r2 r2 r2 h1 k~ ; = k~ ; + T k k (0) + C(h1): (55) T T T T T r2=1 h1   r2=1     X X  

Therefore, Trace( T AT WAT ) equals

T 1 T T 1 r2 r2 h1 = (h1) k~ ; + T (h1) k k (0) + O (1) T T T h1= T +1 r2=1 h1= T +1     X X   X T 1 T T 1 ~ r2 r2 g q  = (h1) k ; q 1 h1 (h1) + O q 1 + O(1); T T T j j T h1= T +1 r2=1 h1= T +1 X X   X   (56) where we have used the second order Taylor expansion

h1 q q q k j j k (0) = g h =T + o (=T ) : (57) T j 1j   Using T 1 1 (h ) = !2 (1 + O( )); (58) 1 T T h1= T +1 X and 1 T r r 1 1 k~ 2 ; 2 = k (r; r)dr + O( ); (59) T  T T  T r =1 0 X2   Z 31 we now have

1 T 1 g 1 q 1 T = k(r; r)dr h (h)(1 + o(1)) + O : (60)  T q !2 j j T Z0 T h= T +1   X 1 1 By de…nition,  = E = 0 k(r; r)dr and thus T =  + O T as desired. We next approximate Trace[( A W A )m] for m > 1: The approach is similar R T T  T  to the case m = 1 but notationally more complicated. Let r2m+1 = r1; r2m+2 = r2; and hm+1 = h1: Then

m Trace [( T AT WAT ) ] T m r2j r2j+1 = (r2j 1 r2j)k~ ; T T r ;r ;:::;r =1 j=1 2 4 X2m Y   T T r2 T r4 T r2m m r r + h = :: (h )k~ 2j ; 2j+2 j+1 j  T T r2;r4;:::;r2m=1 h1=1 r2 h2=1 r4 hm=1 r2m j=1   X X X X Y T 1 T h1 0 T T 1 T hm 0 T = + + 0 1    0 1 h1=1 r2=1 h1=1 T r2=1 h1 hm=1 r2m=1 hm=1 T r2m=1 hm X X X X X X X X m @ r r + h A @ A (h )k~ 2j ; 2j+2 j+1 j  T T j=1 Y   = I + II; (61) where

T 1 T h1 0 T T 1 T hm 0 T I = + + 0 1    0 1 h1=1 r2=1 h1=1 T r2=1 h1 hm=1 r2m=1 hm=1 T r2m=1 hm X X X X X X X X m @ r r A @ A (h )k~ 2j ; 2j+2 ; (62) j  T T j=1 Y   and

T 1 T h1 0 T T 1 T hm 0 T II = O + + 80 1    0 1 h1=1 r2=1 h1=1 T r2=1 h1 hm=1 r2m=1 hm=1 T r2m=1 hm < X X X X X X X X m @ A @ A :  hj+1 (hj) j j : (63) j j T 9 j=1 Y  = Here we have used ;

r2j r2j+2 + hj+1 r2j r2j+2  hj+1 k~ ; k~ ; = O j j : (64) T T T T T      

32 A similar result is given and proved in (77) below. The …rst term (I) can be written as

T 1 T T 1 T 0 h1 I = 0 1    h1=1 T r2=1 h1=1 r2=T h1+1 h1=1 T r2=1 X X X X X X @T 1 T T 1 T 0 Ahm m r2j r2j+2 (hj) k~ ; 0 1 T T hm=1 T r2m=1 hm=1 r2m=T hm+1 hm=1 T r2m=1 j=1 X X X X X X Y n  o @ m A r2j r2j+2 = (hj) k~ ; ; (65)    T T  h ;r h ;r j=1 X X1 2 mX2m Y n  o T 1 T T 1 T where is one of the three choices ; ; hj ;r2j hj =1 T r2j =1 hj =1 r2j =T hj +1 0 hj and is the summation over all possible combinations of hj =1P T r2j =1  P P P P m Ph ;r Ph ;r : TheP 3 summands in (65) can be divided into two groups 1 2    m 2m with the …rst group consisting of the summands all of whose r indices run from 1 to P P  T and the second group consisting of the rest. It is obvious that the …rst group can be written as m r2j r2j+2 (hj) k~ ; : 0 1 T T h j=1 r X Y X n  o The dominating terms@ in the secondA group are of the forms

T 1 T T 1 T T 1 T m r2j r2j+2 (hj) k~ ; ;       T T hj =1 T r2=1 hk=1 T r2k=T hk+1 hm=1 T r2m=1 j=1 X X X X X X Y n  o or

T 1 T T 1 T T 1 hm m r2j r2j+2 (hj) k~ ; ;       T T hj =1 T r2=1 hk=1 T r2k=T hk+1 hm=1 T r2m=1 j=1 X X X X X X Y n  o both of which are bounded by

T 1 T T 1 T 1 hm m ~ r2j r2j+2 (hj) hk k ;       j j j j T T h1=1 T r2=1 hk=1 T hm=1 T r2m=1 j=1 j=k;m X X X X X Y 6 Y   m 1 m 2 ~ r2 r4 sup k ; (hj) (hk) hk ;  r T T 0 j j1 0 j j j j1 4 h h  X   Xj Xk @ A @ A using the same approach as in (43). Approximating the sum by integral and noting that the second group contains (m 1) terms which are of the same orders of mag- nitude as the above typical dominating terms, we conclude that the second group is

33 m 2 m 2 1 of order O mT 4 0 k(v)dv uniformly over m: As a consequence,    R  m 1 m 2 r2j r2j+2 m 2 I = (hj) k~ ; + O mT 4 k(v)dv ; 0 1 T T h j=1 r ( 0 ) X Y X n  o  Z  @ A (66) uniformly over m: The second term (II) is easily shown to be of smaller order than the …rst term (I). Therefore

m Trace [( T AT WAT ) ] m 1 m 2 r r = (h) k~ 2j ; 2j+2 + O mT m 2 4 k (v)dv ; (67)  T T  h ! r ( 0 ) X X n  o  Z  and

m 1 m 2 m m m;T = 2 (m 1)!T ! Trace [( T AT WAT ) ] T 1 m 2  r2j r2j+2 m = 2m 1(m 1)! T m k~ ; + O 4 k (v)dv  T T T 2  ( r " 0 #) X    Z  m 1 1 m 2 m 1 m = 2 (m 1)! k(j; j+1)djdj+1 + O 4 k(v)dv 8  T 2 9 j=1 0 " 0 #

 = [j1; j1; j2; j2; jk; jk] (71)             m1 times m2 times mk times

| {z } | {z } | {z } k for some integer k, sequence jk such that j1 > j2 > > jk and m = miji: f g    i=1 P

34 Combining the preceding formula with part (b) gives

1 m 2 2m 1 m! = + O 4 k (v)dv m;T m T 2  m !m ! m ! ( 0  1 2 k )  Z  X    1 m 2 22m 1m! = + O 4 k (v)dv (72) m T 2  (  Z0  ) uniformly over m; where the last line follows because 1 < 2m:  m1!m2! mk!  P Lemma 10 Let A2 hold. If  and T such that =T q 0, then (a) ! 1 ! 1 !

T 1  q 1  T =  h (h)(1 + o(1)) + O + o ; (73) T q!2 j j T T q T h= T +1   X   (b) 1 1 1  = 2 k (r; s) 2 drds + O ; (74) 2;T  T Z0 Z0   (c) for m = 3 and 4; 

1 m 1 1  = O  q + O : (75) m;T T     

Proof of Lemma 10. We have proved (73) in the proof of Lemma 9 as equation (60) holds for both …xed  and increasing : It remains to consider m;T for m = 2; 3, 2 4 2 and 4: We …rst consider 2;T = 2T !T Trace ( T AT WAT ) : As a …rst step, we have  h i 2 Trace ( T AT WAT ) h r2 r3i r4 r1 = k~ ; k~ ; (r r ) (r r )  T T  T T 1 2 3 4 r ;r ;r ;r 1 X2 3 4 n    o T T r2 T T r4 r r + h r r + h = k~ 2 ; 4 2 k~ 4 ; 2 1 (h ) (h )  T T  T T 1 2 r2=1 h1=1 r2 r4=1 h2=1 r4      X X X X T 1 T h1 0 T T 1 T h2 0 T = + + 0 1 0 1 h1=1 r2=1 h1=1 T r2=1 h1 h2=1 r4=1 h2=1 T r4=1 h2 X X X X X X X X @ r r + h r r +Ah@ A k~ 2 ; 4 2 k~ 4 ; 2 1 (h ) (h )  T T  T T 1 2      := I1 + I2 + I3 + I4; (76)

35 where

T 1 T 1 T h1 T h2 r r + h r r + h I = k~ 2 ; 4 2 k~ 4 ; 2 1 (h ) (h ) 1  T T  T T 1 2 h =1 h =1 r =1 r =1 X1 X2 X2 X4     

T 1 T h1 0 T r r + h r r + h I = k~ 2 ; 4 2 k~ 4 ; 2 1 (h ) (h ) 2  T T  T T 1 2 h1=1 r2=1 h2=1 T r4=1 h2      X X X X 0 T T h1 T h2 r r + h r r + h I = k~ 2 ; 4 2 k~ 4 ; 2 1 (h ) (h ) 3  T T  T T 1 2 h1=1 T r2=1 h1 r2=1 r4=1      X X X X and

0 T 0 T r r + h r r + h I = k~ 2 ; 4 2 k~ 4 ; 2 1 (h ) (h ): 4  T T  T T 1 2 h1=1 T r2=1 h1 h2=1 T r4=1 h2      X X X X We now consider each term in turn. Note that

T T h2 T 1 k r4 h2 1 k r4 1 k r4 h2 k = k = k + O j j ; T  T T  T T  T T k=1   k=1 h2   k=1     X X X and r2 r4 h2 r2 r4  h2 k k = O j j ; T T T       we have

r r + h k~ 2 ; 4 2  T T   r2 r4 r2 r4 h2 r2 r4 h2 = k~ ; + k k + O(j j) T T T T T       r2 r4  h2 = k~ ; + O j j : (77)  T T T     Similarly r4 r2 + h1 r4 r2  h1 k~ ; = k~ ; + O j j : (78)  T T  T T T      

36 It follows from (77) and (78) that

T 1 T 1 T 1 T 1 r r + h r r + h I = k~ 2 ; 4 2 k~ 4 ; 2 1 (h ) (h ) 1  T T  T T 1 2 h =1 h =1 r =1 r =1 X1 X2 X2 X4      T 1 T 1 + O [T ( h1 + h2 ) + h1h2 ] (h1) (h2) 0 j j j j j j j j1 h =1 h =1 X1 X2 T @1 T 1 T 1 T 1 A r r + h r r + h = k~ 2 ; 4 2 k~ 4 ; 2 1 (h ) (h ) + O(T )  T T  T T 1 2 h =1 h =1 r =1 r =1 X1 X2 X2 X4      T 1 T 1 T 1 T 1 r r r r = k~ 2 ; 4 k~ 4 ; 2 (h ) (h ) + O(T )  T T  T T 1 2 h =1 h =1 r =1 r =1 X1 X2 X2 X4 n    o T 1 T 1 T 1 T 1 r2 r4  ( h1 + h2 ) + O k~ ; j j j j (h1) (h2) 8 T T T j j9 h =1 h =1 r =1 r =1 < X1 X2 X2 X4     = T 1 T 1 T 1 T 1 r r r r = : k~ 2 ; 4 k~ 4 ; 2 (h ) (h ) + O (T ) :;  T T  T T 1 2 h =1 h =1 r =1 r =1 X1 X2 X2 X4 n    o Following the same procedure, we can show that

T 1 T 0 T r r r r I = k~ 2 ; 4 k~ 4 ; 2 (h ) (h ) + O (T ) ; 2  T T  T T 1 2 h1=1 r2=1 h2=1 T r4=1 X X X X n    o 0 T T T r r r r I = k~ 2 ; 4 k~ 4 ; 2 (h ) (h ) + O (T ) ; 3  T T  T T 1 2 h1=1 T r2=1 r2=1 r4=1 X X X X n    o and

0 T 0 T r r r r I = k~ 2 ; 4 k~ 4 ; 2 (h ) (h ) + O (T ) : 4  T T  T T 1 2 h1=1 T r2=1 h2=1 T r4=1 X X X X n    o In consequence,

2 Trace ( T AT WAT )

h i T 1 2 r r 2 = k~ 2 ; 4 (h ) + O (T ) ; (79)  T T 1 r2;r4 h=1 T ! X n  o X and

2 4 2 2;T = 2T !T T race ( T AT WAT ) 1 1  2h 1 i = 2 k (r; s) drds + O : (80)  T Z0 Z0    37 So for the …rst order power kernels, 2 1  = + O ; 2;T  + 1 T   and for the second order power kernels,

 1=2 1 1  = 2 + O + ; 2;T 2g  T     where the last line uses equation (83) below. The proof for m;T for m = 3; 4 is essentially the same except that we use Lemma 7(a) or 1 (1 x) dx = O (1=) and Lemma 8 to obtain the …rst term 0 1=q m 1 O  :RDetails are omitted.    9.2 Proofs of the Main Results Proof of Theorem 1. First we consider the second order power kernels. Combining Lemma 7(a) with Lemma 8, we have

(m 1)=q m = O  : j j   As a consequence,

1=2 F(z) = P (W (1) + ) < z n 1 o = G ( z2) + G ( z2 )z4 + O( 2=q); (81)   2 00  2 where 1 1 1  = E = k(r; r)dr = 1 k(r s)drds;  Z0 Z0 Z0 and

1 1 2 1 1 2 2 = 2 k(r s)drds 4 k(r s)k(r q) + 2 k (r s)drds:  Z0 Z0  Z Z0 Z0 We proceed to approximate  and 2 as  : First, we have ! 1 1 1 k(r s)drds 0 0 Z Z 1 r 1 r = 2 k(r s)ds dr = 2 k()d dr 0  0  0  0  Z 1 Z 1 1 Z Z = 2 k()dr d = 2 (1 )k()d Z0 Z  Z0  1=2 = + O (1=) ; g  

38 by Lemma 7(b). Second, it follows from Lemma 7(a) that

1 1 1 k(r s)k(r q)dsdq dr Z0 Z0 Z0  1 r 2 1 2 = k(s)ds dr k(s)ds = O(1=): (82) 0 r 1  1 Z Z  Z  As a consequence,

1 1 2 1 2 = 2 k (r s)drds + O (83)   Z0 Z0    1=2 1 = 2 + O : (84) 2g      Then

2 2 2 1 2 4 1 F(z) = G(z ) + G0 (z )( 1) z + G00(z )z 2 + O  2     1=2 1=2 2 2 2  2 4  1 = G(z ) G0 (z )z + G00(z )z + O  g  2g        1=2 2  2 4 2 2 1 = G(z ) + G00(z )z p2G0 (z )z + O 2g        2 1=q  2=q  = G(z ) + cq q(G; z) + O  (85) L   where the O ( ) term holds uniformly for any z [Ml;Mu] where 0 < Ml < Mu < : For the case with the …rst order power kernels,2 we have, after some brute force1 calculations,

1 1   = 1 k(r s)drds =  + 2 Z0 Z0 2 1 = + O : 2  + 1 2   So 1 F (z) = G ( z2) + G ( z2)z4 + O 1=2   00   + 1 2 2 2 2 2 4 1 2 = G(z ) G00(z )z + G00(z )z + O 1=   + 2   + 1 2 2 4 2 2 1 2  = G(z ) + G00(z )z 2G0 (z )z + O 1=    2  1=q  2=q  = G(z ) + cq q(G; z) + O  (86) L   as desired.

39 Proof of Corollary 2. We focus on the case with the second order power kernels as the proof for the …rst order power kernels is similar. Using a Taylor series expansion, we have

F (z ;) 1=2 2  1 2 4 2 2 1 = D(z ) + D00(z )z D0(z )z + O ; g p2 ; ; ; ;        1=2 2  1 2 4 2 2 2 2 2 1 = D(z ) + D00(z )z D0(z )z + D0(z ) z z + O ; g p2 ;         (87) that is,

1=2  1 2 4 2 2 2 2 2 1 0 = D00(z )z D0(z )z + D0(z ) z z + O : (88) g p2 ;         So

1=2 2 2  2 1 1 2 4 2 2 1 z = z D0(z ) D00(z )z D0(z )z + O : (89) ; g p2          Now 1=2 z=2 z e 1 1 z 3 1 z D0(z) = ;D00(z) = p2ze 2 z 2 p2e 2 ; (90) (1=2)p2 4pz2   and thus

1 1=2 z=2 D00(z) 1 1 z 3 1 z z e p 2 2 p 2 = 2 2ze z 2e D0(z) 4pz (1=2)p2 !   1 3 2 = 3 2pz 2z : (91) 4z 2   Hence

1=2 3  1 2z 2z 1 z2 = z2 + z2 z4 + O ; g 4z3 p2          1=2 p2 p2 1 = z2 + 1 + z2 + z4 + O ; (92) g 4 4    ( ! )  

40 from which we get 1=2  1=2 p2 p2 1 z = z2 + 1 + z2 + z4 + O ; g 4 4  (   " ! #  ) 1=2  1=2 p2 p2 1 = z 1 + 1 + + z2 + O g 4 4  (   " ! #  ) 1  1=2 p2 p2 1 = z 1 + 1 + + z2 + O 2 g 4 4  (   " ! #)   1  1=2 p2 p2 1 = z + 1 + z + z3 + O ; (93) 2 g 4 4    ( ! )   as stated.

Proof of Corollary 3. Again, we focus on the case with the second order power kernels as the proof for the …rst order power kernels is similar. For notational conve- nience, let  1=2 p2 p2 f(z2 ) = 1 + z2 + z4 ; g 4 4   ( ! ) and then f(z2 ) 1 z2 = z2 + + O : (94) ; p    We have 2 1 EG(z ) ; 1=2 2 2 2 1 2  1 2 f(z ) 2 1 2 = 1 G z + f(z ) G00 z + z + f(z ) p g p2  p p     (     ) 1=2 2 2  2 f(z ) 2 f(z ) 1 + G0 z + z + + O g  p p           1=2 2 2 1 2  1 2 4 2 2 1 = 1 G(z ) G0 (z ) f(z ) G00(z )z G0 (z )z + O  p g p2          1=2 2  2 p2 2 p2 4 = 1 G(z ) G0 (z ) 1 + z + z g  4 4   ( " ! # 1 2 4 2 2 1 + G00(z )z G0 (z )z + O p2       1=2 2  p2 2 4 p2 2 4 p2 2 2 1 = 1 G(z ) G0 (z )z + G00(z )z + G0 (z )z + O g 4  2  4     !   1=2 2  p2 1 2 4 2 4 1 2 2 1 = 1 G(z ) G0 (z )z + G00(z )z + G0 (z )z + O : g 2 2   2         (95)

41 Note that 2 j j 1 z=2 1  =2 2=2 z 2 e G0 (z) = e ; (96)  j! (j + 1=2)2j+1=2 j=0  X and

2 j j 1 z=2 j 1 z=2 1  =2 2=2 1 1 z 2 e 1 z 2 e G00 (z) = e (j )  j! 2 z (j + 1=2)2j+1=2 2 (j + 1=2)2j+1=2 j=0  ! X 2 j j 1=2 z=2 1 1 1  =2 2=2 z e j = G0 (z) + e 2z 2  j! (j + 1=2)2j+1=2 z j=0    X 1 1 = G0 (z) + 1 + K (z) ; (97) 2  z   so 1 1 G (z2 )z4 + G (z2 )z4 + G (z2 )z2 = z4 K z2 ; (98) 2 0 00 2 0  and  2 2 4 2 1 1 1 EG(z ) = 1 G(z ) cqz K z + O ; (99) ; p    completing the proof of the corollary. 

Proof of Theorem 4. First, since G( ) is a bounded function, we can rewrite (16) as 

1=2 2 P (W (1) + )  z = lim EG(z )1   B  B fj j  g n o !1 1 1 (m ) 2 m 2m = lim E G (z ) ( ) z 1   B B m! fj j  g !1 m=1 X 1 1 (m) 2 2m = lim G (z ) mz 1   B ; (100) B m! fj j  g !1 m=1 X 1 (m) 2 2m where the last line follows because the in…nite sum m1=1 m! G (z ) mz con- 2 verges uniformly to G(z ) when   B. The uniformity holds because j j  P G( ) is in…nitely di¤erentiable with bounded derivatives. Using Lemma 8, we have, for some constant C > 0

1 1 (m) G ( z2) z2m m!   m m=1 X 1 m 1 1 1 2m 1 1 2m 4m 4 C z m C z 2 (m 1)! k(v)dv  m! j j  m! m=1 m=1 0 X X Z  1 m 1 C 1 1 2 m = 16z k(v)dv < ; (101) 16 m 1 m=1 Z0  X 

42 1 2 provided that 0 k(v)dv < 1= 16z . As a consequence, the limit limB can be moved inside the summation sign in (100), giving !1 R  1=2 1 1 (m) 2 2m P (W (1) + )  z = G (z ) mz ; (102)   m!  m=1 n o X 1 2 when 0 k(v)dv < 1= 16z : Second, it follows from (25) that R  ^ 2 P pT 0 =!^ z = E G(z &T ) + O(1=T ): (103)  n   o But  1 1 (m) E G (z2& ) = G ( z2) z2m; (104)  T m!  T m;T m=1  X 2 where the right hand side converges to E G(z &T ) uniformly over T because (i) 1 m 2 22m 1m ! = + O 4 k (v)dv m;T m T 2  (  Z0  ) uniformly over m by Lemma 9, (ii) G(m) ( ) is a bounded function and (iii)   1 m 2 1 1 22m 1m! z2m 4 k (v)dv ; m! T 2  m=1 0 X  Z  1 m 2 1 1 2 m = 16z k(v)dv < (105) 32T 2 1 m=1 Z0  X  1 2 when 0 k(v)dv < 1= 16z : Therefore

R  1 1 (m) 2 2m 1 P pT ^ 0 =!^ z = G (T z ) m;T z + O( ): (106)  m!  T n   o m=1 X It follows from (102) and (106) that 1=2 P pT ^ 0 =!^ z P (W (1) + )  < z   n   o n o 1 1 (m) 2 2m 1 1 (m) 2 2m 1 = G (T z ) m;T z G (z ) mz + O( ) m! m! T m=1 m=1 X X 1 1 (m) 2 2m 1 1 (m) 2 2m 1 = G (z ) m;T z G (z ) mz + O( ) (107) m! m! T m=1 m=1 X X 1 1 (m) 2 2m 1 = G (z )( m;T m) z + O( ) m!  T m=1 X 1 m 2 1 1 1 = O 22m 1 4 k (v)dv z2m + O T 2  T ( m=1 0 ) X  Z    1 = O ; T  

43 (j) 2 (j) 2 1 where the second equality holds because G (T z ) = G (z )+O T ; Lemma 9 holds and 1 G(m) ( z2) z2m < uniformly over T; and the last equality m1=1 m!  T m;T  follows from (105). This completes the proof1 of Theorem 4. P Proof of Theorem 5. It follows from Lemma 10 that when  , ! 1 1=q 2=q 1 2;T = 2;T = 2cq + O  + T ; (108)   2=q 1 3;T = 3;T = O  + O T ; (109)

2  2=q  1 4;T = 4;T + 32;T = O  + O T ; (110) and    T 1 q 2 1 T =  g T ! h(h)(1 + o(1)) + O T : (111) T h= T +1  X  Thus, as  ; ! 1 ^ 2 1 FT;(z) = P pT 0 =!^ z = E G(z &T ) + O(T )  n   o = G ( z2) + c G ( z2)z4 1=q+ O  2=q + O T 1  T q 00 T 2 2 2   2 4 1=q = G(z ) + G0 (z )z (T ) + cqG00(T z )z     2=q 1 + O  + O T ; (112)   using (108) to (111). But 

2 2 2 2 2 G(z ) = G(z ) + G0 (z )z ( 1) + O ( 1)  2 1=q 2 2 1=q  2=q  = G(z ) +  G0 (z )z 2 cq + O  (113) and   2 2 G0 (z )z (T )  T 1 2 1=q 2 q 2 q 1 = G0 (z ) + O  z g T ! h (h)(1 + o(1)) + O T  T h=1 T ! h  i X T 1   q 2 q 2 2 2=q 1 = g T ! h (h)G0 (z )z (1 + o(1)) + O  + O T ; (114) T  h=1 T  X    so that 2 1=q 2 4 1=q 2 2 2 2 q FT;(z) = G(z ) +  cq G00(z )z 2 G0 (z )z gd T G0 (z )z T    2=q  1 q  + O  + O T + o T   2  1=q 2 2 q = G(z ) + cq q(G ; z) gd T G0 (z )z T L  2=q 1 q + O  + O T + o T  (115)     44 as desired.

Proof of Corollary 6. Part (a) Using Theorem 5, we have, as 1=+1=T +=T q 0 ! 2 2 4 1=q 2 2 1=q FT;0(z ;) = D(z ) + D00(z )z 2 D0(z )z cq ; ; ; ; ; 2 h 2 q 1 2=q i q d T D0(z )z T + O T +  + o T ; ; 2 2  q 1 2=q q = F (z ;) d T D0(z )z T + O T +  +o T ; ; 2 2 q  1 2=q  q = 1 d T D0(z )z T + O T +  + o T : (116) ; ;   Re-arranging the above equation gives  

2 2 q 1 2=q q 1 FT;0(z ;) = d T D0(z )z T + O T +  + o T : (117)   2   Part (b) Plugging z ; into (29) yields

^ 2 pT ( 0) 2 P z ; 0 !^  1

@ 2 1=q A = 1 G(z ) cq L (G; z ;) ; 2 2 q 1 2=q q + d T G0 (z ;)z ; T + O T +  + o T 2 4 2 1=q  = 1 G(z ) cqz K z   2 2 q 1 2=q q + d T G0 (z )z T + O T +  + o T (118)   where the last equality follows from the same proof as Corollary 3.

45 References

[1] Andrews, D. W. K. (1991): “Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation,” Econometrica, 59, 817–854.

[2] Andrews, D. W. K. and J. C. Monahan (1992): “An Improved Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimator,” Econometrica, 60, 953–966.

[3] de Jong, R. M. and J. Davidson (2000): “Consistency of Kernel Estimators of Heteroskedastic and Autocorrelated Covariance Matrices,”Econometrica, 68, 407–424.

[4] den Haan, W. J. and A. Levin (1997): “A Practitioners Guide to Robust Co- variance Matrix Estimation,” in G. Maddala and C. Rao (eds), Handbook of : Robust Inference, Volume 15, Elsevier, New York.

[5] Hansen, B. E. (1992): “Consistent Covariance Matrix Estimation for Dependent Heterogenous Processes,”Econometrica, 60, 967–972.

[6] Hong, Y. and J. Lee (2001): “-based Estimation for Heteroskedastic and Autocorrelation Consistent Variance-covariance Matrices,”Working paper, Cornell University.

[7] Jansson, M. (2002): “Consistent Covariance Matrix Estimation for Linear Processes,”Econometric Theory, 18, 1449–1459.

[8] — — — (2004): “On the Error of Rejection Probability in Simple Autocorrelation Robust Tests,”Econometrica, 72, 937-946.

[9] Kiefer, N. M., T. J. Vogelsang and H. Bunzel (2000): “Simple Robust Testing of Regression Hypotheses,”Econometrica, 68, 695–714.

[10] Kiefer, N. M. and T. J. Vogelsang (2002a): “Heteroskedasticity-autocorrelation Robust Testing Using Bandwidth Equal to Sample Size,” Econometric Theory, 18, 1350–1366.

[11] — — — (2002b): “Heteroskedasticity-autocorrelation Robust Standard Errors Using the Bartlett Kernel without Truncation,”Econometrica, 70, 2093–2095.

[12] — — — (2005): “A New Asymptotic Theory for Heteroskedasticity- Autocorrelation Robust Tests,”Econometric Theory, 21, 1130-1164.

[13] Newey, W. K. and K. D. West (1987): “A Simple, Positive Semi-De…nite, Het- eroskedasticity and Autocorrelation Consistent Covariance Matrix,”Economet- rica, 55, 703–708.

[14] — — — (1994): “Automatic Lag Selection in Covariance Estimation,”Review of Economic Studies, 61, 631–654.

46 [15] Phillips, P. C. B. (1993): “Operational Algebra and Regression t-Tests,”In Peter C.B. Phillips (ed.), Models, Methods, and Applications of Econometrics. Basil Blackwell.

[16] Phillips, P. C. B. (2005): “HAC Estimation by Automated Regression,”Econo- metric Theory, 21, 116-142.

[17] Phillips, P. C. B., Y. Sun and S. Jin (2006): “Spectral and Robust Hypothesis Testing Using Steep Origin Kernels Without Truncation, ” International Economic Review, 21, 837-894.

[18] — — — (2007): “Long-Run Variance Estimation and Using Sharp Origin Kernels with No Truncation,”Journal of Statistical Planning and Inference, 137, 985-1023.

[19] Ray. S. and N. E. Savin (2008): “The performance of Heteroskedasticity and Autocorrelation Robust Tests: a Monte Carlo Study with an Application to the Three-factor Fama-French Asset-pricing Model, ”Journal of Applied Economet- rics, 23, 91-109.

[20] Ray, S., N. E. Savin, and A. Tiwari (2009): “Testing the CAPM revisited,” Working paper, Department of Finance, University of Iowa.

[21] Sul, D., P. C. B. Phillips and C-Y. Choi (2005): “Prewhitening Bias in HAC Estimation,”Oxford Bulletin of Economics and Statistics, 67, 517-546.

[22] Sun, Y., P. C. B. Phillips, and S. Jin (2008): “Optimal Bandwidth Selection in Heteroskedasticity-Autocorrelation Robust Testing,”Econometrica, 76, 175-194.

[23] Sun, Y. (2004): “Estimation of the Long-run Average Relationship in Nonsta- tionary Panel Time Series,”Econometric Theory, 20, 1227–1260

[24] Sun, Y. (2009): “Robust Multivariate Trend Inference in the Presence of Non- parametric Autocorrelation.” Working paper, Department of Economics, UC San Diego.

[25] Velasco, C. and P. M. Robinson (2001): “Edgeworth Expansions for Spectral Density Estimates and Studentized Sample Mean,” Econometric Theory, 17, 497–539.

[26] Vogelsang, T. J. (2003): “Testing in GMM Models Without Truncation,”Chap- ter 10 of Advances in Econometrics Volume 17, Maximum Likelihood Estimation of Misspeci…ed Models: Twenty Years Later, ed. by T. B. Fomby and R. C. Hill, Elsevier Science, 199–233.

47 http://cowles.econ.yale.edu/