<<

Goodness-of-Fit Tests for Parametric Regression Models

Jianqing Fan and Li-Shan Huang

Several new tests are proposed for examining the adequacy of a family of parametric models against large nonparametric alternatives. These tests formally check if the bias vector of residuals from parametric Ž ts is negligible by using the adaptive Neyman test and other methods. The testing procedures formalize the traditional model diagnostic tools based on residual plots. We examine the rates of contiguous alternatives that can be detected consistently by the adaptive Neyman test. Applications of the procedures to the partially linear models are thoroughly discussed. Our simulation studies show that the new testing procedures are indeed powerful and omnibus. The power of the proposed tests is comparable to the F -test even in the situations where the F test is known to be suitable and can be far more powerful than the F -test statistic in other situations. An application to testing linear models versus additive models is also discussed. KEY WORDS: Adaptive Neyman test; Contiguous alternatives; Partial ; Power; thresholding.

1. INTRODUCTION independent observations from a population, Parametric linear models are frequently used to describe Y m4x5 ˜1 ˜ N 401‘ 251 (1.1) the association between a response variable and its predic- D C tors. The adequacy of such parametric Ž ts often arises. Con- ventional methods rely on residual plots against Ž tted values where x is a p-dimensional vector and m4 5 is a smooth regres- sion surface. Let f 4 1 ˆ5 be a given parametri¢ c family. The or a covariate variable to detect if there are any system- ¢ atic departures from zero in the residuals. One drawback of null hypothesis is the conventional methods is that a systematic departure that H 2 m4 5 f 4 1 ˆ5 for some ˆ0 (1.2) is smaller than the noise level cannot be observed easily. 0 ¢ D ¢ Recently many articles have presented investigations of the use of nonparametric techniques for model diagnostics. Most of Examples of f 4 1 ˆ5 include the widely used linear model T ¢ them focused on one-dimensional problems, including Dette f 4x1 ˆ5 x ˆ and a logistic model f 4x1ˆ5 ˆ0=41 TD D C (1999), Dette and Munk (1998), and Kim (2000). The book ˆ1 exp4ˆ2 x55. by Hart (1997) gave an extensive overview and useful refer- The is often vague. Depending on the ences. Chapter 5 of Bowman and Azzalini (1997) outlined the situations of applications, one possible choice is the saturated work by Azzalini, Bowman, and Härdle (1989) and Azzalini nonparametric alternative and Bowman (1993). Although a lot of work has focused on H 2 m4 5 f4 1 ˆ5 for all ˆ1 (1.3) the univariate case, there is relatively little work on the multi- 1 ¢ 6D ¢ ple regression setting. Hart (1997, section 9.3), considered two approaches: one is to regress residuals on a scalar function of and another possible choice is the partially linear models (see the covariates and then apply one-dimensional goodness-of-Ž t Green and Silverman 1994 and the references therein) tests; the second approach is to explicitly take into account the T multivariate nature. The test by Härdle and Mammen (1993) H 2 m4x5 f 4x 5 x2 ‚ 1 with f 4 5 nonlinear1 (1.4) 1 D 1 C 2 ¢ was based on the L2 error criterion in d-dimensions, Stute, González Mantoiga, and Presedo Quindimil (1998) investi- where x1 and x2 are the covariates. Traditional model diagnos- gated a marked empirical process based on the residuals and tic techniques involve plotting residuals against each covari- Aerts, Claeskens, and Hart (1999) constructed tests based ate to examine if there is any systematic departure, against on orthogonal series that involved choosing a nested model the index sequence to check if there is any serial correlation, sequence in the bivariate regression. and against Ž tted values to detect possible , In this article, a completely different approach is proposed among others. This amounts to informally checking if biases and studied. The basic idea is that if a parametric family of are negligible in the presence of large stochastic noises. These models Ž ts adequately, then the residuals should have techniques can be formalized as follows. Let ˆ be an esti- O T nearly zero bias. More formally, let 4x11 Y151 : : : 1 4xn1 Yn5 be mate under the null hypothesis and let ˜ 4˜11 : : : 1 ˜n5 be the resulting residuals with ˜ Y f4OxD1 ˆ5O. UsuallyO , ˆ Oi D i ƒ i O O converges to some vector ˆ0. Then under model (1.1), con- n Jianqing Fan is Professor, Department of , Chinese University ditioned on 8xi9i 1, ˜ is nearly independently and normally D O of Hong Kong and Professor, Department of Statistics, University of distributed with vector ‡ 4‡ 1 : : : 1 ‡ 5T , where ‡ North Carolina, Chapel Hill, NC 27599-3260 (E-mail: [email protected]). 1 n i m4x 5 f 4x 1 ˆ 5. Namely, any ŽDnite-dimensional componentDs Li-Shan Huang is Assistant Professor, Department of , i ƒ i 0 University of Rochester Medical Center, Rochester, NY 14642 (E-mail: [email protected] r.edu). Fan’s research was partially supported by NSA 96-1-0015 and NSF grant DMS-98-0-3200. Huang’s work was partially sup- ported by NSF grant DMS-97-10084. The authors thank the referee and the © 2001 American Statistical Association associate editor for their constructive and detailed suggestions that led to Journal of the American Statistical Association signiŽ cant improvement in the presentation of the article. June 2001, Vol. 96, No. 454, Theory and Methods 640 Fan and Huang: Regression Goodness-of-Fit 641 of ˜ are asymptotically independent and normally distributed there is only one predictor variable (i.e., p 1), such ordering O D with mean being a subvector of ‡. Thus, the problem becomes is straightforward. However, it is quite challenging to order a multivariate vector. A method based on the principal compo- H 2 ‡ 0 versus H 2 ‡ 01 (1.5) nent analysis is described in Section 2.2. 0 D 1 6D When there is some information about the alternative based on the observations ˜. Note that ˜i will not be inde- hypothesis such as the in (1.4), it is pendent in general after estimatinO g ˆ, buOt the dependence is sensible to order the residuals according to the covariate x1. practically negligible, as will be demonstrated. Thus, the tech- Indeed, our simulation studies show that it improves a great niques for independent samples such as the adaptive Neyman deal the power of the generic ordering procedure outlined in test in Fan (1996) continue to apply. Plotting a covariate X j the last paragraph improved a great deal. The parameters ‚2 against the residual ˜ aims at testing if the biases in ˜ along 1=2 O O in model (1.4) can be estimated easily at the nƒ rate via the direction Xj are negligible, namely whether the a difference based estimator, for example, that of Yatchew 84Xj 1 ‡j 59 is negligible in the presence of the noise. (1997). Then the problem is, heuristically, reduced to the one- In testing the signiŽ cance of the high-dimensional prob- dimensional nonparametric setting; see Section 2.4 for details. lem (1.5), the conventional likelihood ratio statistic is not pow- Wavelet transform is another popular family of orthogo- erful due to noise accumulation in the n-dimensional space, nal transforms. It can be applied to the testing problem (1.5). as demonstrated in Fan (1996). An innovative idea proposed For comparison purposes, wavelet thresholding tests are also by Neyman (1937) is to test only the Ž rst m-dimensional sub- included in our simulation studies. See Fan (1996) and problem if there is prior that most of nonzero elements lie Spokoiny (1996) for various discussions on the properties of on the Ž rst m dimension. To obtain such a qualitative prior, the wavelet thresholding tests. a Fourier transform is applied to the residuals in an attempt This article is organized as follows. In Section 2, we pro- to compress nonzero signals into lower frequencies. In test- pose a few new tests for the testing problems (1.2) ver- ing goodness of Ž t for a distribution function, Fan (1996) sus (1.3) and (1.2) versus (1.4). These tests include the proposed a simple data-driven approach to choose m based adaptive Neyman test and the wavelet thresholding tests in on power considerations. This results in an adaptive Neyman Sections 2.1 and 2.3, respectively. Ordering of multivari- test. See Rayner and Best (1989) for more discussions on ate vectors is discussed in Section 2.2. Applications of the the Neyman test. Some other recent work motivated by the test statistics in the partially linear models are discussed in Neyman test includes Eubank and Hart (1992), Eubank and Section 2.4 and their extension to additive models is outlined LaRiccia (1992), Kim (2000), Inglot, Kallenberg, and Ledwina brie y in Section 2.5. Section 2.6 outlines simple methods for (1994), Ledwina (1994), Kuchibhatla and Hart (1996), and estimating the residual ‘ 2 in the high-dimensional Lee and Hart (1998), among others. Also see Bickel and setting. In Section 3, we carry out a number of numerical stud- Ritov (1992) and Inglot and Ledwina (1996) for illuminating ies to illustrate the power of our testing procedures. Technical insights into nonparametric tests. In this article, we adopt the proofs are relegated to the Appendix. adaptive Neyman test proposed by Fan (1996) to the testing problem (1.5). It is worth noting that the test statistic stud- 2. METHODS AND RESULTS ied by Kuchibhatla and Hart (1996) looks similar to that of Fan (1996), but they are decidedly different. See the second As mentioned in the Introduction, the hypothesis testing paragraph of Section 3 for further discussions. The adaptive problem (1.2) can be reduced to the high-dimensional prob- Neyman test of Kuchibhatla and Hart (1996) also will be lem (1.5) based on the weakly dependent Gaussian random implemented to demonstrate the versatility of our proposed vector ˜. The adaptive Neyman test and the wavelet thresh- O idea. We would note that the adaptive Neyman tests in Fan olding test proposed in Fan (1996) are adapted for the cur- (1996) and Kuchibhatla and Hart (1996) were proposed for rent regression setting. The case of a multiple linear model, independent samples. Herein, the adaptive Neyman test of Fan f 4x1 ˆ5 xT ˆ, is addressed here and its extension to a general D (1996) will be justiŽ ed to remain applicable for weakly depen- parametric model can be treated similarly. dent ˜ in the current regression setting; see Theorem 1. TheO power of the adaptive Neyman test depends on the 2.1 The Adaptive Neyman Test smoothness of 8‡ 9n as a function of i. Let us call this i i 1 Assume a parametric linear model under the null hypothe- function ‡4 5, namelyD ‡4i=n5 ‡ . The smoother the func- i sis, tion ‡4 5, th¢ e more signiŽ canDt are the Fourier coefŽ cients ¢ Y xT ˆ ˜1 ˜ N 401‘ 251 on the low frequencies and hence the more powerful the D C test will be; see Theorems 2 and 3. When m4 5 is com- and assume ˜ is the resulting residual vector from a least- ¢ O T pletely unknown such as in (1.3), there is no information squares Ž t. Let ˜ü 4˜ü 1 : : : 1 ˜ü 5 be the discrete Fourier O D O1 On on how to make the function ‡4 5 smoother by ordering the transform of the residual vector ˜. More precisely, we deŽ ne residuals. Intuitively, the closer ¢the two consecutive covari- O n ate vectors xi and xi 1, the smaller the difference between C 1=2 x x and x x , and hence the ˜2ü j 1 42=n5 cos42 ij=n5˜i1 m4 i5 f 4 i1 ˆ05 m4 i 15 f4 i 11 ˆ05 O ƒ D O ƒ C ƒ C i 1 D smoother the sequence 8m4xi5 f4xi1 ˆ059 indexed by i. Thus, ƒ n Xn a good proxy is to order the residuals 8˜i9i 1 in a way that the 1=2 D ˜2ü j 42=n5 sin42 ij=n5˜i1 j 11 : : : 1 6n=270 corresponding xi’s are close as a sequence. Note that when O D i 1 O D XD 642 Journal of the American Statistical Association, June 2001

n When n is odd, an additional term ˜nü 41=pn=25 i 1 ˜i is The proof is given in the Appendix. As a consequence of needed. Note that for linear regressioO nDwith an interceptD , Othis Theorem 1, the critical region term is simply zero and hence can be ignored. As mentioneP d in the Introduction, the purpose of the Fourier transform is TAN1 j > log8 log41 59 4j 11 25 (2.5) ƒ ƒ ƒ D to compress useful signals into low frequencies so that the has an asymptotic signiŽ cance level . power of the adaptive Neyman test can be enhanced. Let ‘ 1 1=2 O Assume that ˆ converges to ˆ as n . Then, f4 1 ˆ 5 be a nƒ consistent estimate of ‘ under both the null and O 0 ! ˆ ¢ 0 the alternative hypotheses. Methods for constructing such an is generally the best parametric approximation to the under- lying regression function m4 5 among the candidate models estimate are outlined in Section 2.6. The adaptive Neyman test ¢ statistic is deŽ ned as 8f4 1 ˆ59. Let ‡i m4xi5 f 4xi1 ˆ05 and let ‡ü be the Fourier ¢ D ƒ T transform of the vector ‡ 4‡11 : : : 1 ‡n5 . Then we have the m D 1 2 2 following power result. TANü 1 1 max ˜iü ‘ 1 1 (2.1) D 1 m n 4 O ƒ O µ µ 2m‘ 1 i 1 Theorem 2. Under the assumptions in Theorem 1 and O XD p Condition A4 in the Appendix, if and the null hypothesis is rejected when TANü 1 1 is large. See Fan (1996) for motivations and some optimal properties of the m = = 4 1 2 1 2 2 test statistic. Note that the factor 2‘ in (2.1) is the variance 4log log n5ƒ max mƒ ‡iü 1 (2.6) 1 m n ! ˆ 2 2 µ µ i 1 of ˜ with ˜ N 401‘ 5. Thus, the null distribution of the D X testing procedure depends on the normality assumption. For then the critical regions (2.5) have an asymptotic power 1. nonnormal noises, a reasonable method is to replace the factor 4 2 2 We now give an implication of condition (2.6). Note that 2‘ 1 by a consistent estimate ‘ 2 of Var4˜i 5. This leads to the tesOt statistic O by Parseval’s identity,

m m n n 1 2 2 1=2 2 1=2 2 2 T ü max ˜ü ‘ 0 (2.2) mƒ ‡iü mƒ ‡i ‡iü 0 (2.7) AN12 2 i 1 D ƒ D 1 m n m‘ O ƒ O i 1 i 1 i m 1 µ µ 2 i 1 D D D C O XD X X X p Suppose that the sequence 8‡i9 is smooth so that The null distribution of TANü 1 2 is expected to be more robust against the normality assumption. Following Fan (1996), we n normalize the test statistics as 1 2 2s nƒ ‡iü O4mƒ 5 (2.8) i m 1 D D C T 2 log log nT ü X AN1 j D AN1 j for some s > 0. Then the maximum value of (2.7) over m is p 82 log log n 005 log log log n of order ƒ C n 4s 1 1=44s5 1 2 C 005 log44 59 for j 11 20 (2.3) nƒ ‡i 0 ƒ D i 1 D Under the null hypothesis (1.5) and the independent normal For random designs with Xa density h, model with a known variance, it can be shown that n n 2 2 ‡i 8m4xi5 f 4xi1 ˆ059 P4TAN1 j < x5 exp4 exp4 x55 as n 0 (2.4) i 1 D i 1 ƒ ! ƒ ƒ ! ˆ D D X X See Darling and Erdos (1956) and Fan (1996). However, the n 8m4x5 f 4x1 ˆ 592h4x5dx0 ƒ 0 approximation (2.4) is not so good. Let us call the exact null Z distribution of TAN1 1 under the independent normal model with By Theorem 2, we have the following result. a known variance as J . Based on a million simulations, the n Theorem 3. Under the conditions of Theorem 2, if the distribution of J was tabulated in Fan and Lin (1998). We n smoothness condition (2.8) holds, then the adaptive Neyman have excerpted some of their results and they are presented in test has an asymptotic power 1 for the contiguous alternative Table 1. with We now show that the approximation (2.4) holds for test- ing linear models where 8˜i9 are weakly dependent. For 2 4s=44s 15 2s=44s 15 O 8m4x5 f 4x1ˆ059 h4x5 dx nƒ C 4loglogn5 C cn convenience of technical proofs, we modify the of max- ƒ D imization over m as 611 n=4log log n547. This hardly affects our Z for some sequence with liminf c . theoretical understanding of the test statistics and has little n Dˆ impact on practical implementations of the test statistics. The signiŽ cance of the preceding theoretical result is its adap- tivity. The adaptive Neyman test does not depend at all on the Theorem 1. Suppose that Conditions A1–A3 in the smoothness assumption (2.8). Nevertheless, it can adaptively Appendix hold. Then, under the null hypothesis of (1.2) with 2s=44s 15 s=44s 15 detect alternatives with rates O4nƒ C 4log log n5 C 5, f 4x1 ˆ5 xT ˆ, we have D which is the optimal rate of adaptive testing (see Spokoiny 1996). In this sense, the adaptive Neyman test is truly adap- P4T

Table 1.  Upper Quantile of the Distribution Jn

 n 10 20 30 40 60 80 100 120 140 160 180 200 n 0.01 5078 6007 6018 6022 6032 6037 6041 6041 6042 6042 6047 6043 0.025 4057 4075 4082 4087 4091 4093 4095 4095 4095 4095 4095 4095 0.05 3067 3077 3083 3085 3088 3089 3090 3090 3090 3091 3090 3089 0.10 2074 2078 2081 2082 2085 2085 2086 2087 2086 2087 2087 2086

2.2 Ordering Multivariate Vector In the case of two covariates, Aerts et al. (1999) constructed a statistic, and the power of which depends also on The adaptive Neyman test statistics depend on the order of the ordering of a sequence of models. It seems that ordering the residuals. Thus, we need to order the residuals Ž rst before is needed in the multiple regression setting, whether choosing using the adaptive Neyman test. By Theorem 2, the power of an ordering of residuals or a model sequence. the adaptive Neyman tests depends on the smoothness of the sequence 8m4xi5 f 4xi1 ˆ059 indexed by i. A good ordering 2.3 Hard-Thresholding Test ƒ n should make the sequence 8m4xi5 f 4xi1 ˆ059i 1 as smooth as possible for the given function mƒ so that its Dlarge Fourier The Fourier transform is used in the adaptive Neyman coefŽ cients are concentrated on low frequencies. test. We may naturally ask how other families of orthogo- When the alternative is the partial linear model (1.4), we nal transforms behave. For comparison purposes, we apply the wavelet hard-thresholding test of Fan (1996) that is based can order residuals according to the covariate X1 so that the n on the “extremal phase” family of . See Daubechies sequence 8m4xi5 f 4xi1 ˆ059i 1 is smooth for given m. For saturated nonparametriƒ c alternativD e (1.3), there is little useful (1992) for details on construction of the discrete wavelet trans- information on how to order the sequence. As discussed in form with various wavelets. The Splus WaveThresh package the Introduction, a good strategy is to order the covariates includes many useful routines for wavelets. n Assume that the residuals are ordered. We Ž rst standardize 8xi9i 1 so that they are close to each other consecutively. This probleD m easily can be done for the univariate case (p 1). the residuals as ˜S1 i ˜i=‘ 3, where ‘ 3 is some estimate of ‘ . D Let 8˜ 9 be theO empiricaD O Ol wavelet transforO m of the standard- However, for the case p > 1, there are limited discussions on OW 1 i ordering multivariate observations. Barnett (1976) gave many ized residuals, arranged in such a way that the coefŽ cients at useful ideas and suggestions. One possible approach is to Ž rst low resolution levels correspond to small indexes. The thresh- olding test statistic [equation (17) in Fan 1996] is deŽ ned as assign a score si to the ith observation and then order the observations according to the rank of . Barnett (1976) called si J0 n this “reduced ordering.” 2 2 TH ˜W 1 i ˜W 1 iI4 ˜W 1 i > „5 What could be a reasonable score 8s 9? We may consider D i 1 O C i J 1 O — O — i D D 0C this problem from the viewpoint of principal component (PC) X X 2 analysis (see, e.g., Jolliffe 1986). Let S be the sample covari- for some given J0 and „ 2 log4nan5 with an logƒ 4n . As in Fan (1996), D3 is used, which keepsDthe waveleƒt ance matrix of the covariate vectors 8xi1 i 11 : : : 1 n9. Denote J05 J0 D coefŽ cients in the Ž rst twDopresolution levels intact. Then it by ‹11 : : : 1 ‹p the ordered eigenvalues of S with correspond- T can be shown analogously that T has an asymptotic normal ing eigenvectors Ž11 : : : 1 Žp. Then zi1 k Žk xi is the score for H D distribution under the null hypothesis in Theorem 1, and the the ith observation on the kth sample PC, and ‹k can be inter- preted as the sample variance of 8z 1 : : : 1 z 9. Note that 41 5 critical region is given by 11 k n1 k ƒ x x z Ž z Ž , where x is the sample average. i i1 1 1 i1 p p 1 1 ƒ N D C ¢ ¢ ¢ C N ‘ ƒ 4TH ŒH 5 > êƒ 41 51 Thus a measure of variation of xi may be formed by taking H ƒ ƒ p 1 2 2 1 where ŒH J0 p2= anƒ „41 „ƒ 5 and ‘ H 2J0 s ‹ z2 0 1 3D C 2 C D C i k i1 k p2= anƒ „ 41 3„ƒ 5. The power of the wavelet threshold- D n 1 k 1 C ƒ D ing test can be improved further by using the following tech- X niques owing to Fan (1996). Under the null hypothesis (1.5), We call si the sample score of variation. Also see the maximum of n independent Gaussian noises is on the order Gnanadesikan and Kettenring (1972) for an interpretation of si as a measure of the degree of in uence for the ith observation of 2 log n. Thus, replacing an in the thresholding param- 4 2 on the orientation and scale of the PCs. A different ordering eter „ by min444maxi ˜W 1 i5ƒ 1 logƒ 4n J055 does not alter p O ƒ scheme was given in Fan (1997). the asymptotic null distribution. Then, under the alternative Ordering according to a certain covariate is another sim- hypothesis, the latter is smaller than an and hence the pro- ple and viable method. It focuses particularly on testing if cedure uses a smaller thresholding parameter automatically, the departure from linearity in a particular covariate can be leading to better power of the wavelet thresholding test. explained by chance. It formalizes the traditional scatterplot 2.4 Linear Versus Partially Linear Models techniques for model diagnostics. The approach can be pow- erful when the alternative is the additive models or partially Suppose that we are interested in testing the linear model linear models. See Section 2.5 for further discussions. against the partial linear model (1.4). The generic methods 644 Journal of the American Statistical Association, June 2001 outlined in Sections 2.1–2.3 continue to be applicable. In this the partially linear model is taken into account. This encour- setting, a more sensible ordering scheme is to use the order of aged us to extend the idea to testing a linear model against

X1 instead of the score si given in Section 2.2. Our simulation the additive model studies show that this improves a great deal on the power of Y f 4X 5 f 4X 5 f 4X 5 ˜0 the generic methods. Nevertheless, the power can further be D 1 1 C 2 2 C ¢ ¢ ¢ C p p C improved when the partially linear model structure is fully Let T be the normalized test statistic (2.3) when the residuals exploited. j are ordered according to variable Xj . Let T max1 j p Tj . The basic idea is to estimate the coefŽ cient vector ‚2 Ž rst D µ µ and then to compute the partial residuals z Y xT ‚ . Rejecbt the null hypothesis when T > Jn4=p5, with Jn4=p5 i i i1 2 O2 D ƒ the =p upper quantile of the distribution Jnbin Table 1. Thibs Based on the 84xi11 zi59, the problem reduces to testing the one-dimensional linear model, but with slightly is simply the Bonferroni adjustmenbt applied to the combined dependent data. Theoretically, this dependence structure is test statistics. asymptotically negligible, following the same proof as that of 2.6 Estimation of Residual Variance Theorem 1, as long as is estimated at the root- rate. We ‚O2 n can therefore apply the adaptive Neyman and other tests to the The implementations of the adaptive Neyman test and other tests depend on a good estimate of the residual variance ‘ 2. data 84xi11 zi59. There is a large literature on efŽ cient estimation of coefŽ - This estimator should be good under both the null and alter- cients ‚ . See, for example, Wahba (1984), Speckman (1988), native hypotheses, because an overestimate (caused by biases) 2 2 Cuzick (1992), and Green and Silverman (1994). Most of the of ‘ under the alternative hypothesis will signiŽ cantly dete- methods proposed in the literature involve choices of smooth- riorate the power of the tests. A root-n consistent estimator ing parameters. For our purpose, a root-n consistent estima- can be constructed by using the residuals of a nonparametric kernel regression Ž t. However, this theoretically satisfactory tor of ‚2 sufŽ ces. This can be much more easily constructed than the semiparametric efŽ cient estimator. The basic idea is method encounters the “curse of dimensionality” in practical implementations. as follows. In the case of a single predictor (p 1), there are many First the data are arranged according to the correspond- possible estimators for ‘ 2. A simple anDd useful technique is ing order of 8X 9, yielding the sorted data 84x 1 Y 51 i 1 4i5 4i5 D given in Hall, Kay, and Titterington (1990). An alternative 11 : : : 1 n9. Note that for a random sample, x4i 151 1 x4i511 is C ƒ estimator can be constructed based on the Fourier transform, O 41=n5. Thus, for the differentiable function f in (1.4), we P which constitutes raw materials of the adaptive Neyman test. have If the signals 8‡i9 are smooth as a function of i, then the high components are basically noise even under the T Y4i 15 Y4i5 4x4i 151 2 x4i5125 ‚2 ei alternative hypothesis, namely, the discrete Fourier transform C ƒ D C ƒ C 1 ˜jü for j large has a mean approximately equal to 0 and vari- O 4nƒ 51 i 11 : : : 1 n 11 (2.9) O 2 C P D ƒ ance ‘ . This gives us a simple and effective way to estimate 2 ‘ by using the sample variance of 8˜iü 1 i In 11 : : : 1 n9: where 8ei9 are correlated stochastic errors with ei ˜4i 15 O D C ˜ . Thus, ‚ can be estimated using the ordinary least-squareD C ƒs n n 2 4i5 2 2 1 2 1 estimator from the approximate linear model (2.9). This idea ‘ 1 ˜iü ˜iü 0 (2.10) O D n In O ƒ n In O ƒ i In 1 ƒ i In 1 was proposed by Yatchew (1997) (and was independently pro- DXC DXC posed by us around the same time with the following improve- for some given I ( 6n=47, say). Under some mild conditions n D ments). Yatchew (1997) showed that such an estimate ‚2 is on the smoothness of 8‡i9, this estimator can be shown to be root-n consistent. To correct the bias in the preceding approx- root-n consistent even under the alternative hypothesis. Hart imation at Ž nite sample, particularly for those x at tails (1997) suggested taking I m 1, where m is the number of 4i511 n D C (hence the spacing can be wide), we Ž t the Fourier coefŽ cients chosen based on some data-driven criteria. 2 2 model Similarly, an estimate ‘ 2 for Var4˜ 5 can be formed by taking O 2 the sample variance of 8˜iü 1 i n=4 11 : : : 1 n9: T O D C Y4i 15 Y4i5 4x4i 151 1 x4i51 15‚1 4x4i 151 2 x4i5125 ‚2 ei n n 2 C C C ƒ D ƒ C ƒ C 2 1 4 1 2 ‘ 2 ˜iü ˜iü 0 (2.11) into our implementation by using the ordinary least-squares O D n In i I 1 O ƒ n In i I 1 O ƒ D n C ƒ D nC method. This improves the performance of the estimator based X X on the model (2.9). In the case of multiple regression (p > 1), the foregoing method is still applicable. However, as indicated in 2.5 Linear Models Versus Additive Models Section 2.2, it is hard to order residuals in a way that the resulting residuals 8‡i9 are smooth so that E4˜jü 5 0 for large As will be demonstrated in Section 3.3, when the data are j. Thus, the estimator (2.10) can possess substantiaO D l bias. properly ordered, the proposed tests are quite powerful. To test Indeed, it is difŽ cult to Ž nd a practically satisfactory esti- linear versus partially linear models, we show in Section 3.3 mate for large p. This problem is beyond the scope of this that the adaptive Neyman tests based on ordered residuals article. Here we describe an ad hoc method that is imple- according to the variable X1 for model (1.4) are nearly as mented in our simulation studies. For simplicity of descrip- powerful as those tests in Section 2.4 when the structure of tion, assume that there are three continuous and one discrete Fan and Huang: Regression Goodness-of-Fit 645

covariates, X11 X21 X3, and X4, respectively. An estimate ‘ ` 3.1 for can be obtained by Ž tting linear splines with four knots anOd In this section, we study the power of the adaptive Neyman the terms between continuous predictors, test and other tests for univariate regression problems:

4 3 4 H0 2 m4x5  ‚x versus H1 2 m4x5  ‚x0 ˆ0 ˆiXi ˆi1 j 4Xi ti1 j 5 D C 6D C C C i 1 C i 1 j 1 ƒ D D D The sample size is 64 and the residuals ˜ are ordered by X XX Oi ƒ1X1X2 ƒ2X1X3 ƒ3X2X31 their corresponding covariate. The power of each test is eval- C C C uated at a sequence of alternatives given in Examples 1–3. where t denotes the 420j5th , j 11 21 31 4, of the i1 j The empirical critical values for TAN1 1 and TAN1 2 are 4.62 and ith covariate, i 11 21 3 (continuous covariateD s only). In this D 4.74, respectively, and 3.41 for the KH test when ‘ is esti- model, there are 20 parameters and the residual variance is mated. For comparison purposes, we also include the paramet- estimated by ric F test for the linear model against the quadratic regression 2 ‚0 ‚1x ‚2x . ‘ 2 residual sum of squared errors=4n 2050 C C O` D ƒ Example 1. The covariate X1 is sampled from uniform 3. SIMULATIONS 4 21 25, and the response variable is drawn from ƒ We now study the power of the adaptive Neyman tests 2 Y 1 ˆX1 ˜1 ˜ N 401 151 (3.1) (TAN1 1 and TAN1 2) and the wavelet thresholding test TH via D C C simulations. The simulated examples consist of testing the for each given value of ˆ. The power function for each test goodness of Ž t of the linear models in the situations of one is evaluated under the alternative model (3.1) with a given predictor, partially linear models, and multiple regression. The ˆ. This is a quadratic regression model where the F test is results are based on 400 simulations and the signiŽ cance level derived. Nevertheless, the adaptive Neyman tests and the KH is taken to be 5%. Thus, the power under the null hypoth- test perform close to the F test in this ideal setting, whereas esis should be around 5%, with a Monte Carlo error of the wavelet thresholding test falls behind the other three tests; p0005 0095=400 1% or so. For each testing procedure, we ü see Figure 1, (a) and (b). In fact, the wavelet tests are not examine how well it performs when the noise levels are esti- speciŽ cally designated for testing this kind of very smooth mated and when the noise levels are known. The latter mimics alternatives. This example is also designated to show how the situations where the variance can be well estimated with large a price the adaptive Neyman and the KH tests have to little bias. An example of this is when repeated measurements pay to be more omnibus. Surprisingly, Figure 1 demonstrates are available at some design points. Another reason for using that both procedures paid very little price. the known variance is that we intend to separate the perfor- mance of the adaptive Neyman test and the performance of Example 2. Let X1 be N 401 15 and the variance estimation. For the wavelet thresholding test, the asymptotic critical value 1.645 is used. The simulated criti- Y 1 cos4ˆX1 5 ˜1 ˜ N 401 150 (3.2) D C C cal values in table 1 of Fan and Lin (1998) are used for the adaptive Neyman test if ‘ is known; empirical critical values This example examines how powerful each testing procedure is for detecting alternatives with different frequency compo- are taken when ‘ 1 and/or ‘ 2 are estimated. For simplicity of presentation, we report only the results of the wavelet thresh- nents. The result is presented in Figure 1, (c) and (d). Clearly, olding test when the residual are given. the adaptive Neyman and KH tests outperform the F test when To demonstrate the versatility of our proposed testing ‘ is given, and they lose some power when ‘ is estimated. scheme, we also include the test proposed by Kuchibhatla and The loss is due to the excessive biases in the estimation of Hart (1996) (abbreviated as the KH test), ‘ when ˆ is large. This problem can be resolved by setting a larger value of In in the high-frequency cases. The wavelet m 2 1 2n˜iü thresholding test performs nicely too. As anticipated, the adap- Sn max O 1 1 m n p m ‘ 2 tive Neyman test is more powerful than the KH test for detect- D µ µ ƒ j 1 XD ing a high-frequency component. where p is the number of parameters Ž tted in the null model. Example 3. We now evaluate the power of each testing Compared to the adaptive Neyman test, this test tends to procedure at a sequence of models, select a smaller dimension m, namely the maximization in Sn is achieved at smaller m than that of the adaptive Neyman 10 test. Therefore, the KH test will be somewhat more power- Y ˜1 ˜ N 401 150 (3.3) D 100 ˆ exp4 2X15 C ful than the adaptive Neyman test when the alternative is very C ƒ smooth and will be less powerful than the adaptive Neyman where X1 N 401 15. Figure 1, (e) and (f), depicts the results. test when the alternative is not as smooth. Again, the results of The adaptive Neyman and KH tests far outperform the F test,

Sn are given in cases when variance is known and when vari- which is clearly not omnibus. The wavelet thresholding test ance is estimated. The same variance estimator as the adaptive is also better than the F test, whereas it is dominated by the Neyman tests is applied. adaptive Neyman and KH tests. 646 Journal of the American Statistical Association, June 2001

Figure 1. Power of the Adaptive Neyman Tests, the KH Test, the Wavelet Thresholding Test, and the Parametric F Test for Examples 1–3 With n 64. Key: A, the adaptive Neyman test with known variance; B, the adaptive Neyman test with estimated variance (2.10); F, the parametric D F-test statistic; H, the wavelet thresholding test with known variance; R, the robust version of the adaptive Neyman test with estimated variance (2.10) and (2.11); S, the KH test with known variance; T, the KH test with estimated variance (2.10). Fan and Huang: Regression Goodness-of-Fit 647

Figure 1. Continued

3.2 Testing Linearity in Partially Linear Models This is due partially to the fact that an overparameterized full quadratic model is used in the F test. Again, the wavelet In this section, we test the linear model versus a partial thresholding test does not perform as well for this kind of linear model smooth alternatives. Note that this model is very similar to that in Example 1 and the power of the adaptive Neyman and m4X5 ˆ0 ˆ1X1 f 4X25 ˆ3X3 ˆ4x41 D C C C C KH tests behaves analogously to that in Example 1. This ver- where X11 : : : 1 X4 are covariates. The covariates X1, X2, and iŽ es our claim that the parametric components are effectively X3 are normally distributed with mean 0 and variance 1. Fur- estimated by using our new estimator for ‚2 in Section 2.4. thermore, the correlation coefŽ cients among these three ran- Example 5. We simulate the response variable from dom variables are .5. The covariate X4 is binary, independent of X1, X2, and X3, with Y X cos4ˆX  5 2X ˜1 ˜ N 401‘ 250 (3.5) D 1 C 2 C 4 C P4X 15 04 and P4X 05 060 4 D D 4 D D The results are depicted in Figure 2, (c) and (d). The adap- The techniques proposed in Section 2.4 are employed here for tive Neyman, wavelet thresholding, and KH tests perform well when ‘ is known. The F test gives low power for ˆ 105, the adaptive Neyman and KH tests, and the same simulated ¶ critical values for n 64 as in the case of the simple linear which indicates that it is not omnibus. Similar remarks to model are used. ForDcomparison, we include the parametric those given at the end of Example 4 apply. Note that when F test for testing the linear model against the quadratic model ˆ is large, the residual variance cannot be estimated well by any saturated nonparametric methods. This is why the power 4 of each test is reduced so dramatically when ‘ is estimated ‚0 ‚iXi ‚i1j XiXj 0 by using our nonparametric estimate of ‘ . This also demon- C i 1 C 1 i j 3 XD µXµ µ strates that to have a powerful test, ‘ should be estimated well We evaluate the power of each testing procedure at two par- under both the null and the alternative hypotheses. ticular partially linear models as follows. 3.3 Testing for a Multiple Linear Model Example 4. The dependent variable is generated from the quadratic regression model Two models identical to those in Examples 4 and 5 are used to investigate the empirical power for testing linearity 2 when there is no knowledge about the alternative, namely the Y X1 ˆX2 2X4 ˜1 ˜ N 401 151 (3.4) D C C C alternative hypothesis is given by (1.3). The sample size 128 for each given ˆ. Although the true function involves a is used. If we know that the nonlinearity is likely to occur in quadratic of X2, the adaptive Neyman and KH tests do slightly the X2 direction, we would order the residuals according to better than the F test, as shown in Figure 2, (a) and (b). X2 instead of using the generic method in Section 2.2. Thus, 648 Journal of the American Statistical Association, June 2001

Figure 2. Power of the Adaptive Neyman Tests, the KH test, the Wavelet Thresholding Test, and the Parametric F Test for Examples 4 and 5 With Partial Linear Models as Alternative and n 64. Key: A, the adaptive Neyman test with known variance; B, the adaptive Neyman test with D estimated variance (2.10); F, the parametric F-test statistic; H, the wavelet thresholding test with known variance; R, the robust version of adaptive Neyman test with estimated variance (2.10) and (2.11); S, the KH test with known variance; T, the KH test with estimated variance (2.10). Fan and Huang: Regression Goodness-of-Fit 649 for the adaptive Neyman test, we study the following four APPENDIX versions, depending on our knowledge of the model: In this Appendix, we establish the asymptotic distribution given in Theorem 1 and the asymptotic power expression given in Theorem 2. 1. ordering according to X2 and using the known variance We Ž rst introduce some necessary notation that is for the linear (c 3017, where c is the 5% empirical critical value) 2. orderinD g by X and using ‘ (c 3061) model. We use the notation ‚ instead of ˆ to denote the unknown 2 `  parameters under the null hypothesis. Under the null hypothesis, we 3. ordering according to s inO SectioDn 2.2 and using the i assume that known variance (c 3006) Y xT ‚ ˜1 ˜ N401‘ 251 D i D i C 4. ordering by si and using ‘ ` (c 3054) O D where ‚ is a p-dimensional unknown vector. Let X 4x 5 be the D ij To demonstrate the versatility of our proposal, we also applied design matrix of the linear model. Let x1ü 1 : : : 1 xpü be, respectively, the KH test in the following ways: the discrete Fourier transforms of the Ž rst, : : : , the pth column of the design matrix X. We impose the following technical conditions. 1. ordering according to X2 and using the known variance (c 2033) Conditions  D 2. ordering by X and using ‘ (c 2034) A1. There exists a positive deŽ nite matrix A such that n 1xT x 2 O`  D ƒ 3. ordering according to the Ž rst principal component and A. ! using the known variance (c 2054) A2. There exists an integer n0 such that  D 4. ordering by the Ž rst principal component and using ‘ ` 4 O n=4log log n5 (c 2038) 4 1 2  8n=4log log n5 9ƒ xü O4151 j 11 : : : 1 p1 D ij D D i n0 Hart (1997, section 9.3) proposed performing a one- XD dimensional test a couple of times along the Ž rst couple th where xijü is the j element of the vector xiü . 2 2 1=2 2 4 1 of principal components and adjusting the signiŽ cance level A3. ‘ 1 ‘ OP 4nƒ 5 and ‘ 2 2‘ OP 84logn5ƒ 9. 1O Dn C O D C accordingly. For simplicity, we use only one principal direc- A4. n i 1 m4xi5xi b for some vector b. D ! tion in our implementation [see versions (3) and (4) of the ConditioPn A1 is a standard condition for the least-squares estima- KH test]. The results of the wavelet thresholding test with the tor ‚O to be root-n consistent. It holds almost surely for the designs ideal ordering (by X2) and known variance are reported. The that are generated from a random sample of a population with a Ž nite results of the conventional F test against the quadratic mod- second . By Parseval’s identity, els are also included. The simulation results are reported in 1 n 1 n Figure 3. 2 2 xijü xij O4150 n i 1 D n i 1 D First of all, we note that ordering according to X2 is more XD XD powerful than ordering by the generic method or by the Ž rst Thus, Condition A2 is a very mild condition. It holds almost surely principal direction. When ordering by X2, the nonparametric for the designs that are generated from a random sample. Condition procedures with estimated variance perform closely to the cor- A4 implies that under the alternative hypothesis, the least-squares responding procedures with a known variance, except for the estimator converges in mean square error to 1 . ‚O ‚0 Aƒ b KH test at the alternative models (3.5) with high frequency. D This in turn suggests that our generic variance estimator per- Proof of Theorem 1 forms reasonably well. The F test in Example 4 performs Let â be the n n orthonormal matrix generated by the discrete € comparably to the adaptive Neyman with ideal ordering, Fourier transform. Denote whereas for Example 5, the F test fails to detect high- T Xü âX and Y 4Y 1 : : : 1 Y 5 0 frequency components in the model. The results are encour- D D 1 n aging. Correctly ordering the covariate can produce a test that Under the null hypothesis, our model can be written as is nearly as powerful as knowing the alternative model. Y X‚ ˜0 (A.1) D C 4. SUMMARY AND CONCLUSION T 1 T Then the least-squares estimate is given by ‚ 4X X5ƒ X Y. The adaptive Neyman test is a powerful omnibus test. It is O D Denote z ☠and o Xü 4‚ ‚5. Then, powerful against a wide class of alternatives. Indeed, as shown D D O ƒ 2 in Theorem 3, the adaptive Neyman test adapts automatically ˜ü ☠âX4‚ ‚5 z o and z N 401‘ I 50 O D ƒ O ƒ D ƒ n to a large class of functions with unknown degrees of smooth- Let and be the th components of the vectors z and o, respec- ness. However, its power in the multiple regression setting zi oi i depends on how the residuals are ordered. When the residuals tively. Then are properly ordered, it can be very powerful as demonstrated m m 2 2 2 in Section 3.3. This observation can be very useful for testing ˜iü 4zi 2zioi oi 50 (A.2) i 1 O D i 1 ƒ C the linear model against the additive model. We just need to XD XD 2 order the data according to the most sensible covariate. Esti- We Ž rst evaluate the small order term oi . By the Cauchy–Schwarz mation of residual variance also plays a critical role for the inequality, adaptive Neyman test. With a proper estimate of the residual p 2 p 2 2 2 variance, the adaptive Neyman test can be nearly as powerful o xü 8‚ ‚ 9 ‚ ‚ xü 1 (A.3) i D ij Oj ƒ j µ ˜ O ƒ ˜ ij as the case where the variance is known. j 1 j 1 XD XD 650 Journal of the American Statistical Association, June 2001

Figure 3. Power of the Adaptive Neyman Tests, the KH Test, the Wavelet Thresholding Test, and the Parametric F Test for Examples 4 and 5: When the Alternative Is Fully Nonparametric and n 128. Key: A, the adaptive Neyman test (ANT) ordered according to X and using the known D 2 variance; B, the ANT ordered according to s and using the known variance; C, the ANT test ordered by X and using ‘ ; D, the ANT test ordered i 2 O ` by s and using ‘ ; F, the parametric F-test statistic; H, the wavelet thresholding test ordered by X and using the known variance; S, the KH test i O ` 2 with known variance and ordered by X ; T, the KH test with estimated variance ‘ and ordered by X ; U, the KH test with known variance and 2 O ` 2 ordered by ’ rst PC; V, the KH test with estimated variance ‘ and ordered by the ’ rst PC. O ` Fan and Huang: Regression Goodness-of-Fit 651

where z in T is distributed as N4‡ 1‘ 2I 5. Therefore, the power where ‚Oj and ‚j are the jth components of the vectors ‚O and ‚, ANü ü n respectively. By the linear model (A.1) and Condition A1, we have can be expressed as

T 2 T 1 1 1 2 E4‚ ‚54‚ ‚5 ‘ 4X X5ƒ nƒ Aƒ 8‘ o41590 1=2 O ƒ O ƒ D D C P8T > c 9 P6T ü > 4log log n5 AN1 j  D AN This and Condition A2 together yield 1=2 81 o4159 o84‡ü 5 970 (A.9) € C C n m p n=4log log n54 2 2 2 4 Then, by (A.9), we have o ‚ ‚ xü O 84log log n5ƒ 90 (A.4) i µ ˜ O ƒ ˜ ij D P i n0 j 1 i n0 D D D X X X P8TAN1 j > c 9 We now deal with the second term in (A.2). Observe that m0ü 4 1=2 2 2 1=2 m m P 42mü ‘ 5ƒ 4z ‘ 5 > 24log log n5 1 2 2 1 2 2 ¶ 0 i ƒ mƒ zi ‘ max mƒ 4zi ‘ 5 oP 4log log n50 (A.5) i 1 1 m n D i 1 µ C µ µ i 1 ƒ D D D X X X m0ü 4 1=2 2 2 2 The last equality in (A.5) follows from (A.8). By the Cauchy– P 42mü ‘ 5ƒ 4z ‡ü ‘ 5 ¶ 0 i ƒ i ƒ Schwarz inequality, (A.4), and (A.5), we have i 1 XD 1=2 m m 1=2 m 1=2 > 24log log n5 ‡ü 0 (A.10) 1=2 2 1 2 3=2 n mƒ o z o mƒ z o 84log log n5ƒ 90 ƒ i i µ i i D P i n0 i n0 i n0 XD XD XD The sequence of random variables This together with (A.2) and (A.4) entails m 1=2 m 4 2 2 ƒ 2 2 2 m m 2m‘ 4‘ ‡ 4z ‡ü ‘ 5 1=2 2 1=2 2 3=2 C i i ƒ i ƒ mƒ ˜i ü mƒ zi oP 84log log n5ƒ 90 (A.6) i 1 i 1 O D C D D i n0 i n0 D D X X X X is tight, because they have the mean zero and 1. It Let m follows from (A.10) and the assumption (2.8) that the power of the 4 1=2 2 2 Tnü max 42m‘ 5ƒ 4zi ‘ 50 1 m n critical regions in (2.5) tends to 1. D µ µ i 1 ƒ D Then, by theorem 1 of Darling and ErdXös (1956), we have [Received April 1999. Revised June 2000.]

P6 2 log log nT ü 82 log logn 05 log log log n n ƒ C REFERENCES p 05 log44 59 x7 exp4 exp4 x55 (A.7) ƒ µ ! ƒ ƒ Aerts, M., Claeskens, G., and Hart, J. D. (1999), “Testing Lack of Fit in Hence, Multiple Regression,” Journal of the American Statistical Association, 94, 869–879. 1=2 Azzalini, A., and Bowman, A. N. (1993), “On the Use of Nonparametric T ü 82 log log n9 81 o 4159 (A.8) n D C P Regression for Checking Linear Relationships,” Journal of the Royal Sta- tistical Society, Ser. B, 55, 549–557. and Azzalini, A., Bowman, A. N., and Härdle, W. (1989), “On the Use of Non- 1=2 parametric Regression for Model Checking,” Biometrika, 76, 1–11. Tlogü n 82 log log logn9 81 oP 41590 D C Barnett, V. (1976), “The Ordering of Multivariate Data” (with Discussion), This implies that the maximum of Tnü cannot be achieved at m < Journal of the Royal Statistical Society, Ser. A, 139, 318–355. log n. Thus, by (2.1), Bickel, P. J., and Ritov, Y. (1992), “Testing for Goodness of Fit: A New Approach,” in and Related Topics, ed. m A. K. Md. E. Saleh, New York: North-Holland, pp. 51–57. 1 2 2 3=2 TANü 1 1 max 4˜iü ‘ 5 op 4log log n5ƒ 0 Bowman, A. W., and Azzalini, A. (1997), Applied Smoothing Techniques for 4 4 D 1 m n=4log log n5 p2m‘ i 1 O ƒ C Data Analysis, London: Oxford University Press. µ µ D X Cuzick, J. (1992), “Semiparametric Additive Regression,” Journal of the Combination of this and (A.6) entails Royal Statistical Society, Ser. B, 54, 831–843. Darling, D. A., and Erdos, P. (1956), “A Limit Theorem for the Maximum of 3=2 Normalized Sums of Independen t Random Variables,” Duke Mathematical TANü 1 1 Tnü Op84log log n5ƒ 90 D C Journal, 23, 143–155. The conclusion for T follows from (A.7) and the conclusion for Daubechies, I. (1992), Ten Lectures on Wavelets, Philadelphia: SIAM. AN1 1 Dette, H. (1999), “A Consistent Test for the Functional Form of a Regression TAN1 2 follows from the same arguments. Based on a Difference of Variance Estimators,” The Annals of Statistics, 27, 1012–1040. Proof of Theorem 2 Dette, H., and Munk, A. (1998), “Validation of Linear Regression Models,” The Annals of Statistics, 26, 778–800. Let mü be the index such that 0 Eubank, R. L., and Hart, J. D. (1992), “Testing Goodness-of-Fit in Regression m via Order Selection Criteria,” The Annals of Statistics, 20, 1412–1425. 4 1=2 2 Eubank, R. L., and LaRiccia, V. M. (1992), “Asymptotic Comparison of ‡nü max 42‘ m5ƒ ‡iü 1 m n D µ µ i 1 Cramér–von Mises and Nonparametric Function Estimation Techniques for XD Testing Goodness-of-Fit,” The Annals of Statistics, 20, 2071–2086. is achieved. Denote c log8 log41 59. Fan, J. (1996), “Test of SigniŽ cance Based on Wavelet Thresholding and  D ƒ ƒ ƒ Using arguments similar to but more tedious than those in the Neyman’s Truncation,” Journal of the American Statistical Association, 91, proof of Theorem 1, we can show that under the alternative hypoth- 674–688. (1997), Comments on “Polynomial Splines and Their Tensor Products esis, in Extended Linear Modeling,” The Annals of Statistics, 25, 1425–1432. Fan, J., and Lin, S. K. (1998), Test of SigniŽ cance When Data Are Curves,” 3=2 3=2 1=2 T ü T ü o 84log log n5ƒ 4log log n5ƒ 4‡ü 5 91 Journal of the American Statistical Association, 93, 1007–1021. AN1 j D AN C p C n 652 Journal of the American Statistical Association, June 2001

Gnanadesikan, R., and Kettenring, J. R. (1972), “Robust Estimates, Residuals, Kuchibhatla, M., and Hart, J. D. (1996), “Smoothing-Based Lack-of-Fit Tests: and Outlier Detection with Multiresponse Data,” Biometrics, 28, 81–124. Variations on a Theme,” Journal of Nonparametric Statistics, 7, 1–22.

Green, P. J., and Silverman, B. W. (1994), and Lee, G., and Hart, J. D. (1998), “An L2 Error Test with Order Selection and Generalized Linear Models: A Roughness Penalty Approach, London: Thresholding,” Statistics and Probability Letters, 39, 61–72. Chapman and Hall. Ledwina, T. (1994), “Data-Driven Version of Neyman’s Smooth Test of Fit,” Härdle, W., and Mammen, E. (1993), “Comparing Nonparametric Versus Journal of the American Statistical Association, 89, 1000-1005. Parametric Regression Fits,” The Annals of Statistics, 21, 1926–1947. Neyman, J. (1937), “Smooth Test for Goodness of Fit,” Skandinavisk Hall, P., Kay, J. W., and Titterington, D. M. (1990), “Asymptotically Optimal Aktuarietidskrift, 20, 149–199. Difference-Based Estimation of Variance in Nonparametric Regression,” Rayner, J. C. W., and Best, D. J. (1989), Smooth Tests of Goodness of Fit, Biometrika, 77, 521–528. London: Oxford University Press. Hart, J. D. (1997), Nonparametric Smoothing and Lack-of-Fit Tests, Speckman, P. (1988), “Kernel Smoothing in Partial Linear Models,” Journal New York: Springer-Verlag. of the Royal Statistical Society, Ser. B, 50, 413–436. Inglot, T., Kallenberg, W. C. M., and Ledwina, T. (1994), “Power Approxima- Spokoiny, V. G. (1996), “Adaptive Hypothesis Testing Using Wavelets,” The tions to and Power Comparison of Smooth Goodness-of-Fit Tests,” Scan- Annals of Statistics, 24, 2477–2498. dinavian Journal of Statistics, 21, 131–145. Stute, W., González Manteiga, W., and Presedo Quindimil, M. (1998), “Boot- Inglot, T., and Ledwina, T. (1996), “Asymptotic Optimality of Data- strap Approximations in Model Checks for Regression,” Journal of the Driven Neyman Tests for Uniformity,” The Annals of Statistics, 24, American Statistical Association, 93, 141–149. 1982–2019. Wahba, G. (1984), “Partial Spline Models for the Semiparametric Estimation Jolliffe, I. T. (1986), Principal Component Analysis, New York: Springer- of Functions of Several Variables,” in Analyses for , Tokyo: Verlag. Institute of Statistical Mathematics, pp. 319–329. Kim, J. (2000), “An Order Selection Criteria for Testing Goodness of Fit,” Yatchew, A. (1997), “An Elementary Estimator of the Partial Linear Model,” Journal of the American Statistical Association, 95, 829–835. Economics Letters, 57, 135–143.