Testing for Covariate Effects in the Fully Nonparametric Analysis of Covariance Model
Lan WANG and Michael G. AKRITAS
Traditional inference questions in the analysis of covariance mainly focus on comparing different factor levels by adjusting for the continu- ous covariates, which are believed to also exert a significant effect on the outcome variable. Testing hypotheses about the covariate effects, although of substantial interest in many applications, has received relatively limited study in the semiparametric/nonparametric setting. In the context of the fully nonparametric analysis of covariance model of Akritas et al., we propose methods to test for covariate main effects and covariateÐfactor interaction effects. The idea underlying the proposed procedures is that covariates can be thought of as factors with many levels. The test statistics are closely related to some recent developments in the asymptotic theory for analysis of variance when the number of factor levels is large. The limiting normal distributions are established under the null hypotheses and local alternatives by asymptotically approximating a new class of quadratic forms. The test statistics bear similar forms to the classical F-test statistics and thus are convenient for computation. We demonstrate the methods and their properties on simulated and real data. KEY WORDS: Covariate effects; Fully nonparametric model; Heteroscedasticity; Interaction effects; Nearest-neighborhood windows; Nonparametric hypotheses.
1. INTRODUCTION parametric assumptions and have not considered the type of hy- potheses that we discuss in this article. In many scientific fields, researchers frequently face the Closer to the spirit of the present article is recent work problem of analyzing a response variable at the presence of on testing the equality of two or more mean regression func- both factors (categorical explanatory variables) and covariates tions (see Härdle and Marron 1990; Hall and Hart 1990; King, (continuous explanatory variables). Traditional inference fo- Hart, and Wehrly 1991; Delgado 1993; Kulasekera 1995; Hall, cuses mainly on covariate-adjusted factor effects. Testing for Huber, and Speckman 1997; Dette and Munk 1998; Dette and covariates effects, although of substantial interest in itself, has Neumeyer 2001; Koul and Schick 2003, among others). The received relatively limited attention, especially in a semipara- null hypothesis considered by these authors amounts to the hy- metric/nonparametric setting. In this article we discuss formu- pothesis of no factor-simple effects (i.e., no factor main effects lating and testing appropriate hypotheses for covariate effects and no factorÐcovariate interaction effects). These tests are de- in a completely nonparametric fashion. veloped under the assumption of a location-scale model; as our The classical analysis of covariance model imposes a set simulation (Sec. 4) suggests, this is not a benign assumption. of stringent assumptions including linearity, parallelism, ho- Moreover, it is not immediately clear how the hypotheses of moscedasticity, and normality. In this framework, testing the no covariate effects or no factorÐcovariate interaction effects significance of covariate effects is equivalent to checking can be directly formulated and tested with their approaches. whether the regression coefficients are zero, and testing for the One exception is the work of Young and Bowman (1995), who presence of factorÐcovariate interaction effects amounts to in- considered testing the parallelism of several mean regression vestigating whether the regression lines have the same slope. functions. They used a partial linear model, which provides one Due to its simplicity, the classical approach remains the most way to formulate the hypothesis of no factorÐcovariate inter- popular tool in practice. action effects. They did not consider any hypothesis related to Early work to relax the restrictive assumptions of the classi- main covariate effects; for their method to work, the random er- cal ANCOVA model involves the use of rank-based methods. rors need to have homoscedastic normal distributions. Finally, This literature is very extensive, but it is fair to state that al- it also is not clear how the aforementioned methods can readily though successful in some respects, the rank-based methods are accommodate more than one factor. not totally satisfactory. For example, the test of Quade (1967) To overcome the aforementioned difficulties, this article ex- can be applied only to test for covariate-adjusted treatment ef- tends the state of art of nonparametric analysis of covariance fects; it avoids assuming normality and linearity but requires in a number of ways. The new methodology applies to test- that the covariate have the same distribution across all treat- ing all common hypotheses, allows for the presence of several ment levels. The approach of McKean and Schrader (1980) factors, includes both continuous and discrete ordinal response does not need to assume normality but is based on paramet- variable, permits heteroscedastic errors, and does not require ric modeling of the covariate effect. Another line of research the covariate to be equally spaced or to have the same distrib- yields interesting alternative models, including the partial lin- ution at different treatment level combinations. The main vehi- ear model (Speckman 1988; Hastie and Tibshirani 1990), the cle for achieving this extension is the nonparametric model of varying-coefficient model (Hastie and Tibshirani 1993), and the Akritas, Arnold, and Du (2000), which generalizes beyond the model of Sen (1996). But these approaches still require certain common location-scale model. To introduce the fully nonparametric model, let (Xij, Yij), i = = Lan Wang is Assistant Professor, School of Statistics, University of Min- 1,...,k, j 1,...,ni, denote the observable variables. More nesota, Minneapolis, MN 55455 (E-mail: [email protected]). Michael G. Akritas is Professor, Department of Statistics, Pennsylvania State University, University Park, PA 16802 (E-mail: [email protected]). The authors thank the © 2006 American Statistical Association associate editor, an anonymous referee, and the editor for comments that sig- Journal of the American Statistical Association nificantly improved the article. This work was supported in part by National June 2006, Vol. 101, No. 474, Theory and Methods Science Foundation grant SES-0318200. DOI 10.1198/016214505000001276
722 Wang and Akritas: Testing for Covariate Effects 723 specifically, i enumerates the cells (or factor-level combina- An important property of the nonparametric hypotheses is that tions) and j denotes the observations within each cell. This no- they are invariant to monotone transformations of the response. tation is typical for one-way analysis of covariance and also can This property is particularly desirable for ordinal response vari- be used for higher-way designs, with the understanding that the ables because they can be coded using different scales by dif- single index i enumerates all factor-level combinations; Xij may ferent researchers. also represent a vector of covariates if needed. More discussion The use of mid-rank procedures for testing (2) was consid- on higher-way designs and multiple covariates is given in Sec- ered in Akritas et al. (2000). In this article we develop pro- tion 6. The fully nonparametric model of Akritas et al. (2000) cedures for testing the hypotheses (3) and (4). Procedures for specifies only that testing (5) and (6) can be constructed similarly. Note that (4) and (5) generalize the hypotheses of parallelism and equality of Y |X = x ∼ F ; (1) ij ij ix regression lines, which were considered by Young and Bowman that is, given Xij = x, the distribution of Yij depends on i and x. (1995). Development of the present tests requires new types of Thus (1) does not specify how this conditional distribution statistics and different techniques from those for testing (2). changes with the cell or covariate value. In this sense it is com- The basic idea for constructing the test procedures is to treat pletely nonparametric (also nonlinear and nonadditive). Choose the continuous covariate as a factor with many levels and bor- a distribution function G(x) and let row suitable test statistics from the ANOVA. The methods are k therefore motivated by recent advances in the asymptotic theory G 1 for heteroscedastic ANOVA with a large number of factor lev- F · (y) = F (y) dG(x) and F· (y) = F (y). i ix x k ix i=1 els (Akritas and Papadatos 2004; Wang and Akritas 2002). The technical derivation involves the asymptotic approximation of a If the X ’s are random, then G can be taken as their overall ij new class of quadratic forms (Sec. 3.2). Some new asymptotic distribution function. Therefore, if the covariate has the same tools are developed for this purpose. distribution in all groups, then FG(y) is the marginal distribu- i· In the sequel, we use N(µ, σ 2) to represent a normal distri- Y . tion function of ij The hypotheses of interest in model (1) for bution with mean µ and variance σ 2 and let a ∧ b = min(a, b) the one-way design are as follows: and a ∨ b = max(a, b).Welet1d denote the d × 1 column vec- G ; = No main group effect, or Fi· does not depend on i (2) torof1’s,Jd 1d1d, Id represent the d-dimensional identity matrix, and ⊕ and ⊗ denote the operations of Kronecker sum No main covariate effect, or F· (y) does not depend on x; x and Kronecker product. (3) The rest of the article is organized as follows. In the next section we introduce the test statistics. In Section 3 we present F (y) = FG(y) + K (y); No group-covariate interaction, or ix i· x the asymptotic techniques and main theoretical properties, and (4) in Section 4 we report results from several simulation studies. In Section 5 we apply the tests to two real datasets: the White F (y) i; No simple group effect, or ix does not depend on (5) Spanish Onion data and the North Carolina Rain data. We pro- and vide some generalizations and discussions in Section 6 and give proofs of the main theorems and some auxiliary technical re- No simple covariate effect, or F (y) does not depend on x, ix sults in two appendixes. (6) 2. TEST STATISTICS where in (4) K (y) is some function independent of i. Effects x We adopt the convention that in each category i, the observed for these hypotheses are defined in terms of an ANOVA-type pairs (X , Y ), i = 1,...,k, j = 1,...,n , are enumerated in as- decomposition, ij ij i cending order of the covariate values; thus Xi1 < Xi2 < ···< = + + + Fix(y) M(y) Ai(y) Dx(y) Cix(y), (7) Xin holds for each i. Also, we let X1 < X2 < ··· < XN de- i k = note the pooled covariate values Xij’s in ascending order, where where, under the side conditions i=1 Ai(y) 0 for all y, k k N = = ni is the total number of observations. Dx (y) dG(x) = 0 for all y, = Cix(y) = 0 for all x and y, i 1 i 1 In the context of the nonparametric ANCOVA model (1), and Cix(y) dG(x) = 0 for all i and y, the effects are given there is no conceptual distinction between the ANCOVA and by M(y) = k−1 k FG(y), A (y) = FG(y) − M(y), D (y) = i=1 i· i i· x ANOVA designs, because the effect of the covariate is not mod- F· (y) − M(y), and C (y) = F (y) − M(y) − A (y) − D (y). x ix ix i x eled. The idea, then, for constructing the test statistics is to treat Borrowing terminology from the classical ANCOVA, A is the i the ANCOVA design as an ANOVA design where one factor, covariate-adjusted main effect of cell i, D is the main effect of x corresponding to the covariate, has levels X , X ,...,X . Thus the covariate value x, and C is the interaction effect between 1 2 N ix recent advances in testing hypotheses in ANOVA with one fac- cell i and covariate value x. The nonparametric hypotheses sim- tor having many levels become relevant for the present testing ply state that the corresponding nonparametric effects are zero. problems. Such methods are not directly applicable, however, In particular, the hypotheses (3) and (4) can be restated as because of the sparseness issue (lack of replications in the cells) H0(D) : Dx(y) = 0 for all x and all y caused by the continuity of the covariate. Our proposal for deal- ing with the sparseness in the ANOVA design is to use the and fact that the covariate factor is actually a regressor and to use H0(C) : Cix(y) = 0 for all i, all x, and all y. smoothness assumptions. 724 Journal of the American Statistical Association, June 2006
Basically, a smoothness assumption stipulates that the con- FD and FC are suitable for testing the nonparametric hypothe- ditional distribution of the response changes smoothly with ses (3) and (4) is that the aforementioned expectations tend to 0 the covariate value. Different versions of smoothness assump- under their respective null hypotheses and smoothness assump- tions are ubiquitous in nonparametric regression. They are used tions; see Lemma B.2 in Appendix B. for enhancing or augmenting the available information about For the ANOVA with independent observations and a large the conditional distribution of the response at a given x-value number of factor levels, asymptotic distributions are obtained 1/2 1/2 by also considering the responses at covariate values that are by considering N (FD − 1) and N (FC − 1) (see Wang and “close” to x or belong in a local “window” around x. In our Akritas 2002). But these results do not apply here, due to the case we need to augment the available information for each Fix, dependence of the observations in the augmented layout. In = where i 1,...,k and x takes any of the covariate values Section 3.2 a new set of asymptotic approximation techniques X1, X2,...,XN . Because the present hypothesis testing objec- is developed that is suitable for our purpose. Due to the local tive is different than that of nonparametric estimation, the win- smoothing, the parametric convergence rate N1/2 is impossible dow size n is required to increase with N at a rate only faster to achieve. The next section reveals that an appropriate scaling than log N [see Assumption A1(b)]. This allows the use of fairly − constant should be of order N1/2n 1/2. Because under weak small local window sizes even for large N, and simulations sug- regularity conditions, the MSE tends to a constant in probabil- gest that the actual choice of the local window size does not ity, by Slutsky’s theorem, it suffices to study the asymptotic dis- seem to have a large influence on the performance of the testing tribution of procedures. Augmenting the information that is available for Fix means, 1/2 −1/2 1/2 −1/2 N n TD = N n (MSTD − MSE) (8) in terms of the ANOVA design, augmenting the information available in the corresponding cell (i, x). In particular, each and cell (i, Xr) is augmented to include the n responses from cat- 1/2 −1/2 1/2 −1/2 egory i whose corresponding covariate values Xij are nearest N n TC = N n (MSTC − MSE), (9) to Xr in the sense that which we use as the test statistics for H (D) and H (C),from n − 1 0 0 |Gi(Xij) − Gi(Xr)|≤ , now on. 2ni = −1 ni ≤ where Gi(x) ni j=1 I(Xij x) is the empirical distribution 3. ASYMPTOTIC DISTRIBUTIONS OF function of the covariate in group i. We use the notation Wir THE TEST STATISTICS to denote either the set of responses or the set of indices of 3.1 Conventions and Assumptions the observations in the augmented cell (i, Xr); the meaning should be clear from the context. To distinguish the observa- The test statistics in (8) and (9) can be expressed as TD = tions in the augmented ANOVA design from those of the orig- Z ADZ and TC = Z ACZ, where Z = (Z111, Z112,...,ZkNn) is inal observations, the tth observation in cell ( i, Xr) is denoted the vector of all observations in the augmented design and = ni ≤ by Zirt. Specifically, Zirt Yij if and only if l=1 I(Xil xij, ∈ = N Yil Wir) t. 1 1 Now let F and F denote the classical F statistics in bal- AD = Jkn − JNkn D C (N − 1)kn N(N − 1)kn anced two-way ANOVA for testing the null hypotheses of no r=1 main column effects and no rowÐcolumn interaction effects, N k 1 1 evaluated on the augmented data. Then FD and FC have the − INkn + Jn Nk(n − 1) Nk(n − 1)n expressions r=1 i=1 MSTD FD = and MSE N k − −1 N − 2 1 kn(N 1) r= (Z·r· Z···) = 1 AC = 1n − −1 k N n − 2 (N − 1)(k − 1)n (Nk(n 1)) i=1 r=1 t=1(Zirt Zir·) r=1 i=1 k N and 1 1 1 MST × INk − JN ⊗ − Jk + JNk = C N k Nk FC i=1 r=1 MSE −1 N k N k n{(N − 1)(k − 1)} 1 1 = × 1 − INkn − Jn . (Nk(n − 1))−1 n Nk(n − 1) n r=1 i=1 r=1 i=1 k N (Z · − Z· · − Z ·· + Z···)2 × i= 1 r =1 ir r i For later reference, we also define two other quadratic forms, k N n . − · 2 i=1 r=1 t=1(Zirt Zir ) Z BDZ and Z BCZ, where − = Unlike in the classical ANOVA model, E(Zirt Zir·) 0is N k N no longer true. Similarly, E(Z· · − Z···) = 0 under H (D) and 1 1 1 r 0 BD = Jkn − INkn + Jn · − · · − ·· + ··· = Nkn Nk(n − 1) Nk(n − 1)n E(Zir Z r Zi Z ) 0 under H0(C). The reason why r=1 i=1 r=1 Wang and Akritas: Testing for Covariate Effects 725 is block diagonal with block size kn × kn and a vector of covariates, and its coordinates are correlated due to the local augmentation. To our knowledge, no asymptotic k N 1 method in the literature is directly applicable. Z BCZ = [Z JnZir − Z Zir] Nk(n − 1) ir ir i=1 r=1 Our approach for obtaining the asymptotic distribution of this class of quadratic forms consists of four steps. First, it k N 1 can be assumed without loss of generality that the observa- − Z JnZi r, Nk(k − 1)n i1r 2 tions are conditionally centered under the null hypothesis; this i =i r=1 1 2 is done in Proposition 1. Second, the test statistics are approx- where Zir = (Zir1,...,Zirn) is a vector of all observations in imated by simpler quadratic forms obtained by an application the augmented cell (i, Xr). Moreover, we set K(u) = I(|u|≤1), of Akritas and Papadatos’s (2004) variant of Hájek’s projec- G(x) for the overall distribution function of X, Gi for the dis- tion method; this is done in Proposition 2 (although the pro- tribution function of X in the ith group, and hi = (n − 1)/(2ni), jection calculations, which are quite tedious, are not shown). and define Third, the simpler quadratic forms are further approximated by − the clean generalized quadratic forms (de Jong 1987), which in- = Gi(X1) Gi(X2) Ki,G(X1, X2) K . (10) volve only independent variables; this is done in Proposition 3. hi Finally, the conditions for establishing the asymptotic normal- The asymptotic distributions of TD and TC are derived under ity of the clean generalized quadratic forms (proposition 3.2 of the following assumptions: de Jong 1987) are verified, which proves the asymptotic nor- Assumption A1. (a) For each i, ni/N → λi ∈ (0, 1) as mality of the proposed test statistics; the last step is presented ni →∞. in the next section. The new techniques developed in this arti- (b) ln(Nn−1) = o(n),ln(Nn−1)/ ln ln N →∞, N−3n5 = cle are of independent interest and are likely to be useful for o(1) as N →∞. asymptotic analysis of other similar statistics.
Assumption A2. (a) The covariate Xi is a continuous random Proposition 1. Assume that Assumptions A1 and A2 hold. variable, i = 1,...,k. Then the following results apply: (b) The X ’s have common bounded support S, i = 1,...,k. i 1/2 −1/2[ − − | × (c) The density g of X is bounded away from 0 and ∞ on (a) Under H0(D), N n Z ADZ (Z E(Z X)) i i − | ]→ S and is differentiable, i = 1,...,k. AD(Z E(Z X)) 0 in probability. 1/2 −1/2[ − − | × (d) ydF(y|x) are uniformly Lipschitz continuous of order (b) Under H0(C), N n Z ACZ (Z E(Z X)) i − | ]→ 1 in the following sense: There exists some positive constant c AC(Z E(Z X)) 0 in probability. such that Proposition 2. Assume that Assumptions A1ÐA3 hold. Then the following results apply: ydFi(y|x1) − ydFi(y|x2) ≤ c|x1 − x2| for all i, x1, x2. 1/2 −1/2 (a) Under H0(D), N n (Z ADZ − Z BDZ) → 0in 4| = Assumption A3. E(Yij Xij x) are uniformly bounded probability. 1/2 −1/2 in i, j, x. (b) Under H0(C), N n (Z ACZ − Z BCZ) → 0in probability. The restriction that Assumption A1(b) puts on n is very weak; for example, n is allowed to go to infinity at the Proposition 3. Assume that Assumptions A1ÐA3 hold. Then rate log N. This assumption also implies that n = o(N3/5). the following results apply: We find that a typical order of N1/2 often works well in prac- 1/2 −1/2 − → tice. Our simulation results indicate that the nonparametric test (a) Under H0(D), N n (Z BDZ T) 0 in probabil- = + is usually valid for a wide range of choices of n. Assump- ity, where T T1 T2, with tions A2 and A3 impose weak smoothness and moment con- k n 1 i straints. T = Yil Yil 1 k(n − 1) 1 2 i=1 l1 =l2 3.2 A New Class of Quadratic Forms and Asymptotic Approximations × Ki,G Xil1 , x Ki,G Xil2 , x dG(x) The test statistics that we are dealing with are quadratic forms of the type Z MZ, where M is a symmetric matrix and and Z is a random column vector whose dimension goes to infinity. n n k i1 i2 When the coordinates of Z are independent and homoscedas- 1 T = Y Y tic random variables, the asymptotic distribution of this type of 2 kn i1l1 i2l2 i1 =i2 l1=1 l2=1 quadratic form was studied by Boos and Brownie (1995), Jiang (1996), and Akritas and Arnold (2000); when the coordinates × K X , x K X , x dG(x). of Z are independent but heteroscedastic, it was investigated by i1,G i1l1 i2,G i2l2 Akritas and Papadatos (2004), who applied a novel variant of 1/2 −1/2 − → Hájek’s projection method. In our setting, besides the fact that (b) Under H0(C), N n (Z BCZ T) 0 in probabil- −1 Z is nonnormal and heteroscedastic, its distribution depends on ity, where T = T1 − (k − 1) T2,forT1 and T2 defined earlier. 726 Journal of the American Statistical Association, June 2006 2 3.3 Asymptotic Distribution Under the Null y dRix(y) are uniformly bounded for all i and x.Itisalsoas- sumed that for some positive constant C, |R (y) − R (y)|≤ The two theorems that follow provide the asymptotic dis- ix1 ix2 C|x − x |, for all i, y. Define tributions of the test statistics under the null hypotheses of 1 2 no nonparametric main covariate effects and no nonparametric k 2 1 factorÐcovariate interaction effects. θ = ydR (y) dG(x) D k ix Theorem 1. Assume that Assumptions A1ÐA3 hold. Then, i=1 under H (D), k 2 0 1 − ydRix(y) dG(x) 1/2 −1/2 4 4 4 k N n TD → N 0, (ξ + η ) , = 3k2 i 1 and where, setting σ 2(x) = var(Y |X = x), ρ (t) = λ g (t), g(t) = i ij ij i i i k = ∧ − ∧ 2 k k i=1 ρi(t), and τi1,i2 (t) 3(ρi1(t) ρi2(t)) (ρi1(t) ρi2(t)) / 1 ∨ θC = ydRix(y) − ydRjx(y) (ρi1(t) ρi2(t)), k i=1 j=1 k g2(t) 4 = 4 k 2 ξ σi (t) dt and 1 = ρi(t) − ydR (y) dG(t) + ydR (y) dG(t) dG(x). i 1 it k jt j=1 k g2(t) η4 = σ 2 (t)σ 2 (t) τ (t) dt. i1 i2 i1,i2 The next lemma plays a central role in the derivation of ρi1(t)ρi2(t) i1