ANCOVA: a Robust Omnibus Test Based on Selected Design Points Rand R

Journal of Modern Applied Statistical Methods

Volume 5 | Issue 1 Article 3

5-1-2006 ANCOVA: A Robust Omnibus Test Based On Selected Design Points Rand R. Wilcox University of Southern California, [email protected]

Follow this and additional works at: http://digitalcommons.wayne.edu/jmasm Part of the Applied Statistics Commons, Social and Behavioral Sciences Commons, and the Statistical Theory Commons

Recommended Citation Wilcox, Rand R. (2006) "ANCOVA: A Robust Omnibus Test Based On Selected Design Points," Journal of Modern Applied Statistical Methods: Vol. 5 : Iss. 1 , Article 3. DOI: 10.22237/jmasm/1146456120 Available at: http://digitalcommons.wayne.edu/jmasm/vol5/iss1/3

This Invited Article is brought to you for free and open access by the Open Access Journals at DigitalCommons@WayneState. It has been accepted for inclusion in Journal of Modern Applied Statistical Methods by an authorized editor of DigitalCommons@WayneState. Journal of Modern Applied Statistical Methods Copyright © 2006 JMASM, Inc. May, 2006, Vol. 5, No. 1, 14-21 1538 – 9472/06/$95.00

ANCOVA: A Robust Omnibus Test Based On Selected Design Points

Rand R. Wilcox Dept of Psychology University of Southern California

Many robust analogs of the classic analysis of covariance method have been proposed. One approach, when comparing two independent groups, uses selected design points and then compares the groups at each design point using some robust method for comparing measures of location. So, if K design points are of interest, K tests are performed. There are rather obvious ways of performing, instead, an omnibus test that for all K points, no differences between the groups exist. One of the main results here is that several variations of these methods can perform very poorly in simulations. An alternative approach, based in part on the usual sample median, is suggested and found to perform reasonably well in simulations. It is noted that when using other robust measures of location, the method can be unsatisfactory.

Key words: ANCOVA, bootstrap methods, measures of depth, smoothers

Introduction =++ββε YXijijojij (1)

The analysis of covariance (ANCOVA) problem ε σσ2 2 = σ2 ε is to compare two independent groups based on where ij has variance j , 12, and ij is some outcome of interest, Y , in a manner that independent of X . So by implication, for each takes into account some covariate, X. A classic ij and well-known approach assumes that the error group, the conditional variance of Y , given X, term of the usual linear regression model is does not vary with X, and each group has the homoscedastic and has a normal distribution, the same slope. regression lines associated with each group are It is known that violating one or more of parallel, and the variances associated with the these assumptions can result in serious practical error terms for each group are assumed to be problems. Concerns about the robustness of the identical. More formally, if for the jth group (j method date back to at least Atiqullah (1964) = 1, 2 ), then there are n randomly sampled who concluded that non-normality is a practical j problem. Another obvious concern is the pairs of observations, say (X ij, Y ij), i = 1, . . . n j, the classic assumption is that for the jth group, assumption that the regression lines are parallel. There are several robust methods for testing this assumption (e.g., Wilcox, 2003, 2005), but it

remains unclear when such tests have enough Rand R. Wilcox is Professor of Psychology at power to detect situations where having non- the University of Southern California, Los parallel lines is a practical concern. Yet another Angeles. Email: [email protected]. concern about equation (1) is the assumption that the association between Y and X is linear.

14 RAND R. WILCOX 15

Of course, in some situations this is a reasonable where ε has a median of zero, variance σ 2 , approximation, but this is not always the case. ij ij Many alternative methods have been derived and is independent of X ij . Let mxj () be the that eliminate the assumption that the population median of Y for the jth group, given association is linear (e.g. Bowman & Young, that the covariate of the jth group is Xx= . 1996; Delgado, 1993; Dette & Neumeyer, 2001; j Hall, Huber, & Speckman, 1997; Kulasekera, (Comments on using other location estimators 1995; Kulasekera & Wang, 1997; Munk & are given later in the article). Let x1,..., xK be Dette, 1998; Neumeyer & Dette, 2003; Young & K values of X that are of interest. The method Bowman, 1995; Wilcox, 2003). However, some in Wilcox (2003, section 14.8) includes as a of these methods require homoscedasticity and special case the problem of testing for most there are few if any simulation results that support their use with small to moderate == H 01: mx()kk mx 2 (),1,..., k K, sample sizes. A simple and very flexible approach to for each k. That is, K tests are to be performed. ANCOVA is described in Wilcox (2003, section Let δ x =−mx mx . The goal here is 14.8). It allows the regression lines to be non- ( k )()(12kk) linear, it allows heteroscedasticity, it performs to test well in simulations, and in the event standard δδ=⋅⋅⋅ = assumptions are met, all indications are that it has Hx01: () = ( xK ) 0 nearly the same amount of power as the classic (2) ANCOVA method (e.g., Wilcox, 2005, p. 526). Roughly, the method is based on multiple Here, it is assumed that K = 5 and that the comparisons. Examination of the method choices for xx,..., are made empirically in a suggests a simple and rather obvious approach to 15 performing an omnibus test instead. But results manner about to be described. Of course, it is reported here make it clear that several not being suggested that other choices for the variations of this approach perform very poorly design points or K are inappropriate. For in simulations. (Details are given later in the example, a researcher might have interest in K article). The main result in this article is that an specific design points, rather than points alternative approach, based in part on the usual determined as is done here. The idea is to sample median and the depth of the null vector provide a data-driven method for checking in a bootstrap cloud, nearly eliminates this whether the regression lines differ, paying problem. The main exception is a situation particular attention to design points where valid where, simultaneously, the conditional inferences about the medians of the Y values can distribution of Y is discrete, skewed, and the be made. possible values for Y are relatively small in The choice of the five design points stems in number. part from what is called a running interval smoother. To describe the details, attention is Considered and Discarded Methods temporarily focused on a single group of It helps to describe the first general subjects. The basic strategy is to find all X i method that was considered and discarded and values close to x and estimate m(x) with the then suggest a related approach that gives more median of the corresponding Y values. The satisfactory results. It is assumed that for the jth method begins by computing the median group, Y and X are related through some absolute deviation statistic: unknown function, m j. More formally, it is assumed that MAD=−− median{| X M |,...,| X M |}, 1 n

=+ε YmXijjij() ij where M is the usual sample median of the X values. Let MADN = MAD/.6745. The only 16 ANCOVA

reason for rescaling MAD is that under == Yijk (i 1...,nkjk ; 1,..., K) be the Yij values normality, MADN estimates σ . This rescaling such that helps describe the running interval smoother in −≤× (4) terms of familiar concepts, but ultimately it is | X ijxfMADN k | . not important. Then X i is said to be close to x For fixed k and j, generate a bootstrap sample if by randomly sampling with replacement n jk X −≤×xfMADN * = | i | , values from Yijk yielding Yijk,(1,...,in jk ) . Let M * be the usual sample median based on the where f is some constant, called the span. jk * δ * =−** Here, following Wilcox (2003), f = 1 is used. Yijk values and let k M1k M 2k . Repeat this =Σ * Let m j mxjk()/K . A seemingly natural δ = process B times yielding bk ,bB1,..., . So, alternative to (2) is to test * there are B vectors of bootstrap δ values, bk each vector having length K. Then roughly, the 12= (3) H 0 : mm null hypothesis is rejected depending on how deeply the null vector (0,...,0) is nested within That is, view the problem in the context of a 2 this bootstrap cloud. by K ANOVA and test the hypothesis that The problem of choosing the x values there is no main effect for the first factor. Many k robust methods for testing this hypothesis have is approached as follows. Let Nxj () be the been proposed (Wilcox, 2005), which include number of points in the jth group that are various bootstrap techniques. But when considered close to x based on (4). For checking the ability of this approach to control notational convenience, assume that for fixed j, the probability of a Type I error for the problem the X values are in ascending order. That is, at hand, poor results were obtained in situations ij ≤⋅⋅⋅≤ described later in the article. Included were non- X1 j X njJ . The regression lines are said bootstrap methods for 20% trimmed means and to be comparable at x if simultaneously medians (Wilcox, 2003, sections 10.3 & 10.5) Nx( )1≥ 2 for both j = 1 and 2. The value 12 plus bootstrap variations of these methods j described in Wilcox (2005). In particular, it is chosen simply to reflect a sample of points was found that in some situations, when testing large enough so as to expect reasonable control at the .05 level, the actual Type I error over the probability of a Type I error, but probability was estimated to exceed .2. obviously some other (larger) value could be used if desired.

Description of the Recommended Method Suppose x1 is taken to be the smallest The one method that performed well in X value for which the regression lines are simulations is based on testing (2) rather than i1 (3). The general strategy is to generate comparable. That is, search the first group for ≥ bootstrap samples, yielding bootstrap estimates the smallest X i1 such that NX11()12i . If δ ≥ of k , and then determine how deeply the null NX2 ()12il , the two regression lines are vector is nested within this bootstrap cloud. Two = considered comparable at X i1 and x11X i is approaches to measuring the depth of the null < vector are considered. General theoretical results set. If Nx2 ()12il , consider the next largest

related to this approach are reported in Liu and X i1 value and continue until it is Singh (1997). simultaneously true that NX()12≥ and To elaborate, momentarily assume that 11i the xk values have been chosen and let NXi2( 1)≥ 12 . K = 5 is used, but again some

other value is certainly reasonable. Let x5 be RAND R. WILCOX 17

which is the distance of the null vector from the the largest X i1 value in the first group for which the regression lines are comparable. That center of the bootstrap cloud. The (generalized) p-value is is, x is the largest X value such that 5 i1 ≥ ≥ * Nx15()12 and Nx25()12. Let i5 be the 1 p =ΣID() ≤ d , corresponding value of i. The other three B b design points are chosen as follows. Let =+ =+ iii315()/2, i213ii/2, and ≤= ≤ where ID()1 db if Ddb and =+ iii435()/2. Round i2 , i3 , and i4 down to ≤= > ID()0 db if D d b . This will be the nearest integer and set x = X , x = X , 21i2 31i3 called method M. and x = X . The second method considered here for 41i4 measuring the depth of a point in the bootstrap There are various ways of measuring cloud is a projection-type method given in how deeply a point is nested within a Wilcox (2005, section 6.2.5); it represents a multivariate cloud of data (e.g., Liu & Singh, slight variation of a method discussed by 1997, Wilcox, 2005). The simplest is based on Donoho and Gasko (1992) and has been found Mahalanobis distances and is the first of the to perform well in connection with other methods two methods considered here. However, the described in Wilcox (2005). The computational most obvious estimate of the covariance matrix details are relegated to an appendix. This will be associated with the bootstrap vectors is not called method P. used. Rather, it is estimated with A Simulation Study 1 B Simulations were used to assess the s =−−∑()()δδδδ**. kmbkkbmm small-sample properties of the method just B −1 b=1 described. Observations were generated according to the models k, Σδ * That is, for fixed rather than use bk / B as the estimate of the center of the bootstrap Y = ε δ cloud, use k instead. Put another way, there is no need to estimate the center of the bootstrap YX=+ε cloud, it is already known and given by the vector (δδ ,..., ) . Indeed, if it is estimated with and 1 K * Σδ / B , control over the probability of a 2 bk YX=+ε , Type I error deteriorates, consistent with a variety of other methods surveyed by Wilcox where X has a standard normal distribution and = ε (2005). Let S ()skm be the corresponding has one of four g-and-h distributions covariance matrix, in which case the distance of (Hoaglin, 1985), which contain the standard the bth bootstrap vector from the center is given normal distribution as a special case. If Z has a by standard normal distribution, then

− ⎧ − > =−δδδδ**1** − δδδδ − −′ exp(gZ ) 1 2 if g 0 dSb( b11 ,..., bK K ) ( b 11 ,..., bK K ) . ⎪ exp(hZ / 2), = W ⎨ g ⎪ Let ⎩ZhZexp(2 / 2), if g = 0

=−δδ −−1 δδ − −′ DS(11 0,...,KK 0) ( 0,..., 0) ,

18 ANCOVA

Table 1: Some properties of the g-and-h distribution.

g h k1 k2

0.0 0.0 0.00 3.0

0.0 0.2 0.00 21.46

0.2 0.0 1.75 8.9

0.2 0.2 2.81 155.99

has a g-and-h distribution, where g and h are where B is the complete beta function. Here m = parameters that determine the first four mo- 10, 12 and 20 were considered. With m = 12, for ments. The four distributions used here were the example, the possible values for X are the standard normal (g = h = 0.0), a symmetric integers 0,1,...,12 . The values for r and s were heavy-tailed distribution (h = 0.2, g = 0.0), an taken to be r = s = 4, as well as r = 1 and r = 9. asymmetric distribution with relatively light For r = s = 4 the distribution is bell-shaped and tails (h = 0.0, g = 0.2), and a symmetric symmetric with mean m/2. In Figure 1, the distribution with heavy tails (g = h = 0.2). In probability function when r = 1, s = 9 and m = Table 1, the theoretical skewness and kurtosis 12 is exhibited. for each distribution is considered. Additional In Table 2, the estimated probability of a properties of the g-and-h distribution are Type I error when testing at the .05 level and summarized by Hoaglin (1985). nn==40 is exhibited. The estimates are A general concern about methods 12 aimed at comparing population medians, based based on 1,000 replications with B = 600. (From on the usual sample median, is that for discrete Robey & Barcikowski, (1992), 1,000 data where tied values can occur, control over replications is sufficient from a power point of the probability of a Type I error can be poor. view. More specifically, if the hypothesis that This is the case when using the method the actual Type I error rate is .05 is tested, and if proposed by Bonett and Price (2002) as well as power is to be .9 when testing at the .05 level α a related method in Wilcox (2003, section and the true value differs from .05 by .025, 8.7.1). In a paper submitted for publication, the then 976 replications are required.) The results author has found that certain bootstrap methods for YX=+ε did not reveal any new insights, correct this problem while others do not. The and so for brevity they are not reported. To get main point here is that considering discrete some idea of the effect of homoscedasticity, distributions where tied values are likely is additional simulations were run where values in σ = crucial for the problem at hand. Accordingly, the first group were multiplied by 1 4 . The additional simulations were run by generating g-and-h distribution has a median of zero, so the ε from a beta-binomial distribution: null hypothesis remains true. For the beta- binomial distributions, the data were shifted to Bm(,)−+ x rx + s have a median of zero before multiplying by PX()== x , +−++ σ = (m 1)Bm ( x 1, x 1) Br ( ,s ) 1 4 . The top portion of Table 2 are the σ = results when there is homoscedasticity (1)1 .

RAND R. WILCOX 19

Figure 1: The beta-binomial probability function with m = 12 ,r = 1 and s = 9

Table 2: Estimated Type I error probabilities

20 ANCOVA

First, consider the homoscedastic case against low efficiency due to heavy-tailed with continuous g-and-h distributions. Both distributions is obtained. (Note that the usual methods P and M perform reasonably well. To sample median belongs to the class of trimmed avoid an estimated Type I error probability means with the maximum amount of trimming.) greater than .07, method P is preferable. Under However, replacing the usual sample median with heteroscedasticity, method M can be unsatisfac- a 20% trimmed mean, the methods studied here tory, with estimates exceeding .08, while again are unsatisfactory in terms of estimated Type I method P gives fairly satisfactory results. But errors, at least for the situations considered. when tied values occur, method P can be Consideration was given to estimating the disastrous and should not be used. Method M population median with the Harrell and Davis now performs well under homoscedasticity (1982) estimator with the goal of achieving better σ = efficiency under normality, but again control (1)1 , but under heteroscedasticity, it breaks down as well with estimates exceeding .1. over the probability of a Type I error was no All simulations were repeated with longer satisfactory.

nn==60 , no new insights were found, so 12 References the results are not reported. Atiqullah, M. (1964). The robustness of Conclusion the covariance analysis of a one-way classification. Biometrika, 51, 365–372. A positive result is that when tied values occur Bonett, D. G. & Price, R. M. (2002). with probability zero, method P performs fairly Statistical inference for a linear function of well in terms of Type I errors, even when there medians: Confidence intervals, hypothesis is heteroscedasticity. However, when tied values testing, and sample size requirements. are likely, it can be unsatisfactory. If tied values Psychological Methods, 7, 370–383. are likely and there is homoscedasticity, method Bowman, A. & Young, S. (1996). M performs reasonably well, but it can break Graphical comparison of nonparametric curves. down when there is heteroscedasicity. So a Applied Statistics, 45, 83–98. possible argument in favor of method M is that Delgado, M. A. (1993). Testing the when the (conditional) distributions of Y do not equality of nonparametric regression curves. differ, it provides good control over the Statistics and Probability Letters, 17, 199–204. probability of a Type I error. But a negative Dette, H. (1999). A consistent test for feature is that it is sensitive to more than one the functional form of a regression based on a feature of the data. That is, it does not isolate the difference of variance estimators. Annals of reason for rejecting, which could be due to Statistics, 27, 1012–1040. differences between medians or Dette, H. & Neumeyer, N. (2001). heteroscedasticity. Nonparametric analysis of covariance. Annals of Some additional simulations were run with m Statistics, 29, 1361–1400. = 2 0,r = 2 and s = 9. The ability of method P to Donoho, D. L. & Gasko, M. (1992). control the probability of a Type I error Breakdown properties of the location estimates improved substantially versus the situation based on halfspace depth and projected where r = 1, but the estimated probability of a outlyingness. Annals of Statistics, 20, 1803– Type I error for the model Y = ε was .099. So it 1827. seems that some tied values can probably be Hall, P., Huber, C., & Speckman, P. L. tolerated when using method P, but it is difficult (1997). Covariate-matched one-sided tests for to know when this is the case. the difference between functional means. A criticism of the sample median is that Journal of the American Statistical Association, under normality, or when sampling from a light- 92, 1074–1083. tailed distribution, it is relatively inefficient. By Harrell, F. E. & Davis, C. E. (1982). A trimming less, say 20%, good efficiency is new distribution-free quantile estimator. obtained under normality and some protection Biometrika, 69, 635–640. RAND R. WILCOX 21

Hoaglin, D. C. (1985). Summarizing detailed discussion of the minimum volume shape numerically: The g-and-h distribution. In ellipsoid estimator, see Rousseeuw & Leroy, D. Hoaglin, F. Mosteller & J. Tukey (Eds.) 1987). The outlier detection method in Exploring Data Tables Trends and Shapes. Rousseeuw and van Zomeren (1990) is applied, New York: Wiley. any points flagged as outliers are removed, Kulasekera, K. B. (1995). Comparison and ξ is taken to be the mean of the remaining of regression curves using quasi-residuals. vectors. For any i, let Journal of the American Statistical Association, UX=−ξ , 90, 1085–1093. ii Kulasekera, K. B. & Wang, J. (1997). =′ Smoothing parameter selection for power BUUiii optimality in testing of regression curves. =Σp U 2 Journal of the American Statistical Association, k=1 ik 92, 500–511. Liu, R. G. & Singh, K. (1997). Notions and for any j let (j=1,…,n) let of limiting P values based on data depth and p bootstrap. Journal of the American Statistical = Wijikj∑UUk , Association, 92, 266–277, k =1 Munk, A. & Dette, H. (1998). Nonparametric comparison of several and regression functions: Exact and asymptotic W theory. Annals of Statistics, 26, 2339–2368. = ij Tijiip(UU1 ,..., ) (5) Neumeyer, N. & Dette, H. (2003). Bi Nonparametric comparison of regression curves: An empirical process approach. Annals The distance between ξ and the projection of of Statistics, 31, 880–920. X (when projecting onto the line Robey, R. R. & Barcikowski, R. S. j ξ (1992). Type I error and the number of connecting X i and ) is iterations in Monte Carlo studies of robustness. British Journal of Mathematical and Statistical V =||T || , Psychology, 45, 283–288. ijij Young, S. G. & Bowman, A. W.

(1995). Nonparametric analysis of covariance. where ||Tij || is the Euclidean norm associated Biometrics, 51, 920–931. with the vector T . Let Wilcox, R. R. (2003). Applying ij Contemporary Statistical Techniques Testing. San Diego CA: Academic Press. V d = ij , (6) Wilcox, R. R. (2005). Introduction to ij qq− Robust Estimation and Hypothesis Testing (2nd 2 1 Ed). San Diego CA: Academic Press. where for fixed i , q2 and q1 are estimates of Appendix the upper and lower quartiles, respectively, of the V values. (Here, the ideal fourths based on For notational convenience, projection distance ij is described in terms of a sample of n vectors the values Vi1,...Vin were used; see, for from some multivariate distribution. The example, Wilcox, 2004.) The projection = ξ sample is denoted by X i ,1,...,i n . Let be distance associated with X j say D j , is the some multivariate measure of location. Here, ξ maximum value of dij , the maximum being is taken to be the W-estimator stemming from taken over in=1,..., . the minimum volume ellipsoid estimator. (For a