<<

Quantile Regression for Correlated Observations

Li Chen1, Lee-Jen Wei2,andMichaelI.Parzen

1 Division of Biostatistics, University of Minnesota, Minneapolis, MN 55414 2 Department of Biostatistics, Harvard University, Boston, MA 02115 3 Graduate School of Business, University of Chicago, Chicago, IL 60637

Abstract

In this paper, we consider the problem of regression analysis for data which consist of a large number of independent small groups or clusters of correlated observations. Instead of using the standard mean regression, we regress various of each marginal response variable over its covariates to obtain a more accurate assessment of the covariate effect. Our inference procedures are derived using the generalized estimating equations approach. The new proposal is robust and can be easily implemented. Graphical and numerical methods for checking the adequacy of the fitted model are also proposed. The new methods are illustrated with an animal study in toxicology. Key Words: Estimating equations; Gaussian process; Linear Programming; Om- nibus test; Resampling method

1 Introduction

Although quite a few useful parametric and semi-parametric regression meth- ods are available for analyzing correlated observations, they can only be used to evaluate the covariate effect on the mean of the response variable (Laird and Ware, 1982; Liang and Zeger, 1986). To obtain a global picture about the covariate effect on the distribution of the response variable, one may use the quantile regression model. Specifically, let τ be a constant between 0 and 1, Y be the response variable and x be the corresponding (p +1)× 1covariate vector. Given x, let the 100τth of Y be βτ x,whereβτ is an unknown (p +1)× 1 parameter vector and may depend on τ. Inference procedures for βτ with a set of properly chosen τ’s would provide much more information about the effect of x on Y than their counterparts based on the usual mean regression model (Mosteller and Tukey, 1977). For independent observations, inference procedures for βτ have been proposed, for example, by Bassett and Koenker 2 Li Chen, Lee-Jen Wei, and Michael I. Parzen

(1978, 1982), Koenker and Bassett (1978, 1982) and Parzen et al. (1994). When τ =1/2, which corresponds to the regression model, the celebrated L1 estimator which minimizes the sum of the absolute residuals is consistent for β0.5 (Bloomfield and Steiger, 1983). Recently, Jung (1996) proposed an interesting quasi-likelihood equation ap- proach for median regression models with dependent observations. However, his method assumes a known relationship between the median and the den- sity function of the response variable. The variance estimate of his estimator for the regression parameter appears to be rather sensitive to this assumption. Moreover, Jung’s optimal estimating equations may have multiple roots and, therefore, the estimator for βτ may not be well-defined. In this paper, we present a simple and robust procedure to make infer- ences about βτ without imposing any parametric assumption on the density function of the response variable or on the dependent structure among those correlated observations. Furthermore, our estimating functions are monotonic component-wise and the resulting estimator for the regression parameter can be easily obtained through well-established linear programming techniques. The new proposal is illustrated with an animal study in toxicology.

2 Inferences for Regression Parameters

In this section, we derive regression methods for analyzing data that consist of a large number of independent small groups or clusters of correlated observations. Let Yij be the continuous response variable for the jth measurement in the ith cluster, where i =1, ..., n; j =1, .., Ki,whereKi is relatively small with respect to n.Letxij be the corresponding covariate vector. Furthermore, assume that the 100τth percentile of Yij is βτ xij . The observations within each cluster may be dependent, but (Yij ,xij )and(Yi j ,xij ) are independent when i = i .Note that the distribution function Fτij(·) of the error term (Yij −βτ xij ) is completely unspecified and may involve xij . Suppose that we are interested in βτ for a particular τ. If all the observations {(Yij ,xij )} are mutually independent, the following estimating functions are often used to make inferences about βτ :

n Ki −1/2 Wτ (β)=n xij {I(Yij − β xij ≤ 0) − τ}, (1) i=1 j=1 where I(·) is the indicator function. For the aforementioned correlated observa- tions, (1) are estimating functions based on the “independence working model” (Liang and Zeger, 1986) and the expected value of Wτ (βτ ) is 0. Therefore, a solution βˆτ to the equations Wτ (β) = 0, would be a reasonable estimate for βτ . The consistency of βˆτ can be easily established using similar arguments for the case of independent observations. In practice, βˆτ can be obtained by minimizing Quantile Regression for Correlated Observations 3

n Ki ρτ (Yij − β xij ), (2) i=1 j=1 where ρτ (v)isτv if v>0, and (τ − 1)v,ifv ≤ 0 (Koenker and Bassett, 1978). This optimization problem can be handled by linear programming techniques (Barrodale and Roberts, 1973). An efficient algorithm developed by Koenker and D’Orey (1987) is available in Splus to obtain a minimizer βˆτ for (2). Us- ing a similar argument given in Chamberlain (1994) for the case of indepen- dent observations, one can show that for the present case, the distribution of 1/2 n (βˆτ −βτ ) goes to a as n →∞. The corresponding covari- −1 T −1 ance matrix is Aτ (βτ )var{Wτ (βτ )}{Aτ (βτ )} ,whereAτ (β) is the expected value of the derivative of Wτ (β) with respect to β. For the heteroskedastic quan- tile regression model considered here, it is difficult to estimate the covariance matrix because Aτ (β) may involve the unknown underlying density functions. Complicated and subjective nonparametric functional estimates are needed to estimate the variance directly. Recently, Parzen et al. (1994) developed a general resampling method which can be used to approximate the distribution of (βˆτ − βτ ) without involving any complicated and subjective nonparametric functional estimation. To apply this resampling method to the case with correlated observations, let ⎡ ⎤ n Ki −1/2 ⎣ ⎦ Uτ = n xij {I(yij − β˜τ xij ≤ 0) − τ} Zi, i=1 j=1 where {Zi,i =1, ...n} is a random from the standard normal popu- lation, y and β˜τ are the observed values of Y and βˆτ , respectively. Note the only component that is random in Uτ is Zi. It is straightforward to show that the unconditional distribution of Wτ (βτ ) and the conditional distribution of Uτ converge to the same limiting distribution. Let wτ (β)betheobservedWτ (β). ∗ ∗ Define a random vector βτ such that wτ (βτ )=−Uτ . Then, the unconditional distribution of (βˆτ − βτ ) can be approximated by the conditional distribution ∗ ∗ of (βτ − β˜τ ). The adequacy of using the distribution of (βτ − β˜τ ) to approxi- mate the unconditional distribution of (βˆτ − βτ ) has been addressed by Parzen et al. (1994) through extensive simulation studies. Furthermore, the distribu- ∗ tion of βτ can be estimated using a large random sample {uτm,m =1, ..., M} ∗ generated from Uτ . For each realized uτm, we obtain a solution of βτm,by ∗ solving the equation w(βτm)=−uτm, m =1, .., M. The covariance matrix ˆ of βτ can then be estimated by the empirical distribution function based on ∗ M ∗ ∗ T {βτm,m =1, ..., M}, for example, by m=1(βτm − β˜τ )(βτm − β˜τ ) /M .The standard bootstrap method can be used for estimating the variance of the re- gression parameters. However, as far as we know, there is no analytical proof that the bootstrap method is valid for the general quantile regression model with correlated observations. In order to use existing statistical software (for example, Koenker and D’Orey, 1987) to solve the equation wτ (β)=−u, one may artificially cre- 4 Li Chen, Lee-Jen Wei, and Michael I. Parzen ate an extra data point (y∗,x∗), where x∗ is n1/2u/τ and y∗ is an ex- ∗ ∗ ∗ tremely large number such that I(y − β x ≤ 0) is always 0. Let wτ (β)= −1/2 ∗ ∗ ∗ wτ (β)+n x {I(y − β x ≤ 0) − τ}. Then, solving the equation wτ (β)=u ∗ is equivalent to solving the equation wτ (β)=0. To illustrate the above method, we use an animal study in developmental toxicity evaluation of Dietary Di(2-ethylhexyl)phthalate (DEHP), a widely used plasticizing agent, in timed-pregnant mice (Tyl et al˙, 1988). DEHP was admin- istered in the diet on days 6 through 15 of gestation with dose levels of 0, 44, 91, 191 and 292 (mg/kg/day). On the 17th gestational day, the maternal animals were sacrificed and all the fetuses were examined. One of the major outcomes for the study is the fetal body weight. The investigators would like to know whether DEHP has a negative effect on the fetal body weight. Since the sex of the fetus is expected to be correlated with the weight, an adjustment from this covariate in the analysis is needed. Here, the litter is the cluster and each live fetus is a member of the cluster. Furthermore, Yij is the weight and xij is a 3 × 1 vector, where the first component is one, the second one is the dose level, and the third one is the sex indicator for the fetus. For the animal study data, there are total of 108 clusters and the cluster sizes from 2 to 16. With the aforementioned quantile regression, estimates for βτ and the corresponding estimated standard errors obtained based on the estimating functions (1) are reported in the third and fourth columns in Table 1. The estimated is obtained using the new method with 500 resampling samples. The re- sults indicate that DEHP tends to have a greater impact on a light fetus than on a heavy one. For comparison, Table 1 also gives the estimated standard errors with two heteroskedastic bootstrap procedures. The first procedure is the paired boot- strap (denoted as paired-BS) method (Efron, 1982, p.36), where the (xij ,yij ) pair is resampled. Specifically, we resampled the clusters to accommodate the dependency and heteroskedasticity. The second procedure is the empirical quan- tile function bootstrap (denoted as Heqf-BS) method proposed by Koenker (1994), where the full quantile regression process βτ is resampled. Specifically, for each bootstrap realization of n observations, n vectors of p-dimensions from the estimated regression quantile process βˆτ are drawn. The bootstrapped ob- servation yij is then the inner product of the design row xij and the corre- sponding ith draw from the regression quantile process. This procedure again accommodate certain forms of dependency and heteroskedasticity. The standard error estimated by bootstrap methods are based on 500 bootstrap samples. The results from the paired-BS is similar to those obtained from our resampling pro- cedure. The results from the Heqf-BS are smaller than those obtained from our resampling procedure. For any given set of percentiles, say, {τk,k =1, .., K}, one may ob- tain a simultaneous confidence interval for a particular component ητk of

βτk ,k =1, ..., K. More specifically, consider a class of estimating functions { } { } Wτk (βτk ),k =1, ..., K and the corresponding Uτk ,k =1, .., K ,where { } the random sample Zi,i =1, ..., n is now shared by all the Uτk ’s. Let Quantile Regression for Correlated Observations 5

Table 1. Estimates for quantile regression for the DEHP study

Quantile Coefficient Standard Error New Method Paired-BS Heqf-BS 0.05 Intercept 0.80 0.049 0.047 0.049 Dose* -0.048 0.015 0.014 0.013 Sex -0.019 0.028 0.027 0.027

0.25 Intercept 0.97 0.023 0.021 0.017 Dose* -0.048 0.013 0.012 0.007 Sex -0.038 0.010 0.010 0.011

0.50 Intercept 1.03 0.021 0.022 0.015 Dose* -0.039 0.012 0.012 0.005 Sex -0.036 0.094 0.091 0.082

0.75 Intercept 1.10 0.026 0.026 0.015 Dose* -0.028 0.014 0.013 0.007 Sex -0.045 0.011 0.010 0.008

0.95 Intercept 1.20 0.027 0.023 0.018 Dose* -0.030 0.013 0.012 0.006 Sex -0.047 0.012 0.009 0.010 * Estimates for dose effect are for per 100 unit increase.

{ ∗ } { ∗ βτk ,k=1, ..., K , be the solutions to the simultaneous equations wτk (βτk )= − } { ˆ − } Uτk ,k =1, .., K . Then, the joint distribution of (βτk βτk ),k =1, ..., K { ∗ − ˜ } − can be approximated by that of (βτk βτk ),k =1, .., K .Toobtaina(1 α) confidence band for ητk , we first find a critical value cα such that { −1| ∗ − |} ≤ − Pr(sup σˆτk ητk η˜τk cα)=1 α, k ∗ ∗ ˜ where ητk andη ˜τk are the corresponding components of βτk and βτk , respec- 2 tively;σ ˆτk is the variance estimate ofη ˆτk , obtained through the above resam- { } pling method. A confidence band of ητk ,k =1, ..., K is then ± ηˆτk cασˆτk ,k =1, ..., K.

For the animal study example, if we let K =5withτ1 =0.05,τ2 =0.25,τ3 = 0.5,τ4 =0.75, and τ5 =0.95, then a 95% confidence band, displayed in the dashed lines, for the dose effect is given in Figure 1. The corresponding crit- ical value is 2.38 based on 1000 simulations. For comparison, we also provide corresponding pointwise confidence intervals, displayed in the solid lines. The simultaneous confidence intervals in the figure are not too different from their pointwise counterparts. Naturally, the confidence band would become wider if K gets larger. 6 Li Chen, Lee-Jen Wei, and Michael I. Parzen

0.06 estimate confidence intervals 0.04 confidence band

0.02

0.00

-0.02

-0.04

-0.06 Dose Effect (per 100 unit increase)

-0.08

-0.10

0.0 0.2 0.4 0.6 0.8 1.0

Quantiles of Fetal Body Weight

Fig. 1. Confidence intervals and band for dose effect

A very useful application of the above flexible modeling is to make prediction of the τth quantiles of the distribution of the fatal body weight. Figure 2(a) shows the point estimates of fatal body weight for males with various dosing levels, and Figure 2(b) gives the corresponding predicted weights for females. These plots are quite informative, for example, one may readily conclude that with a 20% chance the weight of a male whose mother was treated with the highest dose is less than 0.77 grams. On the other hand, if the mother were not exposed to DEHP, the corresponding weight would be 0.91 grams.

If the effect of a particular covariate, say ξk of βτk ,isaboutthesame across the set of quantiles τk,k =1, ..., K, one may want to combine ξˆk’s, ˆ obtained from βτk ’s, to make inferences about the common parameter ξ.To ˆ K ˆ this end, consider an “optimal” linear combination ξ = i=1 akξk,where −1 −1 a =(a1, ..., aK ) = Γ e/{e Γ e}, e =(1, ..., 1) is a K-dimensional vector and Γ is a K ×K covariance matrix of ξˆk’s. Note that asymptotically, ξˆ has the smallest variance among all the linear combinations of ξˆk’s (Wei and Johnson, 1985). Note that even if the covariate effects of ξk are unequal across different quantiles, in practice one may still combine the ξˆk’s to draw a conclusion about the “average effect” of the covariate provided that there are no qualitative dif- ferences among the ξˆk’s. For the DEHP study example, if the dose effects are Quantile Regression for Correlated Observations 7 about the same for τ1 through τ5, then the common dose effect (per 100 unit increase) is ξˆ = −0.041 with an estimated standard error of 0.01.

(a) Males

control dose=44 dose=91 dose=191 dose=292 Predicted Fetal Body Weight 0.6 0.8 1.0 1.2

0.0 0.2 0.4 0.6 0.8 1.0

Quantile

(b) Females

control dose=44 dose=91 dose=191 dose=292 Predicted Fetal Body Weight 0.6 0.8 1.0 1.2

0.0 0.2 0.4 0.6 0.8 1.0

Quantile

Fig. 2. Predicting fatal body weight for the DEHP study with various dosing levels 8 Li Chen, Lee-Jen Wei, and Michael I. Parzen 3 Simulation Studies

To examine the performance of the proposed resampling method, we conducted simulation studies for median regression. In the simulation studies, we generated 500 samples {(Yij ,xij ),i =1, ..., 50; j =1, 2} from the following linear model: Yij = β0.5xij + ei + εij ,whereβ0.5 =1,{xij } is a realization of a random sample from the uniform variable on (0, 1), ei is the standard normal variable. Two models for εij are considered: (a) εij is a normal variable with mean 0 and variance 0.5; (b) εij is a normal variable with mean 0 and variance proportional ∗ to xij . For each simulated sample, the distribution of (β0.5 −β˜0.5) was estimated based on 500 samples from U0.5. The standard and percentile methods (Efron and Tibshirani, 1986, pp. 67-70) were then used to construct confidence inter- vals of the regression coefficient corresponding to xij . The empirical coverage and estimated average lengths for these intervals are summarized in Table 2 for the constant variance and Table 3 for the heteroskedastic vari- ance. For comparison we also report the results based on the paired bootstrap method and the empirical bootstrap method in the tables. In general, the resampling procedure performs well. The paired bootstrap method also performs well, but the empirical quantile function bootstrap method has lower coverage probabilities. These findings are consistent with those in the paper by Koenker (1994). We also performed simulation studies on the performance of the new method with variable cluster size. We generated 500 samples {(Yij ,xij ),i=1, ..., 50; j = 1,ni} from the above linear model. The cluster size ni was randomly chosen from any numbers between 2 and 10. Again, for each of the 500 simulations, 500 resampling samples were used. Table 4 displays the empirical coverage proba- bility and the estimated mean length for median regression with Gaussian error of mean 0 and either constant or heteroskedastic variance. As demonstrated in the table, the new method performs well in the case of variable cluster sizes. The computing time for this simulation was 12 minutes on a Sun Solaris II machine. The set of estimating functions Wτ (β) in (1) is a special case of the functions W˜ τ considered by Jung for estimating βτ ,where

Table 2. Empirical coverage probabilities (ECP) and estimated mean lengths (EML) for Gaussian error with mean 0 and variance 0.5 for median regression

Confidence New Method Paired-BS Heqf-BS level ECP EML ECP EML ECP EML 0.95 S 0.95 1.28 0.93 1.25 0.88 1.06 P 0.95 1.23 0.94 1.19 0.91 1.03 0.90 S 0.91 1.06 0.89 1.04 0.84 0.85 P 0.90 1.01 0.89 0.99 0.85 0.82 S: standard method; P: percentile method Quantile Regression for Correlated Observations 9

Table 3. Empirical coverage probabilities (ECP) and estimated mean lengths (EML) for Gaussian error with mean 0 and heteroskedastic variance for median regression

Confidence New Method Paired-BS Heqf-BS level ECP EML ECP EML ECP EML 0.95 S 0.93 1.22 0.92 1.19 0.90 1.08 P 0.93 1.17 0.92 1.13 0.91 1.04 0.90 S 0.89 1.02 0.89 1.00 0.81 0.81 P 0.89 0.98 0.90 0.96 0.82 0.78 S: standard method; P: percentile method

Table 4. Empirical coverage probabilities (ECP) and estimated mean lengths (EML) for Gaussian error with mean 0 and either constant or heteroskedastic variance for median regression for variable cluster size

Constant Heteroskedastic Confidence variance variance level ECP EML ECP EML 0.95 S 0.94 1.16 0.95 1.19 P 0.95 1.13 0.95 1.18 0.90 S 0.92 0.99 0.91 0.99 P 0.90 0.97 0.91 0.98 S: standard method; P: percentile method

⎛ ⎞ − ≤ − n I(Yi1 β xi1 0) τ ˜ −1/2 ⎜ . ⎟ Wτ (β)=n Ri ⎝ . ⎠ i=1 I(YiKi − β xiKi ≤ 0) − τ and Ri is a Ki × Ki matrix which may involve xij ,j =1, ..., Ki and β.Un- der some regularity conditions on Ri, a root to the equation W˜ τ (β)=0,is consistent (Jung, 1996). In theory, the inclusion of Ri, which accounts for the dependence among the correlated measurements, may achieve greater efficiency than the procedures based on (1). However, if Ri depends on β, the equation W˜ τ (β) = 0 may have multiple roots. Furthermore, empirically we have found that such efficiency improvement is quite small if there is any. For example, in the simulation study using the two models mentioned above, the mean squared errors (MSE) are estimated. Jung’s optimal estimating functions given in his Section 6 was used for comparison. Jung’s variance estimate for βτ is derived by assuming a specific relationship between the median and the dispersion of the response variable. If this assumption is violated, his inference procedure may not be valid. The MSE for our estimator and Jung’s are displayed in Table 5 for β0.5.Also displayed in Table 5 are the simulated MSEs when the covariate xij is discrete. Specifically, {xij } is a realization of a random sample from the Bernoulli (0.5) 10 Li Chen, Lee-Jen Wei, and Michael I. Parzen distribution. As seen in Table 5, if the variance of εij in the above linear model is proportional to xij , but we assume that the variance of εij is constant for Jung’s method, the loss of efficiency of Jung’s method compare to the independent working model of the new method ranges from 30% to 54%. We were not able to make comparisons for quantiles other than the median because Jung’s method was developed for median regression.

Table 5. Mean MSEs for the new method and Jung’s method

Sample Distribution MSE MSE ratio size xij ei εij Jung New (Jung/New) 50 Ber(0.5) N(0,1) N(0,.5) 0.050 0.049 1.01 N(0,.5)xij 0.065 0.050 1.30 Uni(0,1) N(0,1) N(0,.5) 0.060 0.058 1.02 N(0,.5)xij 0.066 0.048 1.34

100 Ber(0.5) N(0,1) N(0,.5) 0.033 0.032 1.04 N(0,.5)xij 0.040 0.030 1.32 Uni(0,1) N(0,1) N(0,.5) 0.043 0.042 1.03 N(0,.5)xij 0.042 0.028 1.54

4 Model Checking

If the deterministic portion of the fitted quantile regression model is correctly specified, the inference procedures discussed in Section 2 are valid even when the error term (Y − βτ x) in the model depends on the covariates. For the afore- mentioned animal study example, the crucial modeling assumption is that the 100τth percentile of the response variable is linearly related to the covariates. The , given in Figure 3, of the ordinary residuals against the dose level for median regression offers little information on the adequacy of the fitted quantile regression model. The difficulty with using such plots for model checking is that we simply have no knowledge about the behavior of those individual correlated residuals even under the simple additive quantile regression model. Consider the random components {eτij(β),j =1, ..., Ki; i =1, ..., n} in the estimating function (1), where eτij(β)=I(Yij − β xij ≤ 0) − τ.The { ˆ } quantities eτij(βτ ) resemble ordinary residuals in linear models. For ex- n Ki ˆ ample, i=1 j=1 eτij(βτ ) = 0, and under the fitted model, for large n, E{eτij(βˆτ )} =0.Notethateτij(βˆτ )iseither1− τ or −τ. Therefore, it is difficult to use such individual “quantile residuals” graphically to examine the adequacy of the assumed quantile regression model. Here, we show how to use Quantile Regression for Correlated Observations 11

.

...... Residuals ......

-0.4. -0.2 0.0 0.2 0.4 . .

0 50 100 150 200 250 300

Dose

Fig. 3. Ordinary residuals against dose level for median regression cumulative sums of quantile residuals to examine the model assumption graph- ically and numerically (see Lin, Wei and Ying, 2002, for a general review on this approach of model checking). First, consider the following process:

n Ki −1/2 Vτl(β; t)=n eτij(β)I(xlij ≤ t), i=1 j=1

where t ∈ , l =1, ..., p,andxij =(1,x1ij , ..., xpij ) . If the 100τth percentile of Yij ,givenxij ,isβτ xij , we expect Vτl(βˆτ ; t) to behave approximately like a Gaussian process. One may plot the observed vτl(β˜τ ; t)ofVτl(βˆτ ; t) to see if it is an unusual realization of a zero-mean normal process. For the animal study, the above process vτl(·) is plotted against the dose level {x1ij } and is displayed in the solid curves in Figure 4. Next, the solid curves in Figure 4 are compared with the null distribution of Vτl(βˆτ ; t). In the Appendix, we show that this null distribution can be ap- ∗ proximated by the conditional distribution of Vτl(t): ⎡ ⎤ n Ki −1/2 ⎣ ⎦ ∗ n {I(yij − β˜τ xij ≤ 0) − τ}I(xlij ≤ t) Zi + vτl(βτ ; t) − vτl(β˜τ ; t), i=1 j=1 where {Zi,i =1,...,n} is the random sample which generates Uτ in Section ∗ 2. Note that the process {Vτl(t)} can be easily simulated. First, we generate a 12 Li Chen, Lee-Jen Wei, and Michael I. Parzen

∗ random sample {Zi}. For this particular sample, we obtain uτ ,βτ and then a ∗ realization {vτl(t)}. For the animal data, 30 such realizations, displayed by the dotted curves, against the dose level are presented in Figure 4. The solid curves do not seem to be unusual with respect to their dotted counterparts. One can also plot the cumulative sums of {eτij(βˆτ )} against the predicted values {βˆτ xij } based on the process:

n Ki −1/2 Sτ (βˆτ ; t)=n eτij(βˆτ )I(βˆτ xij ≤ t). i=1 j=1

In the Appendix, we show that the null distribution of the process can be ∗ approximated by the conditional distribution of Sτ (t): ⎡ ⎤ n Ki −1/2 ⎣ ⎦ ∗ n {I(yij − β˜τ xij ≤ 0) − τ}I(β˜τ xij ≤ t) Zi + sτ (βτ ; t) − sτ (β˜τ ; t), i=1 j=1 where sτ is the observed Sτ . Thirty realizations displayed by dotted curves ∗ generated from Sτ (t) are given in Figure 5. Again, comparing with those dotted curves the solid curves of the observed sτ (β˜τ ; t) do not seem atypical. If we fit the above animal study data with log(dose) instead of using the original dose level for the median regression model (1), the curves based on ∗ {v0.5,1(β˜0.5; t)} and {v0.5,1(t)} against dose level are given in Figure 6. This model does not seem to fit the data well. Quantile Regression for Correlated Observations 13

(a) quantile = 0.25 Cumulative Residuals -1 0 1

0 50 100 150 200 250 300

Dose

(b) quantile = 0.50 Cumulative Residuals -1.0 -0.5 0.0 0.5 1.0

0 50 100 150 200 250 300

Dose

(c) quantile = 0.75 Cumulative Residuals -2 -1 0 1 0 50 100 150 200 250 300

Dose

Fig. 4. Cumulative sums of residuals against dose level 14 Li Chen, Lee-Jen Wei, and Michael I. Parzen

(a) quantile=0.25 Cumulative Residuals -3 -2 -1 0 1 2

0.75 0.80 0.85 0.90

Predicted Value

(b) quantile=0.50 Cumulative Residuals -1.5 -1.0 -0.5 0.0 0.5

0.85 0.90 0.95 1.00

Predicted Value

(c) quantile=0.75 Cumulative Residuals -4 -2 0 2

0.94 0.98 1.02 1.06

Predicted Value

Fig. 5. Cumulative sums of residuals against predicted value Quantile Regression for Correlated Observations 15 Cumulative Residuals -1.0 -0.5 0.0 0.5 0 50 100 150 200 250 300

Dose

Fig. 6. Cumulative sums of residuals against dose level by fitting a median regression model with logarithm of dose level

Figures 4 and 5 provide much more information regarding the adequacy of the fitted additive quantile regression model than the usual residual plot in Figure 3. One may make the above graphical procedures even more objective by supplementing it with some numerical values that measure how extreme the observed {vτl(t)} and {sτ (t)} are under the fitted model. For example, let Gτl =supt |Vτl(βˆτ ; t)| and gτl be its observed value. Then the probabilities pτl = Pr(Gτl ≥ gτl) would be reasonable candidates for such numerical measures. ∗ These probabilities can be estimated by simulating {Vτl(t)}. For the solid curves ∗ in Figure 4, estimates of such p-values based on 500 realizations of supt{Vτl(t)} are 0.694, 0.346 and 0.906 for τ being 0.25, 0.50 and 0.75, respectively. For Figure 5, the corresponding p-values are 0.998, 0.676 and 0.994, respectively. For Figure 6, the p-value is 0.056. One can also plot the partial sums of {eτij(βˆτ )} against a covariate variable which is not included in the fitted model to assess if it is an important predic- tor. Although the diagnostic plots against individual explanatory variables in Figures 4, 5 and 6 are useful for checking the adequacy of the assumed model, they may not be able to detect, for example, the existence of high-order in- teraction terms in the model. To tackle this problem, one may consider a high dimensional residual plot based on the following multi-parameter process: 16 Li Chen, Lee-Jen Wei, and Michael I. Parzen ⎛ ⎞ n K I(x1ij ≤ t1) i ⎜ ⎟ V ˆ t −1/2 { − ˆ ≤ − } . τ (βτ ; )=n I(Yij βτ xij 0) τ ⎝ . ⎠ i=1 j=1 I(xpij ≤ tp)

where t =(t1, ..., tp) . If the 100τth percentile of Y ,givenx,isβτ x,wewouldex- pect that the partial-sum process {Vτ (βˆτ ; t)} to fluctuate about 0. Let vτ (β˜τ ; t) be the observed Vτ (βˆτ ; t). Using similar arguments in the Appendix, one can show that the null distribution of Vτ (βˆτ , t) may be approximated by that of ∗ Vτ (t), where ⎛ ⎞ n K I(x1ij ≤ t1) i ⎜ ⎟ V∗ t −1/2 { − ˜ ≤ − } . τ ( )=n [ I(yij βτ xij 0) τ ⎝ . ⎠]Zi i=1 j=1 I(xpij ≤ tp)

∗ +vτ (βτ ; t) − vτ (β˜τ ; t). ∗ Again, the distribution of Vτ can be easily obtained through simulation. Pre- sumably one may compare the observed vτ (β˜τ ; t) with a number of realizations ∗ generated from Vτ (t) in a high-dimensional plot to see if there is a lack-of-fit of the assumed model. Unfortunately, this may not be feasible if p>2. On the other hand, numerical lack of fit tests can be easily constructed based on the process Vτ (βˆτ ; t). For example, a large value of supt Vτ (βˆτ ; t) suggests that the fitted model may be misspecified. Using the above simple additive quantile regression model to fit the animal study data with three covariates, the p-values ∗ of this sup-type test based on 500 realizations from Vτ (t) are 0.71, 0.34 and 0.93 for τ being 0.25,0.5 and 0.75 respectively. It can be shown that this sup-type gives an omnibus test. That is, the test is consistent against a gen- eral alternative. An omnibus test, however, may not be very powerful against some particular alternatives. In practice, we recommend that both numerical and graphical methods proposed here should be used for model checking.

5Remarks

The new procedure can be used to analyze data comprised of a group of repeated measurements over time. This type of correlated data is often encountered in medical studies. It is important to note that for such repeated measurements, our method is valid when the missing observations are missing completely at random. For the usual mean regression problem, several methods have been proposed to analyze the longitudinal data which are subject to informative cen- soring (Wu and Carroll, 1988; Baker, Wax and Patterson, 1993; Rotnitzky and Robins, 1995). Recently, Lipsitz, Fitzmaurice, Molenberghs and Zhao (1997) proposed a set of weighted estimating equations for quantile regression to han- dle the data whose missing observations are missing at random. Their novel Quantile Regression for Correlated Observations 17 proposal, however, does not have theoretical justification. It would be interest- ing to investigate if the resampling method discussed in the present paper is applicable to those weighted estimating functions. We proposed a graphical method for checking the adequacy of the assumed quantile regression model. More work is clearly needed in this area. Recently, a goodness of fit process for quantile regression analogous to the conventional R2 statistic of least squares regression has been introduced for independent observations (Koenker and Machado, 1999). Their related tests are based on some sup-type statistics. They mentioned it is possible to expand the test to Cramer-von-Mises forms, where the test would be based on an integral of the square of the regression quantile process over τ. It would be useful to study if the methods can be extended to handle the quantile regression with correlated observations.

Acknowledgments: The authors are grateful to Dr. Robert Gray for his helpful comments on this paper and to Dr. Paul Catalano for providing the dataset.

Appendix

∗ We will give a heuristic justification of using the distribution of {Vτl(t)} to approximate that of {Vτl(βˆτ ; t)}. With a minor modification of Theorem 1 in Lai and Ying (1988), one can show that there exists a deterministic row vector A(t) such that

1/2 Vτl(βˆτ ; t)=Vτl(βτ ; t)+A(t)n (βˆτ − βτ )+o(1), a.s.

Also,

∗ 1/2 ∗ vτl(βτ ; t)=vτl(β˜τ ; t)+A(t)n (βτ − β˜τ )+o(1). (A.1)

Recall that βˆτ is a solution to the estimating equations: Wτ (β) = 0. It follows 1/2 from the argument in Appendix 2 of Parzen et al. (1994) that n (βˆτ − βτ ) ≈ Cτ Wτ (βτ ), where Cτ is a deterministic matrix. Hence the limiting distribution of {Vτl(βˆτ ; t)} is the same as that of ⎡ ⎤ n Ki −1/2 ⎣ ⎦ n {I(yij − β˜τ xij ≤ 0) − τ}I(xlij ≤ t) Zi + A(t)Cτ Uτ . (A.2) i=1 j=1

∗ ∗ On the other hand, the random vector βτ is a solution to wτ (βτ )=−Uτ . Again, 1/2 ∗ from the argument in Appendix 2 of Parzen et al. (1994), n (βτ − β˜τ )= Cτ Uτ + o(1). This, coupled with (A.1), gives

∗ vτl(βτ ; t) − vτl(β˜τ ; t)=A(t)Cτ Uτ + o(1). (A.3) 18 Li Chen, Lee-Jen Wei, and Michael I. Parzen

From (A.2) and (A.3), we obtain the desired asymptotic equivalence. ∗ Next, we show that Sτ (βˆτ ; t)andSτ (t) have the same limiting distribution. Taking linear expansion of Sτ (βˆτ ; t)atβτ we can approximate Sτ (βˆτ ; t)by 1/2 Sτ (βτ ; t)+D(t)n (βˆτ − βτ ) with a deterministic row vector D(t). Likewise, ∗ 1/2 ∗ we can approximate sτ (βτ ; t) − sτ (β˜τ ; t)byD(t)n (βτ − β˜τ ), where the slope ∗ remains the same because βτ and β˜τ are close to βτ .Thus

n Ki n Ki −1/2 −1/2 Sτ (βˆτ ; t)=n eτij(βτ )I(βτ xij ≤ t)+D(t)Cτ n eτij(βτ )xij i=1 j=1 i=1 j=1

+o(1), and ⎡ ⎤ n Ki ∗ −1/2 ⎣ ⎦ Sτ (t)=n eτij(β˜τ )I(β˜τ xij ≤ t) Zi i=1 j=1 ⎡ ⎤ n Ki −1/2 ⎣ ⎦ +D(t)Cτ n eτij(β˜τ )xij Zi + o(1). i=1 j=1

By comparing the preceding two approximations, it is clear that Sτ (βˆτ ; t)and ∗ Sτ (t) converge to the same limiting distribution.

References

Baker, S. G., Wax, Y. and Patterson, B. (1993). Regression analysis of grouped survival data: Informative censoring and double sampling. Biometrics 49, 379–390. Barrodale, I. and Roberts, F. (1973). An improved algorithm for discrete L1 linear approximations. SIAM, Journal of Numerical Analysis 10, 839–848. Bassett, G. Jr. & Koenker, R. (1978). Asymptotic theory of least absolute error regression, J. Am. Statist. Assoc. 73, 618–622. Bassett, G. Jr. and Koenker, R. (1982). An empirical quantile function for linear models with iid errors, J. Am. Statist. Assoc. 77, 407–415. Bloomfield, P. & Steiger, W. L. (1983). Least Absolute Deviations: Theory, Applications, and Algorithms. Birkhauser, Boston, Mass. Chamberlain, G. (1994). Quantile regression, censoring and the structure of wages. In Proceedings of the Sixth World Congress of the Econometrics Society (eds. C. Sims and J.J. Laffont). New York: Cambridge University Press. Efronm R. & Tibshirani, R. (1986). Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statist. Sci. 1, 54–75. Quantile Regression for Correlated Observations 19

Jung, S. (1996). Quasi-likelihood for median regression models. J. Am. Statist. Assoc. 91, 251–257. Koenker, R. (1994). Confidence intervals for regression quantiles. Proc.ofthe 5th Prague Symp. on Asymptotic Stat., 349–359, Springer-Verlag. Koenker, R. and Bassett, G. Jr. (1978). Regression quantiles. Econometrica 84, 33–50. Koenker, R. and Bassett, G. Jr. (1982). Tests of linear hypotheses and L1 estimation. Econometrica 50, 1577–1584. Koenker, R. and D’Orey, V. (1987). Computing regression quantiles. Applied Statistics 36, 383–393. Koenker, R. and Machado, J. A. F. (1999). Goodness of fit and related inference processes for quantile regression. J. Am. Statist. Assoc. 94, 1296–1310. Lai, T. L. and Ying, Z. (1988). Stochastic integrals of empirical-type processes with applications to censored regression. Journal of Multivariate Analysis 27, 334–358. Laird, N. M. and Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics 38, 963–974. Liang, K. Y. and Zeger, S. L. (1986). Longitudinal data analysis using general- ized linear models. Biometrika 73, 13–22. Lin, D. Y., Wei, L. J. and Ying, Z. (2002). Model-checking techniques based on cumulative residuals. Biometrics 58, 1–12. Lipsitz, S. R., Fitzmaurice, G. M., Molenberghs, G. and Zhao, L. P. (1997). Quantile regression models for longitudinal data with drop-out: Application to CD4 cell counts of patients infected with the human immunodeficiency virus. Applied Statistics 46, 463-476. Mosteller, F. and Tukey, J. W. (1977). Data Analysis and Regression: A Sec- ondary Course in Statistics. Addison-Wesley. Parzen, M. I., Wei, L. J. and Ying, Z. (1994). A resampling method based on pivotal estimating functions. Biometrika 81, 341–350. Portnoy, S. (1992). A regression quantile based statistic for testing non-stationary errors. In Nonparametric Statistics and Related Topics, ed. by A.Saleh, 191–203. Rotnitzky, A. and Robins, J. M. (1995). Semi-parametric regression estimation in the presence of dependent censoring. Biometrika 82, 805–820. Tyl, R.W., Price, M. C., Marr, M. C. and Kimmel, C. A. (1988). Developmental toxicity evaluation of dietary di(2-ethylhexyl)phthalate in Fisher 344 rats and CD-1 mice. Fundamental and Applied Toxicology 10, 395–412. Wei, L. J. and Johnson, W. E. (1985). Combining dependent tests with incom- plete repeated measurements, Biometrika 72, 359–364. Wu, M. C. and Carroll, R. J. (1988). Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics 44, 175–188.