Quantile Regression for Correlated Observations
Li Chen1, Lee-Jen Wei2,andMichaelI.Parzen
1 Division of Biostatistics, University of Minnesota, Minneapolis, MN 55414 2 Department of Biostatistics, Harvard University, Boston, MA 02115 3 Graduate School of Business, University of Chicago, Chicago, IL 60637
Abstract
In this paper, we consider the problem of regression analysis for data which consist of a large number of independent small groups or clusters of correlated observations. Instead of using the standard mean regression, we regress various percentiles of each marginal response variable over its covariates to obtain a more accurate assessment of the covariate effect. Our inference procedures are derived using the generalized estimating equations approach. The new proposal is robust and can be easily implemented. Graphical and numerical methods for checking the adequacy of the fitted quantile regression model are also proposed. The new methods are illustrated with an animal study in toxicology. Key Words: Estimating equations; Gaussian process; Linear Programming; Om- nibus test; Resampling method
1 Introduction
Although quite a few useful parametric and semi-parametric regression meth- ods are available for analyzing correlated observations, they can only be used to evaluate the covariate effect on the mean of the response variable (Laird and Ware, 1982; Liang and Zeger, 1986). To obtain a global picture about the covariate effect on the distribution of the response variable, one may use the quantile regression model. Specifically, let τ be a constant between 0 and 1, Y be the response variable and x be the corresponding (p +1)× 1covariate vector. Given x, let the 100τth percentile of Y be βτ x,whereβτ is an unknown (p +1)× 1 parameter vector and may depend on τ. Inference procedures for βτ with a set of properly chosen τ’s would provide much more information about the effect of x on Y than their counterparts based on the usual mean regression model (Mosteller and Tukey, 1977). For independent observations, inference procedures for βτ have been proposed, for example, by Bassett and Koenker 2 Li Chen, Lee-Jen Wei, and Michael I. Parzen
(1978, 1982), Koenker and Bassett (1978, 1982) and Parzen et al. (1994). When τ =1/2, which corresponds to the median regression model, the celebrated L1 estimator which minimizes the sum of the absolute residuals is consistent for β0.5 (Bloomfield and Steiger, 1983). Recently, Jung (1996) proposed an interesting quasi-likelihood equation ap- proach for median regression models with dependent observations. However, his method assumes a known relationship between the median and the den- sity function of the response variable. The variance estimate of his estimator for the regression parameter appears to be rather sensitive to this assumption. Moreover, Jung’s optimal estimating equations may have multiple roots and, therefore, the estimator for βτ may not be well-defined. In this paper, we present a simple and robust procedure to make infer- ences about βτ without imposing any parametric assumption on the density function of the response variable or on the dependent structure among those correlated observations. Furthermore, our estimating functions are monotonic component-wise and the resulting estimator for the regression parameter can be easily obtained through well-established linear programming techniques. The new proposal is illustrated with an animal study in toxicology.
2 Inferences for Regression Parameters
In this section, we derive regression methods for analyzing data that consist of a large number of independent small groups or clusters of correlated observations. Let Yij be the continuous response variable for the jth measurement in the ith cluster, where i =1, ..., n; j =1, .., Ki,whereKi is relatively small with respect to n.Letxij be the corresponding covariate vector. Furthermore, assume that the 100τth percentile of Yij is βτ xij . The observations within each cluster may be dependent, but (Yij ,xij )and(Yi j ,xij ) are independent when i = i .Note that the distribution function Fτij(·) of the error term (Yij −βτ xij ) is completely unspecified and may involve xij . Suppose that we are interested in βτ for a particular τ. If all the observations {(Yij ,xij )} are mutually independent, the following estimating functions are often used to make inferences about βτ :