A Resampling Method Based on Pivotal Estimating Functions

Biometrika (1994), 81,2, pp. 341-50 Printed in Great Britain

A resampling method based on pivotal estimating functions

BY M. I. PARZEN Graduate School of Business, University of Chicago, Chicago, Illinois 60637, U.S.A. L. J. WEI Department of Biostatistics, Harvard University, Boston, Massachusetts 02115, U.S.A. AND Z. YING Department of Statistics, University of Illinois, Champaign, Illinois 61820, U.S.A.

SUMMARY Suppose that, under a semiparametric model setting, one is interested in drawing inferences about a finite-dimensional parameter vector /? based on an estimating function. Generally a consistent point estimator /J for /?0, the true value for /J, can be easily obtained by finding a root of the corresponding estimating equation. To estimate the variance of ft, however, may involve complicated and subjective nonparametric functional estimates. In this paper, a general and simple resampling method for inferences about jS0 based on pivotal estimating functions is proposed. The new procedure is illustrated with the quantile and rank regression models. For both cases, our proposal can be easily and efficiently implemented with existing statistical software.

Some key words: Bootstrap; Pivot; Quantile regression; Rank regression.

1. INTRODUCTION Suppose that we are interested in drawing inferences about /?, a p x 1 vector of parameters, based on a random observable quantity X through an estimating function Sx(fi). If E{SX{PO)} =0, a consistent point estimator /? for p0, the true value of p, usually can be obtained by solving the equation: Sx(f}) = 0. If the function SX(P) is smooth enough in /?, generally the distribution of /? is approximately normal with mean f}0. The corresponding 1 1 variance is simply A~ (P0)v&r {Sx(fl0)} {A'Uio)}' , where A{fl) is the expected value of the derivative of SX(P) with respect to /?. Inferences about /?0 may then be made based on these large-sample properties of /?. Often, under a semiparametric model setting, the estimating function S may not be smooth. Although under some regularity conditions the large-sample distribution of ft is still normal, the above matrix A(p) may involve the unknown underlying density functions. Complicated and subjective nonparametric functional estimates are needed to estimate the matrix A. For example, suppose that we are interested in fitting the data X, which consist of the responses {Yt, i = 1,..., n} and the covariates {zh i = 1,..., n}, with a simple heteroscedastic median regression model. That is, Y( = P'Zi + e,-, where e; has median 0 for i = 1,..., n (Koenker & Bassett, 1978). The error terms e's are assumed to be independent, but may not be identically distributed. Conditioning on the independent variable z, the distribution function of e is completely 342 M. I. PARZEN, L. J. WEI AND Z. YING unspecified. A commonly-used estimating function S for ft is

where /(.) is the indicator function. Note that this estimating function S is not continuous in p. Recently, in an unpublished paper from Harvard Institute of Economic Research, G. Chamberlain has shown that for large n the distribution of /? is approximately normal, but the variance of /? is rather difficult to estimate well directly for the heteroscedastic case. In this paper, a general and simple resampling method based on pivotal estimating functions SX(P) is proposed for inferences about /?0. The new procedure, given in § 2, does not involve any complicated and subjective nonparametric functional estimate. Our proposal is illustrated with quantile and rank regression models in §§ 3 and 4. For both cases, we show that the new method can be easily and efficiently implemented with existing statistical software.

2. A GENERAL RESAMPLING METHOD

Suppose that the random vector Sx(f}0) is exactly or asymptotically pivotal. That is, the exact or limiting distribution of Sx(fi0) can be generated by a p x 1 random vector U whose distribution function is completely known or can be estimated consistently. For example, U may be a mean 0 Gaussian random vector whose covariance matrix can be estimated consistently from the data. Let x be the observed value of the random quantity X. Then, at least in theory, a 1 — a confidence region D for f}0 may be obtained by collecting all the P's such that Sx(j3) e C, where pr (U e C) is at least 1 — a, exactly or asymptotically. In practice, however, it may be difficult to choose an appropriate set C in Rp to construct the desired region D, especially when p is large. Now, let us define a random vector pv which is a solution to the stochastic equation: = Sx(Pu) U. If Sx(Po) is exactly pivotal and Sx(f}) is a one-to-one function in /?, then pv generates a joint fiducial distribution of /? (Buehler, 1983, pp. 76-7). When p = 1, one may use this distribution to choose a desired 1 — a fiducial interval D which, by definition, is also a 1 — a confidence interval for /?0. With the presence of nuisance parameters, however, such an interval D obtained from the marginal fiducial distribution for the parameter of interest may not have the correct coverage probability (Buehler, 1983, pp. 78-9). We show in this paper that, for a rather general estimating function Sx((}) which satisfies two mild conditions (AM) and (Al-2) in Appendix 1, the above D is a valid confidence interval asymptotically. Specifically, we demonstrate that, for any realization x of X, the conditional distribution of (/? — flv) is asymptotically identical to the unconditional distribution of (ft — Po) (Appendix 1), where ft is the observed ft. If the distribution of f}v can be easily generated, inferences about any specific component of p0 may be made based on the corresponding marginal distribution of pv. In practice, the distribution of pv can be estimated using a resampling method through U. First, we generate a large random sample {uy,j = l,...,M} from U. Then, for each realized sample Uj, we obtain a solution flUj by solving the equation: Sx(PUj) = Uj (j = 1,..., M). The theoretical distribution of pv can be approximated by the usual empirical distribution function based on {fiUj; j = 1,..., M). It is important to note that to make the above resampling method feasible for practical usage, efficient numerical algorithms for solving the equation: Sx(/?u) = u are needed. In the next two sections, we use quantile and rank regression models to illustrate our Pivotal estimating functions 343 proposal. For both cases, reliable and easily accessible statistical software exists for solving the above equation.

3. HETEROSCEDASTIC QUANTILE REGRESSION

Let T be a number between 0 and 1. Let Yt be the ith response variable and z, be the corresponding covariate vector, i = l,...,n. Also, let the lOOrth percentile of Yt be P'ozt. Then, one may use the following estimating function SX(P) to make inferences about f}0: z n^t MYi-p'zi^0)-x}. (31) i = l A A solution /} to the equation: SX(P) = 0 is a consistent point estimator of /?0. In practice, /? may be obtained by minimizing

i = l where pT(v) is xvifv^ 0, and is (T — l)v, if v < 0 (Bassett & Koenker, 1982). This optimiz- ation problem can be easily handled by linear programming techniques (Barrodale & Roberts, 1973). An efficient algorithm developed by Koenker & D'Orey (1987) is available in S-PLUS to obtain such an estimate /) for the quantile regression model. Note that SX(PO) is a pivotal quantity. Its distribution can be generated exactly by a random vector U which is a weighted sum of independent and centred Bernoulli variables: n~* £ zA^ — T), where the sum is over the range i = 1,...,«, and where {^,} is a random sample from a Bernoulli random variable with a 'success' probability T. In Appendix 2, we show that the estimating function (31) satisfies (All) and (Al-2). Therefore, for large n, the distribution of (/? — /J ) can be approximated by the conditional distribution of C o To generate the distribution of f}v empirically, we need to know how to solve efficiently the equation, Sx(/?) = u, for a given realization u of U. To this end, let yt be the observed value of Y{ (i = 1,..., n). We then artificially create an extra data point yn + l and zn + 1, where zn + 1 is M*U/T and yn + l is an extremely large number such that I()>n+i — P'zn+i < 0) is always 0. Furthermore, let S*{P) be

Then, rinding a solution Sx(p) = u is equivalent to solving the equation: S*(p) = 0, which can be done using existing statistical software for the point estimation of the quantile regression coefficients. Now, we use an example to illustrate the above procedure. The data of the example were collected on survival times in patients undergoing a particular type of liver operation (Neter, Wasserman & Kutner, 1985, p. 419). The aim of the study was to examine the effects from various prognostic variables on patient's survival. Observations from 54 patients were collected in this example. From each patient record, four potential predic- tors were extracted from the preoperational evaluation. They are blood clotting score, prognostic index, enzyme function test score, and liver function test score. With the intercept term, the covariate z is a 5 x 1 vector and the response variable is the base 10 logarithm of the patient's survival time. For simplicity, we only analyzed the data with a median regression model. That is, we let x be 0-5 in (31). The resulting point estimate ft is reported in Table 1 under the headings 'median regression' and 'new method'. To obtain 344 M. I. PARZEN, L. J. WEI AND Z. YING an estimate of the covariance matrix for /?, we generated 1000 samples {uj} from the random vector U. The covariance matrix of ft was then estimated by f 1000 1

I j = i ) The corresponding estimated standard errors are reported in Table 1. It appears that only the fourth covariate 'liver function test score' is not a significant predictor for the patient's survival. Table 1. Estimation of regression coefficients for the surgical unit example Median regression Rank regression Mean regression New method Bootstrap STATA New method Least squares Intercept 0-4151(00649) (00671) (00310) - 0-4868(00500) BCS 00710(0-0075) (0-0071) (0-0032) 00714(00048) 00685(0-0054) PI 00098(00005) (00004) (00002) 00094(00004) 00092(00004) EFTS 00097(00003) (0-0003) (00002) 00096(0-0003) 00095(00004) LFTS 00029(00092) (00083) (00057) 00023(00068) 00011(00096) Estimated standard errors in parentheses. BCS, blood clotting score; PI, prognostic index; EFTS, enzyme function test score; LFTS, liver function test score. We also analyzed the present data with the heteroscedastic boostrap method (Efron, 1982, p. 36). That is, we resampled the (y, z) pairs with replacement instead of resampling the residuals from the fitted regression model. For each bootstrap sample, we obtained an estimate /?B using the estimating function (31). The standard error estimates under the heading 'boostrap' presented in Table 1 are based on 1000 bootstrap samples. The results from the bootstrap method are very similar to those obtained from our resampling procedure. As far as we know, however, there is no analytical proof that the boostrap method is valid for the general quantile regression model. In Table 1 we also report the results from an analysis done with the commercial statistical package STATA. The standard error estimates obtained from the bootstrap and the new method are larger than those from STATA. Note that, regardless of the technique used to estimate the standard errors, the point estimates are all the same. It is also important to note that the variance estimates of the median regression coefficient estimates used in STATA were obtained under a strong assumption, that is, the errors {ej are independent of the covariates {z,} (Computing Resource Center, 1992, p. 135). For the heteroscedastic case, those variance estimates may not be valid. For practical sample sizes, it is important to know if it is appropriate to use the resampling distribution of (j? — fiv) to approximate the unconditional distribution of (/? — /Jo). To this end, simulation studies were carried out. The results indicate that under the median regression model with the estimating equation (31) this approximation is fairly satisfactory even for moderate sample sizes. For example, in one of the numerical studies, we generated 1000 samples {(yh zj, i = 1,..., 50} with jl0 = (0,1,1)'. For each of those 1000 samples, the first components of {z,} were all l's and the second components were a realization of a random sample from a Bernoulli population with a success probability \. The third components were a realization of a random sample from the standard normal, which was independent of the previous Bernoulli variable. The {yt} were then generated with the errors {e,} being a random sample from various distributions. For each simulated sample {(yh z,), i = 1,..., 50}, the distribution of (/? — pv) was estimated based on 1000 samples from U. Pivotal estimating functions 345 The standard, percentile, and bias-corrected methods (Efron & Tibshirani, 1986, pp. 67-70) were then used to construct confidence intervals of the regression coefficient corresponding to the continuous covariate. The empirical coverage probabilities and estimated average lengths for these intervals are summarized in Table 2. In general, we findtha t the resampling interval procedure with the standard method performs well. The percentile method appears to be slightly conservative while the bias-corrected method does not appear to offer significant improvement over the standard method. For comparisons, we also report the results based on the boostrap method and the procedure used in STATA in the table. We find that the intervals obtained from STATA and the bootstrap method are similar to ours in terms of the empirical coverage probability and the average interval length. However, if the variance of e,- is not constant, for example the variance is proportional to the absolute value of the continuous covariate, the empirical coverage probabilities of confidence intervals obtained from the commercial package STATA can be extremely low; see Table 2(c). To examine the adequacy of usjng the distribution of (/? — pv) to approximate the unconditional distribution of (/? — jS0) globally, 1000 random samples {(yt, z,), i = 1,..., 200} were generated from the above model. The distribution of (j8 — /?0) was estimated using the empirical distribution function based on the corresponding 1000 realized /Ts. The resulting distribution function for the third component of (/? — f}0) is given in Fig. 1. We then randomly selected two samples from the above 1000 samples. For each selected sample, the distribution of (J3 — (5V) is estimated based on 1000 random samples generated from U. The two estimated distribution functions, denoted by the dotted curves, for the third component of (J3 — ($v) are also displayed in Fig. 1. These two dotted curves appear to be fairly accurate approximations to the solid curve^ which should be very close to the true distribution function of the third component of (/? — /?0).

(a) 1-0- 0-8- True / 0-6 Resampled / 0-4- / 0-2- 00 • -010 -005 00 005 010 P~Po (b) 10 0-8 True f 0-6 Resampled / 0-4- / 0-2 00 -010 -0-05 00 005 010 P~Po

Fig. 1. Comparison of distribution functions for /? — /?0 versus fi — Pv 346 M. I. PARZEN, L. J. WEI AND Z. YING Table 2. Empirical coverage probabilities (ECP) and estimated mean lengths (EML) for various interval procedures (a) Gaussian error with mean 0 and variance 0-5

Confidence Resample Bootstrap STATA level ECP EML ECP EML ECP EML 0-95 s 0-95 0-62 0-95 0-59 0-97 058 p 0-98 0-62 0-98 0-59 B 0-93 0-64 0-94 0-61 0-90 S 0-92 0-51 0-91 0-49 094 049 P 0-95 0-51 0-94 0-49 B 0-90 0-53 0-89 0-51 0-85 S 0-88 0-45 0-86 0-43 091 043 P 0-91 0-43 0-89 0-42 B 0-87 0-47 0-85 0-45

0 25 5 (b) Lognormal error with mean e - am1 variance (e — e°'i )

Confidence Resample Bootstrap STATA level ECP EML ECP EML ECP EML

095 S 0-97 0-62 0-96 0-60 097 059 P 0-98 0-62 0-98 0-60 B 0-95 0-65 0-93 0-63 0-90 S 0-92 0-52 0-91 0-50 095 049 P 0-95 0-51 0-94 0-49 B 0-88 0-54 0-87 0-52 0-85 S 0-88 0-46 0-87 0-44 092 043 P 0-91 0-44 0-89 0-42 B 0-85 0-47 0-82 0-46

Confidence Resample Bootstrap STATA level ECP EML ECP EML ECP EML

0-95 S 0-95 0-66 0-95 0-65 060 029 P 0-97 0-65 0-95 0-64 B 0-94 0-66 0-93 0-64 0-90 S 0-91 0-55 0-90 0-54 053 024 P 0-92 0-53 0-91 0-53 B 0-90 0-55 0-88 0-53 0-85 S 0-87 0-48 0-86 0-47 047 021 P 0-87 0-47 0-87 0-46 B 0-85 0-48 0-83 0-47 s, standard method; p, percentile method; B, bias correction method.

4. RANK REGRESSION

Again, let us assume that 1^ = P'zt + e,- (i = 1,..., ri)bu t {e,} are now independent and identically distributed. Note that the vector /J does not include the intercept term. The estimating function SX(P) based on ranks is tP'zM, (41) Pivotal estimating functions 347 where is an increasing function, i?(e,) is the rank of ef among {sj, j = 1,..., n} and z is the mean of {z,} (Hettmansperger, 1984, p. 235). Note that the function (j> may depend on n. The limiting variance of n*/? depends on the unknown density function of the error term (Hettmansperger, 1984, p. 241). It is easy to see that /? is a minimizer of the function:

1 = 1 which is a nonnegative, continuous, and convex function of B. An efficient statistical program called RREGRESSION in MINITAB is available to obtain such an estimate /?. Note that SX(PQ) is a pivotal quantity. Its distribution can be generated exactly by the random vector U: £(z,- — z)$(r\i), where the sum is over i=l,...,n, and where (n^,..., nn) is a random permutation of (1,..., n). If there are ties, we use mid-ranks. In Appendix 3, we show that under the rank regression model the distribution of (/? — /?0) can be approximated by the distribution of {Bv — fl). Now, given a realization u from U, consider the problem of solving the equation: SX{B) = u. As for quantile regression, we artificially create an extra data point (yn+i, zn+1), where yn+1 is an extremely large number such that is always {n + 1), z(n+1) is and ^ = n"1 £^(i)> where the sum is over i = 1,..., n. Let S*(B) be n + l

where z* is the mean of zt for i = 1,..., (n + 1). Then, finding Bu such that £*(/?„) = u is equivalent to solving the equation S*(PU) = 0, which can be done using the RREGRESSION program in MINITAB. For the 'surgical unit' example, the rank estimates for the regression coefficients with the identity function, i.e. the Wilcoxon score function, are fairly close to those for the median regression (Table 1). The corresponding estimated standard errors were obtained from 1000 samples generated from U. For this example, our rank procedure gives smaller confidence intervals than the median regression counterpart does. However, if the assumption of a location model with a linear regression function is false, the rank interval procedures may not be valid. For practical sample sizes, the new resampling method for the rank regression performs well if the distribution of the error term is indeed free of the covariates. For example, with the Wilcoxon score and the same set-up used in the previous section, the empirical coverage probabilities of the 095, 0-90 and 0-85 rank intervals of the regression coefficient corresponding to the dichotomous covariate are 095, 091 and 085, respectively, when the error term is the standard normal and the sample size is- 50. On the other hand, if the error term depends on the covariates, then the rank procedure may produce rather liberal intervals. 5. REMARKS Like the bootstrap method, our proposal is useful when the point estimate /? for the parameter of interest can be easily obtained, but its variance may be difficult to estimate well by conventional methods. In simulating the distribution of f}v, the problem of choosing an appropriate number M of samples {Uj, j,..., M) is similar to that of choosing the number of bootstrap samples 348 M. I. PARZEN, L. J. WEI AND Z. YING in the bootstrap method. In practice, we may construct the desired intervals based on, say, 500 resamples. Then, we repeat the same type of analysis with M = 1000. If there is not much practical difference with respect to the resulting confidence intervals between these two analyses, further resampling may not be needed. Recently Keaney & Wei (1994) have successfully applied the resampling method proposed here to an interesting and important problem in monitoring clinical trials based on median survival times.

ACKNOWLEDGEMENT The authors would like to thank the editor and the referees for helpful comments on the paper. This research was supported by grants from the U.S. National Institute of Health, the U.S. National Science Foundation and the U.S. National Security Agency.

APPENDIX 1

Distribution of {ft - Pv)

Let n be the sample size for X. Suppose that there exist a sequence of constants cn and a nonsingular matrix A such that,

\\SAf})-Sx(P*)-AnHp-p*)

an c almost surely, where sup is over ||/? —/?0 II s% cn d \\P* — Po\\ ^ n- Furthermore, for

inf || Sx(/J) II = ?„-«> (Al-2)

almost surely. We will show that the conditional distribution of n*(j? — pv) given {X = x) is asymptotically the same as the unconditional distribution of n*(/? — p0). The following elementary fact is needed in the proof. Let Wn, W* (n ^ 1) be random vectors. If, for any finite rectangular G,

pt{WmeG)-pt(W*eG)-*0,

then Wn and W* have the same asymptotic distribution. Now, in view of (All) and (Al-2),

almost surely. Thus (Al-3) is satisfied with

l Wn = nHP - Po), W*n = -A- Sx(Po). l It follows that n*(/? — p0) has the same asymptotic distribution as — A~ U. Now, conditional on the observed x, let us consider the sequence of random vectors SX(PV)I( \\ SX(PV) || < yn). It follows from (All) that

/U || ^ yn) + o{ 1 + «* || ft, - or, equivalently,

Since pr (||S;c(/?t/)||

APPENDIX 2 Quantile regression

Suppose that (If — P'ozt) has a unique median at 0 and has a continuous density function /, such that fi(0) > a positive constant. Let Ft be the distribution function for /;. To check if SX(P) satisfies 1/3 (All), one can use Theorem 1 of Lai & Ying (1988) to show that, for \\P~PO\\ <"~ and

sup •0.

It follows from the continuity of/- that

=l Therefore, (All) is satisfied with A being the limit of n~l £/J(0)z,Z;, where the sum is over i = 1,...,«. Since Sx(/?) is a monotonic function in each component of /?, it follows from some n 1/3 elementary calculations that inf || Sx (P) || -»oo with inf being over || /? — /Jo II ^ ~ -

APPENDIX 3 Rank regression

For rank statistics, if we let /?* be /?0, then (All) is a slight generalization of the well-known almost sure linearity for linear rank statistics (Sen, 1981, Ch. 4). Rewrite (41) as

{Fp(t)} dFfiJt), J — where

FpW^n-^IW-p'z^t), - p'zt < 0-

Theorem 1 of Lai & Ying (1988) implies

sup Sx(P)-Sx(P*)-n* dEFpJt)- dEFf •0,

1/3 1/3 where sup is over || P - p01| ^ n~ and || p* - p0 || < n" , and

e - sup Sx(p) — n* I ()>{EFg(t)} dEFgz(t) =o{n ) (A3 l) J — oo for any e>0, where sup is over any bounded region. Since §{EFp(t)} dEFpz(t) is a differentiable 1/3 function of p, (All) is satisfied with cn = n" and A being the limit of

Here F and / are the distribution and density functions of the error term. Furthermore, using the monotonicity property of the rank statistics we can show that the minimum value of

113 1 6 over \\P-Po\)>n~ is at least of the order n ' . By (A31), SX{P) has the same order over WP-PoW ^n113. Therefore, (Al-2) is also satisfied. 350 M. I. PARZEN, L. J. WEI AND Z. YING

REFERENCES

BARRODALE, I. & ROBERTS, F. (1973). An improved algorithm for discrete /t linear approximations. S1AM J. Numer. Anal. 10, 839-48. BASSETT, G. JR. & KOENKER, R. (1982). An empirical quantile function for linear models with iid errors. J. Am. Statist. Assoc. 77, 407-15. BUEHLER, R. J. (1983). Fiducial inference. In Encyclopedia of Statistical Sciences, 3, Ed. S. Kotz and N. L. Johnson, pp. 76-9. New York: Wiley. COMPUTING RESOURCE CENTER (1992). Stata Reference Manual: Release 3, 3, 5th ed. Santa Monica, CA. EFRON, B. (1982). The Jackknife, the Boostrap and Other Resampling Plans. Philadelphia, PA: SIAM. EFRON, R. & TmSHiRANi, R. (1986). Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statist. Sci. 1, 54-75. HETTMANSPERGER, T. P. (1984). Statistical Inference Based on Ranks. New York: John Wiley. KEANEY, K. M. & WEI, L. J. (1994). Interim analysis based on median survival times. Biometrika 81, 279-86. KOENKER, R. & BASSETT, G. JR. (1978). Regression quantiles. Econometrica 84, 33-50. KOENKER, R. & D'OREY, V. (1987). Computing regression quantiles. Appl. Statist. 36, 383-93. LAI, T. L. & YING, Z. (1988). Stochastic integrals of empirical-type processes with applications to censored regression. J. Mult. Anal. 27, 334-58. NETER, J., WASSERMAN, W. & KUTNER, M. H. (1985). Applied Linear Statistical Models, 2nd ed. Homewood, Illinois: Richard D. Irwin, Inc.

[Received September 1992. Revised June 1993]