V3302197 Prediction and Tolerance Intervals with Transformation And
Total Page:16
File Type:pdf, Size:1020Kb
0 1991 American StatistIcal Association and TECHNOMETRICS, MAY 1991, VOL. 33, NO. 2 the American Society for Quality Control Prediction and Tolerance Intervals With Transformation and/or Weighting Raymond J. Carroll David Ruppert Department of Statistics School of Operations Research Texas A&M University and Industrial Engineering College Station, TX 77843 Cornell University Ithaca, NY 14853 We consider estimation of quantiles and construction of prediction and tolerance intervals for a new response following a possibly nonlinear regression fit with transformation and/or weighting. We consider the case of normally distributed errors and, to a lesser extent, the nonparametric case in which the error distribution is unknown. Quantile estimation here follows standard theory, although we introduce a simple computational device for likelihood ratio testing and confidence intervals. Prediction and tolerance intervals are somewhat more difficult to obtain. We show that the effect of estimating parameters when constructing tol- erance intervals can be expected to be greater than the effect in the prediction problem. Improved prediction and tolerance intervals are constructed based on resampling techniques. In the tolerance interval case, a simple analytical correction is introduced. We apply these methods to the prediction of automobile stopping distances and salmon production using, respectively, a heteroscedastic regression model and a transformation model. KEY WORDS: Bootstrap; Heteroscedasticity; Nonadditive errors; Nonlinear regression; Power transformations; Transform-both-sides model; Variance function es- timation. The purpose of this article is to discuss prediction Prediction intervals are easily constructed for the and tolerance intervals and quantile estimation in multiple linear regression model yj = x,‘p + E;(i = nonlinear regression when the distribution of the re- 1, . ) n), where the E; are independent normal sponse is skewed or has a nonconstant variance. random variables with a constant variance 02. Let X An important use of a regression model is the pre- be the n x p matrix whose ith row is x,‘. The (1 - diction of future responses at known future values CZ)100% prediction intervals for a future response, of the set of independent variables. For example, say yr, given a set of values, xf, of the independent Ezekiel and Fox (1959) and others analyzed a set of variables is data in which the independent variable is the speed j$ + Bt,-,,,-a,2[1 + xfT(xrx))lxf] of an automobile and the response is the stopping (1) distance. The model E(D,) = plSi + &S:, where D (Neter and Wasserman 1974, p. 233). Here tk.” is the is distance and S is speed, fits these data rather well. uth quantile of the t distribution with k df. The sec- The regression coefficients are probably not of in- ond term in brackets is the correction for estimation trinsic interest. Estimation of the mean responsemight of ,!3, and the use of the t rather than the normal be important, but for purposes such as highway de- quantile compensates for estimation of C. Both cor- sign and setting of speed limits even knowledge of rections can be important for small ~1.Nonetheless, the mean responseis inadequate. Instead, one would the uncertainty in predicting y, is mostly due to its want estimates of extreme quantiles of D given S or variability about its mean, not to uncertainty about prediction intervals that, with some prescribed prob- the parameters. This means that interval (1) can be ability, cover the actual stopping distance at a given drastically inaccurate if, as in the stopping distance value of S. In this data set, there is considerable data, either the assumption of normality or of homo- variation about the mean since the 63 observations scedasticity is violated. come from a variety of automobiles, drivers, and Suppose, for example, that the responsehas a con- road surfaces. Another important feature of these stant coefficient of variation and that the ratio of the data is that both the variance and the mean of D largest to smallest expected response is 10. Then the increase with S. ratio of the lengths of the prediction intervals at these 197 198 RAYMOND J. CARROLL AND DAVID RUPPERT two extremes should also be 10, rather than 1 as stroys this relationship. Instead, Carroll and Ruppert would happen if (1) were used. (1984) suggested the transform-both-sides (TBS) In our experience, both skewed error distributions model and nonconstant variance are common in practice. To construct accurate prediction intervals under these h(Yt 7 Al> = h{f(xi3 PO), &I + oOEi~ (4) circumstances, one must model the skewness and The transformation in Model (4) can often induce heteroscedasticity of the error distribution, and the normally distributed errors. If the conditional vari- transformation and weighting models that were ex- ance of y given x is a function of the conditional tensively discussed by Carroll and Ruppert (1988) mean, then the transformation can also induce homo- form an effective methodology for doing this. scedasticity. It may happen, however, that the vari- Carroll and Ruppert (1988) used transformation/ ance of y depends on x not through the mean of y weighting models to give point estimates of the con- or that the normalizing transformation and the vari- ditional quantiles of y, given xf and suggestthe (1 - ance stabilizing transformation differ substantially. (y/2) and a/2 quantiles as the upper and lower end- Then, Carroll and Ruppert (1988, chap. 5) suggested points of a “naive” prediction interval-naive be- a transformation/weighting model cause there is no adjustment for the estimation of parameters. Except where the errors are exactly, or at least nearly, normally distributed with a constant Transformation/weighting models were applied to variance, this interval is certainly preferred to (l), fisheries analysis, Michaelis-Menten kinetics, and since the adjustment for skewness and heterosce- herbicide dose-response modeling by Ruppert and dasticity can be much larger than the correction for Carroll (1985)) Ruppert, Cressie, and Carroll (1989)) estimation of parameters, especially for moderately and Rudemo, Ruppert, and Streibig (1989), respec- large data sets. tively. A nonlinear weighting or variance function model Models (3)-(5) are all special casesof the general is model y; = f(x;, /AI> + g(x;, PO,~“o)Ei, (2) NY;, A) = 4x;, /Jo,4,) + g(x,, A,, 4,)~;, (6) where g2 models the conditional variance of yj given xi. It is often convenient to write g(x, /?,0) = og*(x, where the si are iid with distribution F. We will con- /?, v)(G = (a, qr)r), where 0 is a scale parameter sider both the normal case in which F = @ and the ’ and g* models the structure of the heteroscedasticity. nonparametric case in which F is unknown. For ex- The parameterization using t) is notationally simpler ample, if m(x, /I, 1,) equals h{f(x, p), A} then (6) is and will be used as much as possible, but in certain the TBS/weighting model (5) or the ordinary TBS applications-for example, likelihood ratio (LR) model (4) if g* = 1. If h is the identity transformation confidence intervals for quantiles in Section 1.3-the (A = l), then (6) is a weighting model (2). Finally, parameterization using (a, ylr)r is needed. If g*(x, (6) becomes the Box-Cox model (3) if g* = 1 and p, 0) = g{f(x, B)}, then one has a standard quasi- m(x, P, 4 = f(x, P). 1n model (6), the x,‘s could be likelihood model (Wedderburn 1974). Estimation of either fixed or random. In the latter case, we con- the variance parameters in (2) was studied by Da- dition on the Xi'S. vidian and Carroll (1987). Weighting models are ap- We will develop a prediction interval and toler- propriate when the error distribution is known to be, ance interval methodology for Model (6) that can say, normal but the variance is not constant. then be applied to any of these special cases.Because If the errors appear skewed, then an effective Model (6) is nonlinear in the regression parameters modeling strategy is to assume that after an appro- as well as the transformation and variance parame- priate “symmetrizing” transformation they are nor- ters, exact inference methods do not seem possible. mal. Let h(u, A) be a monotone transformation func- All interval estimates developed in this article are tion in u for given 1. The most popular such function approximate or large-sample in nature, though some is the power transformation h(u, 2) = (u” - 1)/E,, are accurate to a higher order than others. (1 # 0); = log(u)(A = 0). The Box-Cox (1964)model In Section 1, two methods of interval estimation is of the quantiles of y are discussed: the delta method and LR testing. We show that computation in the h(yi, A) = f(xi7 DOJO) + oOEi, (3) latter case is facilitated by converting a constrained optimization problem into an unconstrained prob- where typically f(x,, ,!3) is a linear model. If y is lem. known to fit the mean model f, then the Box-Cox In Section 2, we discuss prediction intervals when model is inappropriate, since the transformation de- E is normally distributed as well as when its distri- TECHNOMETRICS, MAY 1991, VOL. 33, NO. 2 PREDICTION AND TOLERANCE INTERVALS 199 bution is unknown. There are two difficulties, non- collect a series of data sets and to make one or several linearity of the regression function and the need to predictions from each. In many situations, however, estimate the transformation and weighting parame- one will have a single data set and will make many ters. For the construction of prediction intervals in future predictions. In such a case one would like to nonlinear regression, Seber and Wild (1989, p.