0 1991 American StatistIcal Association and TECHNOMETRICS, MAY 1991, VOL. 33, NO. 2 the American Society for

Prediction and Tolerance Intervals With Transformation and/or Weighting

Raymond J. Carroll David Ruppert Department of School of Operations Research Texas A&M University and Industrial Engineering College Station, TX 77843 Cornell University Ithaca, NY 14853

We consider estimation of and construction of prediction and tolerance intervals for a new response following a possibly fit with transformation and/or weighting. We consider the case of normally distributed errors and, to a lesser extent, the nonparametric case in which the error distribution is unknown. estimation here follows standard theory, although we introduce a simple computational device for likelihood ratio testing and confidence intervals. Prediction and tolerance intervals are somewhat more difficult to obtain. We show that the effect of estimating parameters when constructing tol- erance intervals can be expected to be greater than the effect in the prediction problem. Improved prediction and tolerance intervals are constructed based on techniques. In the tolerance interval case, a simple analytical correction is introduced. We apply these methods to the prediction of automobile stopping distances and salmon production using, respectively, a heteroscedastic regression model and a transformation model.

KEY WORDS: Bootstrap; ; Nonadditive errors; Nonlinear regression; Power transformations; Transform-both-sides model; function es- timation.

The purpose of this article is to discuss prediction Prediction intervals are easily constructed for the and tolerance intervals and quantile estimation in multiple model yj = x,‘p + E;(i = nonlinear regression when the distribution of the re- 1, . . . ) n), where the E; are independent normal sponse is skewed or has a nonconstant variance. random variables with a constant variance 02. Let X An important use of a regression model is the pre- be the n x p matrix whose ith row is x,‘. The (1 - diction of future responses at known future values CZ)100% prediction intervals for a future response, of the set of independent variables. For example, say yr, given a set of values, xf, of the independent Ezekiel and Fox (1959) and others analyzed a set of variables is in which the independent variable is the speed j$ + Bt,-,,,-a,2[1 + xfT(xrx))lxf] of an automobile and the response is the stopping (1) distance. The model E(D,) = plSi + &S:, where D (Neter and Wasserman 1974, p. 233). Here tk.” is the is distance and S is speed, fits these data rather well. uth quantile of the t distribution with k df. The sec- The regression coefficients are probably not of in- ond term in brackets is the correction for estimation trinsic interest. Estimation of the responsemight of ,!3, and the use of the t rather than the normal be important, but for purposes such as highway de- quantile compensatesfor estimation of C. Both cor- sign and setting of speed limits even knowledge of rections can be important for small ~1.Nonetheless, the mean responseis inadequate. Instead, one would the uncertainty in predicting y, is mostly due to its want estimates of extreme quantiles of D given S or variability about its mean, not to uncertainty about prediction intervals that, with some prescribed prob- the parameters. This that interval (1) can be ability, cover the actual stopping distance at a given drastically inaccurate if, as in the stopping distance value of S. In this data set, there is considerable data, either the assumption of normality or of homo- variation about the mean since the 63 observations scedasticity is violated. come from a variety of automobiles, drivers, and Suppose,for example, that the responsehas a con- road surfaces. Another important feature of these stant and that the ratio of the data is that both the variance and the mean of D largest to smallest expected response is 10. Then the increase with S. ratio of the lengths of the prediction intervals at these

197 198 RAYMOND J. CARROLL AND DAVID RUPPERT

two extremes should also be 10, rather than 1 as stroys this relationship. Instead, Carroll and Ruppert would happen if (1) were used. (1984) suggested the transform-both-sides (TBS) In our experience, both skewed error distributions model and nonconstant variance are common in practice. To construct accurate prediction intervals under these h(Yt 7 Al> = h{f(xi3 PO), &I + oOEi~ (4) circumstances, one must model the and The transformation in Model (4) can often induce heteroscedasticity of the error distribution, and the normally distributed errors. If the conditional vari- transformation and weighting models that were ex- ance of y given x is a function of the conditional tensively discussed by Carroll and Ruppert (1988) mean, then the transformation can also induce homo- form an effective methodology for doing this. scedasticity. It may happen, however, that the vari- Carroll and Ruppert (1988) used transformation/ ance of y depends on x not through the mean of y weighting models to give point estimates of the con- or that the normalizing transformation and the vari- ditional quantiles of y, given xf and suggestthe (1 - ance stabilizing transformation differ substantially. (y/2) and a/2 quantiles as the upper and lower end- Then, Carroll and Ruppert (1988, chap. 5) suggested points of a “naive” -naive be- a transformation/weighting model cause there is no adjustment for the estimation of parameters. Except where the errors are exactly, or at least nearly, normally distributed with a constant Transformation/weighting models were applied to variance, this interval is certainly preferred to (l), fisheries analysis, Michaelis-Menten kinetics, and since the adjustment for skewness and heterosce- herbicide dose-response modeling by Ruppert and dasticity can be much larger than the correction for Carroll (1985)) Ruppert, Cressie, and Carroll (1989)) estimation of parameters, especially for moderately and Rudemo, Ruppert, and Streibig (1989), respec- large data sets. tively. A nonlinear weighting or model Models (3)-(5) are all special casesof the general is model y; = f(x;, /AI> + g(x;, PO,~“o)Ei, (2) NY;, A) = 4x;, /Jo,4,) + g(x,, A,, 4,)~;, (6) where g2 models the conditional variance of yj given xi. It is often convenient to write g(x, /?,0) = og*(x, where the si are iid with distribution F. We will con- /?, v)(G = (a, qr)r), where 0 is a sider both the normal case in which F = @ and the ’ and g* models the structure of the heteroscedasticity. nonparametric case in which F is unknown. For ex- The parameterization using t) is notationally simpler ample, if m(x, /I, 1,) equals h{f(x, p), A} then (6) is and will be used as much as possible, but in certain the TBS/weighting model (5) or the ordinary TBS applications-for example, likelihood ratio (LR) model (4) if g* = 1. If h is the identity transformation confidence intervals for quantiles in Section 1.3-the (A = l), then (6) is a weighting model (2). Finally, parameterization using (a, ylr)r is needed. If g*(x, (6) becomes the Box-Cox model (3) if g* = 1 and p, 0) = g{f(x, B)}, then one has a standard quasi- m(x, P, 4 = f(x, P). 1n model (6), the x,‘s could be likelihood model (Wedderburn 1974). Estimation of either fixed or random. In the latter case, we con- the variance parameters in (2) was studied by Da- dition on the Xi'S. vidian and Carroll (1987). Weighting models are ap- We will develop a prediction interval and toler- propriate when the error distribution is known to be, ance interval methodology for Model (6) that can say, normal but the variance is not constant. then be applied to any of these special cases.Because If the errors appear skewed, then an effective Model (6) is nonlinear in the regression parameters modeling strategy is to assume that after an appro- as well as the transformation and variance parame- priate “symmetrizing” transformation they are nor- ters, exact inference methods do not seem possible. mal. Let h(u, A) be a monotone transformation func- All interval estimates developed in this article are tion in u for given 1. The most popular such function approximate or large-sample in nature, though some is the power transformation h(u, 2) = (u” - 1)/E,, are accurate to a higher order than others. (1 # 0); = log(u)(A = 0). The Box-Cox (1964)model In Section 1, two methods of is of the quantiles of y are discussed:the delta method and LR testing. We show that computation in the h(yi, A) = f(xi7 DOJO) + oOEi, (3) latter case is facilitated by converting a constrained optimization problem into an unconstrained prob- where typically f(x,, ,!3) is a . If y is lem. known to fit the mean model f, then the Box-Cox In Section 2, we discuss prediction intervals when model is inappropriate, since the transformation de- E is normally distributed as well as when its distri-

TECHNOMETRICS, MAY 1991, VOL. 33, NO. 2 PREDICTION AND TOLERANCE INTERVALS 199 bution is unknown. There are two difficulties, non- collect a series of data sets and to make one or several linearity of the regression function and the need to predictions from each. In many situations, however, estimate the transformation and weighting parame- one will have a single data set and will make many ters. For the construction of prediction intervals in future predictions. In such a case one would like to nonlinear regression, Seber and Wild (1989, p. 193) say something about the conditional probability, given linearized the regression function, but they did not the data used to construct the interval, that the in- consider transformation or weighting. As for the terval covers y,. A (1 - a) content/(1 - y) confi- transformation and weighting parameters, if any dence tolerance interval is constructed so that this method can be talked of as the “usual” method it conditional probability is at least (1 - a) with prob- would be the following. Recall that 0 = (0, ylr)r and ability (1 - y). In other words, if one collects many g(x, p, 0) = ag,(x, p, ye).It is standard practice to data sets and constructs an interval from each, then estimate the transformation and weighting parame- approximately (1 - y) of the intervals will have con- ters by (@,2) and then assumethat these parameters ditional probability at least (1 - (Y) of containing are fixed and known. Next, using the linearization the predicted observation. Here we follow other of Seber and Wild, one makes the linear approxi- authors-for example, Guttman (1970) and Mee mation m(x, 8, X) = m(x, PO,J) + mg(x, b, A)@ - (1989)-and call (1 - CX)the content of the interval PO),where mlr is the partial derivative of m with re- and (1 - y) the confidence coefficient. By using a spect to /?. Ignoring the variability of (4, 1”) and of tolerance rather than a prediction interval, we are /? in g(x, 8, 8) and assuming a linear regression with more confident that our particular data set is one response h(y, x)lg(x, /?, 8) and carrier mg(x, /?, j)/ with (1 - CX)conditional coverage probability for g(x, B 6), one calculates the usual homoscedastic prediction. Note that terminology for prediction and linear regression prediction intervals (1). Although tolerance intervals has not been standardized; see this method of constructing prediction intervals is Mee (1989) for discussion. For example, our predic- closely related to the usual method of setting confi- tion and tolerance intervals are sometimes called, dence bounds for the mean function, other methods respectively, mean content tolerance intervals and for the latter problem-for example, that of Khor- guaranteed content tolerance intervals. In this arti- asani and Milliken (1982)-do not generalize to pre- cle, conditional coverageprobability and content will diction in any obvious manner. be used synonymously, as will unconditional cover- We indicate that error in the usual intervals in age probability and mean content. terms of both the coverage probability and the end- The distinction between prediction and tolerance points of prediction intervals is of the same order of intervals does not appear in standard regressiontext- magnitude, l/n, as the error incurred when one uses books; perhaps the idea of conditional coverage normal rather than t percentage points. The error is probability is considered too difficult for beginning due to the estimation of (II, A) and the possible non- students. Draper and Smith (1981) referred to our linearity of f(x, /3) in /I. Similar results in different prediction interval as a for a new contexts have been obtained by Atwood (1984), Cox response, which may obscure the fact that prediction (1975), and Bearn (1990). Atwood and Cox sug- of a random quantity is a more subtle issue than gested analytic adjustments to make the intervals estimation of a fixed parameter. It appears to us that correct to a higher order of magnitude, but these tolerance intervals should be more widely under- could be rather complex in the present situation. We stood and used. suggestinstead an adjustment by resampling, as did In Section 3, we consider tolerance intervals, Beran (1990). In the usual framework, bootstrap indicating that the error made in ignoring estima- confidence intervals for parameters can require many tion of parameters requires an adjustment of order replications, Efron (1987) suggestingthat as many as .-1’2. Since this is a slower order of convergencethan 1,000 replicates may be needed. This is particularly that obtained for the prediction problem, we expect inconvenient in our case, where parameter estima- that the effect of estimating parameterswill be greater. tion is highly nonlinear and thus already computa- This is confirmed in Section 4, where we illustrate tionally intensive. We take advantageof the structure the methods on the stopping distance data and a data of the parametric prediction problem so that many set used by Carroll and Ruppert (1988). We are not fewer bootstrap replicates are needed. aware of other work on tolerance intervals in regres- At this point we need to look more precisely at sion. coverage probability what is meant by the of a pre- 1. CONFIDENCE INTERVALS FOR diction interval. The coverage probability is the un- WANTILES OF Y conditional probability that the interval, which is ran- dom, will cover a future yr, which is also random. Supposethat the distribution F of Eis the standard This probability is meaningful if one is planning to @. For fixed J, let h(*, A) have

TECHNOMETRICS, MAY 1991, VOL. 33, NO. 2 200 RAYMOND J. CARROLL AND DAVID RUPPERT inverse function h- I(. ,I”) and write 5 = (/3r, OT,AT) T. equality constraint can be a difficult numerical prob- Given a known future value xr of x, we wish to make lem, and software for solving this problem often is inference about the resulting response yr. Problems not readily available. One simple technique is to con- of interest are (a) confidence intervals for quantiles vert the constrained problem to an unconstrained of y, and (b) the construction of prediction and tol- problem of lower dimension. This is possible here if erance intervals for y, with prescribed coverage prob- y # .5. Define abilities. This section concerns only the first problem, though the problems are related. In fact, the most important application of confidence intervals for = {h(q, 1) - 4x, A w{g*(x? A VP’-‘(Y)). quantiles of yf uses one-sided intervals: A (1 - y) upper confidence bound for the (1 - (Y)quantile of (8) y, is a (1 - y) confidence/(1 - o) content upper tolerance bound for y,. In setting speed limits using Then Ho is equivalent to a0 = o*(q,,, xf, PC,,&, qo), the stopping-distance data, this tolerance bound is and maximizing (7) subject to H,, is equivalent to maximizing the type of interval that seems most natural to use. We will only discuss two-sided confidence intervals, since these also generate one-sided confidence in- tervals. (9) By Model (6), the yth quantile of yf is q(y, <) = Let (/$, &, Go)maximize (9). Then twice the log- h-‘[{m(xf, p, 2) + g(xf, /3, O)@-‘(y)}, 11. This is a likelihood ratio is standard parametric inference problem, which can be solved by, for example, the delta method, LR 2Wqo) = WL,, - LL(qh BO?A,? h)>. (10) testing, or the bootstrap. To set the confidence interval, one finds the two val- 1.1 The Delta Method ues of q. at which (10) equals ~‘(1 - (u/2, l), the 1 - (r/2 quantile of the chi-squared distribution with Let (t - &,) be asymptotically normally dis- 1 df. To do this, one can use Newton’s method start- tributed with mean 0 and S;‘(&,). Let ing at each of the endpoints of the delta-method q&, 5) be the partial derivative of q(y, <) with re- interval. This technique requires repeated calls to a spect to 5. Then q(y, t) is asymptotically normally likelihood maximization routine, especially since the distributed with mean q(y, 5”) and variance derivative of LR( q) is needed, and this is most easily which is consistently esti- q&3 &JTS* YG@(YY r”), calculated numerically. Newton’s method converged mated by q?(y, r)TS+l(r)q&, [). If [ is the maxi- rapidly, however, in the examples of Section 4. mum likelihood estimate (MLE), then S,(&) can be If y = .5, then (8) is not defined, since @‘-l(y) = estimated by the observed Fisher information and 0, but then Ho is equivalent to the partial derivative qs can be calculated analytically or numerically. Other estimators < and correspond- W~(Xf~ bk &h 4,) = 40. (11) ing S* were discussed by Ruppert and Aldershof In many examples, (11) can be solved for one of the (1989). Large-sample confidence intervals for q( y, l) components of /3 in terms of the others, and then can then be constructed in the standard fashion. likelihood ratio testing proceeds as previously; see 1.2 Likelihood Intervals Section 4. DiCiccio (1988) discussedBartlett adjustments to Assume as before that g(x, p, 0) = Q*(x, p, v). LR inference for the linear regression model, in- The log-likelihood (LL) of yl, . . . , y,,, given x1, . . . ) x0, is cluding inference for q(y, <). The extension of this work to transformation and weighting models seems complex, however.

2. PARAMETRIC PREDICTION INTERVALS - (2~2)-‘{h(.Yi, A) - m(xi, P, E.)12/g2*(xi, P1 rl) We now turn attention to parametric prediction intervals, assuming that E has a standard normal dis- tribution. When 1. and 0 are known, /? has an ap- Let (/?,*A,*@,6) be the MLE and assume that LL,,, proximate variance matrix S( To))‘, where = LL(B, A, fi, c?).To perform the LR test of Hi: q(y, to) = qo, one must maximize (7) subject to Ho. Max- imizing a nonlinear function subject to a nonlinear

TECHNOMETRICS, MAY 1991, VOL. 33, NO. 2 PREDICTION AND TOLERANCE INTERVALS 201 which is an analog to XTX in (1). Now define c’(5) [* be the bootstrap estimates of <. Then the boot- = q&, 8, 4Ts(<)-‘mg(xf, P, 4 and T,(t) = {g’(xf, strap estimate of coverage is B, 0) + c2(W2. T,(r) is an analog to f?[l + xr’( XTX) - ‘Xf]“’ in (1). The naive interval has end- Pr*{A,*(y,?) 5 u} = E*[@[{uTJe) + H(u, g*, r^> points - H(u, r^, t))lg(xf, b, @]I. (16) h-‘{m(x,, j, I) + @-‘(a/2)T,(g), A}, k’{rn(Xf, j, I) + Q-1(1 - (Y/2)T,(r^), A} (12) Here A,*(.) is A,(*) with [* replacing [. In Equation (16), E* refers to the expectation taken over the and uses A&) = {KY~, 8 - m(q, b, ;2)>/T,(r^) bootstrap distribution holding [ fixed. For any value as a pivot with an approximate normal distribution. of a one can adjust u in (16) to find o,(u), which is The “usual” delta-method interval, which corrects defined as the value of u such that (16) equals a. only for estimation of /3 and 0, replaces the normal Then the parametric bootstrap prediction interval is quantiles in (12) by t quantiles, using the t distribu- the naive interval (12) with @-‘(a) replaced by tion as a better approximation to the distribution of C,(e). The count-up-events bootstrap estimate is the A,(y,). We will develop more accurate intervals by proportion of the M bootstrap trials in which the finding still better approximations to the distribution event on the left side of (16) occurs. Instead of this of A,(yf). In fact, as shown in the appendix, {yr 5 estimate, we recommend taking the average of the h-l{m(x,, p^, 1) + UT,((), j}} = {A,(yf) I u}, so we random quantity in the right side (16) over M boot- would like to replace Cp-‘(1 - a/2) and @-‘(a/2) in strap replicates. Both estimates approximate the (12) with ~(1 - a/2) and u(a/2), respectively, where “true” bootstrap estimate corresponding to M = w, u(a) is the ath quantile of A,(yf). but for a fixed M the latter has greater accuracy. We will develop several methods for estimating In the appendix it is shown that the error in using the distribution function of A,(yf), obtaining esti- Pr*L%(yf*) --K uI as an estimate of Pr{A,(yf) I u} mated quantiles o(a) for any a and obtaining a pre- is of the order n-*, and hence the bootstrap is more diction interval by substituting O(1 - a/2) and accurate than the “usual” method, which does not ir((~/2) into (12). We show in the appendix that, for correct for estimation of the transformation and a certain function H(u, <, co), weighting parameters and which, by (14), is only accurate to order l/n. This result is similar to prop- WL(y~) 5 4 = E[WuT,(tJ + H(u, r*, &J osition 3B of Beran (1990). - H(u, 50,r”wg(xf7 P”, hJl1. (13) Efron (1987) emphasized that many more boot- Using this, in the appendix we show that there is a strap replicates are needed to estimate a bootstrap function V, such that probability by counting up events than are needed for estimating expectationsof a smooth function. Since the right side of (16) is an expectation of a smooth function, the number of bootstrap replications M + n-‘4(u)V,(u, q, &I). (14) needed to estimate it accurately is fairly small. A similar point was made by Stine (1985). Equation (13) suggestsa resampling method for es- timating the coverage probability of a prediction in- terval and adjusting that interval to have correct cov- Example I We illustrate the savingsin computing erage to a higher order. effort for our method by an example. Consider the The bootstrap adjustment we propose is similar to one sample location/scale problem with pivot A,(yf) that of Beran (1990) and to the “calibration” of con- = (Yf - 9,)md1 + n-*)l’*}, where by and hY are fidence intervals proposed by Loh (1987). the sample mean and based on an n sample from a normal distribution. We wish to compute the bootstrap coverage estimate by Monte 2.1 Bootstrap Prediction Intervals Carlo techniques, using M bootstrap samples. In this The observations in a single bootstrap trial are simple case, A,(yf) is an exact pivot with a t distri- yI*(i = 1,. . . ) n) for the sample or i = f for the bution on (n - 1) df, so we restrict attention to “future” observation in which methods that would yield the exact coverage prob- ability with enough simulation. yi* = h-‘{WZ(Xi, 8, ;2) + g(X;, j, B)e: , A}. (15) Let y; be the bootstrap forecast. The count-up- Here (e:, . . . , e,*, ef*) is an independent bootstrap events estimate is the percentage of times A,*( yf* ) % sample of the errors. In the parametric case, these u, and this estimate has approximate variance a(u){1 will be randomly generated standard normals. Let - @(u)}lM. Our estimate is the average over M

TECHNOMETRICS, MAY 1991, VOL. 33, NO. 2 202 RAYMOND J. CARROLL AND DAVID RUPPERT

Monte Carlo replications of terms of the form in most cases, may be ignored. The analog to (16) cp u&-:(1 + n-l)“2 + ($ - p,) used in our examples is a ’ (17) Pr*{A,*(yf*) I u} r OY I If we generate normally distributed bootstrap errors, = E*[Z(X* < O)r{Q;(r^*, xf, u)}] the variance of this estimate is d*(u, n)lM, where d,(u, n) = var[@{u6E(l + n-l)*‘* + ,&}I and where + E*[GQ,(r^*, -q, UN /$ and d, are the sample mean and standard deviation x @q{uTn(<) + H*(U, r^*> - H*(u, t>> from a standard normal sample. Neglecting the term (1 + n-l)“*, a Taylor series shows that a(&-, + ,i&) f Ed-q,8, @Il. (19) - G(u) = ~(u){u(& - 1) + A}. Thus d,(u, n) = K’@(U)(l + u2/2), so the ratio of the variance of 2.3 Nonparametric Intervals and the counting estimate to that of our estimate is ap- Other Distributions proximately n@(u){1 - @(u)}{fj2(u)(1 + u*/2)}-‘. For example, if n = 20 and u = 1.96, then the The previous discussion concerned prediction in- counting method is approximately 50 times more tervals for normally distributed data. This is the nat- variable than ours. Of course, this example will be ural caseto consider, but other distributions may also more optimistic than the more complex examples be important. For example, especially in the hetero- considered in this article. scedasticregression model without a transformation, It is interesting to note that the use of antisym- the gamma and lognormal distributions play an im- metric variates is not typically helpful. If (6 , 6; ) is portant role. Prediction intervals in the former case the bootstrap estimate based on n normally distrib- can be constructed in two steps. First, in (13) replace uted errors (E:, . . . , E,*) and if (ii:*, 6;*) is the the normal distribution function @ by the gamma bootstrap estimate based on ( - sf , . . . , -en*), then distribution function G,, where (Yis the shape pa- it seems natural to average the two values of (17) rameter and G, has mean 0 and variance 1. Second, based on these antisymmetrically generated esti- in (16) replace @ by Ga, where & is the estimate mates with the hope that induced negative correla- of (Y. tion will reduce the variance. Our calculations show, One can also construct computationally efficient however, that for typical values of u the resulting prediction intervals for the nonparametric case in estimate of (17) will be more variable than the es- which the distribution Fof E;in (6) is unknown. Meth- timate using only independent errors. ods for transformation to identically distributed er- rors that are not necessarily normally, or even sym- metrically, distributed were given by Ruppert and 2.2 Power Transformations Aldershof (1989). In many situations these methods For the power transformation h(u, A) = (u” - l)/ give parameter estimates that are similar to the nor- 2, Equations (13), (15), and (16) need a slight mod- mal-theory MLE. Nonetheless, prediction intervals ification becausethe inverses leading to (15) may not can be sensitive to the normality assumption, and it exist in all cases. We illustrate this when the trans- may be desirable to use nonparametric prediction formed regression function satisfies (13); of course, intervals even when the MLE is used for the esti- if 1 is known as in heteroscedasticregression models, mation of parameters. no modification is necessary. Define the set Q,(<, Let rj be the ordinary residuals xf, u) = {(t, q, u) I f(q, BY + luT,(T) 2 01. The complement of this set is denoted by Q;. A direct r, = h(y,, 4 - m(xjy B, 1) I calculation yields d-G, B, 4 .

WA&) 5 u 1ylT . . . , YJ Let p and s denote the sample mean and sample standard deviation of these residuals, respectively. = Z(X < O)Z{Q;(r^, -q, UN + GQnG!, -q, UN Because the residuals have neither mean 0 nor vari- x @[{uTn(to) + fJ(u, 6 to) ance 1 in general, define the centered and standard- ized residuals by et = (ri - p)/s. TO generate the - mu, to, &JMX,~/J”nl 441. (18) bootstrap sample, we need terms e: ; see (15). We suggest the use of the balanced bootstrap (Davison, Another calculation shows that as u + CQor u --+ - 33, Hinkley, and Schechtman 1986) applied either to the (18) does not converge to 1 or 0, respectively, be- (e,, . . . , e,) or to the (el, . . . , e,, - el, . . . , -e,), cause the power transformation cannot handle neg- for the asymmetric and symmetric cases, respec- ative y. This loss in probability is generally small and, tively. Let F, be the empirical distribution function

TECHNOMETRICS, MAY 1991, VOL. 33, NO. 2 PREDICTION AND TOLERANCE INTERVALS 203 of the (e,); if one assumes symmetry of the errors y,,) such that one should let F, be the empirical distribution func- Pr[Pr{L, 5 y, I U, 1y,, . . . , yn} 2 1 - a] tion of (e,, . . . , e,, - e,, . . . , - e,). Then in (16) replace Q, by F,,. The use of the balanced bootstrap 2 1 - y. (22) is already known to increase computational effi- Tolerance intervals differ from prediction intervals ciency. The use of the empirical distribution function in that the former have (1 - (~)100%content at least on the right side (16)-rather than evaluating the left (1 - y)lOO% of the time rather than on average. side of (16) at a single simulated yf*--improves ef- We will show that the naive tolerance intervals that ficiency as well, as indicated by Example 1. ignore estimation of the parameters, especially the This nonparametric method takes a simple form. estimation of (A, 0), need an adjustment of order As in (15), let the bootstrap observations be denoted n-l’* to satisfy (22). This is an adjustment of bigger , y,* . Let y,(ei) be given by (15) with the by:, . . . order than is necessary for prediction intervals- ith residual ei replacing the bootstrap e” and Xi equal namely, n-’ as in (14)-indicating that it will be more to x,. Then in the case of F symmetric, for a given critical to take estimation of parameters into account [* the estimate of Pr{A,(yf) I u} is the average over for tolerance intervals. M bootstrap samples of the terms 3.1 Main Results For a given c > 0, define u,~ = Q-*(1 - c/2) and ucL= @-‘(c/2) and U,,,(c) = h-‘{m(x,, j3, ,I) + g(xf, + Z[AXYf(- ei)) 5 ~11,(20) P, WC”~ A}, and define L,,(r) similarly, but with with obvious modification if F is not symmetric. u,~ replacing u,~. The naive tolerance interval is This estimate of Pr{A,(yf) 5: u} is adjusted for {L,JT), U,,(f)}. Recall that we have been using the estimation of all parameters but not for the estima- notation that S,(&) is the asymptotic covariance ma- tion of F by F,,. Thus if DNp(u)is the value of u such trix of n*‘*([ - co). In the appendix we show that that (20) equals u, then the nonparametric bootstrap for a certain function R,(r) that uses &=(a) instead of a-‘(*) in (12) will have c = a + n-1’2~-‘(r)~{(a-1(c/2)} correct coverage of order l/n but not of order n-*. To account for estimation of F, we suggestletting c(u) be a fixed quantile of F,,. Thus define o,(u) = F;*(c(u)), where for any a, c(u) is the value of c that is, the naive (1 - c)lOO% prediction interval is, such that in fact, a tolerance interval with gauranteed con- tent (1 - (Y) and confidence (1 - y), where c and ((Y, y) are related by (23). For any value of c, there are an infinite number of pairs (a, y) satisfying (23) becauseone can decrease(Y and then compensateby + Z[A,*(y,(-e,)) 5 E-‘(c)]] = a. (21) increasing y. From (23) one sees that the effect of In (21), c is F, calculated from the bootstrap sam- estimating r is of the nontrivial order n-l’*. ple. The fully corrected (FC) nonparametric interval replaces @-l(e) in (12) with +c(*). The fully cor- 3.2 Computation rected intervals are computed in the first example of By (23), to construct the desired tolerance interval Section 4, and they seem quite reasonable-slightly we need only find c and construct a naive (1 - c)lOO% longer than the intervals that do not correct for F. interval. A plug-in estimate of c replaces to by [ in We conjecture that these intervals are correct to or- (23) and solves the equation. This requires the com- der o(n-*), but more research is needed. One tech- putation of R, and S* and the numerical solution of nical difficulty is that F is an infinite-dimensional an equation in a single unknown, which are not par- “parameter.” Moreover, the choice between the ticularly difficult, in general. One could develop an usual step-function empirical, the piecewise linear analog of this method for one-sided tolerance inter- version, or a smoothed estimate of F such as an in- vals, but this would simply generate the tolerance tegrated kernel density estimate will have an effect intervals mentioned in Section 1 that result from con- of order l/n. fidence intervals for quantiles of yf. Besides the delta-method intervals just described, 3. TOLERANCE INTERVALS one can construct tolerance intervals with the boot- For given (a, y), the goal of a tolerance interval strap. In the appendix, the intermediate step (A.3) (Aitchison and Dunsmore 1975; Guttman 1970) is to suggestsa simple resampling method for choosing c. find random variables (L,, I/,) based on (yl, . . . , Use the notation of Section 2 for bootstrap random

TECHNOMETRICS, MAY 1991, VOL. 33, NO. 2 204 RAYMOND J. CARROLL AND DAVID RUPPERT variables, and define program written by us. Plots of the residuals K* = @[UC, + {fqhh t*, 0 Di - (blSi + B,Sf) ri = (j@; + lj*sp - - show no evidence of heteroscedasticity. Moreover, the residuals seem symmetric with tails somewhat - lighter than for the normal distribution. The MLE’s and their standard errors from the ob- Then, using a count-up-events method, choose c so served information matrix are /?I = .648 (.120), b2 that Pr*(K* B 1 - a) 2 1 - y. For the power- = .0590 (.0059), and 8 = 1.38 (.213). Thus the vari- transformation family, Equations (23), (24), and (A.3) ance certainly seems nonconstant, but the hetero- must be modified. In these equations, each term in- scedasticity appears less severethan under a constant volving @ must be modified by the right side of (18) coefficient of variation. We also fit the TBS model, and (19), respectively, keeping in mind that, as dis- and that model gave only a slight increase of .7 to cussedin the appendix, in the tolerance interval case the log-likelihood. Both models seem perfectly ad- T,(T) = dXf. 8, 0 equate, and we chose the power-of-the-mean vari- The naive interval with c chosen in this manner is ance model becausethe TBS model will be illustrated the parametric bootstrap tolerance interval. A non- in the following example. parametric tolerance interval corrected for estima- Table 1 gives point estimates and delta-method tion of r but not for estimation of F replaces @ in and LR confidence intervals for selected quantiles of (24) by F, but otherwise is the same as the parametric D given S. Only large quantiles (y 2 .75) are pre- interval. sented, becausethese are likely to be of most interest Finally, a fully corrected nonparametric tolerance for this set of data. The point estimates show the interval replaces @ by F, in (24) and in (uCL,u,~) for heteroscedasticity-the .95 quantile is 32.9 feet above the original sample, but in the bootstrap samples the at S = 40 but only 16.1 feet above the replaces Q, by F,*, where F,* is the bootstrap version median at S = 22. The heteroscedasticity and of F,,. Since F, is the “true” error distribution in the “leverage” affect the width of the confidence inter- bootstrap , using F, to construct the toler- vals. The 95% delta-method confidence interval for ance intervals in the bootstrap sampling is really an the .95 quantile has width 30 at the extreme point S assumption that the error distribution is known. This = 40 but width only 8.1 at S = 22 in the middle of is the reason for using E instead. the observations values of S. The delta-method and In the examples of Section 4, we find the delta- the LR intervals are similar if one compares their method and resampling-method tolerance intervals lengths. The LR intervals are shifted somewhat to to be reasonably similar in one case but somewhat the right compared to the delta-method intervals, dissimilar in the second in which the sample size is especiallythose for q( .95,40). The sameshift is found, smaller. but to a much greater extent, in the next example. There small quantiles (y 5 .25) are also examined 4. EXAMPLES and, as one might expect, the shift is sometimes in the opposite direction. In this example, we recom- 4.1 Stopping Distances The first example will be the stopping-distancedata Table 1. Quantiles of the Conditional Distribution of mentioned in the introduction. There are 63 obser- Stoooina Distance (in feet) Given Sueed vations with speed S ranging from 4 to 40 miles per Speed (in mph) hour and stopping distance D ranging from 4 to 138 feet. We fit the model Quantile 8 22 40 .95 MLE 14.4 MLE 58.9 MLE 153.2 Di = P’S, + p2S? + U(plSi + P2Sf)“*&iy (25) Delta 12.5, 16.3 Delta 54.9, 63.0 Delta 138, 168 LR 12.8, 16.8 LR 55.3, 63.7 LR 140,171 where the ei are assumed standard normal. In this .90 MLE 13.2 MLE 55.4 MLE 145.9 model the variance is proportional to the 8th power Delta 11.6, 14.9 Delta 51.8, 58.9 Delta 133, 159 of the mean, with 8 = 2 being the constant coefficient LR 11.8, 15.2 LR 52.2, 59.5 LR 134, 161 of variation model. The quadratic polynomial with .75 MLE 11.2 MLE 49.4 MLE 134 zero intercept has been used by others (Ezekiel and Delta 9.83, 12.6 Delta 46.5, 52.3 Delta 122, 145 Fox 1959; Snee 1986) to model the mean response LR 9.92, 12.7 LR 46.7, 52.6 LR 123, 146 and seems quite adequate by residual plots. Model NOTE: Each entry has the point estimate on top, next the 95% delta-method (25) was fit by maximum likelihood using a GAUSS confidence interval, and then the 95% likelihood ratio interval on bottom.

TECHNOMETRICS, MAY 1991, VOL. 33, NO. 2 PREDICTION AND TOLERANCE INTERVALS 205

mend the delta method, becauseit is easy and quick Tab/e 2. Prediction and Tolerance Intervals for Stopping to calculate and appearsto have acceptable accuracy. Distances (in feet) A highway engineer trying to judge if a section of road is safe with a 40-mph speed limit might ask, Speed (mph) “What is the longest possible stopping distance that Interval 8 22 40 might occur at 40 mph ?” Because 168.0 is the upper 80% Mean Content Prediction Intervals end of the 95% two-sided confidence interval for q(.95,40), it is also a 95% content/97.5% confidence Uncorrected, 4.69, 13.2 30.2, 55.4 94.7, 146 parametric upper tolerance bound for D at S = 40. Using 95% Corrected for/J and 4.56, 13.3 30.0, 55.6 93.4, 147 confidence intervals for q(.99, 40) and q(.999, 40) 0, parametric (not in the table), we find that the 99% and 99.9% delta method content, 97.5% confidence upper tolerance bounds Fully corrected, 4.43, 13.4 29.9, 56.0 93.1, 148 for D at S = 40 are 185 and 205, respectively. These parametric bootstrap are probably too pessimistic, since they are based on Corrected for /I, 4.15, 13.7 28.9, 56.8 91.8, 150 the normality assumption and there is evidence that 0, and 0. the error distribution has somewhat lighter tails than nonparametric this. Nonparametric construction of such high con- bootstrap tent tolerance intervals is extremely difficult with only Fully corrected, 3.90, 13.9 28.1, 57.6 91.2, 151 nonparametric 63 observations. One could fit a more flexible para- bootstrap metric model to the error distribution. The sym- metric beta distributions, centered at 0, might be 80% Content/80% Confidence Tolerance Intervals appropriate, since there would be a shape as well as Fully corrected, 3.46, 14.5 26.6, 58.9 86.5, 154 a scale parameter to adjust and the error distribution parametric delta method would have bounded support. We will not pursue Fully corrected, 3.63, 14.3 27.9, 57.7 87.7, 153 this idea, since it is somewhat tangential to our main nonparametric topic. How can one best answer the engineer’s ques- bootstrap tion, given the available data? We would say that at NOTE: Fully corrected intervals are corrected for the effects of estimating 8. (i, speedsnot in excessof 40 mph stopping distances as 0, and, in the nonparametric case, F. great as 200 feet might occur on rare occasions, even though the greatest observed distance in the data was only 138 feet. the prediction intervals, which indicates the cost of Table 2 gives 80% mean content prediction inter- having guaranteed content rather than mean content. vals and 80% content/80% confidence tolerance in- The length of the tolerance intervals would, of course, tervals for D at S = 8, 22, and 40. The first three be even longer if one wished to guarantee the 80% prediction intervals are all parametric and differ only content at a higher confidence level. For example, in that the first is uncorrected, the second corrects at S = 40, the 80% content/99% confidence para- only for estimation of /I and 0, and the third corrects metric delta-method tolerance interval is (79.6, 161). for the estimation of all parameters. Notice that the This is an example in which simple delta-method effect of also correcting for 0 is larger than that when inference is reasonably close to inference by the more correcting for p and (T. computationally intensive likelihood and resampling The first nonparametric interval does not correct methods. In the next example, the sample size is only for uncertainty about F but is, nonetheless, some- 27, less than half of the sample size here, and the what longer than the parametric intervals. The light variation about the mean is much larger. In that ex- tails of the residual empirical distribution causes ample, the delta method agreesfar lesswith the other moderate quantiles, in particular the 90th , methods than it does here. to be larger than the corresponding quantiles of a normal distribution with the same variance. This is 4.2 Salmon Recruitment the reason for the extra length of the nonparametric We now illustrate some of these methods applied intervals. The situation is reversed for 90% or higher to the TBS model. The Skeena River sockeye-salmon intervals, for light tails causethe more extreme quan- data were analyzed by Ruppert and Carroll (1985) tiles of the residual distribution to be shorter than and Carroll and Ruppert (1988, sec. 4.3). The data those of a normal distribution. The fully corrected come from Ricker and Smith (1975) and consist of nonparametric interval is longer still and to us seems 28 yearly measurements of the size of the spawning the most realistic assessment of uncertainty, since population (x) in a given year and the number of there is no a priori reason for assuming normality. recruits (y), which are spawned fish returning four The tolerance intervals are somewhat longer than years later to the river. Thus the number of spawners

TECHNOMETRICS, MAY 1991, VOL. 33, NO. 2 206 RAYMOND J. CARROLL AND DAVID RUPPERT

in year t, y,, is x,-~ minus the catch in year t. Because Table 4. Prediction and Tolerance Intervals for Salmon of the four-year lag and because of the obvious ec- Recruitment (in thousands of fish) onomic importance of knowing the size of recruit- Spawners (in thousands of fish) ment, this is a natural prediction problem. at speed (mph) The data in thousands of fish were taken from Ricker and Smith (1975) and were given in table 4.1 87 511 1,066 of Carroll and Ruppert (1988). The variable x ranges 80% Mean Content Prediction Intervals from 87 to 1,066, and y ranges from 127 to 3,071. Uncorrected, 206,457 722, 2,054 871, 2,588 As discussed by Ricker and Smith (1975), the year parametric 1951 is an outlier resulting from a rock slide that Uncorrected, 217,430 733, 1,892 934, 2,374 interfered with spawning. Carroll and Ruppert (1988) nonparametric showed that this observation is highly influential. In Corrected for /I and 197,482 690, 2,175 830. 2,747 0, parametric the present analysis, 1951is omitted. We will use the delta method Ricker (1954) model fR(x, j3) = /3,x exp( -p2x) com- Fully corrected, 179, 501 663, 2,049 772, 2,697 bined with the TBS model so that if E - @, h(y, A) parametric = W,&, P), 4 + 0~. bootstrap Uncorrected prediction intervals were given by Corrected for /I, 178, 505 658, 2,067 766, 2,741 u, and I, Carroll and Ruppert (1988, sec. 4.5). As a sensitivity nonparametric analysis,these intervals were recomputed with 1 equal bootstrap to selected values, all within two standard deviations 80% ContenV80% Confidence Tolerance Intervals of 1. The lengths of the intervals varied to a disturb- ing degree with 2, and this was the motivation for Corrected for /I 198, 481 684, 2,197 824, 2,777 and 0 this article. Fully corrected, 152, 673 508, 3,288 608, 4,203 Point estimates and 95% confidence intervals for parametric delta the quantiles of y, were constructed as discussed in method Section 1, using xf = 87, 511, and 1,066; see Table Corrected for /I, 155, 651 663, 2,289 727, 3,267 3. In this case, (11) reduces to p1 = ( qolxf) exp( /3*xf), c, and A, nonparametric so LR confidence intervals for the median can be set bootstrap without special software for constrained optimiza- tion. The delta and LR methods produce similar in-

terval estimates of the median, especially at xr = 511, which is in the middle of the of x. In other Table 3. Maximum Likelihood Estimate (MLE) and 95% Confidence Intervals for the y Quantile of Salmon instances,the LR interval is far from symmetric about Recruitment (in thousands of fish) the MLE, particularly if y = .95 and xf = 1,066, the largest value of x in the data set. It should come as Xf no surprise that with only 27 observations, extreme quantiles (.05 and .95) are difficult to estimate, es- Y 87 517 1,066 pecially at extreme values of X. Comparison with the .05 MLE 186 633 760 LR intervals suggeststhat the delta-method intervals Delta 84,288 497,769 496, 1,024 LR 21,279 450,757 509, 1,087 are optimistic in such situations, and we do not rec- ommend the delta method in this example. .lO MLE 206 722 871 In Table 4, we list 80% mean content prediction Delta 111,302 587, 858 567, 1,175 LR 53, 296 537, 853 600, 1,250 intervals. The uncorrected Gaussian interval ignores the effect of estimating all parameters and is given MLE 1,186 1,458 .50 302 h-‘{fR(xf, p^) k Delta 211, 394 1,000, 1,371 937, 1,979 by &I-‘(1 - (u/2), 8. The (1 - LR 217,419 1,014, 1,404 1,088, 2,103 a)lOO% interval, corrected for jI and 0 only, is .90 MLE 457 2,054 2,588 h-‘{fR(xf, j) k t25,1-o,2Tn(T^),j}. P-9 Delta 256,659 1,560, 2,549 1,502, 3,674 LR 317, 805 1,688, 3,054 1,767, 5,084 The uncorrected nonparametric interval is the same as the uncorrected Gaussian interval except that it .95 MLE 517 2,428 3,083 Delta 258, 777 1,680, 3,175 1,594, 4,572 replaces the normal distribution function @ by the LR 344,970 1,922, 4,679 2,073, 9,064 empirical distribution function of the residuals. The fully corrected Gaussian interval uses (19) to NOTE: Confidence intervals are by the delta method Kleltal and likelihood-ratio testing (LRI. correct for the estimation of all parameters. We cre-

TECHNOMETRICS, MAY 1991, VOL. 33. NO. 2 PREDICTION AND TOLERANCE INTERVALS 207 ated 500 bootstrap samples, and [:, . . . , [$, were tolerance intervals for these data; see Table 4. The evaluated and saved. For any value of u, the right plug-in delta method described just before Equation side of (19) was approximated by averaging the in- (24) and the bootstrap method were used. The naive tegrand over the 500 values of g*. For any y, the uncorrected tolerance interval is the same as the un- value D,(y) at which the right side of (19) is equal to corrected prediction interval. In Table 4, we exhibit y was found by Newton’s method starting at u,, = the tolerance interval when one corrects for esti- &,. Usually only two or three iterations were needed mation of a and g but not 1. Note that there is quite to be within .OOl of y. The fully corrected Gaussian a big difference between uncorrected intervals, and interval is given by (26) with - t25,1-n,2and f25,1-a,2 fully corrected intervals have required a modification replaced by ir,(a/2) and DP(l - a/2), respectively. of order O,(n-“*). The difference between the par- As discussedin Section 2.3, the fully corrected non- tially and fully corrected intervals shows that the larg- parametric estimate is a more efficient version of the est effect is due to estimating A. The next largest effect count-up-events estimate. In the example, we found is due to estimation of 0. One can show that if only that the count-up-events estimate was three times /I is unknown, then the delta-method correction of more variable than the suggested fully corrected order OP(~-li2) is 0, but if 0 is also unknown, then Gaussian interval using (16) and 2.6 times more vari- the correction of order OP(lz-l’*) is nonzero. Intui- able than the nonparametric estimate. tively, the reason is that variation in p shifts the An important question in practice is whether to interval but does not change its length and this has use parametric or nonparametric intervals. When the an effect only of order Op(n-‘). Figure 1 illustrates parametric and nonparametric intervals do not agree, the relative sizesat xr = 1,066of three nonparametric then we favor the nonparametric intervals, since the intervals-uncorrected, corrected prediction, and parametric intervals are quite sensitive to nonnor- corrected tolerance. It is apparent from the figure mality. A transformation can frequently induce near that the size of the corrections is nontrivial relative normality, however, and when, as in this example, to the variability in the data as measured by the the residuals appear normally distributed, then the uncorrected interval. parametric and nonparametric intervals are similar. In this example the samplesize is moderately small, Moreover, extreme quantiles require parametric in- the variation about the mean is large, and there is tervals unless the sample size is extremely large. considerable heteroscedasticity and some skewness We also constructed 80% contentBO% confidence before transformation. The fact that our methods

n 0 - X

n Point estimate

+ Uncorrected interval

l-~ Nonparametric prediction interval corrected for estimation of all parameters

A Nonparametric tolerance interval corrected for estimation of oil parameters

...... m . .+ ...... m ...... + . . .. . q ...... & ......

600 1600 2000 3000 Figure 1. Salmon Recruitment Prediction and Tolerance Intervals when x, = 1,066.

TECHNOMETRICS, MAY 1991, VOL. 33, NO. 2 208 RAYMOND J. CARROLL AND DAVID RUPPERT give quite reasonable answers here is encouraging. fits well. When a complex, nonlinear model is needed, There is strong evidence, however, that the more the bootstrap seemsan excellent way to approximate refined approximations provided by, for example, the distribution of the conditional coverage proba- resampling are needed here. bility and thereby calibrate a tolerance interval. Beran (1990)independently studied the bootstrap as a means 5. CONCLUDING REMARKS to estimate conditional coverage probabilities, and Ordinary prediction and tolerance intervals and he had a number of interesting theoretical results. confidence intervals for quantiles will often be no- Our work differs from his in that we construct tol- ticeably optimistic if one ignores the estimation of erance intervals and we focus on transformation and transformation and weighting parameters. In predic- weighting models. For data sets such as the stopping tion intervals, ignoring the effects of parameter es- distanceswhere n is sufficiently large or 0 is relatively timation causeserrors of order n-l, but these can be small, the less accurate delta method of setting tol- large, especially for small sample sizes-for example, erances intervals seems satisfactory. the salmon data of the previous Section 4. We have It should be emphasized again that all of our in- suggestedmethods that correct for this optimism and tervals are approximate. Given enough simulations, take account of estimation of all parameters. These the parametric bootstrap can evaluate a coverage corrections can be important. In the salmon data set, probability exactly at < = [ but this will, in general, where prediction will be of economic significance, differ conditionally on g by OP(yl~3’2)and uncondi- the size of the corrections is larger than the effect of tionally by O(n-*) from the same probability eval- using certain other spawner-recruit models for the uated at 5 = &,. In certain simple univariate prob- median response. lems, sophisticated methods such as the accelerated, Our results shed light on a controversy in the trans- bias-corrected bootstrap will generate exact intervals formations literature. In the Box-Cox model, p var- (Efron 1987), but such refinements are not readily ies greatly because of variation in 1. Box and Cox generalized to problems as complex as those studied (1984) and Hinkley and Runger (1984) advocated in this article. treating fl as fixed becausei determines the scale on ACKNOWLEDGMENTS which conclusions are made. The problem with this approach is that it does not adequately account for Our research was supported by the Air Force Of- uncertainty when one makes conclusions on the orig- fice of Scientific Research, the National Science inal scale. Unfortunately, this point was not dis- Foundation, and the Army Research Office through cussed adequately by Box and Cox or by Hinkley the Mathematical Sciences Institute at Cornell. We and Runger, with the result that their articles can be thank two referees and an editor for their many sug- misleading. For example, Rode and Chinchilli (1988) gestions that greatly improved the exposition. Mary used Box-Cox transformations to construct multi- Dowling proofread an earlier manuscript and pro- variate tolerance regions. Although their tolerance vided helpful comments. Rudy Beran kindly sent us regions are on the original scale, they cited Box and a preprint of his interesting article. We thank Jeff Cox (1984) as justification for ignoring variability in Wu for the reference to Loh’s paper. j”. The results in this article suggest that the Rode and Chinchilli regions may be considerably too small. APPENDIX: SECOND-ORDER ANALYSIS OF In a discussionto Hinkley and Runger (1984),Rubin THE BOOTSTRAP (1984) and Carroll and Ruppert (1984) pointed out Make the following further definitions: G(u, 5) = the dangers of the Box/Cox/Hinkley/Runger phi- UT,(C) + 4-q, B, 4, and H(u, t, = h[h-‘{G(u, losophy. But perhaps because it is so tempting to <), A}, %,,I. Let H< and Hi5 be the first and second ignore the problem of variability in 2, the tendency derivatives with respect to < of the function H(u, c, among those using transformations has been to ig- &,) evaluated at &,, and suppose that bias and vari- nore these dangers. We hope the results in this article ance are of the usual orders of magnitude-namely, will serve to correct this tendency. As can be seen in the recent survey by Guttman E(r^ - To) = n-‘b(G); (1988), the literature on tolerance intervals has fo- E(5^ - Mt - W = S*‘(b) + o(n-‘1. cused on exact methods for simple models. Com- pared to parametric estimates, however, tolerance We also assumethat nS*(&) converges to a positive intervals can be much more sensitive to assumptions definite matrix, that the xi remain in a bounded set, such as normality or constant variance. Therefore, and that f and g are smooth. These regularity con- one should not choosea simple model unlessit clearly ditions could be weakened, but we will not try to do

TECHNOMETRICS, MAY 1991, VOL. 33, NO. 2 PREDICTION AND TOLERANCE INTERVALS 209 that here. We find that verifies the claim beginning the second paragraph in Section 2.1. &MY~) 5 4 We now give the details of the delta method for = Z{h(yf, An) 5 H(u, r^, to)) tolerance intervals. In the preceding arguments, let 7’,,(r) now equal g(xf, /3, 0) and modify G(u, <) and = Z{cfg(xf, Po, A,) 5 H(u, r^, to) H(u, <, to) accordingly. This redefinition of T, is not necessary, but it simplifies matters. We could have - m(q, PO,&)I used this simpler choice of T, to construct the pre- = z{cfg(xf, Po, en>5 uTn(to) + H(u, r^, to) diction intervals, but there we wanted to start with the “usual” interval to see the effect on this interval - H(u, to, To)) of correction for estimation of the transformation = Z{Yf 5 h-‘{uT”(g) + m(q, B, 8, a>. and weighting parameters. Define R.(t) = HS(GLI~CT (0) - Hs(uc~, t, M. This means that Using (A.l) and noting that 4(uCL) = 4(uCo), we find that neglecting higher order terms, PrL%(yf) 5 u 1~1, . . . y Y,> WL,(9) 5 yf 5 unc(t) 1~1, . . . , YJ = @[{uT,,(To) + H(u, r^, lo> = @[UC”+ {H(G,, r*, Co) - H(u, Co, SoWs(+ PO,eo)l (A-1) Thus - WU,U, To, to)Vg(q> PO,eo>l IWMy~) 5 u) - W&L + {H(u,L, r^, to) = E[@[{uT&,) + H(u, r^, 4,) - H(UA to, to)Vg(xf, PO,do)1

- H(u, ~51,t,,))k(.q, ,& b)ll = 1 - c + 45{~-‘(c/2)}RC(~~JT(~ - 5”). (A.3) In the notation of Section 2, we have that n1’2 =@{uT!,(c,)~g(s.~ PO, &N Rc(W’(t - Co)+ N orma 10, R,(~o)‘S*(~,)R,(~,,)}. + n-‘~(u)Vl,n(~, $3 6) Thus to satisfy (2.2) as n + m, choose c > 0 such that 1 - y 5 Pr[(l - c) + ~{~-‘(c/2)}Rc(ro)‘(r - + K~‘V~.~(U, xf, (0) + O(n-*), (A.3 to) + O,(n- *) 2 1 - CX]so that, ignoring terms of where order n-l, c = (Y + n-“w’(~)f${@-‘(c/2)} {~,(To)r~*(r”)~,(ro))“2. [Received March 1989. Revised August 1990.1 = [HfWoYg(-q, PO,00) + W’) REFERENCES x tr{H$4bdVg(xf, DO,RJI Aitchison, J., and Dunsmore, I. R. (1975), Statistical Prediction - [(u/2) tr{S*(hW&)lg(-q, PO, @0)*1 Analysis, Cambridge, U.K.: Cambridge University Press. Atwood, C. L. (1984), “Approximate Tolerance Intervals, Based and V,., is a smooth function that we will not give on Maximum Likelihood Estimates,” Journal of the American explicitly. Here Ht and H,, are, respectively, the vec- Statistical Association, 79, 459-465. tor of first and the matrix of second partial deriva- Beran, R. (1990), “Calibrating Prediction Regions,” Journal of the American Statistical Assocation, 85, 715-723. tives of H(u, <, to) with respect to < evaluated at Box, G. E. P., and Cox, D. R. (1964). “An Analysis of Trans- < = to. Equation (A.2) shows that the “usual” inter- formations,” Journal of the Royal Statistical Society, Ser. B, 26, val, which ignores estimation of transformation and 21 l-246. weighting parameters, has correct coverage proba- - (1984), “An Analysis of Transformations, Revisited, Re- butted,” Journal of the American Statistical Association, 26, bility to terms of order l/n. 21 l-252. Notice that VI,, is a smooth function of r. More- Carroll, R. J., and Ruppert, D. (1984), Comment on “The Anal- over, T,(<)lg(+, PO,0,) = 1 + n-lW,(<), where ysis of Transformed Data,” by D. Hinkley and G. Runger, W,, is also a smooth function of <, ([ - to) = Journal of the American Statistical Association, 79, 312-313. ~?~(n-“*), and E(< - to) = O(n-‘). Thus, when (1988). Transformation and Weighting in Regression, New York: Chapman & Hall. < is substituted for & in the right side of (A.2), the Cox, D. R. (1975). “Prediction Intervals and Empirical Bayes error is 0,(nM3’*) conditionally on [ but O(n-*) un- Confidence Intervals,” in Perspectives In Probability and Sta- conditionally. The coverage probability of a predic- tistics, ed. J. Gani, London: Academic Press, pp. 47-55. tion interval dependson the unconditional error. This Davidian, M., and Carroll, R. J. (1987), “Variance Function Es-

TECHNOMETRICS, MAY 1991, VOL. 33, NO. 2 210 RAYMOND J. CARROLL AND DAVID RUPPERT

timation,” Journal of the American Statistical Association, 82, Journal of the Fisheries Research Board of Canada, 32, 1369- 1079-1081. 1381. Davison, A., Hinkley, D., and Schechtman, E. (1986), “Efficient Rode, R. S., and Chinchilli, V. M. (1988), “The Use of Box-Cox Bootstrap Simulation,” Biometrika, 73, 555-566. Transformations in the Development of Multivariate Tolerance DiCiccio, T. (1988), “Likelihood Inference for Linear Regression Regions With Applications to Clinical Chemistry,” The Amer- Models,” Biometrika, 75, 29-34. ican , 42, 23-30. Draper, N., and Smith, H. (1981). Applied , Rubin, D. (1984), Comments on “The Analysis of Transformed New York: John Wiley. Data,” by D. Hinkley and G. Runger, Journal of the American Efron, B. (1987), “Better Bootstrap Confidence Intervals,” Jour- Statistical Association, 79, 309-312. nal of the American Statistical Association, 82, 171-200. Rudemo, M., Ruppert, D., and Streibig, J. (1989). “Random Ezekiel, M., and Fox, K. A. (1959), Methods of Correlation and Effect Models in Nonlinear Regression With Applications to Regression Analysis, New York: John Wiley. Bioassay,” Biometrics, 45, 349-362. Guttman, I. (1970) Statistical Tolerance Regions, London: Charles Ruppert, D., and Aldershof, B. (1989), “Transformations to Sym- W. Griffin. metry and ,” Journal of the American Statis- - (1988), “Tolerance Regions, Statistical,” in Encyclopedia tical Association. 84, 437-446. of Statistical Sciences(vol. 9), eds. S. Kotz and N. L. Johnson, Ruppert, D., and Carroll, R. J. (1985), “Data Transformation in New York: John Wiley, p. 272-287. Regression Analysis With Applications to Stock Recruitment Hinkley, D., and Runger, G. (1984), “The Analysis of Trans- Relationships,” in Resource Management (Lecture Notes in formed Data,” Journal of the American Statistical Association, Biomathematics), 61, ed. M. Mangel, New York: Springer. 79, 302-309. Ruppert, D., Cressie, N. A. C., and Carroll, R. J. (1989). “A Khorasani, F., and Milliken, G. A. (1982), “Simultaneous Con- Transformation/Weighting Model for Estimating Michaelis- fidence Bands for Nonlinear Regression Models,” Communi- Menten Parameters,” Biometrics, 45, 637-656. cations in Statistics-Theory and Methods, 11, 1241-1253. Seber, G. A. F., and Wild, C. J. (1989). Nonlinear Regression, Loh, W.-Y. (1987), “Calibrating Confidence Coefficients,” Jour- New York: John Wiley. nal of the American Statistical Association, 82, 459-465. Snee, R. D. (1986), “An Alternative Approach to Fitting Models Mee, R. W. (1989), “Normal Distribution Tolerance Limits for When Re-expression of the Response Is Useful,” Journal of Stratified Random Samples,” Technometrics, 31, 99-106. Quality Technology, 18, 211-225. Neter, J., and Wasserman, W. (1974). Applied Linear Regression Stine, R. A. (1985), “Bootstrap Prediction Intervals for Regres- Models, Homewood, IL: Richard D. Irwin. sion,” Journal of the American Statistical Association, 80, 1026- Ricker, W. E. (1954), “Stock and Recruitment,” Journal of the 1031. Fisheries Research Board of Canada, 1, 559-623. Wedderburn, R. W. M. (1974), “Quasi-likelihood Functions, Ricker, W. E., and Smith, H. D. (1975), “A Revised Interpre- Generalized Linear Models and the Gauss-Newton Method,” tation of the History of the Skeena River Sockeye Salmon,” Biometrika, 61, 439-447.

TECHNOMETRICS, MAY 1991, VOL. 33, NO. 2