, Transformations and Estimation in Nonlinear Regression

Olaf Bunke1, Bernd Droge1 and J¨org Polzehl2 1 Institut f¨ur Mathematik, Humboldt-Universit¨at zu Berlin PSF 1297, D-10099 Berlin, Germany 2 Konrad-Zuse-Zentrum f¨ur Informationstechnik Heilbronner Str. 10, D-10711 Berlin, Germany

Abstract

The results of analyzing experimental using a parametric model may heavily depend on the chosen model. In this paper we propose procedures for the ade- quate selection of nonlinear regression models if the intended use of the model is among the following: 1. prediction of future values of the response variable, 2. estimation of the un- known regression function, 3. calibration or 4. estimation of some parameter with a certain meaning in the corresponding field of application. Moreover, we propose procedures for variance modelling and for selecting an appropriate nonlinear trans- formation of the observations which may lead to an improved accuracy. We show how to assess the accuracy of the parameter estimators by a ” oriented bootstrap procedure”. This procedure may also be used for the construction of confidence, prediction and calibration intervals. Programs written in Splus which realize our strategy for nonlinear regression modelling and parameter estimation are described as well. The performance of the selected model is discussed, and the behaviour of the procedures is illustrated by examples.

Key words: Nonlinear regression, model selection, bootstrap, cross-validation, variable transformation, variance modelling, calibration, squared error for prediction, computing in nonlinear regression. AMS 1991 subject classifications: 62J99, 62J02, 62P10.

1 1 Selection of regression models

1.1 Preliminary discussion

In many papers and books it is discussed how to analyse experimental data estimating the parameters in a linear or nonlinear regression model, see e.g. Bunke and Bunke [2], [3], Seber and Wild [15] or Huet, Bouvier, Gruet and Jolivet [11]. The usual situation is, that it is not known, if a certain regression model describes the unknown true regression function sufficiently well. The results of the statistical analysis may depend heavily on the chosen model. Therefore there should be a careful selection of the model, based on the scientific and practical experience in the corresponding field of application and on statistical procedures. Moreover, a nonlinear transformation of the observations or a appropriate model of the variance structure can lead to an improved accuracy. In this paper we will propose procedures for the adequate selection of regression mod- els. This includes choosing suitable variance models and transformations. Additionally, we present methods to assess the accuracy of estimates, calibrations and predictions based on the selected model. The general framework of this paper will be described now. We consider possibly replicated observations of a response variable Y at fixed values of exploratory variables (or nonrandom design points) x1,...,xk, which follow the model

k Yij = f(xi)+εij,i=1,...,k, j =1,...,ni, ni = n. (1.1) i=1

In (1.1) the regression function f is unknown and real-valued and the εij are assumed 2 2 to be uncorrelated random errors with zero mean and positive σi = σ (xi). 2 2 The usual assumption of a homogeneous error variance σi = σ is unrealistic in many applications and one is often confronted with problems. The analysis of the data requires in general to estimate the regression function which describes the dependence of the response variable on the explanatory variables. This is usually done by assuming that this dependence may be described by a parametric model {f(., ϑ) | ϑ ∈ Θ} , (1.2) where the function f(., ϑ) is known up to a p-dimensional parameter ϑ ∈ Θ ⊆ Rp,so that the problem reduces to estimate this parameter using the data. As an estimate of the parameter we will use the ordinary estimator (OLSE) ϑˆ,whichisthe minimizer of the sum of squares

k ni 2 S(ϑ)= (Yij − f(xi,ϑ)) (1.3) i=1 j=1

2 with respect to ϑ . In practice this is the most popular approach to estimate ϑ.Weighted LSE will be discussed later in Section 2. To simplify the presentation in what follows, we will preliminary restrict ourselves to the case of real-valued design points, that is, we will assume to have only one explanatory variable. Note that the model (1.2) is called linear if it depends linearly on ϑ;otherwiseit is called nonlinear. Our focus will be on the latter case. A starting point in an approach to model selection is the idea, that even if a certain regression model is believed to be convenient for given experimental data, either because of theoretical reasonings or based on past experience with similar data, there is seldom sure evidence on the validity of a model of the form (1.2). Therefore a possible modification of the model could lead to a better fit or to more accurate estimates of the interesting parameters. A parameter with a certain physical or biological meaning may often be represented in different regression models as a function of the corresponding parameters. This will be the case, if the parameter γ of interest is a function

γ = γ[f(x1),...,f(xk)] (1.4) of the values of the regression function at the values x1,...,xk of its argument. Examples are one of these values, e.g. γ = f(x1), the linear slope (or growth rate) k k f(xi)(xi − x) 1 i=1 γ = k 2 with x = xi (1.5) i=1(xi − x) k i=1 or the approximated area under the curve

1 k−1 γ = (f(xi+1)+f(xi))(xi+1 − xi), (1.6) 2 i=1 which is used to characterize rate and extent of drug absorption in pharmacokinetic studies. Alternatively, the theoretical reasonings as well as experiences may lead to several models, which preliminary seem to be equally adequate, so that the ultimate choice of a specific model among them is left open for the analysis of the data.

1.2 Examples

The following examples illustrate the above discussion by describing situations with different objectives in analyzing the data. Example 1 (Pasture regrowth). Ratkowsky [12] considered data (see Table 1) de- scribing the dependence of the pasture regrowth yield on the time since the last grazing. The objective of the analysis is to model the dependence of the pasture regrowth yield on the time since the last grazing. A careful investigation of a graph of the data (see Fig- ure 1) suggested, that the process produces a sigmoidally shaped curve as it is provided

3 Table 1: Data of yield of pasture regrowth versus time

Time after pasture 91421284257637079 Yield 8.93 10.8 18.59 22.33 39.35 56.11 61.73 64.62 67.08

Table 2: Sum of squared errors for alternative sigmoidal models in Example 1

model (1.7) (1.8) (1.9) (1.10) 1 ˆ n S(ϑ) 0.93 4.65 2.42 0.80

by the Weibull model ϑ4 f(x, ϑ)=ϑ1 − ϑ2 exp((−ϑ3x )). (1.7) But there are other models of sigmoidal form, which could as well fit data from growth , e.g. f(x, ϑ)=ϑ1 exp[ϑ2/(x + ϑ3)] (1.8) or f(x, ϑ)=ϑ1 exp[− exp(ϑ2 − ϑ3x)] (1.9)

Figure 1: Pasture regrowth: Observed yield and fitted regression curves .

(1.7) • (1.8) • (1.9) • (1.10) •

• yield

• •

10 20• 30 40 50 60 70

20 40 60 80 time after pasture

4 (three-parameter Gompertz model) or ϑ2 f(x, ϑ)=ϑ1 + (1.10) 1+exp(ϑ3 − ϑ4x) (four-parameter logistic model). A listing and a description of even more alternative sigmoidal nonlinear models is given in Ratkowsky [13] (see also Seber and Wild [15] or Ross [14]). The regression curves corresponding to the alternative models (1.8), (1.9) and (1.10) fit differently well to the observations, as may be seen by the corresponding values of the sum of squared errors (1.3) in Table 2 or by the plot of the fitted models in Figure 1. Notice that the fitted curves of model (1.7) and (1.10) are hardly distinguishable in Figure 1. Example 2 (Radioimmunological assay of cortisol: Calibration). In calibration

Table 3: Data for the calibration problem of a RIA of cortisol

Dose in ng/.1 ml Response in counts per minutes 0 2868 2785 2849 2805 2779 2588 2701 2752 0.02 2615 2651 2506 2498 0.04 2474 2573 2378 2494 0.06 2152 2307 2101 2216 0.08 2114 2052 2016 2030 0.1 1862 1935 1800 1871 0.2 1364 1412 1377 1304 0.4 910 919 855 875 0.6 702 701 689 696 0.8 586 596 561 562 1 501 495 478 493 1.5 392 358 399 394 2 330 351 343 333 4 250 261 244 242 ∞ 131 135 134 133 experiments one first takes a training (calibration) sample and fits a model to the data providing a calibration curve. However, in contrast to Example 1 the real interest lies here in estimating an unknown value of the explanatory variable corresponding to a value of the response variable, which is independent of the training sample and may easily be measured. This is usually done by inverting the calibration curve. We will illustrate the procedure by considering the radioimmunological assay (RIA) of cortisol. The corresponding data from the laboratoire de physiologie de la lactation

5 (INRA) are reported in Huet, Bouvier, Gruet and Jolivet [11] and are reproduced in Table 3. The response variable is the quantity of a link complex of antibody and radioactive hormone, while the explanatory variable is the dose of some hormone. In practice, the Richards (generalized logistic) model,

ϑ2 f(x, ϑ)=ϑ1 + ϑ , (1.11) [1 + exp(ϑ3 − ϑ4x)] 5 is used for describing the dependence between the response and the logarithm of the dose. Since the aim of the is to estimate an unknown dose of hormone for a value y0 of the response, we have to invert the fitted calibration curve in y0 providing ⎛ ⎡ ⎤⎞ 1 ˆ ˆ ϑ5 −1 ⎜ ⎢ ϑ2 ⎥⎟ xˆ(y0)=ϑˆ4 ⎝ϑˆ3 − log ⎣ − 1⎦⎠ . (1.12) y0 − ϑˆ1

In (1.12) ϑˆ1,...,ϑˆ5 denote the least squares estimates of the parameters of model (1.11), andx ˆ(y0) is the estimated log-dose associated with the response y0. Of course, for- mula (1.12) can only be applied if y0 belongs to the of model (1.11), that is, if y0 ∈ (ϑˆ1, ϑˆ1 + ϑˆ2). If y0 lies outside of this range then we have to modify the procedure, e.g. by taking the minimum or maximum value of the observed dose in dependence of y0 being larger or smaller than all possible values of the model function (in case that f((x, ϑˆ) is monotonously decreasing).

Example 3 (Length versus age for dugongs: Estimation of growth rate). In many agricultural but also biological applications, growth curves are studied which for large values of the explanatory variable approach an asymptote (similarly to the sigmoidal curves), but lack an inflection point. We have reinvestigated a corresponding data

Table 4: Length versus age of dugongs.

age length age length age length age length 11.80 5.0 2.26 10.0 2.41 15.5 2.47 1.5 1.85 7.0 2.35 12.0 2.50 16.5 2.64 1.5 1.87 8.0 2.47 12.0 2.32 17.0 2.56 1.5 1.77 8.5 2.19 13.0 2.43 22.5 2.70 2.5 2.02 9.0 2.26 13.0 2.47 29.0 2.72 4.0 2.27 9.5 2.40 14.5 2.56 31.5 2.57 5.0 2.15 9.5 2.39 15.5 2.65

6 set describing the length versus the age of dugongs (see Ratkowsky [12], p. 101, and Table 4), which had been examined by Ratkowsky [12] using the so-called asymptotic regression model (with various parameterizations)

x f(x, ϑ)=ϑ1 − ϑ2ϑ3. (1.13)

In contrast to Ratkowsky we will suppose, however, that the objective of the analy- sis is to determine the growth rate γ (given by (1.5)) of the dugongs. An unbiased estimate of γ may be calculated without a model: γ[Y¯1,...,Y¯k]=0.02896, where ni Y¯i := 1/ni j=1 Yij ,i=1,...,k. The use of a parametric model could lead to a more accurate estimate of γ, if the model gives a good approximation to the unknown true regression function.

1.3 A criterion for the model performance

The model fit may be apparently good (as in the Example 1)

(1) visually from a graphical representation of the estimated regression curve together with the observations or

(2) from the numerical values of the sum of squares S(ϑˆ).

The fit may be improved (even up to a vanishing S(ϑˆ)!) taking models with a large number of parameters. But it is intuitively clear, that such ”overparametrized” models f(x, ϑ) will lead to large errors in estimating its parameters and consequently also to large errors f(x, ϑˆ) − f(x) in estimating the regression function f. If the objective of the analysis is primarily the estimation of the regression function or curve, that is, of the values of the regression function itself over a region X of interest (and only secondarily the analysis of its properties and the estimation of some of its parameters), then the weighted cross-validation criterion is a convenient criterion characterizing the performance of the model f(x, ϑ):

s ni ij 2 C = ci |Yij − f(xi, ϑˆ )| . (1.14) i=r j=1

Here ϑˆi,j denotes the LSE calculated from the n − 1 observations left after deleting the observation (xi,Yij). Its numerical calculation will be easy for well parameterized models using ϑˆ as a starting value. Further we assume in (1.14) that

−1 −1 ci =(s − r +1) ni and that the values xr,xr+1,...,xs of the independent variable are in X , while the other values are those (if any!) not contained in X . If the independent variable is univariate

7 and its values xi are ordered according to their magnitude and if X =[a, b]isaninterval with a ≤ xr < ···

−1 ci = di[2ni(b − a)] , (1.16)

di = xi+1 − xi−1 (r

dr =2(xr+1 − xr),ds =2(xs − xs−1). (1.18)

If the user would not like to specify the interval [a, b], then a = x1 and b = xk should be the standard values. In the definition of the cross-validation criterion (1.14) we have introduced weights in order to take into account the distances between the different design points as well as the number of replications. A more detailed reasoning for choosing the weights just as in (1.16) is given in Section 1.4 and in Bunke, Droge and Polzehl [6], where C is characterized as an estimate of the mean squared error in estimating the values of the regression function. If the values xi are equidistant and all contained in the interval [a, b] and if there are −1 no replications (ni = 1), then the weights are identical: ci = k for i =1,...,k. The criterion (1.14) will also be convenient, if the estimated regression function will be used to predict by f(x, ϑˆ) the future values Y ∗(x) of the dependent variable for given values x in [a, b] of the explanatory variable, assuming, that their ”distribution” is represented to a certain extend by the design points x1,...,xn. i,j For some models the estimates f(xi, ϑˆ ) may not be defined for some i,e.g.in the exponential model (1.8) with ϑ1 > 0andϑ2 < 0, the value f(x, ϑ) tends to 0 for x ↓ xo = −ϑ3 (convergence from the right) while it tends to ∞ for x ↑ xo (convergence from the left). Such cases are not disturbing, if in place of C we always use the following modified cross-validation criterion (full cross-validation, see Bunke, Droge and Polzehl [6] and Droge [10]): s ni 2 C˜ = ci |Yij − y˜ij| , (1.19) i=r j=1 where i,j y˜ij = f(xi, ϑ˜ ) (1.20) i,j and where ϑ˜ is the OLSE calculated under the substitution of just the observation Yij by Yˆi = f(xi, ϑˆ).

1.4 Model selection procedure

The model selection could be done in three steps.

8 Step 1. List alternative models f1(x, ϑ1),...,fM (x, ϑM ) of similar qualitative be- haviour corresponding to theoretical or practical experience in the field of application, e.g. models with sigmoidal form as in our Example 1. The books of Ratkowsky [13] (but also of Seber and Wild [15] and Ross [14]) offer a rich selection of alternative models with one to five parameters for each possible type of qualitative behaviour:

• convex or concave curves that are either continuously ascending or descending (without maxima, minima or inflection points),

• sigmoidal shaped curves, i.e. curves possessing inflection points but without max- ima or minima (and possibly having asymptotes),

• curves with maxima and minima (and possibly one or more inflection points).

Sometimes the same models are given with different parameterizations, that is, there are models, which may be obtained from another substituting the parameters by functions of new parameters. Often it is indicated, which parameterizations may be favorable with respect to having comparatively small parameter effects curvature in the sense of Bates and Watts [1]. Such parameterization could lead to numerical as well as inferential advantages and should therefore be used. Step 2. Select among these models fm (m =1,...,M)amodelfmˆ with smallest value of the corresponding cross-validation criterion Cm given by (1.19) or (1.14):

Cmˆ =min{C1,...,CM }. (1.21)

Alternatively, one could select a model fm˜ which subjectively has an especially appealing form (e.g. a model with interpretable parameters or a model with simple structure and (or) few parameters) but with an otherwise small value of the cross-validation. For this, we may use the rule of thumb   Cm˜ ≤ 1.4Cmˆ (or Cm˜ ≤ 1.2 Cmˆ ) (1.22)

(see Examples 1, 2 and 3 treated in Subsection 1.7). Step 3. The data analysis is then done with the estimated regression function

fmˆ (x, ϑˆmˆ )orfm˜ (x, ϑˆm˜ ). (1.23)

A parameter like (1.4) would be estimated by

γˆ = γ[fmˆ (x1, ϑˆmˆ ),...,fmˆ (xk, ϑˆmˆ )], (1.24)

ϑˆm being the OLSE of the parameter under the model fm.

9 As an alternative to the OLSE in view of possibly (or most likely) heteroscedastic variances the estimate in (1.23) or (1.24) could be chosen as a weighted LSE (WLSE). It is defined to be the minimizer of

k ni 2 2 W (ϑ)= (Yij − f(xi,ϑ)) /σˆi (1.25) i=1 j=1

2 2 over ϑ ∈ Θ, whereσ ˆi denotes a convenient estimate of the variance σi (see Section 2). 2 This may in some cases increase the precision provided that the estimatesσ ˆi are suffi- ciently accurate. But often the WLSE will be less reliable, especially when the differences 2 in the variances σi are not large. Because the regression functions and the variances are unknown, actually it is not known whether the OLSE or the WLSE is better. A choice between the OLSE and a WLSE may be performed with a criterion aiming at a maximal accuracy of estimation (or calibration) and is discussed in Section 2.

1.5 The performance of the selected model

The use of a criterion like (1.14) (or (1.19)) is justified by the fact, that it estimates the weighted sum ry of actual squared prediction errors s ∗ 2 ry := ξiEy|Y (xi) − f(xi, ϑˆ)| . (1.26) i=r

∗ Here Ey is the conditional expectation over future values Y (xi) of the independent variable (under the condition of fixed observations Yij)andξi = nici.Theweightedsum ry may be seen as an approximation to the integrated actual squared prediction error over the interval X =[a, b]:

b 1 | ∗ − ˆ |2 − Ey Y (x) f(x, ϑ) dx. (1.27) b a a

2 The weighted sum ry is ( up to a model independent term σ ) equivalent to the mean 2 s 2 squared error in estimating the regression function assuming σ := i=r ξiσi and

b s  2 | − ˆ |2 ≈ 2 1 | − ˆ |2 ry = σ + ξi f(xi) f(xi, ϑ) σ + − f(x) f(x, ϑ) dx. (1.28) i=r b a a

If an estimate of the overall mean squared error s (ˆm) ∗ 2 ry = Ey|Y (xi) − fmˆ (x, ϑˆmˆ )| (1.29) i=r connected with the selected model fmˆ and the corresponding (possibly weighted) LSE ϑˆmˆ is wanted, an estimate like double cross-validation has to be used. It takes into

10 account the data dependence of the chosen model fmˆ :

s ni 2 Cˆ = ci |Yij − fmˆ ij (xi,ϑmˆ ij )| . (1.30) i=r j=1

ij Here the modelm ˆ and the (weighted) LSE ϑˆmˆ ij of the parameter in this model are calculated following the procedure described in Subsection 1.4 from the n − 1obser- vations left after deleting the observation (xi,Yij ). The calculation of (1.30) will be s computationally expensive, because for each of the i=r ni observations in (1.30) there must be calculated: (i) cross-validation values for all admitted models, that is, nM LSE’s and possibly 2 (ii) estimates for the variances σi , which are needed for the calculation of the weighted LSE ϑˆmˆ ij . (ˆm) A less computationally intensive estimate of ry may be calculated selecting randomly t (

t 1 2 Ct = ciτ |Yiτ jτ − fmˆ iτ jτ (xiτ , ϑˆmˆ iτ jτ )| (1.31) t τ of (1.30).

1.6 Alternative criteria

Cross-validation is an adequate criterion, if prediction or regression function estimation is a primary objective of the data analysis. In different situations other criteria should be used. In a calibration problem as in Example 2 a modified cross-validation criterion could be adequate assuming a calibration is demanded for observations Y of the dependent variable being in the interval (˜a,˜b):

k 1 i,j CC = |xi − xˆ(Yij, ϑˆ )|. (1.32) N i=1 j∈Ji

Here Ji contains all indices j =1,...,ni witha

11 If the objective is to estimate a parameter of the form (1.4), then an adequate criterion would be

s ni 1 i,j i,j CG = |γ[Y¯1,...,Y¯k] − γ[f(x1, ϑˆ ),...,f(xk, ϑˆ )]|, (1.33) nrs i=r j=1 where n 1 i s Y¯i = Yij and nrs = ni. (1.34) ni j=1 i=r (1.33) may be interpreted as a jackknife approximation to the mean absolute error for theestimateˆγ (see Bunke, Droge and Polzehl [6]). The criterion (1.33) is only sensible if all sizes ni are large or otherwise if the parameter (1.4) involves weighted sums of many values f(xi). This is the case for the linear slope (1.5) and for the area (1.6), if the number k of design points is not small. Modifications CC and CG in the sense of full cross-validation (see Subsection 1.3) may be defined using ϑ˜i,j in place of ϑˆi,j in (1.32) and (1.33), respectively. ”Double ” criteria analogous to (1.30) or its Monte Carlo approximation (1.31) may be formulated corresponding to the ”calibration selection criterion” (1.32) and the ”estimation selection criterion” (1.33).

1.7 Applications of the model selection procedure

In this section we present the results of applying the model selection procedure to the problems introduced in Subsection 1.2 as well as an additional one. Example 1 (Pasture regrowth, continued). We reconsider the example of pasture regrowth yield, where its dependence on time is to be modelled. In Subsection 1.2 it was already found that the model candidates should be sigmoidally shaped such as (1.7) - (1.11). However, the class of competing models could be enlarged by various other sigmoidal models given, for example, in chapter 5 of Ratkowsky (1990). In addition to the above mentioned ones we take the following:

ϑ2 f(x, ϑ)=ϑ1(1 − exp(−x )) (1.35) ϑ1 f(x, ϑ)= (1.36) 1+exp(ϑ2 − ϑ3x) ϑ ϑ1ϑ2x 3 f(x, ϑ)= ϑ (1.37) 1+ϑ2x 3 ϑ3 f(x, ϑ)=ϑ1(1 − exp(−ϑ2x )) (1.38)

ϑ3 f(x, ϑ)=ϑ1(1 − exp(−ϑ2x)) (1.39)

ϑ3 f(x, ϑ)=ϑ1(1 − ϑ2 exp(−x )) (1.40)

12 ϑ2 ϑ1x + ϑ3ϑ4 f(x, ϑ)= ϑ (1.41) ϑ4 + x 2 ϑ1 f(x, ϑ)= 1/ϑ (1.42) (1 + exp(ϑ2 − ϑ3x)) 4 f(x, ϑ)=ϑ1 + ϑ2 exp(− exp(ϑ3 − ϑ4x)) (1.43)

ϑ4 f(x, ϑ)=ϑ1 + ϑ2(1 − exp(−ϑ3x)) . (1.44)

Table 5: Cross-validation criteria and of models for Example 1.

cross-validation modified cv sum of squares model criterion (1.14) ranking criterion (1.19) n−1×(1.3) (1.7) 2.86 5 1.75 .93 (1.11) 2.20 2 1.16 .55 (1.8) 14.61 11 9.39 4.65 (1.9) 7.70 9 4.89 2.42 (1.10) 2.24 3 1.44 .80 (1.36) 2.29 4 1.63 .90 (1.37) 15.91 12 9.43 4.90 (1.38) 14.47 10 7.97 4.26 (1.39) 15.97 13 9.30 4.84 (1.41) 6.14 7 3.29 1.51 (1.42) 1.48 1 1.07 .67 (1.43) 5.14 6 2.89 1.32 (1.44) 6.67 8 3.60 1.59

For this example, the values of the cross-validation criterion (1.14) and the corre- sponding ranking of the different models are presented in Table 5. Values of the modified cross-validation criterion (1.19) and the sum of squares (1.3) are given for a comparison. Consequently, an application of the cross-validation approach to the pasture regrowth data would lead to the choice of the four-parameter Richards model (1.42). Further- more, there is no other model fulfilling the rule of thumb (1.22). However, models (1.11) (which is an extension of (1.42)), (1.10) and (1.36) violate only slightly the condition (1.22), so that one could also select one of these three models as well, e.g. the simple logistic model (1.36), having only three parameters, that is, one less than the models (1.42), (1.11) or (1.10). We remark that the models (1.35) and (1.40) are not flexible enough to provide reasonable fits to the data, but for other data sets the situation could, of course, be quite different. Figure 2 illustrates the behaviour of the best model (1.42) by plotting the observa-

13 Figure 2: Pasture regrowth: Resulting plots for model (1.42).

ϑϑ ϑϑ ϑ model: f(x, ) = 1234 /(1 + exp( - x ))^(1/ ) RSS= 0.6721 C= 1.476 ϑ ϑϑϑ 1 = 69.623 234 = 4.255 = 0.089 = 1.724 • • • •

• • Yield Residuals • • • • • • • •

10• 20 30 40 50 60 • -1.0 -0.5 0.0 0.5 1.0 1.5 2.0

20 40 60 80 20 40 60 80 Time after pasture Time after pasture tions and the fitted curve as well as the residuals.

Example 2 (Radioimmunological assay of cortisol, continued). We reconsider the example of the radioimmunological assay of cortisol. A plot of the response versus the

Table 6: Cross-validation criteria and ranking of models for Example 2.

cross-validation cross-validation model criterion (1.32) ranking criterion (1.14) criterion (1.33) (1.11) .0307 1 1565 157 (1.9) .1137 6 10618 17020 (1.10) .0362 3 1661 1562 (1.36) .0717 5 3526 11970 (1.42) .0398 4 1746 8469 (1.43) .0328 2 2093 1709 logarithm of the dose (with replacing log(0) by −3andlog(∞) by 2) suggests again a sigmoidal shaped dependence, but now the curves should be monotonously decreasing (see Figure 3). Therefore we can use those models of our catalogue of competing models

14 in Example 1, which are well defined for the given design, i.e. (1.9), (1.10), (1.11), (1.36), (1.42) and (1.43). We have excluded model (1.8), although in principle applicable, since it is not flexible enough to fit this data set well. The points ±∞ have been excluded in the cross-validation criteria by choosing a = −2.99 and b =1.99. The region of interest in the calibration criterion CC (see (1.32)) has been fixed bya ˜ = 200 and ˜b = 2500. The results are summarized in Table 6, indicating that the Richards model (1.11) should be the first choice. Naturally, one could formulate a rule of thumb for the criterion (1.32) analogously to (1.22), say by considering all models fm˜ yielding values of (1.32) with CCm˜ < 1.2CCmˆ , (1.45) where the model fmˆ is that with minimal value of (1.32). Then the logistic model (1.10) would be the only alternative candidate for analyzing the data for calibration purposes, having a more simple structure and one parameter less than the Richards model (1.11). Note that an application of the cross-validation criterion (1.14) would lead to the same ranking of the best three models. In pharmacokinetics the area under the curve obtained by analyzing radioimmuno- logical assays is used to characterize rate and extent of drug absorption. The integral with respect to the dose (recall that xi is the logarithm of the dose) can be approximated

Figure 3: RIA of cortisol: Resulting plots for model (1.11).

ϑϑ ϑ ϑϑ ϑ model: f(x, ) = 12 + /(1 + exp( 34 - x))^ 5 RSS = 2723 C = 1565 calibration criterion (CC): 0.0307 ϑϑ ϑϑϑ 12 = 133.601 = 2628.593 345 = 3.129 = -3.215 = 0.622 • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • ••• • • • • • •• • • • • ••• • • • • •• • • • • • • •• response Residuals • • • • • • • • • •

-100 0• 100 • • • • 500 1000 1500 2000 2500 • •• • • •

-3 -2 -1 0 1 2 -3 -2 -1 0 1 2 log-dose log-dose

15 by a parameter k 1 xi+1 xi γ = (f(xi+1)+f(xi))(10 − 10 ). (1.46) 2 i=1 If this parameter is of interest, for instance as a measure to compare different experi- ments, the cross-validation criterion (1.33) would again suggest to use model (1.11). Figure 3 shows the fit of the best model (1.11) to the data and presents the cor- responding residual plot. This indicates that the error variances are probably hetero- geneous. Therefore, one could try to estimate the error variances, for example, on the basis of the replicated observations and to fit the model to the data by the criterion (1.25). However, in calibration problems it seems to be important to approximate the unknown regression function with high accuracy in particular in regions where it is flat, that is, where it has a small derivative. This would suggest the use of a weighted least squares criterion different from (1.25). To avoid such a discussion here we have therefore used the ordinary nonlinear least squares approach, while a comparison of different weighted least squares in this example is left to Section 2. In order to estimate the approximated area under the curve 1.46 it would be impor- tant to approximate f more accurate if (10xi+1 − 10xi ) is large. This will be reflected in the analyses of Example 2 in Section 3.

Example 3 (Estimation of growth rate, continued). In addition to the model (1.13) we fitted almost all (i.e. more than 20) concave models of chapter 4 in Ratkowsky [13] with one to four parameters to the data. We obtained without numerical difficulties the LSE’s and the values of the model choice criteria for many of these models, especially for the models (1.8), (1.13), (1.39) and

ϑ3 f(x, ϑ)=ϑ1 + ϑ2x (1.47)

f(x, ϑ)=ϑ2 log(x − ϑ1) (1.48)

f(x, ϑ)=log(ϑ1 + ϑ2x) (1.49) x f(x, ϑ)=ϑ1(1 − ϑ2 ) (1.50) x f(x, ϑ)=ϑ1ϑ2 (1.51) ϑ1x f(x, ϑ)= (1.52) x + ϑ2 ϑ2 f(x, ϑ)=ϑ1(1 + x) (1.53) −ϑ ϑ2x 3 f(x, ϑ)=ϑ1x (1.54) x f(x, ϑ)= √ (1.55) ϑ1 + ϑ2x + ϑ3 x 1 f(x, ϑ)= ϑ (1.56) ϑ1 + ϑ2x 3

16 ϑ2 + ϑ3x f(x, ϑ)= (1.57) 1+ϑ1x f(x, ϑ)=ϑ1 + ϑ2 log(x + ϑ3) (1.58) ϑ1ϑ2x ϑ3ϑ4x f(x, ϑ)= + (1.59) 1+ϑ1x 1+ϑ3x ϑ1 + ϑ3x f(x, ϑ)= 2 (1.60) 1+ϑ2x + ϑ4x f(x, ϑ)=ϑ1 exp(−ϑ2x)+ϑ3 exp(−ϑ4x) (1.61)

Table 7 contains the results. The model with minimal value of the criterion (1.33) (for γ given by (1.5)) is (1.13). This model ranks only as eights for prediction purposes, i.e. by the cross-validation criterion (1.14), but still fulfils the rule of thumb (1.22). On the other hand, model (1.47) minimizing criterion (1.14) ranks only eleventh with respect to criterion (1.33) and exceeds the minimal value of that criterion even by more than 80 per cent. Notice that the models providing a value of γ closest to the unbiased estimate γ[Y¯1,...,Y¯k]=0.02549 of γ rank best for criterion (1.33). Figure 4 displays

Table 7: Cross-validation criteria and ranking of models for Example 3.

model criterion (1.14) ranking criterion (1.33) ranking γˆ ×100 ×1000 (1.8) 0.88 5 0.4653 2 0.02585 (1.13) 0.94 8 0.4637 1 0.02580 (1.39) 0.95 9 0.7182 6 0.02616 (1.47) 0.76 1 0.8784 11 0.02641 (1.48) 2.01 15 5.6806 16 0.03117 (1.49) 2.41 16 6.0209 17 0.03151 (1.50) 4.31 18 11.507 18 0.01397 (1.51) 4.08 17 1.9054 13 0.02729 (1.52) 1.83 14 5.1464 15 0.02033 (1.53) 1.08 11 3.057 14 0.02854 (1.54) 0.92 7 0.8759 10 0.02634 (1.55) 0.84 2 0.7498 7 0.02623 (1.56) 0.92 6 0.8706 9 0.02633 (1.57) 0.88 4 0.4720 3 0.02587 (1.58) 0.87 3 1.2735 12 0.02674 (1.59) 1.07 10 0.8705 8 0.02632 (1.60) 1.23 12 0.6215 4 0.02599 (1.61) 1.51 13 0.6372 5 0.02598

17 Figure 4: Estimation of growth rate: Results for model (1.13) and γ given by (1.5).

model: f(x, ϑϑϑϑ ) = + ^ x RSS = 0.006917 C = 0.009434 123 γ = 0.0258 CG = 0.0004637 ϑϑ ϑ 12 = 2.67 = 0.973 3 = 0.873 • • • • • • • • • • • • • • • • • • • •• • • • • • • • •• • • • •• • length • •• • • Residuals •

• • • • • -0.1 0.0• 0.1 • • • 1.8 2.0• 2.2 2.4 2.6 •

0102030 0 5 10 15 20 25 30 age age the results for the best model (1.13) according to criterion (1.33).

Example 4 (Bean root cells – simulated data). This example aims at showing the importance of taking into account the intended use of analyzing the data when an appropriate model is to be selected. The growth of bean root cells is a microscopic vegetative process where the depen- dence of the water content on the distance from growing tip is of interest. A data set of size 15 has been used by Ratkowsky [12] as an illustrating example, and as in the Example 1 it can be seen that the process produces a sigmoidally shaped growth curve. Ratkowsky [12] considered five competing models (all of them are among our candidates), but without arriving to a convenient model selection. Applying the cross-validation criterion to the data would suggest the use of model (1.10). We generated a data set (see Table 8) of size 50, which mimics the original data as follows: After transforming the x-data to the interval (0, 1) the model (1.11), which is an extension of the best model (1.10), was fitted to the data. The resulting homogeneous variance estimateσ ˆ 2 =0.868 was used to simulate 50 pseudo random normally distributed variables with mean 0 and this variance. For 50 equidistant x- values on (0, 1), the corresponding y-values were obtained by adding the simulated ”errors” to the fitted curve.

18 Table 8: Simulated data of Example 4.

xyxyxyxy .01 1.526 .27 3.014 .53 17.627 .79 20.774 .03 1.326 .29 3.787 .55 17.012 .81 22.679 .05 2.410 .31 5.113 .57 17.721 .83 20.736 .07 2.120 .33 6.506 .59 16.722 .85 20.310 .09 1.878 .35 6.848 .61 20.687 .87 21.113 .11 -0.044 .37 9.099 .63 19.150 .89 22.223 .13 0.796 .39 8.520 .65 18.590 .91 21.232 .15 2.260 .41 10.749 .67 19.746 .93 21.751 .17 0.415 .43 11.322 .69 18.692 .95 21.315 .19 1.753 .45 13.266 .71 19.311 .97 20.306 .21 3.477 .47 12.888 .73 19.623 .99 20.954 .23 3.869 .49 13.305 .75 20.097 .25 4.773 .51 14.564 .77 21.726

Table 9: Cross-validation criteria and ranking of models for Example 4. model criterion ranking criterion ranking criterion ranking (1.19) (1.32)×100 (1.33)×100 (1.7) 0.93 5 6.86 9 4.683 2 (1.11) 0.92 2 6.15 4 5.456 3 (1.10) 0.92 4 6.49 6 4.654 1 (1.36) 0.96 7 6.28 5 47.56 9 (1.40) 3.30 9 6.70 8 39.62 8 (1.41) 0.90 1 6.03 1 5.923 5 (1.42) 0.98 8 6.57 7 24.12 7 (1.43) 0.92 3 6.12 2 6.386 6 (1.44) 0.93 6 6.15 3 5.917 4

19 We have used the same catalogue of models as in Example 1. Some of the models turned out to be not flexible enough, leading to numerical problems. The results for the different model selection criteria are contained in Table 9, showing the values of the full cross-validation criterion (1.19) as well as the corresponding ranking of the models. We have checked, that the values of the cross-validation criterion (1.14) (and also the ranking of the models) differ only slightly from the values of the full cross-validation (1.19). Cross-validation favors the Morgan-Mercer-Flodin model (1.41) with four parame- ters, whereas the five-parameter model (1.11) assumed to be the true regression model in the simulated experiment ranks second, but with nearly the same value of the criterion. Figure 5 shows how model (1.41) fits to the data and presents a plot of the resulting residuals.

Figure 5: Simulated data of Example 4: Results for model (1.41).

ϑ ϑϑϑϑϑ ϑ model: f(x, ) = ( 12344 x^ + )/( + x^ 2 ) RSS= 0.7639 C= 0.901 ϑϑϑϑ 1234 = 22.035 = 4.479 = 1.569 = 0.024 • • • • • • • • • • • •• • • • • • •• • • • • • • • • •• • • • • • • • • • • • • y • • • • • • • • • • • Residuals • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • -1012 • • • • • •• • • •• • • •

0• 5 10 15 20 •

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x x

Although in the present example there is no interest in calibration, for the sake of comparison between the criteria we report the values of the criteria (1.32), (1.33) and the corresponding ranking of the models in Table 9, too. In case of (1.33) the parameter of interest was the growth rate (1.5). The ranking of the models for calibration purposes is quite different from that in the case where prediction or regression function estimation was the primary objective of the data analysis. However, the Morgan-Mercer-Flodin model (1.41) is also the best

20 for calibration purposes. If the estimation of the growth rate (1.5) would be the objective of the data analysis, then the ranking of the models would be completely different from those in the other cases. The first choice would be the logistic model (1.10) which behaves slightly better than the Weibull model (1.7). But obviously three of the models are not appropriate in this case. Even the best model for prediction and calibration purposes violates slightly a rule of thumb accepting models fm˜ with

CGm˜ < 1.2CGmˆ , whichisdefinedinanalogyto(1.45).

2 Selection of variance models and variance estimation

2.1 Selection and fitting of variance models

2 The observation variances σi may be estimated by the intra sample estimates

ni 2 1 2 si = |Yij − Y¯i| , (2.1) ni − 1 j=1 if there are enough replications ( ni relatively large!). In such an exceptional situation these estimates may be used for calculating weighted LSE. An improved variance estimation may be possible using alternative variance models and taking into consideration, that possibly it is not sure that a certain variance model is adequate and moreover it is even not sure that a certain structure of the regression function is adequate. Our procedure (see Bunke, Droge and Polzehl [6]) is based on a least squares fitting of alternative variance models to conveniently defined ”obser- vations” z1,...,zk. Here the knowledge of adequate regression and variance models (and normality of observations) are not necessary. The ”observations” zi are defined 2 in such a way, that they have (roughly) the variances σi as their expectation, as it is 2 exactly the case for the estimates si given by (2.1). Assuming ordered univariate values x1

Here we use the residuals − ˆ ei = Yi1 fm∗ (xi, ϑm∗ ) (2.3)

21 in employing the (best fitting) model fm∗ . This model is chosen among the admitted models fm as that with smallest sum S(ϑˆm) of squared errors (see (1.3)). In the case of n1 =1andnk =1weuse

1 2 1 2 z1 = |e1 − e2| and zk = |ek − ek−1| (2.4) 2 2 If the independent variable x is multivariate, then we may use

1 2 zi := |ej(i) − ei| (2.5) 2 in case of ni = 1, where the residual ej(i) corresponds to the value xj nearest to xi:

j(i) − i j − i x x =minj x x . (2.6)

(If there are several such values, we take the value xj(i) with smallest index j(i).) We fit alternative variance models of the form g(x, σ2,τ) to the above ”observations” zi by minimization of the sum of squares 2 2 2 2 2 2 S(σ ,τ)={ (ni − 1)|si − g(xi,σ ,τ)| + |zi − g(xi,σ ,τ)| }. (2.7) i:ni≥2 i:ni=1

We propose six alternative variance models, which are especially useful and have been proposed in the literature, see Carroll and Ruppert [8]: (1) The first is the exponential model

2 2 ˆ τ g(x, σ ,τ)=σ [fmo (x, ϑmo )+a] (2.8)

whereweusethemodelfmo (mo =ˆm orm ˜ ) chosen by the procedure described in Sub- section 1.4. The constant a is chosen as a := .1·maxi fmo (xi, ϑˆmo )−1.1·mini fmo (xi, ϑˆmo ). (2) As an alternative model in place of (2.8) one may fit the model

2 2 g(x, σ ,τ)=σ exp[τfmo (x, ϑˆmo)]. (2.9)

(3) Past experience (or the residuals after fitting by ) may suggest, 2 that the variance σi does not vary monotonously with the mean f(xi), but behaves possibly approximately like a unimodal function of f(xi) or like the reverse of such a behaviour. Then we could alternatively fit a quadratic variance model

2 2 2 g(x, σ ,τ)=σ + τ1(fmo (x, ϑˆmo ) − f¯mo )+τ2(fmo (x, ϑˆmo ) − f¯mo ) , (2.10) where 1 k f¯mo := fmo (xi, ϑˆmo ) (2.11) k i=1

22 (4) A bell-shaped variance model 2 2 2 g(x, σ ,τ)=σ + σ τ f¯− fmo (x, ϑˆmo ) fmo (x, ϑˆmo ) − f , (2.12)

f =mini fmo (x, ϑˆmo ), f¯ =maxi fmo (x, ϑˆmo ), has less parameters and may be useful e.g. for . (5) Sometimes also a simple may be useful:

2 2 g(x, σ ,τ)=σ + τ(fmo (x, ϑˆmo) − f¯mo ). (2.13)

(6) In many cases a homogeneous variance estimateσ ˆ 2 would be more accurate than 2 a heteroscedastic estimateσ ˆi determined by a variance model, especially when the 2 differences between the variances σi are moderate or small. Thus we would fit a constant 2 2 model g(x, σ ) ≡ σ to our observations (xi,zi)andobtain

ni 2 1 { | − ¯ |2 } σˆ = − Yij Yi + zi , (2.14) n + k 2q i:ni≥2 j=1 i:ni=1 where q is the number of points xi with ni ≥ 2. Unfortunately, some of the six models have the disadvantage of possibly leading to 2 2 negative estimatesσ ˆi := g(xi, σˆ , τˆ) for some design points xi. We replace the negative 2 2 (and also very small) estimates by some fixed small positive value, say byσ ˆ 0 := 0.1ˆσ , whereσ ˆ2 is the homogeneous variance estimate (2.14). The variance estimate will be then 2 2 2 σ¯i := max{σˆi , σˆ0} for i =1,...,k. (2.15)

2 We propose to choose among the six variance estimatorsσ ˆi one with smallest value of the ”cross-validation” criterion k 1 2 2 CV := wi|zi − σˆ−i| . (2.16) n − q i=1

The weights wi are either ni − 1(forni ≥ 2) or 1 (for ni = 1), whereas the estimates 2 σˆ−i are (essentially) estimators calculated by fitting the variance model leaving out the observations Yhl entering in the calculation of zi (for details we refer to Bunke, Droge and Polzehl [6]). 2 Alternatively, for each estimatorσ ˆi the cross-validation criteria C, C˜ (see (1.14), (1.19)) corresponding to the estimation of the regression parameters ϑ in a regression 2 model fm(x, ϑ)byaWLSEϑˆm calculated with the estimatorσ ˆi or the criteria CC,CG (see (1.32), (1.33)) corresponding to calibration and estimation of some parameter of interest using the WLSE (see (1.32), (1.33)) may be used. This allows to choose also between the WLSE and the OLSE in a regression model fm(x, ϑ) in view of a smallest estimation, calibration or prediction error.

23 2.2 Examples for variance model selection

Example 5 (Esterase count data). For the sake of illustration we discuss the sta-

Table 10: Esterase count data of Example 5. xy xy xy xy 6.4 84 10.6 93 23.0 409 20.8 260 6.7 86 10.9 100 23.5 381 21.2 474 8.0 104 11.6 97 23.8 529 21.8 416 8.1 96 11.8 131 24.4 566 22.1 317 8.4 124 12.3 256 25.2 418 23.2 376 8.6 79 12.8 219 25.5 435 23.7 466 9.0 79 13.1 215 27.2 208 24.2 412 9.5 203 13.3 268 27.7 412 24.6 369 10.5 191 13.7 249 29.0 474 25.2 531 10.8 167 13.8 226 30.8 438 26.9 472 11.1 116 14.0 329 35.2 416 27.4 646 11.6 170 14.4 255 38.6 717 29.0 595 12.1 233 14.6 317 40.5 695 30.3 527 12.6 115 14.6 216 41.2 597 33.6 635 12.8 201 15.0 301 44.5 718 38.2 695 13.1 144 15.2 389 46.6 599 39.1 1042 13.3 139 15.8 271 52.1 921 40.9 239 13.8 154 16.0 148 14.8 266 41.7 1006 13.9 97 16.1 315 15.2 193 45.0 778 14.1 288 16.9 126 15.2 278 52.0 789 14.6 239 17.1 340 15.9 250 52.4 679 6.5 85 17.7 276 16.0 103 6.1 52 7.8 127 18.8 262 16.4 256 3.1 28 8.0 107 19.2 336 17.0 137 11.2 373 8.2 130 20.5 393 17.5 136 5.6 166 8.6 105 20.8 270 18.1 342 11.0 423 8.8 153 20.9 343 19.0 486 9.2 100 21.4 270 20.5 354 10.3 159 21.8 296 20.8 459 Legend: x –amountofesterase y –observedcount

24 tistical analysis of the esterase count data example in Carroll and Ruppert [8] (Table 2.3, pp. 46-47). In an assay for the concentration of an enzyme esterase, the observed concentration of esterase was recorded, and then in a binding experiment the number of bindings were counted. The data are presented in Table 10 and it is seen, that there are few replications. In the original treatment of the example as a calibration problem, the objective is to take observed new counts and infer the corresponding concentration of esterase, using the data in the table. Alternatively, another objective could be to estimate the regression function to obtain a good model for the dependence between esterase concentration and the number of bindings. We fitted 12 models for the regression function including the linear model used in Carroll and Ruppert [8], which seem to give a good fit looking at their Figure 2.11 −1 representing RSS = n S(ϑˆm) (see Table 11). The models are almost the same as in Example 3, with the exception that the linear model

f(x, ϑ)=ϑ1 + ϑ2x (2.17) has been added, whereas the models (1.48) – (1.59) and (1.61) have been excluded because of some numerical problems (such as exceeding the maximal number of iterations or providing a singular gradient matrix) in fitting the models or in calculating the cross- validation criteria. Notice that some of these numerical problems could probably be avoided by transforming the design points to another interval, say to (0, 1).

Table 11: Residual sum of squares (RSS) for the models considered in Example 5.

model RSS model RSS model RSS (1.8) 10458 (1.53) 10826 (1.57) 10539 (1.13) 10532 (1.54) 10475 (1.58) 10546 (1.39) 10461 (1.55) 10427 (1.60) 10404 (1.47) 10592 (1.56) 10464 (2.17) 10752

The model m∗ with smallest residual sum of squares S(ϑˆm)(RSS) turns out to be the model (1.60), see Table 11, where the RSS for all 12 models (calculated under the assumption of a homogeneous variance) are recorded. This model m ∗ will be used as the ”reference model” in the calculation of the residuals (2.3), which are needed for the variance estimation. Moreover, for fitting the variance models we need the model m0 =ˆm with minimal value of the cross-validation criterion C (under the assumption of a homogeneous variance), which is (1.8), see the last column of Table 12.

25 Table 12: Values of the cross-validation criterion C for all combinations of a regression and a variance model in Example 5.

Regression Variance model model (2.8) (2.9) (2.10) (2.13) (2.12) homogeneous (1.8) 15859 15919 15861 15853 15812 15860 (1.13) 16173 16200 16174 16176 16003 16058 (1.39) 15910 15951 15912 15911 15852 15892 (1.47) 16406 16439 16406 16405 16219 16281 (1.53) 17089 17032 17088 17098 16444 16515 (1.54) 15873 15961 15873 15861 15868 15917 (1.55) 15813 15862 15815 15810 15878 15903 (1.56) 15933 15975 15935 15933 15871 15913 (1.57) 16188 16219 16189 16190 16027 16083 (1.58) 16200 16234 16201 Inf 16049 16105 (1.60) 15997 16070 15998 15988 16274 16278 (2.17) 16469 16560 16471 16437 16380 16435

Table 13: Values of the cross-validation criterion CV and of the parameter estimates for all variance models in Example 5.

Variance CV-value Values of the estimates for 2 model (2.16) σ τ (τ1) τ2 (2.8) 432615620 9.79 1.208 (2.9) 458925519 5257.6 0.00237 (2.10) 447790756 12288. 39.331 0.010239 (2.13) 412023686 12675 40.986 (2.12) 462674900 10818 0.017683 homogeneous 453078924 12610.

2 After calculation of raw data si (see (2.1)) or zi (see (2.2)) resp. for the fitting of models for the variance, we fitted for each of the 12 regression models the six alternative variance models discussed in Subsection 2.1 to the ”observations” (2.2). The combination of a regression model and a variance model leading to the small- est criterion C is given by the model (1.55) and the linear variance model (2.13) (see Table 12). That is, if an estimation of the regression function would be of interest, a 2 WLSE calculated for the model (1.55) with weights given by the variance estimatesσ ˆi

26 Figure 6: Esterase count data: Illustration of the results (rf denotes the model fm0 ).

ϑϑϑϑ model: f(x, ) = x /( 12 + * x + 3 * sqrt(x))) RSS = 10440 C = 15810 ϑϑϑ 123 = 0.109 = 0.002 = -0.019 Variance model (WLSE): sigma2 + tau * (rfw - mean(rfw)) • • • • • • • • • • • • • • • • • • • • • • • • • • • • ••• • • •••••• •• • • • • • • • •• ••••• • • ••• •• ••••• •• •••• • • ••• • •••• • ••• • • • •• • • •• • ••••••• • • • • •• • • • •• ••• • • • • • • • Residuals • • • • • •• • •• • Observed count • • • • •• • • •••• ••• • •• • • ••••••• • •• • •••• • •• •• • • • ••• •• ••• ••• •• •• • ••••••• • • -400 -200 0 200 400 • • • 0 200 400 600 800 1000 10 20 30 40 50 10 20 30 40 50 Amount of esterase Amount of esterase determined by fitting the model (2.13) would be expected to lead to smaller estimation errors (on the average) than the other WLSE or the OLSE. On the other hand, the estimation of the variances themselves may be of primary interest, e.g. for estimating the variance of the OLSE. Then the linear variance model (2.13) has the smallest cross- validation value CV (cf. (2.16)), see Table 13, where also the fitted parameter values for the six variance models are recorded. Figure 6 illustrates the results of this example. The left-hand panel depicts the (weighted least squares) data fit of the best model according to the cross-validation cri- terion C, that is of model (1.55) with weights calculated using the bell-shaped variance model (2.12). The right-hand panel shows the corresponding plot of the residuals as well as a ”σ-region” using the variance model (2.12), where the boundary of the latter is defined by the functions ±[g(x, σˆ 2, τˆ)]1/2. From this graphical representation it is seen that in this example the estimate has been truncated (in a small region) according to (2.15).

Example 2 (Radioimmunological assay of cortisol, continued). We reconsider again the example of the radioimmunological assay of cortisol looking for a possibly het- eroscedastic variance estimation and for a possibly better estimation using a WLSE.

The reference model m∗ providing the smallest value of RSS is now (1.11). The same

27 model is also used as m0 =ˆm in the definition of the variance models.

Table 14: Values of the cross-validation criteria C, CC and CG for all combinations of a regression and a variance model in Example 2.

Regression Criterion Variance model model (2.8) (2.9) (2.10) (2.13) (2.12) homogeneous (1.11) C 1503 1513.9 1502.3 1506.2 1559 1565.2 CC 0.0304 0.0304 0.0303 0.0305 0.0307 0.0307 CG 158.5 153.5 160.5 149.9 243 156.8 (1.9) C 16285 12082 16498 16498 10107 10618 CC 0.137 0.1229 0.1381 0.1397 0.1149 0.1137 CG 12921 13747 12946 12765 17143 17024 (1.10) C 1804 1759.9 1806.6 1814.6 1617.3 1660.9 CC 0.0351 0.0358 0.0351 0.0356 0.0354 0.0362 CG 259.7 255.8 268.5 193.5 2327 1562 (1.36) C 4211 3597.2 4274.9 4275.4 3423.6 3525.7 CC 0.0684 0.0665 0.0687 0.0694 0.0727 0.0717 CG 10196 10497 10218 10095 12229 11969 (1.42) C 1801.4 1747.7 1799.1 1867.9 1735.4 1746.3 CC 0.0371 0.0375 0.0371 0.0371 0.0405 0.0398 CG 7975 8009 7980 7912 8581 8469 (1.43) C 1976.9 1960.1 1973.3 2071.5 1995.4 2093.5 CC 0.0314 0.0313 0.0314 0.0312 0.0332 0.0328 CG 465.4 483.5 474.6 403.3 2361 1709

In Table 14 we record the values of the cross-validation criteria C (for regression function estimation or prediction), CC (for calibration) and CG (for the parameter γ defined in (1.46)) corresponding to the six different regression models. The points ±∞ have been excluded in the cross-validation criteria by choosing a = −2.99 and b =1.99. The range of Y ’s considered in the calibration criterion CC (1.32) is set bya ˜ = 200 and ˜b = 2500. The model (1.11) together with the quadratic variance model (2.10) lead to a smallest C, that is, a WLSE using this model and the variance estimators calculated by fitting the model (2.10) will be the first choice if an estimation of the regression function or prediction is of interest. We see that we have a diminishing of 4% in the criterion C,ifweusetheWLSEinplaceoftheOLSEforthemodel(1.11),whichwasusedin Subsection 1.7. The same combination (1.11), (2.10) of models leads to the smallest value of CC,

28 Table 15: Values of the cross-validation criterion CV and of the parameter estimates for all variance models in Example 2.

Variance CV-value Values of the estimates for 2 model (2.16) σ τ (τ1) τ2 (2.8) 1393684 .000107 2.274 (2.9) 1587828 373.3 0.00112 (2.10) 1446463 1486.6 2.572 .00115 (2.13) 1879743 2542.5 3.033 (2.12) 12662879 4193.6 -.00159 homogeneous 11815005 2911.2

so that the same models are the best when calibration is the objective of the analysis. However, using the WLSE in the model (1.11) for calibration instead of the OLSE as it was proposed in Subsection 1.7 leads only to a small improvement of 1.2% of the value CC. Notice that using model (1.11) together with the exponential variance model (2.8) provides nearly the same results. Hence this combination could also be used for both prediction and calibration purposes. If variance estimation is considered as primary, Table 15 shows that the exponential model (2.8) has the smallest value CV and should be used for estimating the variance.

3 Variable transformations

3.1 Preliminary discussion

In several situations it may be advantageous to fit a regression model to transformed observations T Yij := T [Yij](i =1,...,k; j =1,...,ni) (3.1) of the dependent variable, e.g. using a logarithmic (T [y]=logy) transformation. Such situations may be:

1. It is known, that a certain parametric model f(x, ϑ) gives a good fit to the trans- T T formed observations Yij and small residuals Yij − f(xi, ϑˆT ), where ϑˆT denotes the ordinary least squares estimate in such a fitting. In this case the inversely transformed model −1 fT −1(x, ϑ):=T [f(x, ϑ)] (3.2)

29 will be used to estimate the regression function of the original observations by

fT −1(x, ϑˆT ) or

2. The variances of the variables Yij are markedly heteroscedastic and the transforma- T T tion reduces or eliminates the heteroscedasticity: small differences VarYij −VarYh . This is illustrated in the following example.

Example 2 (Radioimmunological assay of cortisol, continued). In Subsection 2.2 we discussed how a fitting of the model (1.11) using a WLSE of its parameters (based on certain variance estimates) may lead to an improved calibration (in comparison to the fitting of other models using different WLSE or OLSE of their parameters). As alternative procedure for the treatment of a situation with heteroscedastic variances, which seem to be present in this example, we could try to eliminate the heteroscedasticity by a convenient transformation of the dependent observations Yij and a fitting of an adequate model to the transformed observations using OLSE of the parameters. Figure 7 shows the results of fitting the transformed model (1.11) to the transformed observations T −1 Yij =2 s (Yij + a) − 1 ,wheres and a are specified in the next subsection. Note that the residuals now show less (or not at all) a heteroscedasticity than the residuals by fitting to the original observations (see Figure 3). We propose to use the OLSE, even if the variances of the original or of the trans- formed observations are heteroscedastic. The basic idea is that the OLSE does not suffer from the possibly pernicious influence of errors in estimating the variances on the otherwise higher efficiency of the WLSE. Therefore it may happen in some cases, that the simple calculation of an OLSE for conveniently transformed observations and a corresponding selected model leads to a more accurate estimation of parameters (or of the regression function) or to a better calibration than using a WLSE together with conveniently selected regression and variance models (see Example 2 in Subsection 3.4). With this transformation approach we have an alternative approach to parameter estimation under heteroscedastic variances, which may (or may not) be better than the WLSE approach discussed in Section 2. A choice between both should be done on the basis of the corresponding cross-validation value, as it will be explained in the following . Always it will be useful to compare the results of the analysis of the original observations with that of transformed observations and to make a decision about a possible transformation using a data dependent criterion.

3.2 Standard transformations

The selection of a suitable transformation T is simplified, if T is restricted to a sufficiently small but flexible class of transformations. Such a class is the class of (modified) Box-

30 Figure 7: The effect of a square root transformation of the observations and of model (1.11). on the heteroscedasticity in Example 2.

ϑϑϑ ϑϑ ϑ λ model: f(x, ) = 12 + /(1 + exp( 34 - x ))^ 5 case= A = 0.5 RSS = 2724 C = 1512 calibration criterion (CC): 0.03054 ϑϑ ϑϑϑ 12 = 132.688 = 2628.315 345 = 3.158 = -3.23 = 0.616 • • • • • • • • • •• •• • • • • • • • • • • • • • • • • • • • • • ••• • • •• • • • • • • • • •• • • •• • • •• • • • •• • • • • • • transf(response) • • • • •• • • • • • • Residuals (under transformation) • • • • •

• -0.10• -0.05 0.0 0.05 0.10 -2 -1 0 1

-3 -2 -1 0 1 2 -3 -2 -1 0 1 2 log-dose log-dose

Cox-transformations ⎧ ⎨ −1 λ {(s [y + aλ]) − 1}/λ for λ =0 λ T (y)=⎩ −1 (3.3) ln(s [y + ao]) for λ =0 defined for real λ and convenient constants aλ and where k ni k ni 2 1 2 1 s := |Yij − Y¯ | , Y¯ := Yij . (3.4) n i=1 j=1 n i=1 j=1 In many cases it will be sufficient to try some of the transformations given by a finite set of different λ-values (including λ = 0 at least). A standard choice is: λ =0, 1/2, −1/2, 1, −1, 2, −2, (3.5) and to choose the following constants (see Droge [9]), where

ij ij y =mini,j Y , y¯ =maxi,j Y (3.6)

−e ao =[10(¯y − y)/2] − y, a1/2 = a1 = a2 = −y,

a−1 =(¯y/9) − (10y/9), (3.7)

a−1/2 =0.005¯y − 1.005y

a−2 =0.3¯y − 1.3y .

31 3.3 Selection of variable transformations

The variable transformation T has to be chosen simultaneously with an appropriate regression model fm(x, ϑ). The criterion C for this choice should correspond to the objective of the statistical analysis. Such criteria are discussed in Subsections 1.3 and 1.7. Consequently we have to choose a pair (λ,ˆ mˆ ) among the admitted pairs

(λ, m)(e.g., λ =0, ±1/2, ±1, ±2; m =1,...,M) (3.8) minimizing the criterion C = C(λ, m), which depends on the transformation T and the model fm. As in Subsection 1.4. one could as well select a pair (λ,˜ m˜ ) with especially appealing form, which fulfills a rule of thumb, say

C(λ,˜ m˜ ) ≤ 1.4C(λ,ˆ mˆ ). (3.9)

If the objective is the estimation of the values of the true regression function f or the prediction of values of the original dependent variable y (and not a good fitting or prediction for any transformed dependent variable), then we have two possibilities:

Case A: The admitted models fm (perhaps only one!) are considered as very good approximations to the regression function.

Case B: It is not known, which models are good approximations to the regression function.

Then the criterion should be different for the two cases. T In the case A the fit to the transformed observations Yij will be using the transformed regression functions fT,m(x, ϑ):=T [fm(x, ϑ)] (3.10) and the corresponding LSE ϑˆT,m minimizing T,m T 2 S (ϑ):= |Yij − fT,m(xi,ϑ)| . (3.11) i,j As the estimation of the original (untransformed) regression function or the prediction of the original dependent variable requires transforming back with the inverse T −1 of the transformation T , the criterion for the selection of the pair (λ, m) will be

s ni | − −1 ˆij |2 C(λ, m)= ci Yij Tλ [fTλ,m(xi, ϑTλ,m)] . (3.12) i=r j=1

In the case B we have to replace in (3.11) and (3.12) the models fT,m by fm. The alternative criteria discussed in Section 1.5 may also be adapted in a similar manner to a simultaneous choice of a transformation and a model. The estimate ϑˆ has to be chosen as discussed above for the cases A and B. Pairs (λ, m) for which there are numerical difficulties in computation should automatically be excluded from choice.

32 3.4 An example for the selection of variable transformations

Example 2 (Radioimmunological assay of cortisol, continued). In the treatment of our Example 2 in Subsection 2.2 we arrived at a WLSE in the Richards model (1.11) for the purpose of estimation of the regression function itself or prediction as well as for the purposes of calibration and estimation of the area under the curve (1.46). Now we try to fit transformations of the six models considered in our example to correspondingly trans- formed observations of the dependent variable and choose the best pair (λ, m) following the procedure described in Subsection 3.3 and using the values given in (3.5), (3.6). We assume the case A and take the cross-validation criterion given by (3.12) if regression function estimation or prediction is the objective. If the objective is calibration, we take the criterion given by

k 1 | − ˆij | CC(λ, m)= xi xˆm(Yij , ϑTλ,m) , (3.13) N i=1 j∈Ji

wherex ˆm is the calibration function corresponding to fm and Ji is defined as in (1.32). Our criterion for estimation of γ

s ni 1 | ¯ ¯ − ˆi,j ˆi,j | CG(λ, m)= γ[Y1,...,Yk] γ[fm(x1, ϑTλ,m),...,fm(xk, ϑTλ,m)] , (3.14) nrs i=r j=1

is a generalization of (1.33). Notice that in case B this formulas have to be modified appropriately, namely by calculating the inverted calibration curvex ˆ for each index ij at the transformed observation Tλ(Yij) and the corresponding leave-one-out parameter estimate for fitting the (untransformed) model to the transformed observations of the response variable. Table 16 exhibits the results of the calculations being performed with the tools de- scribed in Bunke, Droge and Polzehl [7]. Again a = −2.99, b =1.99,a ˜ = 200 and ˜b = 2500 are used in the calculations. We see that in the case of regression function estimation or prediction, the transformed Richards model (1.11) fitted to the transfor-

mation T−2(Yij) of the observations gives the best combination of transformation and model, and better results than the original fitting to the observations Yij using a WLSE for conveniently selected models. In the case of calibration, a fitting of the transformed model (1.11) to the transformed observations gives again the best results. But now the optimal transformation of the observations is T1/2(Yij ). The associated value of the criterion CC is almost as good as that for the best possible WLSE obtained in Subsection 2.2 (see also Table 14). For estimation of γ (defined in (1.46)) best results are obtained using either model (1.43) with λ = 0 or again the Richards model (1.11) with λ = −.5. Both λ-values correspond to transformations using more weight for small values of f. This corresponds

33 to enforcing more accuracy where the differences 10xi+1 − 10xi are large, and therefore forγ ˆ itself.

Table 16: Values of the cross-validation criteria C, CC and CG for transformed models in Example 2.

Regression Criterion λ value in Tλ() model 2 1 .5 0 -.5 -1 -2 (1.11) C 1780.1 1565.2 1512.1 1478.8 1460.5 1448.1 1436 CC .06095 .0307 .03054 .03092 .03121 .03118 .03105 CG 5467 156.8 198.1 122.2 91.27 102.8 116.6 (1.9) C . 10618 . 32448 34129 22277 87959 CC . .1137 . .1842 .1593 .1185 .05694 CG . 17020 . 32448 7399 7525 7147 (1.10) C 2143.8 1660.9 1981.3 2610.2 3062.4 2927.8 2788.5 CC .1732 .03619 .04054 .04131 .04154 .03925 .03588 CG 11500 1562 1084 506.6 338.9 323.8 305.2 (1.36) C . 3525.7 4557.4 10369 14048 12094 11136 CC . .07169 .07335 .09908 .1008 .08855 .06909 CG . 11970 9260 7483 7294 7443 7578 (1.42) C 2169.3 1746.3 1857 2634.3 3230.1 3063.7 2697.4 CC .05699 .03979 .03814 .04049 .044 .04338 .04203 CG 10240 8469 7545 6946 6913 7039 7185 (1.43) C 21143 2093.5 2070.3 1987.9 1914.4 1847.7 1743.1 CC .1216 .0328 .03127 .03201 .03235 .03256 .03264 CG 70520 1709 1587 90.67 105.5 124.8 142.7

4 Accuracy , prediction and calibration in- tervals

Parameter estimates, predictions or calibration values are only of value, if they are complemented by an assessment of their accuracy. Therefore accuracy characteristics like the , the or the relative mean absolute errors of an estimator, of a predictor or of a calibration value are of interest. Also corresponding confidence, prediction and calibration intervals should be provided. We propose to use bootstrap procedures for these objectives, which will be presented in the following subsections.

34 We remark, that rough preliminary (and sometimes perhaps already satisfying) ac- curacy assessments may already be taken from the optimal value of the characteristics used to select the regression model (and possibly also a transformation). The relative mean absolute error of a parameter estimate (1.24) would e.g. be estimated by (1.33) or by its ”double resampling” modifications, if also the data dependence of the selected model is to be considered. We assume, that a transformation T = Tλ of the observations and a model fT,m have been chosen and a corresponding OLSE ϑˆ := ϑˆT,m is used as a parameter estimate. The accuracy assessments are realized for the estimates, predictions or calibrations based on model m and transformation T taken as fixed. The presentation is for the case A (see Subsection 3.3). It covers also the case B, replacing in this case fT,m by fm in all expressions. The presented bootstrap procedures may also be applied when a model fm for the regression function and a variance model are selected as in Sections 1 and 2 and if cor- responding WLSE are used for the original observations (without any transformation). In this case we have to use λ = 1 (identical transformation) and to replace the OLSE ϑˆT,m by the WLSE ϑˆm. A possibly direct consideration of the dependence of the selected model and trans- formation in the data requires considerably more effort, being based on obvious ”dou- ble resampling” modifications of our procedures. These modifications are described in Bunke, Droge, Polzehl [6].

4.1 Bootstrap

The bootstrap is a resampling method that tries to mimic the (unknown) distribution of the observations by generating artificial independent ”bootstrap” samples from the fitted model. The estimates are repeatedly calculated for these bootstrap samples. The accuracy of the original estimate is inferred from the empirical distribution of these bootstrap estimates. To generate the bootstrap samples we use the following moment oriented bootstrap procedure.

We first search for a transformation T∗ and a corresponding model fT∗,m∗ (see (3.10))

T∗ with the aim of having a distribution (of the transformed observations Yij )beingas near as possible to a normal distribution. This is accomplished by a maximum likelihood choice of T∗,m∗ following Bunke [4]. We have to maximize a ”normal likelihood” or equivalently to minimize ⎡ ⎤ ⎣ T 2⎦ L(T,m)=n ln |Yij − fT,m(xi, ϑˆT,m)| +2nλ ln s − 2(λ − 1) ln(Yij + αλ): (4.1) i,j i,j

L(T∗,m∗)=minL(T,m) (4.2) T,m

35 T∗ We estimate the variances VarYij using the approach of Section 2 for the transformed

T∗ 2 T∗ observations Yij .Let¯σi betheestimateofthevarianceVarYij provided by the variance model minimizing the criterion (2.16). We then generate B bootstrap samples

T∗,b ˆ b Yij := fT∗,m∗ (xi, ϑT∗,m∗ )+εij (j =1,...,ni; i =1,...,k; b =1,...,B), (4.3)

b where εij are independent realizations of a normal distribution with zero mean and 2 varianceσ ¯i . Alternative procedures using higher order moments are discussed in Bunke [5]. If it is clear that fT∗,m∗ is only a rough approximation, in comparison with the variability of the observations, to the unknown true regression function, then it makes sense to take the n ˆ ¯ T∗ i T∗ alternative procedure obtained by replacing fT∗,m∗ (xi, ϑT∗,m∗ )byYi =1/ni j=1 Yij .

This would avoid biases due to an incorrect model m∗ b b For each bootstrap sample (4.3) we calculate the corresponding OLSE ϑˆT,m =: ϑˆ by minimization of k ni b T,b 2 S (ϑ):= Yij − fT,m(xi,ϑ) . (4.4) i=1 j=1

4.2 Accuracy characteristics and confidence intervals

We assess the accuracy of an estimateη ˆ = η(ϑ,ˆ fm(x1, ϑˆ),...,fm(xk, ϑˆ)) of a one di- mensional parameter η = η(ϑ, f(x1),...,f(xk)). The variance Varη ˆ ofη ˆ is estimated by

B B 1 b − 1 d 2 Varη ˆ := − (ˆη ηˆ ) , (4.5) B 1 b=1 B d=1 b b b b whereη ˆ = η(ϑˆ ,fm(x1, ϑˆ ),...,fm(xk, ϑˆ )), while the standard deviation ofη ˆ is esti- mated by 1/2 σˆηˆ := Varη ˆ . (4.6) The bias Biasη ˆ = Eηˆ − η ofη ˆ is estimated by

B 1 b Bias (ˆη)= ηˆ − η,ˆ (4.7) B b=1 and the mean absolute relative error MARE = |η|−1E|ηˆ − η| ofη ˆ by

B b 1 |ηˆ − ηˆ| MARE (ˆη)= | | . (4.8) B b=1 ηˆ

Simple confidence intervals for η of approximate confidence (1 − α)aregivenby IB := bα/2,b1−α/2 , (4.9)

36 where bα denotes the α-quantile of the empirical distribution of the bootstrap estimates ηˆb (b =1,...,B). Such quantiles are easily determined, while the following improved confidence intervals require considerably more computation time, being based on a dou- ble bootstrap procedure. We estimate the ofη ˆ for each of the bootstrap samples (4.3) applying again a bootstrap procedure as described in Subsection 4.1 and b the formula (4.6). We use these estimatesσ ˆηˆ in a pivot statistics S.Letaα denote the α-quantile of the empirical distribution of the pivot values

b b ηˆ − ηˆ S = b . (4.10) σˆηˆ

The improved (1 − α)-confidence intervals are then

I˜B := η, η¯ = ηˆ − a1−α/2σˆηˆ, ηˆ − aα/2σˆηˆ . (4.11)

4.3 Prediction and calibration intervals

For the construction of prediction and calibration intervals we consider the (future) value Y (x) of the dependent variable for a fixed value x of the explanatory variable. We estimate the variance of the transformed variable T∗(Y (x)) by

2 2 σˆ (x):=g(x, σˆ , τˆ) (4.12) where g(x, σ2,τ) is the variance model minimizing the criterion (2.16) as in Subsec-

T∗ tion 4.1, again based on the transformed observations Yij . For the bootstrap proce- b dure we need to generate independent standard normally distributed values ua (a = 1,...,r; b =1,...,B). A simple , for the value Y (x), with approx- imate coverage probability 1 − α may be calculated with the α-quantiles cα of the empirical distribution of the pivot values

b ˆb − b Sa = fT∗,m(x, ϑ ) σˆ(x)ua (a =1,...,r; b =1,...,B). (4.13)

These (1 − α)-prediction intervals are: −1 −1 IP (x):= T∗ (cα/2),T∗ (c1−α/2) (4.14)

Improved intervals are based on a double bootstrap procedure and on the α-quantiles dα of the empirical distribution of the pivot values ˆb − ˆ − b b fT∗,m(x, ϑ ) fT∗,m(x, ϑ) σˆ(x)ua Sa = 2b 2b 1/2 . (4.15) (ˆσ (x)+ˆσm (x))

2b Hereσ ˆ (x) is the estimate of the variance Var T∗(Y (x)) obtained from the transformed observations in the bootstrap sample (4.3) by the method of Subsection 4.1, that is,

37 selecting a variance model which minimizes the criterion (2.16) calculated for the trans- 2b formed observations in the bootstrap sample. The termσ ˆ m (x) in (4.15) is the estimate b ˆb of the variance of the predictorη ˆ := fT∗,m(x, ϑ ) obtained from the bootstrap sample (4.3) by the bootstrap method described in Subsection 4.1. The improved (1 − α)-prediction interval is −1 −1 I˜P (x):= IP , I¯P = T∗ (f(x)),T∗ (f¯(x)) , (4.16) where

1/2 ˆ 2 2 f(x)=fT∗,m(x, ϑ)+dα/2 σˆ (x)+ˆσm(x) , (4.17) 1/2 ¯ ˆ 2 2 f(x)=fT∗,m(x, ϑ)+d1−α/2 σˆ (x)+ˆσm(x) , (4.18)

2 ˆ and whereσ ˆm(x) is the estimate of the variance of the predictorη ˆ := fT∗,m(x, ϑ) obtained

T∗ from the transformed observations Yij by the method of Subsection 4.2. −1 For calibration intervals we use the calibration values defined by the inverse fm (., ϑ) of the regression function fm(., ϑ), that is the calibration value corresponding to the value y of the dependent variable would be:

−1 xˆ = fm (y,ϑˆ) (4.19)

Assuming that fm(x, ϑ) is monotonously increasing in x,asimple(1− α)-calibration interval is given by −1 ˆ −1 ˆ IC(y):= fT∗,m(gα/2, ϑ),fT∗,m(g1−α/2, ϑ) , (4.20) where gα denotes the α-quantile of the empirical distribution of the pivot values

b ˆb − b Sa = fT∗,m(ˆx, ϑ ) σˆ(ˆx)ua. (4.21)

Improved (1−α)-calibration intervals are given by a double bootstrap procedure, namely by 2 2 1/2 I˜C(y):= I C, I¯C = x : T∗(y)+h˜α/2{σˆ (ˆx)+ˆσm(ˆx)} < ˆ ˜ { 2 2 }1/2

Here we denote by hα the α-quantiles of the empirical distribution of the pivot values ˆb − − b b fT∗,m(ˆx, ϑ ) T∗(y) σˆ(ˆx)ua Sa = 2b b 2b b 1/2 , (4.23) {σˆ (ˆxa)+ˆσm (ˆxa)}

b −1 b ˆb wherex ˆa = fT∗,m(T∗(y)+ˆσ(ˆx)ua, ϑ ).

38 4.4 Application

Example 2 (Radioimmunological assay of cortisol, continued). We analyze both the best model (1.11) (with the linear variance model (2.10)) ob- tained in Section 2 and the best model using variable transformation (1.11) (λ = .5) obtained in Section 3. Although the main interest in this example was calibration we will use it to illustrate results in terms of prediction intervals and accuracy statistics as well. Table 17 gives the accuracy statistics and improved approximate .9-confidence inter- vals for the components of the parameter ϑ and the parameter γ defined in (1.46), as defined in Subsection 4.2 for the combination of model (1.11) and variance model (2.10). Table 18 contains the corresponding values for the best model with transformation.

Table 17: Example 2: Accuracy statistics and confidence intervals for parameters in model (1.11) (in combination with variance model (2.10)).

η η ηˆ η¯ MARE(ˆη) Bias(ˆη)ˆσηˆ ϑ1 125.11 134.41 141.95 0.027 -0.568 4.708 ϑ2 2578.9 2629.7 2676.3 0.008 -0.753 28.4 ϑ3 2.59 3.049 3.599 0.07 0.033 0.275 ϑ4 -3.544 -3.168 -2.893 0.046 -0.024 0.19 ϑ5 0.506 0.639 0.718 0.067 -0.003 0.055 γ 19195 20199 20667 0.012 -28.6 294.7

Table 18: Example 2: Accuracy statistics and confidence intervals for parameters in model (1.11) (with transformation λ = .5).

η η ηˆ η¯ MARE(ˆη) Bias(ˆη)ˆσηˆ ϑ1 127.31 132.69 137.72 0.016 -1.262 7.12 ϑ2 2582.7 2626.4 2675.9 0.008 -0.092 27.8 ϑ3 2.719 3.165 3.719 0.070 0.035 0.301 ϑ4 -3.634 -3.237 -2.952 0.046 -0.025 0.202 ϑ5 0.519 0.614 0.700 0.067 -0.003 0.055 γ 19578 20162 20732 0.011 -77.6 444.0

We calculated the improved prediction and calibration intervals (4.16) and (4.22) for a set of equally spaced points x and y, respectively. Approximate .9-prediction intervals

39 Table 19: Example 2: Prediction and confidence intervals based on the combination of regression model (1.11) and variance model (2.10).

x IP (x) η ηˆ = fˆm(x, ϑˆ)¯η I¯P (x) -3.0 2605.4 2711.2 2764.1 2806.6 2919.8 -2.5 2596.9 2703.2 2751.3 2792.1 2904.6 -2.0 2553.8 2664.9 2703.3 2739.9 2852.9 -1.5 2359.8 2463.8 2497.8 2533.7 2631.9 -1.0 1782.8 1855.9 1887.2 1919.9 1989.3 -0.5 981.1 1017.7 1038.0 1057.5 1093.5 0.0 470.8 490.3 498.6 507.1 527.3 0.5 251.1 263.2 270.0 276.5 287.8 1.0 166.7 174.4 183.9 190.0 198.2 1.5 134.5 142.1 152.4 158.5 164.8 2.0 117.4 125.1 134.4 142.0 147.0

Table 20: Example 2: Prediction and confidence intervals based on model (1.11) (with transformation λ = .5).

x IP (x) η ηˆ = fˆm(x, ϑˆ)¯η I¯P (x) -3.0 2603.2 2715.0 2759.1 2803.2 2914.7 -2.5 2593.5 2706.7 2747.4 2790.0 2899.6 -2.0 2553.3 2667.9 2701.8 2738.1 2850.5 -1.5 2370.5 2469.1 2499.7 2537.2 2634.1 -1.0 1798.4 1863.6 1886.0 1914.4 1976.9 -0.5 993.4 1020.9 1034.7 1050.7 1076.3 0.0 470.8 492.8 499.0 505.7 527.9 0.5 247.8 264.2 271.0 277.5 296.2 1.0 166.9 176.9 184.1 190.5 202.8 1.5 139.8 145.9 151.7 159.3 165.7 2.0 131.0 131.1 132.7 135.1 139.3

40 and .9-confidence intervals for fm(x, ϑ) are presented in Table 19 for the combination of regression and variance model (1.11,2.10). Table 20 gives the corresponding results if the model (1.11) with transformation λ = .5 is used. Tables 21 and 22 contain calibration valuesx ˆ and calibration intervals for several values of y in case of variance modelling and transformation, respectively.

Table 21: Example 2: Calibration intervals based on the combination of regression model (1.11) and variance model (2.10).

y I C(y)ˆx I¯C(y) 200 0.757 0.861 1.010 700 -0.270 -0.233 -0.196 1200 -0.644 -0.604 -0.564 1700 -0.946 -0.891 -0.839 2200 -1.312 -1.208 -1.123 2700 -2.990 -1.983 -1.585

Table 22: Example 2: Calibration intervals based on model (1.11) (with transformation λ = .5).

y I C(y)ˆx I¯C(y) 200 0.740 0.864 1.050 700 -0.265 -0.234 -0.193 1200 -0.639 -0.606 -0.573 1700 -0.940 -0.893 -0.849 2200 -1.305 -1.207 -1.133 2700 -2.990 -1.990 -1.605

The Figures 8 and Figures 9 display graphically the results obtained for model (1.11) in combination with the quadratic variance model (2.10) and for model (1.11) with transformation λ = .5. Calibration intervals are marked by horizontal bars in the upper display. Confidence intervals for fm(., ϑ) and prediction intervals are illustrated in both graphs by vertical bars (confidence intervals marked by light bars and prediction intervals by dark ones).

41 Figure 8: Example 2: Prediction, confidence and calibration intervals for the best

combination of regression and variance model (rf denotes the model fmo )

ϑϑ ϑ ϑϑ ϑ model: f(x, ) = 12 + /(1 + exp( 34 - x))^ 5 RSS = 2725 C = 1502 calibration criterion (CC): 0.03032 ϑϑ ϑϑϑ 12 = 134.407 = 2629.702 345 = 3.049 = -3.168 = 0.639 2 2 Variance model: στ + * (rf - mean(rf)) + τ * (rf - mean(rf)) 12 • • • • • • • • • • • • • • • •

• • response

• • • • 500 1000 2000 3000 • • • •

-3 -2 -1 0 1 2

log-dose

• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •

Residuals • • • • • • -100 0 100 • •

-3 -2 -1 0 1 2

log-dose

42 Figure 9: Example 2: Prediction, confidence and calibration intervals for the best model with transformation

ϑϑϑ ϑϑ ϑ λ model: f(x, ) = 12 + /(1 + exp( 34 - x ))^ 5 case= A = 0.5 RSS = 2724 C = 1512 calibration criterion (CC): 0.03054 ϑϑ ϑϑϑ 12 = 132.688 = 2626.375 345 = 3.165 = -3.237 = 0.614

• • • • • • • • • • • • • • • •

• • response

• • • • 500 1000 2000 3000 • • • •

-3 -2 -1 0 1 2

log-dose

• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • Residuals • • • • • -100 0 100 • •

-3 -2 -1 0 1 2

log-dose

43 5 Computational tools

Corresponding to the procedures proposed in the preceding sections we provide a col- lection of functions and programs written in Splus. We intend to submit these tools to StatLib. In fact all the analyses of the examples used in the preceding sections were done using this implementations. Here we will only give a short overview on the range of anal- yses covered by the programs and their interdependencies. For a detailed description of how to use the programs we refer to Bunke, Droge and Polzehl [7]. The analyses are based on a description of both the class of models and the data as simple list objects in Splus. The first problem to be solved is to get reasonable initial estimates of the parameters of all models under consideration. Therefore we provide a menu driven interactive function parinit that allows to select a subclass of models and to generate initial estimates by stochastic search for the models in this subclass. Starting with the information obtained from parinit the menu driven programs se- lectmv (selection of models with variance estimation) and selectmt (selection of models with transformation) all analyses, i.e. parameter estimation and model selection based on criteria (1.14), (1.32) and (1.33), proposed in Sections 1-3 can be performed. The returned object is a list, ordered by the criteria, containing the information for the single models as its components. Using the functions confpar (confidence intervals and accuracy statistics), predict (prediction intervals) and calibrate (calibration intervals) the accuracy statistics, con- fidence, prediction and calibration intervals introduced in Section 4 can be obtained. The functions take the results obtained for a single model by selectmv or selectmt as an argument. Both simple and double bootstrap intervals can be computed, with the simple ones as a standard because of the much higher computational effort caused by the double bootstrap. Information computed by any program is added to the returned object and employed by the other functions if possible. We provide generic functions to plot and print the information obtained during the analyses.

6 Conclusions: A strategy for nonlinear

The procedures and discussions in the preceding Sections 1 - 4 may be summarized as a useful strategy for regression analysis based on nonlinear models in the usual situation, in which (1) there is uncertainty if some model for the regression function is true or is a sufficiently good approximation to the true regression function or

44 (2) it is not known with certainty, that the variances are homogeneous or at least approximately homogeneous. Moreover there is no given variance model which is known to describe the variances sufficiently well. Our approach allows to apply alternative methods for a nonlinear regression analy- sis with the objectives parameter estimation, calibration or prediction. Moreover, the systematic use of cross-validation criteria provides a tool to compare different methods, and to select one of them. For this it is also possible to admit alternative methods be- cause of experiences in the field of applications or simply because of personal taste. The computational tools described in Bunke, Droge and Polzehl [7] allow an automatic se- lection of regression and variance models among classes of models which fulfil prescribed desired properties (possibly the classes consist only of a single model, if it is known to be adequate). Moreover the programs provide an automatic selection of transformations together with the calculation of parameter estimates, regression functions, calibration curves and of assessments of their accuracy characteristics. The programs can be used for a routine analysis and especially when there is no time or special knowledge for addi- tional statistical reasoning leading to a restriction to certain methods. Specific options in the programs can be set to restrict the class of methods, e.g. restriction to a homo- geneous variance together with (or without) a use of transformed observations. A short description of the proposed strategy for nonlinear regression analysis is the following:

Case 1: Fixed regression model with homogeneous variance. It is known, that a fixed model for the regression function is adequate, that is, approximately or even exactly true, and that the observation variances are homogeneous. Thus we calculate the OLSE for parameter estimation and calibration. Asymptotic results or the boot- strap methods of Section 6 may be used to assess the accuracy of the OLSE, for testing and confidence intervals; the methods of residual analysis may be applied to check the model and the homogeneous variance assumptions( see Huet, Bouvier, Gruet and Jolivet [11] or Seber and Wild [15]). If the diagnostic procedures reveal a misspecification of the regression model, we may try to substitute it by a more correct one and repeat the analysis. If the variance homogeneity appears to be misspecified, one could start the analysis as in the case 2.

Case 2: Fixed regression and fixed variance model. In this case there is a fixed model for the regression function and there is a fixed model for the possibly heteroscedas- tic variances. Both are known to be adequate. Then the methods of maximum-likelihood (or three step alternate least squares) (see Carroll and Ruppert [8] or Huet, Bouvier, Gruet and Jolivet [11]) for variance estimation may be applied and the corresponding WLSE for the estimation of parameters in the regression model may be calculated.

45 Case 3: Fixed regression model with heteroscedastic variances. Afixed model for the regression function is known to be adequate, while an adequate model for the heteroscedastic variances is not known. Perhaps there is only a (possibly rough) approximative variance model or some qualitative property of it like monotonicity. Then the method of Section 2 can be applied to select one of six alternative variance models and to estimate the variances. With these variance estimates it is possible to calculate WLSE for regression parameters of the regression model.

Case 4: Alternative regression models and homogeneous variance. There is no fixed regression model known to be adequate, while the variance is apparent to be (approximatively) homogeneous. Alternative models for the regression function may be fixed according to known or expected qualitative properties of the regression function as a function of the independent variable (see Subsection 1.4). Then one of these models is selected by the procedure of Subsection 1.4 minimizing a cross-validation criterion (C˜: if estimation of the regression function or prediction; CC: if calibration; CG:if parameter estimation is the objective (see Subsection 1.7)). This may also be done only approximatively according to the rule of thumb (1.22). The homogeneous variance es- timate (2.14)) may be used to estimate the variance of the OLSE in the selected model and to construct corresponding tests and confidence intervals as in case 1.

Case 5: GENERAL CASE (alternative regression models and hetero- scedastic variances). In this case there is no fixed regression model and no variance model known to be adequate. Alternative models for the regression function have to be fixed according to known or expected qualitative properties of the regression function as a function of the independent variable (see Subsection 1.4). Then one of these models is selected, together with a variance model, by the procedure of Subsection 2.1 minimizing a cross-validation criterion (C˜: if estimation of the regression function or prediction; CC: if calibration; CG: if parameter estimation is the objective (see Subsection 1.7)). This may also be done only approximatively according to the rule of thumb (1.22). Also the procedures of Subsection 3.3 may be applied as an alternative reducing or eliminating heteroscedasticity by transformation. The cross-validation or resampling   values C, C,CC,˜ CC,CG or CG corresponding to the different objectives may be used for a decision between using a WLSE or an OLSE for transformed observations. A preliminary rough accuracy assessment is, in some cases, given by the criterion used for the selection of the model (and possibly of a transformation). A variety of different accuracy assessments and of confidence , prediction or calibration intervals may be calculated by simple or double bootstrap procedures as described in Section 4.

46 References

[1] D.M. Bates and D.G. Watts. Nonlinear regression analysis and its applications. Wiley, New York, 1988.

[2] H. Bunke and O. Bunke (Eds.). in Linear Models. Wiley, Chichester, 1986.

[3] H. Bunke and O. Bunke (Eds.). Nonlinear Regression, Functional Relations and Robust Methods. Wiley, Chichester, 1989.

[4] O. Bunke. The Roles of Transformations and Response Functions chosen by Maxi- mum Likelihood, Math. Operationsforsch. Statist., Ser. Statist., 13, 395-405, 1982.

[5] O. Bunke. Bootstrapping in heteroscedastic regression situations. Discussion paper, Sonderforschungsbereich 373, Humboldt-Universit¨at, Berlin, 1995.

[6] O. Bunke, B. Droge, and J. Polzehl. Selection of regression and variance models in nonlinear regression. Discussion paper, Sonderforschungsbereich 373, Humboldt- Universit¨at, Berlin, 1995.

[7] O. Bunke, B. Droge, and J. Polzehl. Splus tools for model selection in nonlin- ear regression. Discussion paper 95-73, Sonderforschungsbereich 373, Humboldt- Universit¨at, Berlin, 1995.

[8] R.J. Carroll and D. Ruppert. Transformation and Weighting in Regression. Chap- man and Hall, London, 1988.

[9] B. Droge. On a computer program for the selection of variables and models in regression analysis. In V. Fedorov, W.G. M¨uller, and I.N. Vuchkov, editors, Model- Oriented Data Analysis, pages 181–192, Heidelberg, 1992. Springer-Verlag.

[10] B. Droge. Some comments on cross-validation. Discussion Paper 7, Sonder- forschungsbereich 373, Humboldt-Universit¨at, Berlin, 1994.

[11] S. Huet, A. Bouvier, M.A. Gruet, and E. Jolivet. Practical use of statistical tools for parametric nonlinear regression models. Technical report, INRA, Jouy-en-Josas, France, 1995.

[12] D.A. Ratkowsky. Nonlinear Regression Modeling. M. Dekker, New York, 1983.

[13] D.A. Ratkowsky. Handbook of Nonlinear Regression Models.Dekker,NewYork, 1989.

[14] G.J.S. Ross. Nonlinear Estimation. Springer-Verlag, New York, 1990.

47 [15] G.A.F. Seber and C.J. Wild. Nonlinear Regression. Wiley, New York, 1989.

48