TECHNOMETRICS 0, VOL. 25, NO. 1, FEBRUARY 1983

Diagnostic and Shifted Power Transformations

A. C. Atkinson Department of Mathematics imperial College of Science and Technology London, SW7 2B2, England

Diagnostic displays for outlying and influential observations are reviewed. In some examples apparent outliers vanish after a power transformation of the data. Interpretation of the score statistic for transformations as regression on a constructed variable makes diagnostic methods available for detection of the influence of individual observations on the transformation, Methods for the power transformation are exemplified and extended to the power transform- ation after a shift in location, for which there are two constructed variables. The emphasis is on plots as diagnostic tools.

KEY WORDS: Box and Cox; Cook statistic; Influential observations; Outliers in regression; Power transformations; Regression diagnostics; Shifted power transformation.

1. INTRODUCTION may indicate not inadequate data, but an inadequate It is straightforward, with modern computer soft- model. In some casesoutliers can be reconciled with ware, to fit multiple-regressionequations to data. It is the body of the data by a transformation of the re- often, however, lesseasy to determine whether one, or sponse.But there is the possibility that evidencefor, or a few, observations are having a disproportionate against, a transformation may itself be being unduly effect on the fitted model and hence, more impor- influenced by one, or a few, observations.In Section 3 tantly, on the conclusions drawn from the data. the diagnostic plot for influential observations is ap- Methods for detecting such observations are a main plied to the score test for transformations, which is aim of the collection of techniques known as regres- interpreted in terms of regression on a constructed sion diagnostics, many of which are described in the variable. The power transformation may, however, be books of Belsley, Kuh, and Welsch (1980) and of more appropriate after a shift in the location of the Weisberg(1980). observations.This model is presentedby Box and Cox The purpose of the present article is to describe (1964), but they do not discuss it in any detail. In some recently developed diagnostic plots that have Section 4 the methodology is extended to a test for a been applied both to multiple regression and to the shift in location, and the related diagnostic plots are analysis of data transformations. In this article the presented.The detailed structure of some plots is de- techniques are illustrated and extended to situations scribed in Section 5 and the article concludes,in Sec- in which a power transformation is appropriate only tion 6, with a discussionof somepossible extensions of after a shift in location. It is intended that the plots the methods. should be used in a routine manner to display any features that might suggestways in which the model, 2. GRAPHICAL DISPLAYS FOR or data, are inadequate. REGRESSION DIAGNOSTICS Becauseany of the data used in multiple regression The data Y are assumedto come from the multiple- may be wrong due to measuring,recording, transcrip- regressionmodel E(Y) = Xp, where X is the full-rank tion, or keypunching errors, both response and ex- n x p matrix of explanatory variables. The errors are planatory variables may be in error. To detect these independent and have variance u’. If j? is the least two kinds of error, two different quantities are needed. squaresestimate of j?,the predicted responseat the ith Some possibilities are discussedin Section 2, and the data point is gi = .x:/j, which has variance half-normal plots of Atkinson (1981)are usedto inves- var(y^,)= oZxT(XTX)-ixi = aZhi. tigate a set of data on salinity presentedby Ruppert and Carroll (1980). The ith unstandardized residual is ri = yi - ji and The presenceof one or more outlying observations var(r,) = (r2(1- hi).

23 24 A. C. ATKINSON

If s2 is the residual mean square estimate of (r2 the DENT by Belsley, Kuh, and Welsch (1980, Ch. 2). standardized residuals are Apart from the factor in IZand p, Ci is identical to the quantity they call DFFITS,. Both r? and signed values of Ci can be plotted in any of the ways custom- which are identically distributed. ary for residuals as described, for example, by Weis- To find the effect of each observation in turn on the berg (1980,Ch. 7), or in serial order as does Pregibon fitted model Hoaglin and Welsch (1978)suggest look- (1981)for his logistic regressiondiagnostics. Atkinson ing at the prediction at the ith observational point (1981) uses half-normal plots of both r* and Ci to- when yi is not used in fitting. This prediction is p(i) = gether with a form of Monte Carlo testing: with the x,r&), where the subscripted i in parenthesesis to be matrix X of explanatory variables fixed, 19 samples read as “with observation i deleted.” The t test for are simulated from the and the agreementbetween yi and fci), is the cross-validatory envelope of ordered is plotted in addition to or jackknife residual rl, which the results of Plackett the observedvalues. Becausethe null shapeof the plot (1950)show has the simple form of the modified Cook statistics depends on the posi- rf = (yi - y^i)/S~i,Jl- hi. tion of the observational points in X space,that is, on the values of the hi, the envelope makes possible the Atkinson (1981) stressesthat the jackknife residuals distinction between an influential point for which the are a monotone, but nonlinear, function of the stan- y and x values agree and one for which they do not. dardized residuals, which, however, reflect large Experience suggeststhat the disagreementat such a values more dramatically: as ri2 + n - p, rF2 + co. If point of high influence is usually due to an incorrect x the purpose is graphical detection of an outlying y value. The ability to detect such observations is an value, the jackknife residuals are to be preferred. For advantage of the plot of Ci with a simulated envelope i suppose that observation has had an unknown over the study of the components of Ci , namelyr: and amount A added to it. Then the jackknife residual r: hi/(1 - hi). will have a noncentral t distribution. All other jack- As an example of these techniques we look first at knife residuals will be shrunk due to overestimation of some data on the salinity of water used by Ruppert 0 causedby the outlying observation. This shrinkage and Carroll (1980) to demonstrate . will, however, affect all standardized residuals, since a There are 28 observations and three explanatory vari- common estimate of r~is employed. ables. Figure la shows the half-normal plot of the An outlying value of one or more explanatory vari- jackknife residuals r: when a first-order model is ables may create a point of high influence for which hi fitted. Apart from the fourth largest value, all observa- is near one. Such a point may have a large effect on tions lie within the envelope and there is no clear the fitted model, but, since the standardized residuals pattern. On the other hand, the plot of the modified all have the same variance, a residual plot will not Cook statistic Ci shown in Figure lb exhibits a clear reveal such points. Hoaglin and Welsch (1978) and pattern. The largest value of Ci, 10.2, belonging to Obenchain (1977)recommend study of residuals and observation 16, is well outside the envelope. As a of the hi or of somefunction of them such ashi/( 1 - hi) result of this one large value nearly all the other values in order to examine influential observations. Cook are shrunk below the envelope, owing to the over- (1977)suggested the statistic estimation of rr2.

Di = (b(i) - ~j)TXTX(~(i) - b/Ps2 These figures clearly show there is something strange about observation 16 and suggest that the = (ri2hi)(p( 1 - hi)). explanatory variables may be at fault. Inspection of This statistic measures the effect on the parameter the data shows that for observation 16 the third ex- estimate of deleting the ith observation. A scaled ver- planatory variable, water flow, has the value 33.443, sion of Di is used by Atkinson (1981) to obtain a whereas all other values lie in the range 20.769 to diagnostic plot. If o2 is estimated not by s2 but by s$,, 29.895, with 26.417 the second highest value. One the resulting modified Cook statistic is possibility is to “correct” 33.443to 23.443,which has the effect of reducing the residual sum of squaresfrom 42.5 to 26.3. The resulting half-normal plots of rr and Ci in Figure 2a and 2b no longer show any unduly For a D-optimum experimental design all observa- large values, although several of the small values of tions have the same leverageand hi = p/n. The effect both quantities seema little too small. We will return of the scaling of fi is to make the plots of Ci and to the further analysis of thesedata in Section 3. 1rj+ 1identical for this most balancedcase. This example illustrates the usefulnessof the plots Nomenclature for these quantities is not standard- in calling attention to featuresof the data that require ized. The jackknife residuals r: are called RSTU- further investigation. Other examplesare given in At-

TECHNOMETRICS 6, VOL. 25, NO. 1, FEBRUARY 1983 DIAGNOSTIC REGRESSION FOR SHIFTED POWER TRANSFORMATIONS 25

4.0 data are wrong or becausethe model is inadequate. I One cause of apparent outliers is that the data are 3.5 I! being analyzed in the wrong scale.For example,in an

3.0 analysis of Brownlee’s stack loss data Atkinson (1981) found that observation 21, which gave a value of Ci

2.5 lying outside the simulated envelope,could be recon- r: ; ” ciled with the body of the data by use of a log trans- . 2.0 formation. In this section we use the score test for 1, - power transformations, in combination with the diag- _- 1.5 nostic plots of Section 2, to exhibit the influence of i $ individual observations on the evidence for a trans- formation.

j/-gff~~, , , , , , , , , , , , I,, , , The approach follows that of Box and Cox (1964)in considering the parametric family of power transform- ations 0.0 0.5 1.0 1.5 2.0 2.5 3.0 NORMAL SCORES (y” - l)/nj”- r (n # 0) p) = (1) i jt 1% Y (1 = 1)

10 x 16 I ” r --; _ 1.5 I ”

-- “_- 1.0 _- I_ _ - --, _ __-- =* -- - - x rcc - 0.6------;;c- - _-______-- ---;;;;;r r _- o-~,,,,,,,,,,,,,,,,,,,,,,, - ----. . tr.r=- x -&-- 0.0 0.5 1 .o 1.5 2.0 2.6 3.0 0.0 1111,1111,1,,,,,,,,,,,,,,,,,,, NORMAL SCORES 0.0 0.5 1.0 1.5 2.0 2.5 3.0 NORMAL SCORES Figure 1. Salinity Data : Half-Normal Plots of (a) Jackknife Residuals and (b) Modified Cook Statis- tic x , Observations, - , Envelope From 19 Simula- 6 tions ” 5 .I 4 kinson (1980). In all examples consideration is only 4 given to the deletion of one observation at a time. 4 [i Highly conditional tests for the sequential deletion of 1

single observations are described by Dempster and 3 Gasko-Green (1981).A full discussion of diagnostics ” resulting from deletion of groups of observations is x 2 1 given by Cook and Weisberg (1980),who also consid- 1 er the resulting computational problems. Graphical methods for theseprocedures have not yet beendevel- oped.

3. POWER TRANSFORMATIONS AND DIAGNOSTIC PLOTS NORMAL SCORES The half-normal plots described in the previous Figure 2. Salinity Data, Observation 16 “Correc- section call attention to outlying values of responseor ted” : Half -Normal Plots of (a ) Jackknife Residuals explanatory variables. These may occur becausethe and (b) Modified Cook Statistic

TECHNOMETRICS 0, VOL. 25, NO. 1, FEBRUARY 1983 26 A. C. ATKINSON where 3 is the of the y’s. It is assumed transformation, with the score statistic at -.0844, that, for some A, the normal linear model holds to an close to zero. Once the data have been“corrected” the adequate approximation. For known 0’ the log- value increases in magnitude to - 1.61. A plot of likelihood of the observations is proportional to w,*(l), the residual constructed variable for the trans- minus the residual sum of squares formation, is given in Figure 3a. This shows a cloud of points which plausibly have some regressionwith the zT{I - X(X*X)-‘XT}z = zTAz. exception of observation 3. This point has a Ci value When dependenceof z on L is important we shall write of 6.71. The half-normal plot of the Ci for the con- z(‘), but not otherwise. structed variable, shown in Figure 3b, reveals how A full likelihood analysis of the transformation re- influential this one observation is in denying the need quires calculation of the maximum likelihood esti- for a transformation. If observation 3 is deleted the mate 1. But, in practice, only a rather limited grid of 1 score statistic becomes -2.50. The maximum likeli- values is likely to be of interest. To determine the hood estimate of the transformation parameter is evidence for one of these transformations Atkinson -.15 and a log transformation is in agreementwith (1973)suggested use of the score statistic T’(&), which the data. Ruppert and Carroll do not give the units for is based on the slope of the log-likelihood at the hypothesized value 2,. If wp = &/al, the asymp- 2 totically normal test statistic is x x z*Aw 1 I Tp = - (2) s,(w,TAw,)“~ ’ where sf is an estimate of the variance of z derived from the residual sum of squares after regression on both x and w. The dependenceon I, of Tp and wP in

(2) has again beensuppressed. .3 Comparison of the expressionfor T, with standard results in the analysis of covariance shows that Tp can be regarded as the t test for regression on a new explanatory variable wP, adjusted for the explanatory -2 . I I variables x. Box (1980)calls wPa constructed variable. This interpretation, in terms of the univariate regres- -31,,,,,"","",""1""1""1 sion of the residuals AZ on the residual constructed -1 .o -0.5 0.0 0.5 1 .o 1.5 2.0 variables wf = Aw,,, provides an analysis of the con- $11 tribution of the individual observations to the score statistic. Box gives plots of residuals against residual constructed variables for two examples. Atkinson .3 (1982) uses the diagnostic plots of Section 2 on the univariate regressionand shows half-normal plots of the modified Cook statistic Ci that provide an assess- ment of the influence of each observation on the trans- formation. Becauseinterest is in residuals from a linear model

that, in all casesto be considered,contains a constant, 3 relatively simple expressionscan be obtained for wP i without affecting the residual constructed variable w;(A). For the hypothesis of no transformation, that is II, = 1, I

“1” w,(l) = Yh4Y/3) - 11. xx.“XX’” o--=i-:‘:‘I , I I I I , I I I I , I I I I , I I I I , I I I I , Similarly for the hypothesis of a log transformation, 0.0 0.5 1 .o 1.5 2.0 2.5 3.0 that is 2, = 0, the constructed variable is NORMAL SCORES Figure 3. Salinity Data, Observation 16 “Correc- w,(O)= j log y(log y/2 - log J;). ted” : (a ) Residuals and Residual Constructed Vari- As an example we return to the analysis of the able for Power Transformation wf (1) ; (b ) Half- salinity data considered in Section 2. For the original Normal Plot of Modified Cook Statistic for Residual set of data there is no evidence of the need for a Constructed Variable

TECHNOMETRICS 0, VOL. 25, NO. 1, FEBRUARY 1983 DIAGNOSTIC REGRESSION FOR SHIFTED POWER TRANSFORMATIONS 27

Table 1. Salinity Data From Ruppert and Carroll lead to this minimizing value is given at the end of this (1980) : Evidence of the Need for a Transformation section. The problem is familiar in estimation for the three-parameter lognormal distribution (Cheng and Data A Score statistic Residual sum of Amin 1981;Voorn 1981).It also occurs in the analysis T squares of .(A) P of survival-time experiments where there may be a latent period before the toxic dose becomeseffective

1 -0.08 42.47 (Peto and Lee 1973).Although, for the power trans- formation with shifted location, joint estimation of I Observation 16 1 -1.61 26.24 and p may be problematic, an estimate for one of the ‘corrected’ parameters can readily be found if the value of the

Observation 16 1 -2.50 25.56 other is assumed known. In the absence of prior

‘corrected’, 0 -0.35 20.60 knowledge of p, we shall evaluate the transformation observation 3 on a prespecifiedgrid of 2 values. -0.15 0 20.50 deleted. With two parameters there are two constructed the response,which is salinity. If the measurementis of concentration of salt, then a log transformation is not unreasonable. The numerical results leading to this model are summarized in Table 1. Half-normal plots of Ci for the residual constructed variables for II, = 1 and 0 are shown as Figure 4. By comparison with Figure 3b these show that evidence for the log transformation is not being particularly influenced by any one observation. The purpose of this example is to show how diag- nostic plots and quantities can be usedto examine the effect of individual observations on the evidencefor a transformation. Decisions such as those to omit ob- servation 3 and to “correct” observation 16 should ideally be made in collaboration with those who col- 0'"' lected and understand the data. In this connection, o.ok\ 111,1111,1111,1111,1111,1111, Professor Carroll tells me that it was his custom when 0.0 0.5 1 .o 1.5 2.0 2.5 3.0 analyzing these data to reduce large values of water NORMAL SCORES flow to 26, which is close to the correction suggested in Section 2, without any knowledge of the data.

4. POWER TRANSFORMATION WITH SHIFTED LOCATION Sometimes a transformation may be appropriate only after a constant has been added to all observa- tions. The analog of the power transformation (1) is z(1)= NY + 14” - WGw(y + 4 1’ - ’ (1 Z 0) i b-0 + P)1 lw(y + PL) (A=O), (3) where gm(y + p) is the geometric mean of y + p. Al- though this transformation is presented by Box and Cox, they do not apply it in detail to any examples.In this section we develop a constructed variable for the shifted transformation (3) analogous to that of the previous section and investigate some properties of the transformation. NORMAL SCORES A difficulty with the transformation (3) is that as the Figure 4. Salinity Data, Observation 16 “Correc- value of p approachesminus the minimum value of y, ted”, Observation 3 Deleted : Half-Normal Plots of there will be a value of ,J < 1 for which the residual Modified Cook Statistics for Residual Constructed sum of squaresof z(‘) goes to zero. An example of the Variables ; (a) No Transformation wX( 1) and (b ) way in which simultaneous estimation of 2 and p can Log Transformation w,* (0)

TECHNOMETRICS 0, VOL. 25, NO. 1, FEBRUARY 1983 28 A. C. ATKINSON variables and two score statistics corresponding to observations were generated by adding pseudoran- univariate regression on each variable. Because the dom normal deviates to a first-order model. For this score statistics are proportional to the rate of change article we exponentiate theseresults, thus producing a of the residual sum of squaresof z(‘) as one parameter data set for which it is correct to take logs, that is the varies with the other fixed, the problem with joint true value of 1 is zero. For these data the score statis- estimation of 2 and p does not arise. A two-variable tic for no power transformation, T,(l) has the value regression on both residual constructed variables is - 11.8,so that there is overwhelming evidenceof the also possible leading to an approximate F test of 2 need for a transformation. The maximum likelihood and p with two degreesof freedom for the numerator. estimate of 1 is .077, indicating a log transformation. In the examples considered here this simultaneous At ;1 = 0, T, = .71 and T,(O,0) = .50. There is thus no analysis does not prove helpful. The modified Cook evidencethat I and ,u are not both zero. The F test for statistic for the bivariate regression likewise adds both together is also not significant with a value of .35. nothing to the results of the univariate analyses. In This example shows the tests behaving as required some cases,one of which is analyzed more fully in in the null case, that is, when no further transfor- Section 5, this is becausethe score statistics for the mation is needed.The analysis was repeated with 30 two parameters are highly correlated. On the other added to each responseafter exponentiation. The cor- hand, in the example at the end of this section, the F rect transformation is thus to log(y - 30). The re- test for the bivariate regressionis much greater than sulting test statistics are shown in Table 2. Evidenceof would be expected from the values of the individual the need for a transformation is still strong, but it is score statistics, perhaps reflecting the problem with not clear what the transformation should be. Both the joint estimation of II and cc. log transformation and the hypothesis p = 0 are re- The score statistic for testing hypothesesabout the jected by the data, with similar values of the test value of p is of the same form as (2) but with the statistics. Evidence that a shift in location is neededis constructed variable replaced by w, = az/ap. We call shown, by Figures 5a and 5b, to be spread throughout this score statistic T,(& cl) and, where necessary, the data. The plot for wf is similar. Confronted with extend the notation of Section 3 to T,(A, p) with the this contradictory evidence,one possibility is to keep convention that T,(A, 0) = T,(1). ,u = 0 and to try other values of 2. The maximum Usually we are interested in the hypothesis p = 0, likelihood estimate is 2 = -2.09, an unlikely value, which was assumed to hold in the earlier examples. with a residual sum of squaresof z(I) equal to 1296.1. Then the constructed variable Alternatively, if ;1 is kept at zero, the log transfor- 1-l mation leads to $ = -29.8 with a residual sum of w,(A)CT @!f) 2 A- l (A) (4) squaresof 900.6. ap IIr=o= 0j --mZ ' The procedures developed in this section thus lead where hm(y) is the harmonic mean of y. It is interest- to recovery of the correct model in this synthetic ing to note that the constructed variable for the power example. Do examples really occur in which a trans- transformation dependson the geometric mean of the formation should be accompanied by a shift in lo- observations, whereas the variable for a shift in lo- cation? To investigate this we look at a set of data cation introduces a secondsummary statistic, the har- given by Brown and Hollander (1977, p. 257). These monic mean. are a two-way layout of the time, in minutes, for four Two special casesof w,(n) are of particular interest. If 2 = 1, w,(l) = 1. Thus regressionon the constructed variable provides no information about the need for a Table 2. Lognormal Data With Shifted Location : shift in location in an untransformed model that in- Evidence of the Need for Transformations cludes a constant. This reflects the fact that, if the A !J Hypothesis Score statistic Residual sum of model contains a constant, ~1is not estimable and the squares of z(h) likelihood is constant, independentof I*. However, the shift in location is estimable for other transformations. 1 - A=1 6654.8 In particular, for the log transformation, *P = -8*22

0 0 A=0 Tp = -4.65 2654.2

0 0 u=o TS = -5.44 2654.2

To calibrate the behavior of the test for a changein -2.09” 0 1296.1 location we first consider an example with simulated data before looking at a second real example. The 0 -29.79+ 900.6 simulated data, taken from Atkinson (1981) are the results of a 24 experiment with one center point. The * : A given U = 0. + : u given A = 0.

TECHNOMETRICS (0, VOL. 25, NO. 1, FEBRUARY 1983 DIAGNOSTIC REGRESSION FOR SHIFTED POWER TRANSFORMATIONS 29

200

lSO-

loo- Residuals - rl - 50-

O-

-5o-

-lOO-

-lSO-

-2001,,.,,,,,,,,,.,,,,,,,,,,,,~,~,,,,,,, -300 0.1 0.2 0.3 0.4 -300 -200 -100 W$l I 0 100 200 300 400

(I* 0.0 ;:=.x,, I I I I ( I I1 I, I1 I I, I I I I, I I I, 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1 .o 1.5 2.0 2.5 3.0 NORMAL SCORES NORMAL SCORES

Figure 5. Simulated Factorial Experiment: (a) Re- Figure 6. Chimpanzee Data: (a) Residuals and sidual z(O) and Residual Constructed Variable for a Residual Constructed Variable for Power Trans- Shift in Location on the Log Scale w,*(O) and (b) formation w;( 1) and (b ) Half -Normal Plot of Half-Normal Plot of Modified Cook Statistic for Modified Cook Statistic for Residual Constructed Constructed Variable. Variable.

Table 3. Chimpanzee Data From Brown and Hol- lander (1977) : Evidence of the Need for Trans- chimpanzeesto learn each of 10 words. The response formations ranges from 2 to 420 minutes. Brown and Hollander give an analysis of variance of the original observa- tions. McCullagh (1980) criticizes this analysis and compares a generalizedlinear model with an analysis of variance of the logged response.Here we consider All 1 0 A-1 TP - -13.21 189,079 observations 0 0 h -0 T = 1.18 44,985 only an analysis of transformations. P 00 u-o T = 1.03 44.985 The numerical results are summarized in Table 3. It s is clear that the data should be transformed, with a ObSeWatiO” 1 0 h=l TP * -12.65 187,076 value of - 13.2 for T,(l). The plot of residual con- 8 deleted 0 0 h=O TP = -Oe50 41,248 structed variables w,*(1) in Figure 6a shows this strong 00 u-0 Ts = -2.64 41,248 relationship. The related plot of the modified Cook 0 0 *-u-o FL - 31.81 41,248 statistic Ci in Figure 6b shows that the evidence for Lb this transformation is spread throughout the data. 0 -8.59+ - 34,507 The maximum likelihood estimate of A is .095 and the

TECHNOMETRICS 0, VOL. 25, NO. 1, FEBRUARY 1983 30 A. C. ATKINSON I 80 remains strong, but now there is evidence,reflected in a value of - 2.64 for the scorestatistic T,(O,0), that p is not zero. The plot of the residual constructed variable, Figure 8a, and of the associated half-normal plot of the modified Cook statistic, Figure 8b, show that this evidence is spread throughout the data. The maxi- mum likelihood estimate of p is -8.59, for which the residual sum of squaresof the z(‘) is 34,507,compared to 41,248when both p and 1 are zero. There are two points that are raised by this analysis. One is that a model in which the data are logged after 8.59 has been subtracted seems,a priori, unlikely. It may be an indication that the generalized linear model fitted by McCullagh is more appropriate. How- ever, the results of Atkinson (1982)suggest that it may

100 . -I

-100 x

1

x -150 4

-200m 0.0 0.5 1 .o 1.5 2.0 2.5 3.0 -2.0 -1.5 -1 .o -0.5 0.0 0.5 1 .o 1.5 2.0 w:co1 NORMAL SCORES Figure 7. Chimpanzee Data : (a) Residualz”’ and Residual Constructed Variable for a Shift in Lo- 4.0- cation on the Log Scale w:(O) and (b) Half- 3.5- Normal Plot of Modified Cook Statistic for Residual Cl : Constructed Variable. 3.0-

two score statistics for 2 = 0 and p = 0 are both near 2.5- 1. On the evidenceof the score statistics alone there is x no reason to believe that the simple log transfor- mation is not adequate.However, the maximum value of Ci for the constructed variable w,*(O,0) is 14.8.The plot of w,*(O,0) shown in Figure 7a exhibits a cluster of points from which observation 8 is clearly dis- tanced.The half-normal plot of the derived value of Ci in Figure 7b emphasizesthe effect this one observa- tion is having. It is, in fact, nullifying the evidence from the other observations that a shift in location is required. NORMAL SCORES Observation 8 is the smallest observation, equal to Figure 8. Chimpanzee Data, Observation 8 De - 2, with the next smallest observation equal to 10. It leted : (a) Residual z(O) and Residual Constructed will therefore clearly be highly influential in the choice Variable for a Shift in Location on the Log Scale of a value for p. If this observation is deleted,the effect w:(O) and (b ) Half -Normal Plot of the Modified is appreciable.The evidencefor a log transformation Cook Statistic for Residual Constructed Variable.

TECHNOMETRICS 0, VOL. 25, NO. 1, FEBRUARY 1983 DIAGNOSTIC REGRESSION FOR SHIFTED POWER TRANSFORMATIONS 31

Table 4. Chimpanzee Data From Brown and Hol- a quadratic in z. Similarly the constructed variable for lander With Observation 8 Deleted. Residual Sum a shift in location (5) can be written as of Squares of z(n) as p Approaches Minus the Smallest Observation ; p = - (10 - E) w,(O)= 3 exp(-z/3) + (z/WY)). E Expansion of exp(- z/j) in a Taylor’s seriesyields w,(O)= i + z(l/hm(y) - 1) + z2/23 + . . x 10-e 10-l” 10-12 (7) As the plots are of residuals,the constant term in (7) is irrelevant. Comparison of (6) and (7) shows that both 0.2 5321.7 3049.7 1742.1 will give parabolic plots against z with the same qua- dratic term. In the null case,that is, in the absenceof 0.25 4981.9 2936.5 1732.2 any effects, there is no linear regression of z on the constructed variables and the plots will be dominated 0.3 5213.0 3177.0 1939.9 by the quadratic terms. To the extent that the Taylor’s series expansion yielding (7) holds, these parabolas will be the same. Figure 9 shows how similar the not be possible to discriminate between these two parabolas are for the presentexample. models for the particular parameter values and sample size of these data. The approach of Prentice

(1974), in which both models are special cases of a 2.0 three-parameter model, could be used to investigate this matter further. 1.5 1 A second point concerns the large F value in Table s 1 .o x 3 for regression on both constructed variables. This Rwduol . zlol x may reflect the difficulty in simultaneous estimation of 0.5 x 2 and p discussedearlier. To illustrate this point Table d 4 gives some values of the residual sum of squares of o.o- z(A)as p approaches- 10.If we put p = -(lo - E),for I -0.5- “, E = lo-l2 and J = .25, the residual sum of squaresis . x 1732.2, about one-twentieth of the smallest values x -l.O- given in Table 3. These results indicate that by suit- ‘x able choice of II as s--t 0 the residual sum of squares -1.5- x can be made as small as desired, subject to the limi- tations of computer word-length. -2.0 I,II,III,,I,,I,,,I,,,I,,,,,,,, -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 “*PI”1 5. DETAILED STRUCTURE OF PLOTS OF CONSTRUCTED VARIABLES The plots of residuals against residual constructed 2.0- variables shown in this article have tended to look like 1.5- Reslduol ordinary regressionscatter plots. The appearanceof a #J) 1 1 .o- point separated from the general scatter, combined z. * with the half-normal plot of the modified Cook statis- . tic, has been used to identify influential observations. 0.5- r’” But the plots need not look at all random. Figure 9a o-o- shows the plot of w,*(O)for the log of a pseudo- . lognormal sample. Figure 9b shows the plot for -0.5- w:(O, 0). These two plots are clearly parabolic, al- I I -1 .o- though the samplehas beencorrectly transformed and ” I the score statistics indicate no significant departures -1.5- x from the models. To understand the structure it is convenient to -2.01 111,,11,1,1111,,1,.(1,,1,,,,,, -1 .o -0.5 0.0 0.5 1.0 1.5 2.0 work in terms of z(O)= z = jl log y. Figure 9a is a plot W:(O) of residual z against wp*for a simple sample. From the Figure 9. Simulated Lognormal Sample : Residual results of Section 3 we can rewrite the constructed z(O) and Residual Constructed Variables on the Log variable as Scale (a) for Power Transformation w,*(O) and (b) w,(O) = z(z/2j - log j), (6) for Shift in Location w,* (0).

TECHNOMETRICS 0, VOL. 25, NO. 1, FEBRUARY 1983 32 A. C. ATKINSON

The parabolic structure is obscured when the plots ted in Fig. la, the significance level of the bound for are of residuals from more complicated linear models. t-T6 = max rt is about .02. The nongraphical pro- The plot for a two-sample model consists of the super- cedure therefore indicates disagreement between the position of two parabolas, with differing orientations model and the data. and locations. In more complicated examples, as we have seen,the structure is not visible except as a slight [Received November 1981. Revised June 1982.1 curvature in some plots. Thus nonrandomnessin the plots may not be evidence of departure from the model and transformation, but may be an artifact of REFERENCES the constructed variables. This analysis also shows the ANDREW& D. F. (1971) “A Note on the Selection of Data Trans- strong relationship between evidence for a power formations,” Biometrika, 58,249-254. transformation other than the logarithmic and evi- ANDREWS, D. F., and PREGIBON, D. (1978) “Finding the dence for a log transformation with shifted location. Outliers That Matter,” Journal of the Royal Statistical Society, Ser. B, 40,85-93. The close relationship between these two was appar- ATKINSON, A. C. (1973) “Testing Transformations to Nor- ent in the analysis of the chimpanzeedata. mality,” Journal of the Royal Statistical Society, Ser. B 35, 473- 479. 6. DISCUSSION __ (1980), “Examples Showing the Use of Two Graphical Displays for the Detection of Influential and Outlying Observa- This article describesgraphical tools for determin- tions in Regression,” in COMPSTAT 80, eds. M. M. Barritt and ing the influence of individual observations on the D. Wishart, Vienna: Physica Verlag, 276282. power transformation with, and without, a shift in ~ (1981), “Two Graphical Displays for Outlying and Influen- location. Many extensions of these techniques are tial Observations in Regression,” Biometrika, 68, 13-20. possible. Atkinson (1982) describes methods for a ~ (1982), “Regression Diagnostics, Transformations and Con- structed Variables” (with discussion), Journal ofthe Royal Statis- more general family of transformations and gives an tical Society, Ser. B, 44, l-36. example of application to data that arise as percent- BELSLEY, D. A., KUH, E., and WELSCH, R. E. (1980) Regression ages. He also discusses the use of the constructed Diagnostics: Identifying Influential Data and Sources of Col- variable to give a quick estimate of the transformation linearity, New York : John Wiley. parameter and the relationship between the con- BOX, G. E. P. (1980), “Sampling and Bayes’ Inference in Scientific Modelling and Robustness” (with discussion), Journal of the structed variable for the power transformation, the Royal Statistical Society, Ser. A, 143,383430. exact test for transformations given by Andrews BOX, G. E. P., and COX, D. R. (1964), “An Analysis of Transform- (197l), and Tukey’s one degreeof freedom for nonad- ations” (with discussion), Journal of the Royal Statistical Society, ditivity (Tukey 1949).A rather different extension is to Ser. B, 26,21 l-252. the test for the link function in a generalized linear BROWN, B. W., and HOLLANDER, M. (1977). Statistics: A Biomedical Introduction, New York: John Wiley. model (Pregibon 1980). CHENG, R. C. H., and AMIN, N. A. K. (1981), “Maximum Likeli- Diagnostic methods for logistic regressionare given hood Estimation of Parameters in the Inverse Gaussian Distri- by Pregibon (1981).These techniques, like those de- bution, With Unknown Origin,” Technometrics, 23,257-263. scribed in this article, are concernedwith the effect of COOK, R. D. (1977), “Detection of Influential Observations in the deletion of single observations.The advantagesof Linear Regression,” Technometrics, 19, 15-18. COOK, R. D., and WEISBERG, S. (1980) “Characterizations of an considering the effect of pairs and larger groups of Empirical Influence Function for Detecting Influential Cases in observations, which were mentioned in Section 2, Regression,” Technometrics, 22,495-508. have been stressedby Andrews and Pregibon (1978). ~ (1982), Residuals and Influence in Regression, New York and The relationship of their work to that of Cook and London: Chapman and Hall. Weisberg (1980) is discussed by Draper and John DEMPSTER, A. P., and GASKO-GREEN, M. (1981), “New Tools for Residual Analysis,” Annals ofstatistics, 9,945-959. (1981). DRAPER, N. R., and JOHN, J. A. (1981) “Influential Observations The publication, too late for the revision of this and Outliers in Regression,” Technometrics, 23,21-26. paper, of Cook and Weisberg (1982) unifies many HOAGLIN, D. C., and WELSCH, R. E. (1978), “The Hat Matrix in results on diagnostic regressionanalysis and provides Regression and ANOVA,” The American Statistician, 32, 17-22. much new material. In their Section 2.2.2 the stan- McCULLAGH, P. (1980) “A Comparison of Transformations of Chimpanzee Learning Data,” GUM Newsletter No. 2, 1418. dardized residualsr; are called (internally) Studentized OBENCHAIN, R. L. (1977), Letter to the editor, Technometrics, 19, to distinguish them from the jackknife residuals rt, 348-349. which they call externally Studentized. The scaled PETO, R. and LEE, P. (1973), “Weibull Distributions for beta distribution of the ri is derived on page 19. Continuous-Carcinogenesis Experiments,” Biometrics, 29, 457- In addition to graphical methods Cook and Weis- 470. PLACKETT, R. L. (1950), “Some Theorems in Least Squares,” berg describe, in Section 2.2.2, the use of Bonferroni Biometrika, 37, 1499157. bounds for testing the maximum value of r:. A referee PREGIBON, D. (1980), “Goodness of Link Tests for Generalized reports that, for the residuals of the salinity data plot- Linear Models, Applied Statistics, 29, 15-23, 14.

TECHNOMETRICS 0, VOL. 25, NO. 1, FEBRUARY 1983 DIAGNOSTIC REGRESSION FOR SHIFTED POWER TRANSFORMATIONS 33

- (1981), “Logistic Regression Diagnostics,” Annals of Statis- TUKEY, J. W. (1949), “One Degree of Freedom for Non- tics, 9,705-724. Additivity,” Biometrics, 5,232-242. PRENTICE, R. L. (1974), “A Log Gamma Model and Its Maxi- VOORN, W. J. (1981), “A Class of Variate Transformations Caus- mum Likelihood Estimation,” Biometrika, 61,539-544. ing Unbounded Likelihood, Journal of the American Statistical RUPPERT, D., and CARROLL, R. J. (1980), “Trimmed Least Association, 76,709-712. Squares Estimation in the Linear Model,” Journal of the Ameri- WEISBERG, S. (1980), Applied Linear Regression, New York: John can Statistical Association, 75,828-838. Wiley.

TECHNOMETRICS 0, VOL. 25, NO. 1, FEBRUARY 1983