<<

MAS/MASOR/MASSM - Statistical Analysis - Autumn Term

9 regression

9.1 Example An in chemical engineering was carried out to obtain information about the effect of temperature on a glass laminate. The following table gives values of a quantity known as “log damping decrement” (y) for ten controlled temperatures (x) in degrees Celsius. x y -30 0.053 -20 0.057 -10 0.061 0 0.068 10 0.072 20 0.081 30 0.093 40 0.105 50 0.115 60 0.130

The following output shows the construction of the frame that contains the data. The of the data suggests that there is a curvilinear relationship between y and x, which could be modelled by a polynomial curve. We begin by fitting a straight line to the data. Note that the function coef provides us with estimated regression coefficients to more decimal places than does the function summary. The fitted line,

y = 0.070745455 + 0.000850303x,

explains almost 96% of the variation in the response variable, which looks very impres- sive, and the F - for the regression is highly significant. However, to check the adequacy of the model, we should inspect the residuals, which we obtain using the function residuals and here store as the vector res1. A plot of the residuals against x exhibits a very strong pattern, which shows that the straight line model is inadequate and indicates that we might fit a quadratic to the data. y 0.06 0.08 0.10 0.12

-20 0 20 40 60

x Figure 1: Plot of y vs. x

1 > x <- seq(-30,60,by=10) > y <- c(0.053,0.057,0.061,0.068,0.072,0.081,0.093,0.105,0.115,0.130) > glass <- data.frame(y,x) > plot(x,y) > glass1.lm <- lm(y~x, data = glass) > summary(glass1.lm)

Call: lm(formula = y ~ x, data = glass) Residuals: Min 1Q 3Q Max -0.007248 -0.003127 -0.0005 0.00288 0.008236 Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.0707 0.0020 34.8099 0.0000 x 0.0009 0.0001 13.5573 0.0000

Residual : 0.005697 on 8 degrees of freedom Multiple R-Squared: 0.9583 F-statistic: 183.8 on 1 and 8 degrees of freedom, the p-value is 8.418e-007

Correlation of Coefficients: (Intercept) x -0.4629 > coef(glass1.lm) (Intercept) x 0.07074545 0.000850303 > res1 <- residuals(glass1.lm) > plot(x,res1) res1 -0.005 0.0 0.005

-20 0 20 40 60

x

Figure 2: Plot of res1 vs. x

The general situation which we consider is the one in which there are n observed pairs of values (xi, yi), i=1, . . . , n, of the variables x, y, and we wish to fit a k-th order polynomial relationship of the form

2 k yi = β0 + β1xi + β2xi + ... + βkxi + ²i i=1, . . . , n, (1)

2 2 for some k with k < n, where the errors ²i are assumed to be NID(0, σ ). This is a special case of the multiple model in which the regressor variables are x, x2, . . . , xk.

• Although we are fitting a non-linear relationship, the model (1) is linear in the β0, β1, β2, . . . , βk. The design X corresponding to the above is a special case of the one for multiple linear regression, with

 2 k  1 x1 x1 . . . x1    1 x x2 . . . xk   2 2 2  X =  . . . .  .  . . . .  2 k 1 xn xn . . . xn

In S-PLUS, given the variables x and y, we can define new variables x2, . . . , xk and then carry out the multiple regression of y on x, x2, . . . , xm for values of m with m ≤ k to fit a polynomial of order k or less.

What degree of polynomial should we fit? In the following output for our example, the variables x2, x3 and x4 are constructed and the data frame is extended to include the newly constructed variables. Next we consider a multiple linear regression of y on x and x2.

> x2 <- x*x > x3 <- x2*x > x4 <- x3*x > glass <- data.frame(y,x,x2,x3,x4) > rm(x,x2,x3,x4,y) > attach(glass) > glass2.lm <- lm(y~x+x2, data = glass)

> summary(glass2.lm)

Call: lm(formula = y ~ x + x2, data = glass) Residuals: Min 1Q Median 3Q Max -0.001764 -0.0008682 0.00006894 0.0007739 0.001614

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.0666 0.0006 117.9214 0.0000 x 0.0006 0.0000 29.5321 0.0000 x2 0.0000 0.0000 12.3261 0.0000

Residual standard error: 0.001278 on 7 degrees of freedom Multiple R-Squared: 0.9982 F-statistic: 1902 on 2 and 7 degrees of freedom, the p-value is 2.657e-010

Correlation of Coefficients: (Intercept) x x 0.2107 x2 -0.5906 -0.7645

3 > coef(glass2.lm) (Intercept) x x2 0.06663182 0.0006446212 6.856061e-006 > res2 <- residuals(glass2.lm) > plot(x,res2) res2 -0.001 0.0 0.001

-20 0 20 40 60

x

Figure 3: Plot of res2 vs. x

Note that the values of b0 and b1 have changed from what they were for the . We now obtain the fitted equation

y = 0.06663182 + 0.0006446212x + 0.000006856061x2. The introduction of the variable x2 gives a significant improvement in the fit. Note firstly the increase from the previous regression in the value of R2 and the dramatic increase in the value of the F-statistic in the ANOVA.

However, more fundamentally, the fact that the t-statistic t = 12.3261 for x2 is highly significant shows that a highly significant improvement in fit is achieved by the inclusion of the variable x2. An apparently random scatter of points in the plot of the residuals (res2) against x indicates that we now have an adequate model. Nevertheless, we try a further regression, introducing a cubic term x3, to see if we can improve even further on the fit of the model.

4 > glass3.lm <- lm(y~x+x2+x3, data = glass) > summary(glass3.lm)

Call: lm(formula = y ~ x + x2 + x3, data = glass) Residuals: Min 1Q Median 3Q Max -0.001762 -0.0008706 0.00007343 0.0007716 0.00161

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.0666 0.0008 87.0416 0.0000 x 0.0006 0.0000 21.0199 0.0000 x2 0.0000 0.0000 5.4096 0.0016 x3 0.0000 0.0000 -0.0078 0.9940

Residual standard error: 0.001381 on 6 degrees of freedom Multiple R-Squared: 0.9982 F-statistic: 1087 on 3 and 6 degrees of freedom, the p-value is 1.355e-008

Correlation of Coefficients: (Intercept) x x2 x -0.2570 x2 -0.7546 0.2853 x3 0.6036 -0.6397 -0.8808 The introduction of the additional regressor variable x3 does not result in a significant im- provement in the fit. The value of the t-statistic for x3 is t = −0.0078, which is certainly 2 not significant (p = 0.9940). In fact the value of s = MSR has increased (s = 0.001381), so that the introduction of the x3 term into the model has been counter-productive. In looking for an appropriate degree of polynomial to fit we go through a stepwise pro- cedure of a special kind, where at each stage only the next power term is considered. If we include the regressor variable xm in the model then it is natural to include all lower powers of x as well. From the following output, which uses the function cor for correlation, we see that the regressor variables x2, x3 and x4 are particularly highly correlated with each other. This illustrates that problems of will arise and become ever more serious as further powers of x are introduced into the regression.

> cor(glass) y x x2 x3 x4 y 1.0000000 0.9789228 0.8770847 0.9288534 0.8378634 x 0.9789228 1.0000000 0.7644708 0.8560000 0.7312607 x2 0.8770847 0.7644708 1.0000000 0.9479438 0.9589104 x3 0.9288534 0.8560000 0.9479438 1.0000000 0.9732258 x4 0.8378634 0.7312607 0.9589104 0.9732258 1.0000000

5 9.2 Orthogonal A way of dealing with problems of multicollinearity in polynomial regression is to work with what are known as .

Consider an alternative version of the model (1),

yi = α0 + α1P1(xi) + α2P2(xi) + ... + αkPk(xi) + ²i i=1, . . . , n. (2)

The regressor variables are functions P1(x),P2(x),...,Pk(x) of the variable x, where, for r =1, . . . , k, Pr(x) is a polynomial of order r, to be specified more precisely later.

We thus have a y = Xα + ², with X given by   1 P1(x1) P2(x1) ...Pk(x1)    1 P (x ) P (x ) ...P (x )   1 2 2 2 k 2  X =  . . . .  .  . . . .  1 P1(xn) P2(xn) ...Pk(xn)

We choose the polynomial functions P1(x),P2(x),...,Pk(x) in such a way that the design matrix X is orthogonal, i.e., the columns of X are orthogonal to each other. We thus require that Xn Pr(xi) = 0, r =1, . . . , k (3) i=1 and Xn Pr(xi)Ps(xi) = 0, r =2, . . . , k, s=1, . . . , r−1. (4) i=1

A set of polynomials P1(x),P2(x),...,Pk(x) which satisfies the conditions of Equations (3) and (4) is said to be orthogonal. For such a set of polynomials, the matrix X0X is diagonal, Ã ! Xn Xn Xn 0 2 2 2 X X = diag n, P1(xi) , P2(xi) ,..., Pk(xi) . i=1 i=1 i=1 It follows that the inverse (X0X)−1 is also a diagonal matrix, Ã ! 1 1 1 1 0 −1 P P P (X X) = diag , n 2 , n 2 ,..., n 2 . n i=1 P1(xi) i=1 P2(xi) i=1 Pk(xi)

Also, Ã ! Xn Xn Xn Xn 0 0 X y = yi, P1(xi)yi, P2(xi)yi,..., Pk(xi)yi . i=1 i=1 i=1 i=1 From Section 7.2 a = (X0X)−1X0y where a =α ˆ. Hence a0 =y ¯

6 and Pn Pr(xi)yi Pi=1 ar = n 2 r =1, . . . , k. (5) i=1 Pr(xi) • Since the columns of the design matrix X are orthogonal, they are also linearly independent. The problems of multicollinearity are avoided. Note also that, from the construction specified by Equations (3) and (4), the regressor variables P1(x), P2(x), ..., Pk(x) have zero pairwise sample correlations.

0 −1 • From the fact that (X X) is diagonal, it follows that a0 and the ar, r = 1, . . . , k are all independently distributed of each other.

• From the expression (5) it follows that, as successive terms are added to the model (2) to increase the degree of the fitted polynomial, the value of ar for given r does not change. (In general, in multiple linear regression, as new regressor variables are added to the model, the estimates of the parameters already present are changed.)

Using the results of Section 8.2, the regression sum of squares from fitting P1(x), P2(x), ..., Pm(x), i.e., from fitting a polynomial of order m, is given by

Xm Xn SSReg(m) = ar Pr(xi)yi. r=1 i=1 Substituting from Equation (5) we obtain

m Pn 2 X ( Pr(xi)yi) Pi=1 SSReg(m) = n 2 . r=1 i=1 Pr(xi) Equivalently, Xm Xn 2 2 SSReg(m) = ar Pr(xi) . r=1 i=1

Using the results of Section 8.2, the sum of squares due to Pr(x) adjusted for the presence of P1(x), P2(x), ..., Pr−1(x) is

Xn 2 2 SSReg(r) − SSReg(r − 1) = ar Pr(xi) r =1, . . . , k. (6) i=1

The expression of Equation (6) is a function of ar and Pr, not involving the other parame- ter estimates. Thus the expressions of Equation (6) are independently distributed of each other for r =1, . . . , k and we may simply refer to the sum of squares SS(Pr) due to Pr,

n Pn 2 X ( Pr(xi)yi) 2 2 Pi=1 SS(Pr) = ar Pr(xi) = n 2 r =1, . . . , k. (7) i=1 i=1 Pr(xi)

The overall regression sum of squares, SSReg, may be partitioned into k independently distributed components, SS(Pr), r =1, . . . , k, each with one degree of freedom:

Xk SSReg = SS(Pr). r=1

7 Correspondingly, the following ANOVA table may be constructed.

ANOVA TABLE Source DFSSMS

P1 1 SS(P1) SS(P1) P2 1 SS(P2) SS(P2) ...... Pk 1 SS(Pk) SS(Pk)

2 Residual n − k − 1 by subtraction s ≡ SSR/(n − k − 1)

Pn 2 Total n − 1 i=1(yi − y¯) As we successively consider introducing extra terms into the model, at the m-th step the corresponding test statistic for testing whether the polynomial Pm is redundant is SS(P ) F = m MSR

with 1 and n − m − 1 degrees of freedom, where MSR is the residual square from fitting P1(x), P2(x), ..., Pm(x), i.e., from fitting a polynomial of order m.

This F -statistic satisfies F = t2, where t is the value of the t-statistic associated with Pm in the regression output. Both have the same p-value.

9.3 Expressions and tables for orthogonal polynomials

Given x1, x2, . . . , xn, there remains the problem of finding a set of orthogonal polynomials, i.e., polynomials which satisfy Equations (3) and (4). This could be done mathematically from first principles. Often, the xi are equally spaced, in which case there are tables P 2 available which give the polynomials and the values of the Pr(xi) and i Pr(xi) . The estimates ar and the sums of squares in the ANOVA are then readily calculated, using Equations (5) and (7), even if computational facilities are not available. In S-PLUS, whether or not the xi are equally spaced, values of the orthogonal polynomials are easily calculated using the function poly.

Assume for the present that the xi are equally spaced and define x − x¯ ξ = , d x − x¯ ξ = i i=1, . . . , n, i d P where d = xi − xi−1. It follows that ξi − ξi−1 = 1, i = 2, . . . , n, and ξi = 0. It is easier to express the orthogonal polynomials in terms of ξ.

8 It may be shown that

P1(x) = λ1ξ

à ! n2 − 1 P (x) = λ ξ2 − 2 2 12 à ! 3n2 − 7 P (x) = λ ξ3 − ξ 3 3 20 à ! 3n2 − 13 3(n2 − 1)(n2 − 9) P (x) = λ ξ4 − ξ2 + 4 4 14 560 à ! 5(n2 − 7) 15n4 − 230n2 + 407) P (x) = λ ξ5 − ξ3 + ξ 5 5 18 1008

......

The λr are arbitrary constants of proportionality, but in tables they are usually chosen to be the smallest numbers such that the polynomials have integer values at all the points xi, i=1, . . . , n. The λr then depend on n and are tabulated together with the coefficients Pr(xi), for example, in Montgomery.

In S-PLUS, as we shall see, the above approach to obtaining orthogonal polynomials is unnecessary. If we wish to see the values of the coefficients displayed, the function contr.poly (for polynomial contrasts) will produce a matrix whose columns, for equally spaced values of the predictor variable, are the values of the orthogonal polynomials, but Pn 2 normalized in such a way that i=1 Pr(xi) = 1, so that the coefficients do not take integer values. This normalization makes the coefficients less readable but has the advantage of simplifying some of the formulae in Section 9.2. For example, we now have the simple 2 expression that SS(Pr) = ar.

9 9.4 Example (continued) For our example, the values of the first four orthogonal polynomials are given in the following table, scaled in the conventional way so that all the coefficients take integer values.

i P1(xi) P2(xi) P3(xi) P4(xi) 1 -9 6 -42 18 2 -7 2 14 -22 3 -5 -1 35 -17 4 -3 -3 31 3 5 -1 -4 12 18 6 1 -4 -12 18 7 3 -3 -31 3 8 5 -1 -35 -17 9 7 2 -14 -22 10 9 6 42 18 In the following piece of output, the function contr.poly with argument 10 produces a matrix with 10 rows, corresponding to our 10 equally spaced observations, and with 9 columns, corresponding to a set of 9 orthogonal polynomials, up to order 9. The argument [, 1 : 4] restricts the output to the first four columns, which give the coefficients of the polynomials up to order 4, as in the table above, but here scaled so that the sum of squares of each column is equal to one.

> contr.poly(10)[,1:4] .L .Q .C ^ 4 [1,] -0.49543369 0.52223297 -0.4534252 0.33658092 [2,] -0.38533732 0.17407766 0.1511417 -0.41137668 [3,] -0.27524094 -0.08703883 0.3778543 -0.31788198 [4,] -0.16514456 -0.26111648 0.3346710 0.05609682 [5,] -0.05504819 -0.34815531 0.1295501 0.33658092 [6,] 0.05504819 -0.34815531 -0.1295501 0.33658092 [7,] 0.16514456 -0.26111648 -0.3346710 0.05609682 [8,] 0.27524094 -0.08703883 -0.3778543 -0.31788198 [9,] 0.38533732 0.17407766 -0.1511417 -0.41137668 [10,] 0.49543369 0.52223297 0.4534252 0.33658092

In S-PLUS it is very easy to fit models with orthogonal polynomials, using the function poly in the model specification. The poly function produces regressor variables whose values are the same as those exhibited by the contr.poly function for the case of equally spaced predictor variables; but, more generally, it also produces orthogonal polynomials as regressor variables for underlying predictor variables that are not equally spaced. In the following output, the argument (x,2) refers to the fact that x is the underlying predictor variable and that we wish to fit a second-order, i.e., quadratic curve. (The simple command poly(x,2) would display the first two columns of the matrix contr.poly(10).) The resulting regression coefficients are the fitted regression coefficients for the orthogonal polynomials, and the corresponding fitted regression equation is

y = 0.0835 + 0.0772P1(x) + 0.0158P2(x).

10 Note that the correlation matrix confirms that the pairwise correlations among the esti- mated regression coefficients for the orthogonal polynomials are zero.

The function poly.transform is used to convert these regression coefficients to the cor- responding ones for the powers of x. Thus we recapture the fitted equation

y = 0.06663182 + 0.0006446212x + 0.000006856061x2 that we saw earlier.

> glass2o.lm <- lm(y~poly(x,2)) > summary(glass2o.lm)

Call: lm(formula = y ~ poly(x, 2)) Residuals: Min 1Q Median 3Q Max -0.001764 -0.0008682 0.00006894 0.0007739 0.001614

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.0835 0.0004 206.5952 0.0000 poly(x, 2)1 0.0772 0.0013 60.4275 0.0000 poly(x, 2)2 0.0158 0.0013 12.3261 0.0000

Residual standard error: 0.001278 on 7 degrees of freedom Multiple R-Squared: 0.9982 F-statistic: 1902 on 2 and 7 degrees of freedom, the p-value is 2.657e-010

Correlation of Coefficients: (Intercept) poly(x, 2)1 poly(x, 2)1 0 poly(x, 2)2 0 0 > poly.transform(poly(x,2), coef(glass2o.lm)) x^0 x^1 x^2 0.06663182 0.0006446212 6.856061e-006

11