PubH 7405:

REGRESSION IN TERMS A “matrix” is a display of numbers or numerical quantities laid out in a rectangular array of rows and columns. The array, or two-way table of numbers, could be rectangular or square – could be just one row (a row matrix or row vector) or one column (a column matrix or column vector). When it is square, it could be symmetric or a “” (non-zero entries are on the main diagonal). The numbers of rows and of columns form the “dimension” of a matrix; for example, a “3x2” matrix has three rows and two columns. An “entry” or “element” of a matrix need two subscripts for identification; the first for the row number and the second for the column number: A = [aij ] For example, in the following matrix we have a11 = 16,000 and a32 = 35. 16,000 23   A = 33,000 47 21,000 35 BASIC SL REGRESSION MATRICES

Y1  1 x1      Y 1 x Y =  2  X =  2   M  M      Yn  1 xn 

X is special, called “Design Matrix” X: the “Design Matrix” for MLR:

1 x11 x21 . xk1    1 x12 x22 . xk 2  X + = nx(k 1)  M M M M   1 x1n x2n . xkn 

First subscript: Variable, one column for each predictor; Second subscript: Subject, one row for each subject The dimension of “Design Matrix” X is changed to handle more predictors: one column for each predictor (the number of rows is still the sample size. The first column (filled with “1”) is still “optional”; not included when doing “Regression through the origin” (i.e. no intercept). X is called the Design Matrix. There are two reasons for the name: (1) By the model, the values of X’s (columns) are under the controlled of investigators: entries are fixed/designed, (2) The design/choice is consequential: the larger the variation in x’s in each column the more precise the estimate of the slope. TRANSPOSE

The transpose of a matrix A is another matrix, denoted by A’ (or AT), that is obtained by interchanging the columns and the rows of the matrix A; that is:

' If A = [a ij] then A = [a ji ] 2 5    ' 2 7 3 A3x2 = 7 10 ;A2x3 =     5 10 4 3 4 

The roles of rows and columns are swapped; if A is a , aij = aji, we have A = A’ The transpose of a column vector is a row vector  3    − 2 C =  ;C' = [3 − 2 0 12]  0    12  MULTIPLICATION by a Scalar

• Again, “scalar” is an (ordinary) number • In multiplying a matrix by a scalar, every element of the matrix is multiplied by that scalar • The result is a new matrix with the same dimension A = [aij ]

kA = [kaij ] MULTIPLICATION of Matrices

• Multiplication of a matrix by another matrix is much more complicated. • First, the “order” is important; “AxB” is said to be “A is post-multiplied by B” or “B is pre-multiplied by A”; In general: AxB ≠ BxA • There is a strong requirement on the dimensions: AxB is defined only if “the number of columns of A is equal to the number of rows of B”. • The product AxB has, as its dimension, the number of rows of A and the number of columns of B. MULTIPLICATION FORMULA

ArxcBcxs = AB rxs c [aij ]rxc[bij ]cxs = [∑aikbkj ]rxs k=1

For entry (i,j) of AxB, we multiply row “i” of A by column “j” of B; That’s why the number of columns of A should be the same as the number of rows of B. EXAMPLES

4 2a1  4a1 + 2a2     =   5 8a2  5a1 + 8a2  2   2 2 2 [2 3 5]3 = [2 + 3 + 5 = 38] 5 2 54 6 33 52    =   4 15 8 21 32 OPERATION ON SLR BASIC MATRICES

 y1    ' y2  2 Y Y = [y1 y2 L yn ] = [∑ yi ] L    yn 

 y1      ' 1 1 L 1  y2 ∑ yi X Y =   =      x1 x2 L xn  L ∑ xi yi    yn  “Order” is important; cannot form YX’ OPERATION ON MLR BASIC DATA MATRICES

 y1    y '  2  2 Y Y = [y1 y2 L yn ] = [∑ yi ] L    yn     1 1 1 . 1   y1  ∑ yi       x y ' x11 x12 x13 . x1n y2 ∑ 1i i  X Y =     =  M M M M  M M        xk1 xk 2 xk3 . xkn  yn  ∑ xki yi  MORE REGRESSION EXAMPLE

1 x1    1 1 L 1 1 x2  X' X =   x1 x2 L xn  M    1 xn    n ∑ xi =  2  ∑ xi ∑ xi 

Again, X is referred to as the “Design Matrix”  1 1 L 1 1 x11 L xk1     x x L x 1 x L x X' X =  11 12 1n  12 k2   M M L MM M L M    xk1 xk2 L xkn 1 x1n L xkn    1 ∑ x1i L ∑ xki   x x2 L x x = ∑ 1i ∑ 1i ∑ 1i ki   M M L M   2  ∑ xki ∑ x1ixki L ∑ x1i 

X’X is a square matrix filled with sums of squares and sums of products; we can form XX’ but it is a different n-by-n matrix which we do not need. SIMPLE MODEL

Yi = β0 + β1 X i + ε i ;i =1,2,...,n

Y1  1 X1  ε1      β ε  Y2  1 X 2  0   2  =   +  M M Mβ1   M       Yn  1 X n  ε n 

Ynx1 = Xnx2β2x1 + εnx1; or Y = Xβ + ε MULTIPLE LINEAR REGRESSION MODEL

Yi = β0 + β1 X i1 + β2 X i2 +...+ βk X ik + ε i ;i =1,2,...,n

Y1  1 x11 x21 . xk1 β0  ε1         Y 1 x x . x β ε  2  =  12 22 k 2  1  +  2   M  M M M M M  M        Yn  1 x1n x2n . xkn βk  ε n  MLR MODEL IN MATRIX TERMS

Yn−by−1 = Xn−by−(k+1)β(k+1)−by−1 + εn−by−1

E(Y) n−by−1 = Xβ 2 2 σ (Y) n−by−n = σ I OBSERVATIONS & ERRORS

Y1  ε1      Y ε Y =  2  ε =  2  nx1  M nx1  M     Yn  ε n  β: Regression Coefficient (a column vector of parameters)

β0  β  β =  1  (k +1)x1  M   βk  LINEAR DEPENDENCE

Consider a set of c column vectors C1, C2, …, Cc in a rxc matrix. If we can find c scalars k1, k2, …, kc – not all zero – so that: k1C1 + k2C2 + … + kcCc = 0 the c column vectors are said to be “linearly dependent”. If the only set of scalars for which the above equation holds is all zero (k1 = … = kc = 0), the c column vectors is linearly independent. EXAMPLE

1 2 5 1   A = 2 2 10 6 3 4 15 1 Since : 1 2  5  1 0           (5)2 + (0)2 + (−1)10 + (0)6 = 0 3 4 15 1 0 The four column vec tors are linearly dependent RANK OF A MATRIX • In the previous example, the rank of A<4 • Since the first, second and fourth columns are linearly independent (no scalars can be found so that k1C1 + k2C2 + k4C4 = 0); the rank of that matrix A is 3. • The rank of a matrix is defined as “the maximum number of linearly independent columns in the matrix. SINGULAR/NON-SINGULAR MATRICES

• If the rank of a square r x r matrix A is r then matrix A is said to be nonsingular or of “full rank” • An r x r matrix with rank less than r is said to be singular or not of full rank. INVERSE OF A MATRIX

• The inverse of a matrix A is another matrix A-1 such that: A-1A = AA-1 = I where I is the identity or unit matrix. • An inverse of a matrix is defined only for square matrices. • Many matrices do not have an inverse; a singular matrix does not have an inverse. • If a square matrix has an inverse, the inverse is unique; the inverse of a nonsingular or full rank matrix is also nonsingular and has same rank. INVERSE OF A 2x2 MATRICE

a b A =   c d  d − b   A−1 = D D − c a     D D  D = (ad − bc) is the "determinan t" of A; or | A | A singular matrix does not have an inverse because its “determinants” is zero: 2 6  A =   7 21 We have : 2  6  0 (3)  + (−1)  =   7 21 0 and : D = (2)(21) - (6)(7) = 0 A SYSTEM OF EQUATIONS

It is extremely simple to write a system of equations in the matrix form – especially with many equations 3x + 4y −10z = 0 − 2x − 5y + 21z =14 x + 29y − 2z = −3  3 4 −10 x  0       − 2 − 5 21y = 14   1 29 − 2 z − 3 A SIMPLE APPLICATION

Consider a system of two equations :

2y1 + 4y2 = 20

3y1 + y2 =10 which is written in matrix notation :

2 4 y1  20    =   3 1y2  10 −1  y1  2 4 20   =     y2  3 1 10 −.1 .4 20 Solution: =     .3 −.210 2 y1 = 2 & y2 = 4 =   4 ANOTHER EXAMPLE

 3 4 −10 x  0       − 2 − 5 21y = 14   1 29 − 2 z − 3 −1 x  3 4 −10   0        y = − 2 − 5 21 14  z  1 29 − 2  − 3 REGRESSION EXAMPLE

 n ∑ x  X' X =  2  ∑ x ∑ x  2 D = n∑ x − (∑ x)(∑ x) _ = n∑(x − x)2 We can easily see that (D≠0) . This property would not be true if X has more columns (Multiple Regression) and columns are not linearly independent. If “columns” (i.e. predictors/factors) are highly related, Design Matrix approaching “singular”: Regression failed! REGRESSION EXAMPLE  n ∑ x  X' X =  2  ∑ x ∑ x  _ D = n∑(x − x)2  ∑ x2 − ∑ x   _ _  n (x − x)2 n (x − x)2  (X' X)−1 =  ∑ ∑   − ∑ x n  _ _  2 2  n∑(x − x) n∑(x − x)   _ 2 _  1 x − x  +   _ _  n (x − x)2 (x − x)2 =  ∑ ∑   _   − x 1  _ _  2 2   ∑(x − x) ∑(x − x)  FOUR IMPORTANT MATRICES IN REGRESSION ANALYSIS

' 2 Y Y = [∑ yi ]   ' ∑ yi X Y =   ∑ xi yi    n ∑ xi X' X =  2  ∑ xi ∑ xi   _ 2 _  1 x − x  +   _ _  n 2 2 −1  (x − x) (x − x)  (X' X) = ∑ ∑  _   − x 1  _ _  2 2   ∑(x − x) ∑(x − x)  LEAST SQUARE METHOD SLR

n Data : {(xi , yi )}i=1 n 2 Q = ∑(yi − β0 − β1xi ) i=1 δQ n = −2∑(yi − β0 − β1xi ) = 0 δβ0 i=1 δQ n = −2∑ xi (yi − β0 − β1xi ) = 0 δβ1 i=1 LEAST SQUARE METHOD MLR

n Data : {(xi1, xi2 ,..., xik , yi )}i=1 n 2 Q = ∑(yi − β0 − β1x1i − β2 x2i ...− βk xki ) i=1 We solve equations : δQ n = −2∑(yi − β0 − β1x1i − β2 x2i ...− βk xki ) = 0 δβ0 i=1 δQ n = −2∑ x1i (yi − β0 − β1x1i − β2 x2i ...− βk xki ) = 0 δβ1 i=1 ... δQ n = −2∑ xki (yi − β0 − β1x1i − β2 x2i ...− βk xki ) = 0 δβk i=1 NORMAL EQUATIONS

δQ δQ = = 0 δβ0 δβ1 Normal Equations :

∑ yi = nb0 + b1∑ xi 2 ∑ xi yi = b0 ∑ xi + b1 ∑ xi In matrix notations :     n ∑ xi b0  ∑ yi  2   =   ∑ xi ∑ xi b1  ∑ xi yi 

X' X2x2b2x1 = X' Y2x1 NORMAL EQUATIONS IN MLR

δQ n n = −2∑(yi − β0 − β1x1i − β2 x2i ...− βk xki ) = ∑ei = 0 δβ0 i=1 i=1 δQ n n = −2∑ x1i (yi − β0 − β1x1i − β2 x2i ...− βk xki ) = ∑ x1iei = 0 δβ1 i=1 i=1 ... δQ n n = −2∑ xik (yi − β0 − β1x1i − β2 x2i ...− βk xki ) = ∑ xkiei = 0 δβk i=1 i=1 In MLR, these normal equations look more complicated but will lead to the same result in matrix terms:

(X’X)b = X’Y LEAST SQUARE ESTIMATES

Normal Equations :

∑ yi = nb0 + b1∑ xi 2 ∑ xi yi = b0 ∑ xi + b1∑ xi In matrix notations :     n ∑ xi b0  ∑ yi  2   =   ∑ xi ∑ xi b1  ∑ xi yi  (X' X)b = X' Y b = (X' X)−1(X' Y) SUMMARY REGRESSION RESULTS

Normal Equations : (X' X)b = X' Y Least Square Estimates : b = (X' X)−1(X' Y) WE PROVE THE SAME RESULTS

  ' ∑ yi X Y =   ∑ xi yi   _ 2 _  1 x − x  +   _ _  n 2 2 − (x − x) (x − x) (X' X) 1 =  ∑ ∑   _   − x 1  _ _  2 2   ∑(x − x) ∑(x − x)  B = (X' X) −1 X'Y  _ 2 _  1 x − x  +   _ _  n 2 2    (x − x) (x − x)  ∑ yi = ∑ ∑    _  x y  − x 1 ∑ i i  _ _  2 2   ∑(x − x) ∑(x − x) 

∑ xi yi − [∑ xi ][∑ yi ]/ n b1 = _ ∑(x − x)2 MORE DIRECT APPROACH

Instead of normal equations, we could start earlier with the Sum of Squared Errors (SSE) Sum of squared errors : n 2 Q = ∑(yi − β0 − β1x1i − β2 x2i ...− βk xki ) i=1 In matrix notation : Q = (Y − Xβ)' (Y − Xβ) = Y'Y − 2β'X'Y + β'X'Xβ Normal Equations : Q = (Y − Xβ)' (Y − Xβ) = Y'Y − 2β'X'Y + β'X'Xβ  δQ  δβ   0  δQ δ   (Q) = δβ  δβ 1  L   δQ    δβk  = −2X'Y + 2X'Xβ = 0 ⇔ X'Xb = X'Y We can prove the “normal equations” by taking derivative of SSE: SSE = Y’Y-2β’X’Y+β’X’Xβ To do that, we need to learn: (1) Derivative of β’[X’Y] is X’Y (2) Derivative of β’[X’X]β is 2[X’X]β Putting together : Q = (Y − Xβ)' (Y − Xβ) = (Y'- β' X' )(Y − Xβ) = Y'Y − 2β'X'Y + β'X'Xβ  δQ  δ δβ  (Q) =  0  δβ  δQ    δβ1  = −2X'Y + 2X'Xβ Note : = 0 ⇔ (X'X)b = X' Y LEAST SQUARE ESTIMATE

(X' X)b = X' Y b = (X' X)−1 (X' Y)

Note the error in equation (6.25) of the text THE HAT MATRIX

^ Y = Xb = X[(X' X) −1 X' Y] = [X(X' X) −1 X' ]Y = HY H = X(X' X) −1 X' is called the "Hat Matrix" IDEMPOTENCY the "Hat Matrix" : H = X(X' X) −1 X' is "idempotent ": HH = H RANDOM VECTORS & MATRICES

A random vector or a contains elements which are random variables. RANDOM VECTORS IN SLR

Y1  ε1      Y ε Y =  2  and ε =  2   M  M     Yn  ε n  EXPECTED VALUES

Y = E(Y) + ε

E(Y1) 0     E(Y ) 0 E(Y) =  2 ;E(ε) =    M  M     E(Yn ) 0 VARIANCE-

The variances (of elements of a random matrix) and the covariance between any two elements (of elements of a random matrix) are assembled in the variance-covariance matrix – denoted by either Var(Y) or σ2(Y) – or ∑2 EXAMPLE: (BIVARIATE) VECTOR

Y1  Variable : Y =   Y2 

µ1  Mean :μ =   µ2  σ 2 σ  − = 1 12 Variance Covariance Matrix : Σ  2  σ12 σ 2  We have for random variables:

E(a*Y) = a*E(Y) 2 Var(a*Y) = a *var(Y) What about random vectors? Say:

a11 a12  A =   a21 a22 

Y1  Y =   Y2  a11 a12 Y1  AY =    a21 a22 Y2 

a11Y1 + a12Y2  =   a21Y1 + a22Y2 

E(a11Y1 + a12Y2 ) E(AY) =   E(a21Y1 + a22Y2 )

 a11E(Y1) + a12E(Y2 )  =   a21E(Y )1 + a22E(Y2 ) = AE(Y) a11Y1 + a12Y2  AY =   a21Y1 + a22Y2   σ 2 (a Y + a Y ) σ (a Y + a Y,a Y + a Y ) σ 2 = 11 1 12 11 1 12 21 1 22 (AY)  2  σ (a11Y1 + a12Y,a21Y1 + a22Y ) σ (a21Y1 + a22Y )  = L a a  σ 2 (Y ) σ (Y ,Y )a a  = 11 12 1 1 2 11 21   2   a21 a22 σ (Y1,Y2 ) σ (Y2 ) a12 a22  = Aσ2 (Y)A'

Can verify backward too, starting with Aσ2(Y)A’ REGRESSION EXAMPLES

Y = E(Y) + ε Var( ε) = σ 2I  σ 2 (Y ) σ (Y ,Y ) L σ (Y ,Y )  1 1 2 1 n  σ (Y ,Y ) σ 2 (Y ) L σ (Y ,Y ) Var(Y) =  2 1 2 2 n   M M M M   2  σ (Yn ,Y1) σ (Yn ,Y2 ) L σ (Yn )  = σ 2I b0 + b1x1    ^ b + b x Y =  0 1   M  FITTED VALUES   b0 + b1xn 

1 x1    1 x2 b0  =   M Mb1    1 xn  = Xb RESIDUALS

Model :

Ynx1 = Xnx2β2x1 + εnx1 Fitted Value : ^ Y = Xb Residuals : ^ e = Y − Y = Y − Xb = Y − HY = (I − H)Y

Like the Hat Matrix H, (I-H) is symmetric & idempotent VARIANCE OF RESIDUALS

e = (I − H)Y σ2 (e) = (I − H)σ2 (Y)(I − H)' = (I − H)(σ 2I)(I − H)' = σ 2 (I − H)(I − H)' = σ 2 (I − H)(I − H) = σ 2 (I − H) ^ = MSE(I − H) REGRESSION COEFFICIENTS

δ (Q) = −2X'Y + 2X'Xβ δβ = 0 ⇔ X'Y = X'Xb b = (X'X)−1 X'Y = [(X'X)−1 X' ]Y = AY A = (X'X)−1 X' A' = X(X'X)−1 VARIANCE OF REGRESSION COEFFICIENTS

b = [(X'X)−1 X' ]Y = AY σ2 (b) = Aσ2 (Y)A' = Aσ 2IA' = (X'X)−1 X'σ 2IX(X'X)−1 = σ 2 (X'X)−1 s2 (b) = MSE(X'X)−1 b = [(X'X)−1 X' ]Y = (X'X)−1(X'Y)  _ 2 _  1 x − x  +   _ _  n 2 2    (x − x) (x − x)  ∑ yi = ∑ ∑    _  x y  − x 1 ∑ i i  _ _  2 2   ∑(x − x) ∑(x − x) 

b0  Can verify : same results, eg. =   _ b1  − x ∑ y + ∑ xy b1 = _ ∑(x − x)2 _ _ EXAMPLE: ∑(x − x)( y − y) = _ SL REGRESSION ∑(x − x)2 s2 (b) = MSE(X'X)−1  _ 2 _  1 x − x  +   _ _  n 2 2 −1 (x − x) (x − x)  (X' X) =  ∑ ∑  _   − x 1  _ _  2 2   ∑(x − x) ∑(x − x)   _ 2 _  1 x − x  +   _ _  n 2 2 2  (x − x) (x − x)  s (b) = MSE ∑ ∑  _   − x 1  _ _  2 2   ∑(x − x) ∑(x − x)   _ 2 _  1 x − x  +   _ _  n 2 2 2 (x − x) (x − x)  s (b) = MSE  ∑ ∑  _   − x 1  _ _  2 2   ∑(x − x) ∑(x − x)   _ 2  2 1 x  s (b ) = MSE + 0  _  n 2  ∑(x − x) 

2 MSE s (b1) = _ Same results for Variances; ∑(x − x)2 _ Covariance is new - Only − MSE(x) need Mean & Variance of X s(b0 ,b1) = _ ∑(x − x)2 THE MEAN RESPONSE

Let X = xh denote the level of X for which we wish to estimate the mean response, i.e. E(Y|X=xh). The only thing new is that X and xh are vector; xh = (x1h, x2h,…, xkh). The point estimate of the response is: ^ E(Y | X = xh ) = Y h

= b0 + b1x1h + b2 x2h +...+ bk xkh In matrix terms:

^ ' Y = Xhb ^ 2 ' 2 σ (Y ) = Xhσ (b)X h 2 ' −1 = σ Xh (X' X) Xh ^ 2 ' −1 s (Y ) = MSE(Xh (X' X) Xh ) PREDICTION OF NEW OBSERVATION

Let X = xh denote the level of X under investigation, at which the mean response is E(Y|X=xh). Let Yh(new) be the value of the new individual response of interest. The point estimate is still the same E(Y|X=xh): ^ Y h(new) = b0 + b1x1h + b2 x2h +...+ bk xkh ' = Xhb ^ 2 ' −1 Var(Y h(new) ) = σ {1+ Xh (X' X) Xh} ^ 2 ' −1 s (Y h(new) ) = MSE{1+ Xh (X' X) Xh} The matrix notation and derivations may be deceiving because they hide enormous computational complexities. To find the inverse of a 10x10 matrix, for example X’X with k = 9, requires tremendous amount of computation. However, the actual computations will be done by computer; hence it does not matter to us whether X’X represents a 2x2 or a 6x6 matrix. Age (x) SBP (y) x-sq y-sq xy 42 130 1764 16900 5460 46 115 2116 13225 5290 42 148 1764 21904 6216 71 100 5041 10000 7100 80 156 6400 24336 12480 74 162 5476 26244 11988 Example: 70 151 4900 22801 10570 80 156 6400 24336 12480 85 162 7225 26244 13770 SBP versus 72 158 5184 24964 11376 64 155 4096 24025 9920 AGE 81 160 6561 25600 12960 41 125 1681 15625 5125 61 150 3721 22500 9150 75 165 5625 27225 12375 Totals 984 2193 67954 325929 146260  n ∑ x   15 984  X' X =  2  =   ∑ x ∑ x  984 67,954 D = (15)(67,954) − (984)2 = 51,054 67,954 − 984    −1 51,064 51,064 X' X =    − 984 15  51,064 51,064 1.3310 −.0193 =   −.0193 .0003   15 984  X' X =   984 67,954

−1 1.3310 −.0193 (X' X) =   −.0193 .0003   2,193  X' Y =   146,260 b = (X' X)−1(X' Y) 1.3310 −.0193 2,193  =    −.0193 .0003 146,260 VARIANCE OF REGRESSION COEFFICIENTS

b = (X'X)−1(X'Y) s2 (b) = MSE(X'X)−1 1.3310 −.0193 = MSE   −.0193 .0003  SUMMARIES

• All results can be put in the matrix forms • If we can inverse a matrix and can multiply two matrices, we can get all numerical results – even without a packaged computer program. • In matrix their forms, results can be easier generalized; the only change needed is the Design Matrix ( & its dimension) so as to handle more than one predictors. Readings & Exercises

• Readings: A thorough reading of the text’s sections 5.1-5.13 (pp.176-209) and sections 6.2-6.9 (pp.222-247) is highly recommended. • Exercises: The following exercises are good for practice, all from chapter 5 of text: 5.1,5.2, 5.7-5.11 and 5.24-5.26; plus these from chapter 6 of text: 6.5(b-d), 6.7, 6.10(a-d), and 6.15(a-f). Due As Homework

X y #12.1 The following data were collected 24 38.8 during an experiment in which 10 28 39.5 laboratory animals were inoculated with 32 40.3 36 40.7 a pathogen. The variables are Time after 40 41.0 inoculation (X, in minutes) and 44 41.1 Temperature (Y, in Celsius degrees). 48 41.4 For the regression of Y (as dependent 52 41.6 56 41.8 variable) on X (as sole predictor), form 60 41.9 these matrices: Y’Y, X’Y, and X’X #12.2 Solve the following system of equations: 7x − 6y = 12 3x + 9y = 25