ApPENDIX A Vector Spaces

This appendix reviews some of the basic definitions and properties of vectm spaces. It is presumed that, with the possible exception of Theorem A.14, all of the material presented here is familiar to the reader.

Definition A.I. A set.A c Rn is a vector space if, for any x, y, Z E .A and scalars a, 13, operations of vector addition and scalar multiplication are defined such that:

(1) (x + y) + Z = x + (y + z). (2) x + y = y + x. (3) There exists a vector 0 E.A such that x + 0 = x = 0 + x. (4) For any XE.A there exists y = -x such that x + y = 0 = y + x. (5) a(x + y) = ax + ay. (6) (a + f3)x = ax + f3x. (7) (af3)x = a(f3x). (8) 1· x = x.

Definition A.2. Let .A be a vector space and let JV be a set with JV c .A. JV is a subspace of .A if and only if JV is a vector space.

Theorem A.3. Let .A be a vector space and let JV be a nonempty subset of .A. If JV is closed under addition and scalar multiplication, then JV is a subspace of .A.

Theorem A.4. Let .A be a vector space and let Xl' ••• , Xr be in .A. The set of all linear combinations of Xl' .•• ,xr is a subspace of .A. Appendix A. Vector Spaces 325

Definition A.S. The set of all linear combinations of Xl'OO',X, is called the space spanned by Xl' ... , x,. If % is a subspace of At and % equals the space spanned by Xl" .. , x" then {x 1, ... , x,} is called a spanning set for %.

Vectors in Rn will be considered as n x 1 matrices. The 0 vector referred to in Definition A.l is just an n x 1 matrix of O's. Let A be an n x p matrix. Each column of A is a vector in Rn. The space spanned by the columns of A is called the column space of A and written C(A). If B is an n x r matrix, then C(A, B) is the space spanned by the columns of A and B.

Definition A.6. Let Xl' 00 • , x, be vectors in At. If there exist scalars Cl 1 , 00 • , Clr not all zero so that L CliXi = 0, then Xl' 00 • , x, are linearly dependent. If such Cl;'S do not exist, Xl' ... , Xr are linearly independent.

Definition A.7. If % is a subspace of At and if {x 1 ,oo.,x,} is a linearly independent spanning set for %, then {Xl' 00 • , x r } is called a basis for %.

Theorem A.S. If % is a subspace of At, all bases for % have the same number of vectors.

Theorem A.9. If v 1, ... , vr is a basis for % and x E % then the characterization x = L~~l CliVi is unique.

PROOF. Suppose x = L~~l CliVi and x = L~~l PiVi' Then 0 = L~~l (Cli - P;)vi· Since the vectors Vi are linearly independent, Cl i - Pi = 0 for all i. 0

Definition A.IO. The rank of a subspace % is the number of elements in a basis of %. The rank is written r(%). If A is a matrix, the rank of C(A) is called the rank of A, and is written r(A).

Definition A.H. Two vectors x and yare orthogonal (written x .1 y) if x' y = O. (x' is the transpose of x.) Two subspaces, %1 and %2 are orthogonal if x E %1 and y E.AI; implies x' y = O. {x 1, 00 • , xr} is an orthogonal basis for a space % if {x 1, 00 • , X,} is a basis for % and for i # j, xi Xj = o. {x 1,00 • , xr} is an ortho­ normal basis for % if {x l' ... , xr} is an orthogonal basis and xi Xi = 1 for i = 1, ... , r. The terms orthogonal and perpendicular are-used interchangeably.

Our emphasis on orthogonality and our need to find orthogonal projection matrices make both the following theorem and its proof fundamental tools in linear model theory.

Theorem A.12. (Gram-Schmidt Theorem). Let % be a space with basis {x 1, •.• , x,}. There exists an orthonormal basis for %, {y 1, ... , Yr} with Ys in the space spanned by Xl"'" x., S = 1, ... , r. 326 Appendix A. Vector Spaces

PROOF. Define the y/s inductively

Yl = (X~Xd-l/2Xl

.-1 W. = x. - I (x;y;)y; ;=1 Y. = (W;W.)-1/2 W• o See Exercise A.1.

Definition A.13. Let %.l = {y E vltly 1.. %}. %.l is called the orthogonal com­ plement of % with respect to vIt. If vIt is taken as Rn, then %.l is referred to as just the orthogonal complement of %.

Theorem A.14. Let vIt be a vector space and let % be a subspace of vIt. The orthogonal complement of % with respect to vIt is a subspace of vIt and if x E vIt, x can be written uniquely as x = Xo + Xl with XoE% and Xl E%.l. The ranks of these spaces satisfy the relation r(vIt) = r(%) + r(%.l).

PROOF. It is easily seen that %.l is a subspace by checking Theorem A.3. Let r(vIt) = nand r(%) = r. Let Vl""'Vr be a basis for % and extend this with WI"'" Wn-~ to a basis for vIt. Apply Gram-Schmidt to get v!, ... , v:, wt, ... , w:-r> an orthonormal basis for vIt with v!, ... , v: an orthonormal basis for %. If x E vIt, then r n-r X = I !X;vr + I pjWr i=l j=l

Let Xo = I; !XiVr and Xl = I'i-r PiW;*, then XoE%, Xl E%.l, and X = Xo + Xl' To establish the uniqueness ofthe representation and the rank relationship we need to establish that {wt, ... , w:- r } is a basis for %.l. Since, by construc­ tion, the wt's are linearly independent and wt E %.l,j = 1, ... , n - r, it suffices to show that {wt, ... , w:- r } is a spanning set for %.l. If X E %.l, write

r n-r X = I !XiVr + I pjWr i=l j=l

However, since x'vr = 0 we have !Xi = 0 for i = 1, ... , r and x = Ij=r PjWj*, thus {wt, ... , w:- r } is a spanning set and a basis for %.l. To establish uniqueness, let X = Yo + Yt with Yo E % and Yt E %.l. Then Yo = "rL.,1 YiVi* and Yl = "n-rL.,l UjWj,s: * so X = "rL.,1 YiVi* + "n-rL.,1 UjWj.s: * BY t h e umque- . ness of the representation of X under any basis, Yi = !Xi and Pj = Dj for all i and j, thus Xo = Yo and Xl = Yl' Since a basis has been found for each of vIt, %, and %.l, we have r(vIt) = n, r(%) = r, and r(%.l) = n - r. Thus, r(vIt) = r(%) + r(%.l). 0

Definition A.I5. Let %1 and %2 be vector spaces. Then, the sum of %1 and AS is %1 + AS = {xix = Xl + X 2,Xl E%1,X2 E AS}· Appendix A. Vector Spaces 327

Exercises

EXERCISE A.1 Give a detailed proof of the Gram-Schmidt theorem. Questions A.2 through A.13 involve the following matrices: 1 1001 0 0] r1001 0 0] r1 00] r 1 22] A= r0 0 1 0 ; B= 0 1 0 ; D= 2 5 ; E= 2 7 ; 001 1 001 0 0 0 0 1561 5 6] r10521 0 5 2] r102261 0 2 2 6] F= r0 7 2 ; G= 2 5 7 9 ; H= 7 9 3 9 -1 o 0 9 0 0 0 3 0 0 0 3 -7 1001 0 0] r2002 0 0] rl]2 K= 1 1 0 ; L= 1 1 0 ; N= 3 . r101 101 4

EXERCISE A.2 Is the space spanned by the columns of A the same as the space spanned by the columns of B? How about the spaces spanned by the columns of K, L, F, D, and G?

EXERCISE A.3 Give a matrix whose column space contains C(A).

EXERCISE AA Give two matrices whose column spaces contain C(B).

EXERCISE A.5 Which of the following equalities are valid: C(A) = C(A, D), C(D) = C(A,B), C(A,N) = C(A), C(N) = C(A), C(A) = C(F), C(A) = C(G), C(A) = C(H), C(A) = C(D)?

EXERCISE A.6 Which of the following matrices have linearly independent columns: A,B,D,N,F,H,G?

EXERCISE A. 7 Give a basis for the space spanned by the columns of each of the matrices listed: A, B, D, N, F, H, G.

EXERCISE A.8 Give the ranks of A, B, D, E, F, G, H, K, L, and N.

EXERCISE A.9 Which of the following matrices have columns that are mutually orthogonal: B, A, D?

EXERCISE A.1O Give an orthogonal basis for the space spanned by the columns of each of the matrices listed: A, D, N, K, H, G. 328 Appendix A. Vector Spaces

EXERCISE A.ll Find C(A).l and C(B).l (with respect to R").

EXERCISE A.12 Find two linearly independent vectors in the orthogonal comple­ ment of C(D) (with respect to R").

EXERCISE A.13 Find a vector in the orthogonal complement of C(D) with respect to C(A).

EXERCISE A.14 Find an orthogonal basis for the space spanned by the columns of 1 4 2 3 0 X= 11 4 0 1 5 6 4

EXERCISE A.IS Find two linearly independent vectors in the orthogonal comple­ ment of C(X) (with respect to R").

EXERCISE A.16 Let X be an n x p matrix. Prove or disprove the following state­ ment: every vector in Rn is either a vector in C(X), C(X).l, or both.

EXERCISE A.17 Prove that C(A) and the null space of A' are orthogonal complements. ApPENDIX B Matrices

This appendix reviews standard ideas in matrix theory with emphasis given to important results that are less commonly taught in a junior-senior level linear algebra course. The appendix begins with basic definitions and results. A subsection devoted to eigenvalues and their applications follows. This sub­ section contains a number of standard definitions but it also contains a number of very specific results that are unlikely to be familiar to people with only an undergraduate background in linear algebra. The penultimate subsection is devoted to an intensive (brief but detailed) examination of projections and their properties. The appendix closes with some miscellaneous results.

Basic Ideas

Definition B.t. Any matrix with the same number of rows and columns is called a square matrix.

Definition B.2. Let A = [aij] be a matrix. The transpose of A, written A', is the matrix A' = [aji].

Definition B.3. If A = A', then A is called symmetric. Note that only square matrices can be symmetric.

Definition B.4. If A = [aij] is a square matrix and aij = 0 for i #- j, then A is a diagonal matrix. If AI," . ,An are scalars, then D(A;} and Diag(A;} are used to indicate an n x n matrix D = [dij] with du = 0 for i #- j and dii = Ai' A diagonal 330 Appendix B. Matrices matrix with allI's on the diagonal is called an identity matrix and is denoted by I.

If A = [aij] is n x p and B = [bij] n x q we can write an n x p + q matrix C = [A, B], where cij = aij, i = 1, ... , n, j = 1, ... , p, and cij = bi(j_p), i = 1, ... , n, j = p + '1, ... , p + q. This notation can be extended in obvious ways, e.g.,

C' = [~1

Definition B.S. Let A = [aiJ be an r x C matrix and let B be an s x d matrix. The Kronecker product of A and B, written A ® B, is an r x C matrix of s x d matrices. The matrix in the i'th row andj'th column is aijB. In total, A ® B is an rs x cd matrix.

Definition B.6. Let A be an r x c matrix and write A = [A1' A 2 , ... , Ac] where Ai is the i'th column of A. Then the vec operator stacks the columns of A into an rc x 1 vector; thus,

[Vec(A)]' = [A'l,A~, ... ,A~].

EXAMPLE B.7.

A = [~ ~J B = [~ !J

rICo 43) 4C 0 3) 4 l r 0I 43 4 0 1612l A ® B = C 3) C 3) = 2 6 5 15 ' 2 0 4 5 0 4 0 8 0 20 Vec(A) = [1,2,4,5]'.

Definition B.8. Let A be an n x n matrix. A is nonsingular if there exists a matrix A -1 such that A -1 A = I = AA -1. If no such matrix exists, then A is singular. If A -1 exists, it is called the inverse of A.

Theorem B.9. An n x n matrix A is nonsingular if and only if rCA) = n; i.e., the columns of A form a basis for Rn.

Corollary B.10. If A is singular, then there exists x nonzero such that Ax = O.

The set of all x such that Ax = 0 is easily seen to be a vector space.

Definition B.11. The set of all x such that Ax = 0 is called the null space of A.

Theorem B.12. If rCA) = r, then the rank of the null space of A is n - r. Appendix B. Matrices 331

Eigenvalues and Related Results

The material in this subsection deals with eigenvalues and eigenvectors either in the statements of the results or in their proofs. Again, this is meant to be a brief review of important concepts, but in addition, there are a number of specific results that may be unfamiliar.

Definition B.13. The scalar A is an eigenvalue of A if A - AI is singular. A is an eigenvalue of multiplicity s if the rank of the null space of A - AI is s. A nonzero vector x is an eigenvector of A corresponding to the eigenvalue A if x is in the null space of A - AI; i.e., Ax = AX.

Note that if A is an eigenvalue of A, the eigenvectors corresponding to A (with the vector 0) form a subspace of C(A). If A is a symmetric matrix and y and A are distinct eigenvalues, then the eigenvectors corresponding to A and yare orthogonal. To see this, let x be an eigenvector for A and y an eigenvector for y. Then AX' y = x' Ay = yx' y, which can happen only if A = y or if x' y = 0. Since A and yare distinct we have x' y = 0.

Let Al , ... , Ar be the distinct nonzero eigenvalues of a symmetric matrix A with respective multiplicities s(1), ... ,s(r). Let Vil, ... ,Vis(i) be a basis for the space of eigenvectors of Ai. We want to show that vll ,v12 , ••• ,vrs(r) is a basis for C(A). Suppose Vll , V12 , .. . , vrs(r) is not a basis. Since vij E C(A) and the vij's are linearly independent, we can pick X E C(A) with x .l vij for all i andj. Note that since AVij = AiVij , we have (A)pvij = (AJP Vij . In particular, x'(A)Pvij = x' Afvij = Afx'vij = 0, so AP x .l vij for any i,j, and p. The vectors x, Ax, A 2 x, ... cannot all be linearly independent, so there exists a smallest value k ~ n so that AkX + bk_lAk-lx + ... + box = O. Since there is a solution to this, for some real number J1. we can write the equation as (A - J1.I)(A k- l x + Yk_2Ak-2X + ... + Yox) = 0,

and J1. is an eigenvalue. (See Exercise B.3) An eigenvector for J1. is y = Ak- l X + ... + YoX. Clearly, y .l vij for any i and j. Since k was chosen as the smallest value to get linear dependence we have y =f. 0. If J1. =f. 0, Y is an eigenvector that does not correspond to any of Al' ... ' A" a contradiction. If J1. = 0, we have Ay = °and since A is symmetric, y is a vector in C(A) which is orthogonal to every other vector in C(A); i.e., y = °a contradiction. We have proven

Theorem B.14. If A is a symmetric matrix, then there exists a basis for C(A) consisting of eigenvectors of nonzero eigenvalues. If A is a nonzero eigenvalue of multiplicity s, then the basis will contain s eigenvectors for A.

If A. is an eigenvalue of A with multiplicity s, then we can think of A as being an eigenvalue s times. With this convention, the rank of A is the number of 332 Appendix B. Matrices nonzero eigenvalues. The total number of eigenvalues is n if A is an n x n matrix. For a symmetric matrix A, if we use eigenvectors corresponding to the zero eigenvalue we can get a basis for Rn consisting of eigenvectors. We already have a basis for C(A) and the eigenvectors of 0 are the null space of A. For A symmetric, C(A) and the null space of A are orthogonal complements. Let

A1 , ••• ,An be the eigenvalues of a symmetric matrix A. Let V1 , ••• , Vn denote a basis of eigenvectors for Rn, with Vi being an eigenvector for Ai for any i.

Theorem B.lS. There exists an orthonormal basis for R n consisting of eigen­ vectors of A.

PROOF. Assume Ail = ... = Aik are all the A;'S equal to any particular value A, and let Vil , ••• , Vik be a basis for the space of eigenvectors for A. By Gram­ Schmidt there exists an orthonormal basis Wil , •• • , Wik for the space of eigen­ vectors corresponding to A. If we do this for each distinct eigenvalue we get a collection of orthonormal sets that form a basis for Rn. Since, as we have seen for Ai #- Aj, any eigenvector for Ai is orthogonal to any eigenvector for Aj , the basis is orthonormal. 0

Definition B.16. A square matrix P is orthogonal if pi = p-l. Note that if P is orthogonal so is P'.

Theorem B.17. Pnxn is orthogonal if and only if the columns of P form an orthonormal basis for Rn.

PROOF. Necessity: It is clear that if the columns of P form an orthonormal basis for Rn, then pip = 1. Sufficiency: Since P is nonsingular, the columns of P form a basis for Rn. Since pip = I the basis is orthonormal. 0

Corollary B.lS. Pn x n is orthogonal if and only if the rows of P form an orthonormal basis for Rn.

PROOF. P is orthogonal if and only if pi is orthogonal, if and only ifthe columns of pi are an orthonormal basis, if and only if the rows of P are an orthonormal ~~~ 0

Theorem B.19. If A is an n x n symmetric matrix then there exists an orthogonal matrix P such that pi AP = Diag(Ai ) where Al' A2 , ••• , An are the eigenvalues of A.

PROOF. Let v 1, V2, ... , Vn be an orthonormal set of eigenvectors of A corre­ sponding respectively to A 1 ,A2 , ••• ,An• Let P = [Vl,''''Vn], then Appendix B. Matrices 333

P'AP = [~][AVI,,,.'AV"]

= [ v~Jv~ [A1V1, .. ·,A"V,,]

= [AI V~ VI A"V~ V,,] AIV~VI A"V~V" = Diag(A;). D

Corollary B.20. A = PD(Ai)P'.

Definition B.21. A symmetric matrix A is positive (nonnegative) definite iffor any nonzero vector V E R", v' Av is positive (nonnegative).

Theorem B.22. A is nonnegative definite if and only if there exists a square matrix Q such that A = QQ'.

PROOF. Necessity: We know that there exists P, orthogonal, with P'AP = Diag(A;). The A/S must all be nonnegative, because if e; = (0, ... ,0, 1,0, ... ,0) with the 1 in the i'th place and we let v = Pei, then 0 :s v' Av = e; Diag(Ai)ei = Ai' Let Q = P Diag(A) then since P Diag(AJP' = A, we have

QQ' = P Diag(AJP' = A. Sufficiency: If A = QQ' then v'Av = (Q'v)'(Q'v) > O. D

Corollary B.23. A is positive definite if and only if Q is nonsingular for any choice of Q.

PROOF. There exists v "# 0, such that v' Av = 0 if and only if there exists v "# 0 such that Q'v = 0 which occurs if and only if Q' is singular. The contrapositive of this is that v' Av > 0 for all v "# 0 if and only if Q' is nonsingular.

Theorem B.24. If A is an n x n nonnegative definite matrix with nonzero eigen­ values AI"'" Ar, then there exists an n x r matrix Q = QI Q2'1 such that QI has orthonormal columns, C(Qd = C(A), Q2 is diagonal and nonsingular, and Q'AQ = I.

PROOF. Let VI'''''V" be an orthonormal basis of eigenvectors with VI'''''Vr

corresponding to A1, ... ,Ar . Let QI = [vl, ... ,vr ]. By an argument similar to that in Theorem B.19, Q'IAQI = Diag(AJ, i = 1, ... ,r. Now take Q2 = Diag(Ail/2) and Q = QI Q2'I. D

Corollary B.2S. Let W = QI Q2, then WW' = A. 334 Appendix B. Matrices

PROOF. Since Q'AQ = Q21Q'IAQI Q21 = I and Q2 is symmetric, Q'IAQI = Q2Q~. Multiplying gives

Ql Q~AQI Q'1 = (Ql Q2)(Q~Q'I) = WW', but Ql Q'1 is the perpendicular projection matrix onto qA), so Ql Q'1 AQI Q'1 = A (cf. Definition B.31 and Theorem B.35). 0

Corollary B.26. AQQ'A = A and QQ'AQQ' = QQ'.

PROOF. AQQ'A = WW'QQ'WW' = WQ2Q'IQIQ21Q21Q~QIQ2W' = A. QQ'AQQ' = QQ'WW'QQ' = QQ21Q~QIQ2Q2Q'IQIQ21Q' = QQ'. 0

Definition B.27. Let A = [aij] be an n x n matrix. The trace of A is tr(A) = Lt=1 a;;.

Theorem B.28. For matrices Ar x. and B. X" tr(AB) = tr(BA).

Theorem B.29. If A is a symmetric matrix, tr(A) = Li'=1 ,1,;, where ,1,1' ... ,An are the eigenvalues of A.

PROOF. A = PD(AJP' with P orthogonal tr(A) = tr(PD(AJP') = tr(D(AJP'P)

n = tr(D(AJ) = ;=1LA;. o

In fact, a stronger result is true. We give it without proof.

Theorem B.30. tr(A) = Li'=1 ,1,; where ,1,1' ... ,An are the eigenvalues of A.

Projections

This subsection is devoted primarily to a discussion of perpendicular pro­ jection operators. It begins with their definition, some basic properties, and two important characterizations, Theorems B.33 and B.35. A third important characterization, Theorem B.44, involves generalized inverses. Generalized inverses are defined, briefly studied, and applied to projection operators. The subsection continues with the examination of the relationships between two perpendicular projection operators and closes with discussions of the Gram-Schmidt theorem, eigenvalues of projection operators, and oblique (nonperpendicular) projection operators. We begin by defining a perpendicular projection operator onto an arbitrary space. To be consistent with later usage we denote the arbitrary space as qX) for some matrix X. Appendix B. Matrices 335

Definition B.31. M is a perpendicular projection operator (matrix) on C(X) if and only if:

(i) VE C(X) implies Mv = v (projection). (ii) W 1. C(X) implies Mw = 0 (perpendicularity).

Proposition B.32. If M is a perpendicular projection operator onto C(X), then C(M) = C(X).

PROOF. See Exercise B.2. D

Theorem B.33. M is a perpendicular projection operator on C(M) if and only if M2 = M and M' = M.

PROOF. Sufficiency: To show that M2 = M, it is enough to show that M 2v = Mv for any vERn. Write v = VI + V2 where VI E C(M) and V2 1. C(M), then, using Definition B.31, M 2v = M 2 vl + M 2v2 = M 2v1 = MVI = MVI + MV2 = Mv. To see that M' = M, let w = WI + W2 with WI E C(M) and W2 1. C(M). Since (I - M)v = (I - M)V2 = v2, we get

w'M'(I - M)v = w~M'(I - M)V2 = W~V2 = o. This is true for any V and w, so we have M'(I - M) = 0 or M' = M'M. Since M'M is symmetric, M' must also be symmetric. Necessity: If M2 = M and VE C(M), then since v = Mb we have Mv = MMb = Mb = v. If M' = M and w 1. C(M), then Mw = M'w = 0 because the columns of M are in C(M). D

Proposition B.34. Perpendicular projection operators are unique.

PROOF. Let M and P be perpendicular projection operators onto some space At. Let vERn and write v = VI + V2, VI EAt, v21. At. Since V is arbitrary and Mv = VI = Pv, we have M = P. D

For any matrix X, we will now find two ways to characterize the per­ pendicular projection operator onto C(X). The first method depends on the Gram-Schmidt theorem; the second depends on the concept of a generalized inverse.

Theorem B.35. Let °1 , ••• , Or be an orthonormal basis for C(X), and let 0 = [01, ... , Or]. Then 00' = L~=1 0iO[ is the perpendicular projection operator onto C(X).

PROOF. 00' is symmetric and 00'00' = OIrO' = 00', so by Theorem B.33 it only remains to show that C(OO') = C(X). Clearly C(OO') c C(O) = C(X). On the other hand, if V E C(0), then V = Db for some vector b in Rr and V = Db = OIrb = OO'Ob, so clearly VE C(OO'). D 336 Appendix B. Matrices

One use of Theorem B.35 is that, given a matrix X, one can use the Gram-Schmidt Theorem to get an orthonormal basis for C(X) and thus obtain the perpendicular projection operator. We now examine properties of generalized inverses. Generalized inverses are a generalization on the concept of the inverse of a matrix. Although the most common use of generalized inverses is in solving systems oflinear equa­ tions, our interest lies primarily in their relationship to projection operators. The discussion below is given for an arbitrary matrix A.

Definition B.36. A generalized inverse of a matrix A is any matrix G such that AGA = A. The notation A-is used to indicate a generalized inverse of A.

Theorem B.37. If A is nonsingular, the unique generalized inverse of A is A -1.

PROOF. AA -1 A = IA = A, so A -1 is a generalized inverse. If AA - A = A, then AA- = AA-AA-1 = AA-1 = I, so A- is the inverse of A. D

Theorem B.3S. For any symmetric matrix A, there exists a generalized inverse of A.

PROOF. There exists P orthogonal so that PAP' = D(A;) and A = P'D(A;)P. Let _ {l/A; irA; # 0 y; - 0 if X = 0' J and G = P'D(y;)P. We now show that G is a generalized inverse of A. Pis orthogonal so P P' = I and

AGA = P'D(A;)PP'D(y;)PP'D(A;)P

= P'D(A;}D(y;)D(A;)P = P'D(A;}P =A. D

Although this is the only existence result we really need, we wi11later show that generalized inverses exist for arbitrary matrices.

Theorem B.39. If G1 and G2 are generalized inverses of A, then so is G1 AG2 .

PROOF. A(G1 AG2 )A = (AG1 A)G2 A = AG2 A = A. D

For A symmetric, A- need not be symmetric.

EXAMPLE B.40. Consider the matrix [: b~/al Appendix B. Matrices 337

It has a generalized inverse

[l~a -lJo ' and in fact, by considering the equation

[~ b~aJ[: ~J[~ b:/aJ=[~ b:/a} it can be shown that if r = l/a, then any solution of at + as + bu = 0 gives a generalized inverse.

Corollary B.4l. For A symmetric, there exists A - such that A - AA - = A - and (A-Y = A-.

PROOF. Take A - as the generalized inverse in Theorem B.38. Clearly, A - = P'D(yJP is symmetric and A-AA- = P'D(yJPP'D(A.i)PP'D(yJP = P'D(yJD(A.i)D(yJP = P'D(Yi)P =A-. o

Definition B.42. A generalized inverse A - for a matrix A that has the property A - AA - = A-is said to be reflexive.

Corollary B.41 establishes the existence of a reflexive generalized inverse for any symmetric matrix. Note that Corollary B.26 previously established the existence of a reflexive generalized inverse for any nonnegative definite matrix. Generalized inverses are of interest in that they provide an alternative to the characterization of perpendicular projection matrices given in Theorem B.35. The two results immediately below characterize the perpendicular pro­ jection matrix onto C(X).

Lemma B.43. If G and H are generalized inverses of (X' X), then (i) XGX'X = XHX'X = X. (ii) XGX' = XHX'.

PROOF. For vERn, let v = V 1 + V2 with V 1 E C(X) and v2 .l C(X). Also let V 1 = Xb for some vector b.

v'XGX'X = v~XGX'X = b'(X'X)G(X'X) = b'(X'X) = v'x. Since v and G are arbitrary, we have shown (i). 338 Appendix B. Matrices

To see (ii), observe that for the arbitrary vector v above XGX'v = XGX'Xb = XHX'Xb = XHX'v. D

Since X'X is symmetric there exists a generalized inverse (X' Xr that is sym­ metric, so X(X'X)-X' is symmetric. By the above lemma, X(X'X)-X' must be symmetric for any choice of (X' Xr.

Theorem B.44. X(X'Xr X' is the perpendicular projection operator onto C(X).

PROOF. We need to establish conditions (i) and (ii) of Definition B.31. (i) For VE C(X) write v = Xb, so by Lemma B.43, X(X'X)- X'v = X(X'Xr X'Xb = Xb = v. (ii) If w.1 C(X), then X (X'X)-X'w = O. D

The next five results examine the relationships between two perpendicular projection matrices.

Theorem B.45. Let Ml and M2 be perpendicular projection matrices on Rn, (Ml + M 2) is the perpendicular projection matrix onto C(M1 ,M2) if and only if C(Md.1 C(M2)·

PROOF.

(a) If C(M1 ).1 C(M2) then M1M2 = M2Ml = O. Because (Ml + M2? = M; + Mi + M1M2 + M2Ml = M; + Mi = Ml + M 2, and (Ml + M 2)' = M~ + M; = Ml + M2

Ml + M2istheperpendicularprojectionmatrixontoC(M1 + M2)·Clearly C(Ml + M 2 ) c C(M1 ,M2). To see that C(M1 ,M2) c C(Ml + M 2 ) write v = M 1b1 + M 2b2. Then, because M1M2 = M2Ml = 0, (Ml + M 2)v = v. Thus, C(M1>M2) = C(Ml + M2)· (b) If Ml + M2 is a perpendicular projection matrix, then

(Ml + M 2) = (Ml + M2)2 = M; + Mi + M1M2 + M2Ml = Ml + M2 + M1M2 + M 2M 1· Thus, M1M2 + M2Ml = O. Multiplying by Ml gives 0 = M;M2 + M1M2Ml = M1M2 + M1M2Ml and -M1M2Ml = M 1M 2. Since -M1M2Ml is symmetric, so is M 1M 2. This gives M1M2 = (M1M 2), = M 2M 1, so the condition M1M2 + M2Ml = 0 becomes 2(M1M 2) =0 or M1M2 = O. By symmetry this says that the columns of Ml are orthogonal to the columns of M 2 • D

Theorem B.46. If Ml and M2 are symmetric, C(Md.1 C(M2 ), and (Ml + M 2) is a perpendicular projection matrix, then M 1 and M 2 are perpendicular projec­ tion matrices. Appendix B. Matrices 339

PROOF.

(M1 + M2) = (M1 + M2)2 = Mf + M~ + M1M2 + M2M1. Since M1 and M2 are symmetric with C(M1).l.. C(M2), we have M1M2 = M2M1 = 0, and M1 + M2 = Mf + M~. Rearranging gives M2 - M~ = Mf­ M1 so C(M2 - MD = C(Mf - M1)' Now C(M2 - MD c C(M2) and C(Mf - M1) c C(M1) so C(M2 - M~).l.. C(Mf - M1)' The only way a vector space can be orthogonal to itself is if it consists only of the zero vector. Thus, M2 - M~ = Mf - M1 = 0, and M2 = M~ and M1 = Mf. 0

Theorem B.47. Let M and Mo be perpendicular projection matrices with C(Mo) C C(M), then M - Mo is a perpendicular projection matrix.

PROOF. Since C(Mo) c C(M), MMo = Mo and by symmetry MoM = Mo. Checking the conditions of Theorem B.33, we see that (M - Mof = M2 - MMo - MoM + M5 = M - Mo - Mo +Mo = M - Mo,and(M - Mo)' = M-Mo· 0

Theorem B.48. Let M and Mo be perpendicular projection matrices with C(Mo) c C(M), then C(M - Mo) is the orthogonal complement of C(Mo) with respect to C(M).

PROOF. C(M - Mo).l.. C(Mo) because (M - Mo)Mo = MMo - M5 = Mo - Mo = 0. Thus, C(M - Mo) is contained in the orthogonal complement of C(Mo) with respect to C(M). If xEC(M) and x.l.. C(Mo), then x = Mx = (M - Mo)x + Mox = (M - Mo)x. Thus, x E C(M -Mo), and the orthogonal complement of C(Mo) with respect to C(M) is contained in C(M - Mo). 0

Corollary B.49. r(M) = r(Mo) + r(M - Mo).

At this point, we will examine the relationship between perpendicular pro­ jection operations and the Gram-Schmidt Theorem A.12. Recall that in the Gram-Schmidt Theorem X 1"",Xr denotes the original basis and Yl> ... ,Yr denotes the orthonormalized basis. Let

s Ms = L YiY;' i=l By Theorem B.35, Ms is the perpendicular projection operator onto C(x 1, ... , xs). Now define

ws+1 = (1 - M s )xs+1'

Thus, Ws+1 is the projection of X s+1 onto the orthogonal complement of C(x1,· .. , xs)· Finally, Ys+1 is just Ws+1 normalized. We now take a brief look at the eigenvalues of a perpendicular projection 340 Appendix B. Matrices

operator M. Let VI"'" vr be a basis for C(M), then MVi = Vi so Vi is an eigenvector of M with eigenvalue one. In fact, one is an eigenvalue of M with multiplicity r. Now, let WI"'" wn- r be a basis for C(M)-L. It follows that zero is an eigenvalue of M with multiplicity n - r. We have now completely char­ acterized the n eigenvalues of M. Since tr(M) equals the sum of the eigenvalues, we have that tr(M) = r(M). In fact for any n x n matrix A with A 2 = A, any basis for C(A) is a basis for the space of eigenvectors for the eigenvalue 1. The null space of A is the space of eigenvectors for the eigenvalue O. The rank of A and the rank of the null space of A add to n and A has n eigenvalues, so all the eigenvalues are accounted for. Again tr(A) = r(A).

Definition B.50. (a) If the square matrix A has the property A2 = A, then A is called idempotent. (b) If the square matrix A has the property that Av = V for any VE C(A), then A is called the projection operator (matrix) onto C(A) along C(A,)-L. (Note that C(A,)-L is the null space of A.)

Any projection operator that is not a perpendicular projection operator is referred to as an oblique projection operator. Any idempotent matrix A is a projection operator (either perpendicular or oblique) onto C(A), because if V E C(A) we can write v = Ab so that Av = AAb = Ab = v. To show that a matrix A is a projection operator onto an arbitrary space, say C(X), it is necessary to show that C(A) = C(X) and that for x E C(X), Ax = x. A typical proof runs in the following pattern. First show that Ax = x for any x E C(X). This also establishes that C(X) c C(A). To finish the proof, it suffices to show that Av E C(X) for any vERn because this implies that C(A) c C(X). In this book, our use of the word perpendicular is based on using the standard inner product which defines Euclidean distance. In other words, for two vectors x and y their inner product is x' y. By definition, the vectors x and yare orthogonal if their inner product is zero. The length of a vector x is defined as the square root of the inner product of x with itself, i.e., [x'xJ 1/2• These concepts can be generalized. For a positive definite matrix B we can define an inner product between x and y as x'By. As before, x and yare orthogonal if their inner product is zero and the length of x is the square root of its inner product with itself (now [x' Bx] 1/2). As argued above, any idempotent matrix is always a projection operator, but which one is the perpendicular projection operator depends on the inner product. As can be seen from Proposition 2.7.2 and Exercise 2.5, the matrix X(X'BX)-X'B is an oblique projection onto C(X) for the standard inner product, while it is the per­ pendicular projection operator onto C(X) with the inner product defined using the matrix B. Appendix B. Matrices 341

Miscellaneous Results

Proposition B.S1. For any matrix X, C(XX') = C(X).

PROOF. Clearly C(X X') c: C(X), so we need to show that C(X) c: C(XX'). Let x E C(X), then x = Xb for some b. Write b = bo + bl where bo E C(X') and bl .l C(X'). Clearly Xbl = 0, so we have x = Xbo. But bo = X'd for some d, so x = Xbo = XX'd and XEC(XX'). 0

Corollary B.S2. For any matrix X, r(XX') = r(X).

PROOF. See Exercise B.4. o

Corollary B.S3. If Xn x p has r(X) = p, then the p x p matrix X'X is non­ singular.

PROOF. See Exercise B.5. o

We can now easily show that generalized inverses always exist.

Theorem B.S4. For any matrix X, there exists a generalized inverse X-.

PROOF. We know (X'Xr exists. Set X- = (X'Xr X', then XX-X = X(X' X)-X'X = X because X(X' Xr X' is a projection matrix onto C(X). o When we consider applications oflinear models, we will frequently need to refer to matrices and vectors that consist entirely of l's. Such matrices are denoted by the letter J with various subscripts and superscripts to specify their dimensions. J: is an r x c matrix of l's. The subscript indicates the number of rows and the superscript indicates the number of columns. If there is only one column, the superscript may be suppressed, e.g., J, = J/. In a context where we are dealing with vectors in Rn, the subscript may also be suppressed, e.g., J = I n = Ji. A matrix of zeros will always be denoted by O.

Exercises

EXERCISE B.1 (a) Show that AkX + bk_1Ak-1X + ... + box = (A - J.lI)(Ak-1X + 'k_2Ak-2X + ... + 'ox) = 0,

where J.l is any nonzero solution of bo + b1w + ... + bkwk = 0 (bk = 1), and 'j = -(bo + b1J.l + ... + bjJ.lj)/J.lj+1,j = O, ... ,k. 342 Appendix B. Matrices

k (b) Show that if the only root of bo + bi w + ... + bk w is zero, then the factorization in (a) still holds. (c) The solution J1 used in (a) need not be a real number, in which case J1 is a complex eigenvalue and the T;'S are complex, so the eigenvector is complex. Show that with A symmetric, J1 must be real because the eigenvalues of A must be real. In par­ ticular, assume

A(y + iz) = (A. + iy)(y + iz), set Ay = A.y - yz, Az = yy + A.Z, and examine z'Ay = y'Az.

EXERCISE B.2 Prove Proposition B.32.

EXERCISE B.3 Show that any symmetric matrix A can be written as A = PDP' where C(A) = C(P), P'P = J, and D is nonsingular.

EXERCISE BA Prove Corollary B.52.

EXERCISE B.5 Prove Corollary B.53.

EXERCISE B.6 Show tr(cI) = nco

EXERCISE B.7 Let a, b, c, and d be real numbers. If ad - be * 0 find the inverse of [: !J

EXERCISE B.8 Prove Theorem B.28 (i.e., let A be an r x s matrix, let B be an s x r matrix and show that tr(AB) = tr(BA».

EXERCISE B.9 Determine whether the matrices given below are positive definite, nonnegative definite, or neither. 2 -2 -2J [ 26 3 2 -2] 2 -2; -2 4 -6;-7] [262 42 13]6; [ 2 -2 -2 u-2 10 -7 -6 13 13 6 13 -2 -2 10 EXERCISE B.10 Show that the matrix B given below is positive definite and find a matrix Q such that B = QQ'. 2 -1 1] B= [-1 1 0 . 1 0 2

EXERCISE B.11 Let

2 0 4] [1 0 0] [1 4 1] A= [ 1 5 7; B= 0 0 1; C= 2 5 1. 1 -5 -3 0 0 -3 0 1 Appendix B. Matrices 343

Use Theorem B.35 to find the perpendicular projection operator onto the column space of each matrix.

EXERCISE B.12 Show that for a perpendicular projection matrix M

L Lmij = r(M). i j

EXERCISE B.13 Prove that if M = M'M, then M = M' and M = M2.

EXERCISE B.14 Let Ml and M2 be perpendicular projection matrices and let Mo be the perpendicular projection operator onto C(Md n C(M2). Show that the following are equivalent

(a) MIM2 = M2M 1 ; (b) MIM2 = Mo; (c) C(M1 ) n [C(M1 ) n C(M2)]-L .1 C(M2) n [C(M1 ) n C(M2)]-L. Hints: (i) Show that MIM2 is a projection operator, (ii) show that MIM2 is symmetric, (iii) note that C(Md n [C(Md n C(M2)]-L = C(MI - Mo).

EXERCISE B.15 Let Ml and M2 be perpendicular projection matrices. Show that (a) the eigenvalues of MIM2 are no greater than 1 in absolute value; (b) tr(M1 M 2 ) ~ r(MIM2).

Hint: for part (a) show that x'Mx ~ x'x for any perpendicular projection operator M.

EXERCISE B.16 Let Mx = X(X'xtlx' and My = y(y'ytly'. Show that MxMy = MyMx if and only if C(x) = C(y) or x .1 y.

EXERCISE B.17 Consider the matrix A=[~ ~J. (a) Show that A is a projection matrix. (b) Is A a perpendicular projection matrix? Why? (c) Describe the space that A projects on and the space that A projects along. Sketch these spaces. (d) Find another projection operator onto the space that A projects on.

EXERCISE B.18 Let A be an arbitrary projection matrix. Show that C(I - A) = C(A')-L. Hints: Recall that C(A')-L is the null space of A. Show that (I - A) is a projection matrix.

EXERCISE B.19 Show that if A- is a generalized inverse of A, then so is

G = A-AA- + (I - A-A)B1 + B2 (1 - AA-) for any choices of Bl and B2 with conformable dimensions. 344 Appendix B. Matrices

EXERCISE B.20 Let A be positive definite with eigenvalues A1"" ,An' Show that A-1 has roots 1/A1'"'' l/An and the same eigenvectors as A.

EXERCISE B.21 Let

All [ A 12 ], A = A21 A22 and let A1.2 = All - A 12 A 2iA21 . Show that if all inverses exist

A1.~ -A1.~A12A2i ] (a) 1 [ A- = -A2iA21 A 1.12 A22 + A22 -lA21A1.12A12A2i '

A-1 -A1.12A21A2i] (b) A-1 = [ 1.2 1 ' -A2.\A21 A 1: A-2.1 where A 2.1 = A22 - A 21 A 1:A12 · ApPENDIX C Some Univariate Distributions

The tests and confidence intervals presented in this book rely almost exclu­ sively on the X2 , t, and F distributions. This appendix defines each of the distributions.

Definition C.I. Let Zl"'" Zn be independent with Zi '" N(J.li' 1), then

n W= L Z? i=l has a noncentral chi-square distribution with n degrees of freedom and non­ centrality parameter y = L?=l J.l'f/2. Write W'" x2(n, y).

It is evident from the definition that if X '" x2(r, y) and Y", X2(S, c5) with X and Y independent, then (X + Y) '" x2(r + s, y + c5). A central X2 distribu­ tion is a distribution with a noncentrality parameter of zero, i.e., x2(r, 0). We will use x2(r) to denote a x2(r, 0) distribution. The 100oc'th percentile ofax2(r) distribution is the point X2(OC, r) that satisfies the equation Pr[x2(r) :::;; X2(OC, r)] = oc.

Definition C.2. Let X", N(J.l, 1) and Y '" x2(n) with X and Yindependent, then X W=-- .JVn has a noncentral t distribution with n degrees of freedom and noncentrality parameter J.l. Write W'" t(n, J.l). If J.l = 0, we say that the distribution is a central t distribution and write W'" t(n). The 100oc'th percentile of t(n) is denoted by t(oc, n). 346 Appendix C. Some Univariate Distributions

Definition C.3. Let X '" x2(r, y) and Y", X2(S, 0) with X and Y independent, then W = X has a noncentral F distribution with rand s degrees of r j!.s freedom and noncentrality parameter y. Write W'" F(r, s, y). If y = 0, write W'" F(r, s) for the central F distribution. The 100ex'th percentile of F(r, s) is denoted by F(ex, r, s).

As indicated, if the noncentrality parameter of any of these distributions is zero, the distribution is referred to as a central distribution (e.g., central F distribution). The central distributions are those commonly used in statistical methods courses. If any of these distributions is not specifically identified as a noncentral distribution, it should be assumed to be a central distribution. It is easily seen from Definition C.1 that any non central chi-squared distri­ bution will tend to be larger than the central chi-square distribution with the same number of degrees of freedom. Similarly, from Definition C.3, a non­ central F will tend to be larger than the corresponding central F distribution. (These ideas are made rigorous in Exercise c.l.) The fact that the noncentral F distribution tends to be larger than the corresponding central F distribution is the basis for many of the tests used in linear models. Typically, test are used that have a central F distribution if the null hypothesis is true and a noncentral F distribution if the null hypothesis is not true. Since the non­ central F distribution tends to be larger, large values of the test statistic are consistent with the alternative hypothesis. Thus, the form of an appropriate rejection region is to reject the null hypothesis for large values of the test statistic. The power of these tests is simply a function of the noncentrality parameter. Given a value for the noncentrality parameter, there is no theoretical difficulty in finding the power of an F test. The power simply involves computing the of the rejection region when the is a noncentral F. Unfortunately, computing from a noncentral F is (from my admittedly biased point of view) quite a nasty task. We now prove a theorem about central F distributions that will be useful in Chapter V.

Theorem C.4. If s > t then sF(l - ex, s, v) ~ tF(l - ex, t, v).

PROOF. Let X '" X2(S), Y", X2(t), and Z '" X2(V). Let Z be independent of X and Y. Note that (X/s)/(Z/v) has an F(s, v) distribution, so sF(l - ex, s, v) is the 100(1 - ex)'th percentile of the distribution of X/(Z/v). Similarly, tF(1 - ex, t, v) is the 100(1 - ex)'th percentile of the distribution of Y/(Z/v). We will first argue that to prove the theorem, it is enough to show that Pr(X ::; d) ::; Pr(Y ::; d), (1) for all real numbers d. We will then show that (1) is true. Appendix C. Some Univariate Distributions 347

If(l) is true, if c is any real number, and if Z = z, we have by independence Pr[X ::;; cz/v] = Pr[X ::;; cz/vlZ = z] ::;; Pr[Y::;; cz/vlZ = z]

= PrEY ::;; cz/v]. Taking expectations with respect to Z, Pr[X/(Z/v) :::;; c] = E(Pr[X ::;; cz/vlZ = z])

::;; E(Pr[Y ::;; cz/vlZ = z]) = Pr[Y/(Z/v) :::;; c]. Since the cd!for X/(Z/v) is always no greater than the cd! for Y/(Z/v), the point at which a probability of 1 - IX is attained for X/(Z/v) must be no less than the similar point for Y/(Z/v). Therefore,

sF(l - IX, s, v) ~ tF(l - IX, t, v).

To see that (1) holds, let Q be independent of Yand Q ~ X2 (s - t). Then because Q is nonnegative Pr[X ::;; d] = PrEY + Q ::;; d] ::;; PrEY :::;; d].

Exercise

Definition C.5. Consider two random variables W1 and W2 • W2 is said to be stochastically larger than W1 if for every real number w,

Pr[W1 > w] ::;; Pr[W2 > w].

If for some random variables W1 and W2 , W2 is stochastically larger than W1 , then we also say that the distribution of W2 is stochastically larger than the distribution of W1 •

EXERCISE C.1 Show that a noncentral chi-square distribution is stochastically larger than the central chi-square distribution with the same degrees of freedom. Show that a noncentral F distribution is stochastically larger than the corresponding central F distribution. ApPENDIX D Multivariate Distributions

To be consistent with Section VI.3 we will discuss multivariate distributions in terms of row vectors of random variables. The discussion does not depend on whether the vectors are considered as row vectors or column vectors. Thus, the discussion applies equally to Section 1.2, in which the distributions of column vectors are considered. Let (Xl' ... ' xn) be a random row vector. The joint cumulative distributive function (c.dJ.) of (Xl' ... ' Xn) is

F(ul,···,Un) = Pr[xI ~ uI,···,xn ~ un].

If F(u l , •.. , Un) is the c.dJ of a discrete we can define a (joint) probability mass function

f(ul,···,un) = Pr[xI = uI,···,xn = un]. If F(u l , ... , Un) admits the n'th order mixed partial derivative, then we can define a (joint) density function an f(ul,···,un) = '" iJ F(ul,···,un)· UI ··· Un The c.dJ. can be recovered from the density,

UI fun F(ul,···,un) = f ... f(WI,···,wn)dwl,···,dwn· -00 -00

The expected value of any function of (x I, ... , xn) into R, say g(x I,· .. , Xn), is defined as

E[g(xl,···,Xn)] = f: ... f: g(Ul> ... ,Un)f(UI,···,un)duI,···,dun· Appendix D. Multivariate Distributions 349

We now consider relationships between two random vectors, say x =

(xl, ... ,xn ) and Y = (YI, ... ,Ym)' We will assume that the joint vector (x,y) = (Xl"'" xn, YI,"" Ym) has a density function

fx,y(u, v) = fx.y(u I,···, Un' VI"'" vm)· Similar definitions and results hold if (x, y) has a probability mass function or even if one random vector has a density and the other has a probability mass function. The distribution of one random vector, say x, ignoring the other vector, Y, is called the marginal distribution of x. The marginal c.dJ. of x can be obtained by substituting the value + 00 into the joint c.dJ. for all of the Y variables

FAu) = Fx,y(u I, ... , Un' + 00, ... , + (0). The marginal density can be obtained by integrating the joint density over the Y variables:

fAu) = fOC! ... fOC! fx,y(UI, ... ,Un,VI, ... ,vm)dvI, ... ,Vm' -00 -00 The conditional density of a vector, say x, given the values of the other vector, say Y = v, is obtained by dividing the density of (x, y) by the density of Y evaluated at v, i.e.,

fxly(ulv) = fx,y(u, v)//Y(v). The conditional density is a well defined density, so expectations with respect to it are well defined. Let 9 be a function from Rn into R

E[g(x)IY = v] = f:oo ... f:oo g(u)fxly(ulv)du

where du = dU I du2 ... dUn. The standard properties of expectations hold for conditional expectations. For example, with a and b real E[agl(x) + bg2(x)ly = v] = aE[gl(x)ly = v] + bE[g2(X)ly = v]. The conditional expectation E[g(x)IY = v] is a function of the value v. Since Y is random, we can consider E[g(x)IY = v] as a random variable. In this context we will write E[g(x)ly]. An important property of conditional expec­ tations follows: E[g(x)] = E[E[g(x)lyJ]. To see this, note that fxly(ulv)/y(v) = fx,y(u, v) and

E[E[g(x)ly]] = f: ... f: E[g(x)ly = v]]/y(v)dv

= f: ... t: [f: ... f: g(u)fxliulv)du ]/Y(V)dV 350 Appendix D. Multivariate Distributions

= f:oo ... f: g(u)fxly(ulv)/y(v)dudv

= f: ... f: g(u)fX,y(u,v)dudv

= E[g(x)]. In fact, the notion of condition expectation and this result can be general­ ized. Consider a function g(x, y) from Rn+m into R, if y = v we can define E[g(x, v)ly = v] in a natural manner. If we consider y as random, we write E[g(x, y)ly]' It can be easily shown that E[g(x,y)] = E[E[g(x,y)ly]]. Note that a function of x or y alone can be considered as a function from Rn+m into R. A second important property of conditional expectations is that if h(y) is a function from Rm into R, we have ". E[h(y)g(x,y)ly] = h(y)E[g(x,y)ly]. This follows because if y = v E[h(y)g(x,y)ly] = E[h(v)g(x, v)ly = v]

= f: ... f: h(v)g(x, v)fxly(ulv) du

= h(v) f: ... f: g(x, v)fxly(ulv) du = h(v)E[g(x, v)ly = v]. The last term is an alternate notation for h(y)E[g(x,y)ly]. In particular, if g(x, y) = 1 we have E[h(y)ly] = h(y). Finally, we can extend the idea of conditional expectation to a function g(x, y) from Rn+m into RS. Write g(x, y) = [g 1 (x, y), ... ,gs(x, y)], then define E[g(x,y)ly] = (E[gl(X,y)ly], ... ,E[gs(x,y)ly]). Two random vectors are independent if and only if their joint density is equal to the product of their marginal densities, i.e., x and yare independent if and only if fx,y(u, v) = fAu)/y(v). If the random vectors are independent, then any vector valued functions of them, say g(x) and h(y) are also independent. Note that if x and yare independent

fxly(ulv) = fAu). Appendix D. Multivariate Distributions 351

The characteristic function of a random vector x = (Xl' ..• ' Xn) is a function from Rn to C, the complex numbers. It is defined by

({JAt 1,···,tn) = fexp[i ~ tiUi]fAU1, ... ,Un)dU1, ... ,dUn.

We are interested in characteristic functions because if X = (x 1, ... , xn) and Y = (Y1,···,Yn) are random vectors and if ({JAt 1,···,tn) = ((Jy(t 1,···,tn) for all t1' ... ' tn, then x and Y have the same distribution. For convenience, we have assumed the existence of densities. With minor modifications the definitions and results hold for any probability defined on Rn.

Exercise

EXERCISE D.1 Let x and y be independent. Show that

(a) E[g(x)ly] = E[g(x)]; (b) E[g(x)h(y)] = E[g(x)]E[h(y)J. ApPENDIX E Tests and Confidence Intervals for Some One Parameter Problems

Many statistical tests and confidence intervals (C.l's) are applications of a single theory. (Tests and C.l's about variances are an exception.) To use this theory you need to know four things: (1) the parameter of interest (Par), (2) an estimate of the parameter (Est), (3) the standard error of the estimate (S.E. (Est)) [the S.E.(Est) can either be known or estimated], and (4) the appropriate distribution. Specifically, what we need to know is the distribution of Est - Par S.E.(Est) . This must have a distribution that is symmetric about zero and percentiles of the distribution must be available. If the S.E.(Est) is estimated, this distribution will usually be the t distribution with some known number of degrees of freedom. If the S.E.(Est) is known, then the distribution will usually be the standard . In some problems (e.g., problems involving the ) large sample results are used to get an approximate distribution and then the technique proceeds as if the approximate distribu­ tion were correct. When appealing to large sample results, the known distri­ bution in (4) is the standard normal. We need notation for the percentiles of the distribution. The 1 - a percen­ tile is the point that cuts ofT the top a of the distribution. We denote this by Q(1 - a). Formally we can write, Est - Par ] Pr [ S.E.(Est) > Q(1 - a) = a.

By the symmetry of the distribution about zero, we also have Est - Par ] Pr [ S.E.(Est) < - Q(1 - a) = a. Appendix E. Tests and Confidence Intervals for some One Parameter Problems 353

In practice, the value of Q(1 - a) will generally be found from either a normal table or a t table.

Confidence Intervals

A (1 - a)100% c.1. for Par is based on the following probability equalities: 1 - a Est - Par = PrE - Q(1 - a/2) ::; ::; Q(1 - a/2)] S.E.(Est) = Pr[Est - Q(1 - a/2)S.E.(Est) ::; Par::; Est + Q(d1 - a/2)S.E.(Est)]. The statements within the square brackets are algebraically equivalent. The argument runs as follows: Est - Par - Q(1 - a/2) ::; ::; Q(1 - a/2) S.E.(Est) if and only if - Q(1 - a/2)S.E.(Est) ::; Est - Par::; Q(1 - a/2)S.E.(Est); if and only if Q(1 - a/2)S.E.(Est) ;;::: - Est + Par;;::: - Q(1 - a/2)S.E.(Est); ifand only if Est + Q(1 - a/2)S.E.(Est) ;;::: Par;;::: Est - Q(1 - a/2)S.E.(Est); ifandonlyif Est - Q(1 - a/2)S.E.(Est)::; Par::; Est + Q(1 - a/2)S.E.(Est). As mentioned, a C.I. for Par is based on the probability statement 1 - a = Pr[Est - Q(1 - a/2)S.E.(Est) ::; Par::; Est + Q(1 - a/2)S.E.(Est)]. The (1 - 1X)100% c.1. consists of all the points between Est­ Q(1 - 1X/2)S.E.(Est) and Est + Q(1 - a/2)S.E.(Est), where the observed values of Est and S.E.(Est) are used. The confidence coefficient is defined to be (1 - a) 100%. The endpoints of the C.I. can be written Est ± Q(1 - aj2)S.E.(Est). The interpretation of the C.I. is as follows: "The probability that you are going to get a C.I. that covers Par (what you are trying to estimate) is 1 - a." We did not indicate that the probability that the observed interval covers Par is 1 - IX. The particular interval that you get will either cover Par or not. There is no probability associated with it. For this reason the following termi­ nology was coined for describing the results of a confidence interval: "We are (1 - a)100% confident that the true value of Par is in the interval." I doubt that anybody has a good definition of what the word "confident" means in the previous sentence.

EXAMPLE E.1. We have ten observations from a normal population with variance six. x is observed to be seventeen. Find a 95% C.I. for {t, the mean of the population. 354 Appendix E. Tests and Confidence Intervals for some One Parameter Problems

(1) Est = x. (2) Par = Jl. (3) S.E.(Est) = [6/lOJi/2. In this case S.E.(Est) is known and not estimated.

Est - Par _ 1/2 (4) S.E.(Est) = [x - JlJ/[6/10J '" N(O, 1).

The confidence coefficient is 95% = (1 - IX) 100%, so 1 - IX = .95 and IX = 0.05. The percentage point of the distribution that we require is Q(1 - 1X/2) = z(.975) = 1.96. The limits of the 95% C.I. are, in general, x± 1.96Jio,

or, since x = 17,

17 ± 1.96 Jio.

Hypothesis Testing

We want to test

Ho: Par = m versus HA : Par # m where m is some known number. The test is based on the assumption that Ho is true and we are checking to see if the data is inconsistent with that assumption. If Ho is true, then the test statistic Est - m S.E.(Est) has a known distribution symmetric about zero. What kind of data is inconsistent with the assumption that Par = m? What happens when Par # m? Well, Est is always estimating Par, so if Par > m then [Est - mJ/S.E.(Est) will tend to be big (bigger than it would be if Ho were true). On the other hand, if Par < m, then [Est - mJ/S.E.(Est) will tend to be a large negative number. Therefore, data that are inconsistent with the null · I . . d I . I fEst - h ypot heSlS are arge posItive an arge negative va ues 0 m . S.E.(Est) Before we can state the test we need to consider one other concept. Even if Ho is true, it is possible (not probable but possible) to get any value at all for (Est - m)/S.E.(Est). For that reason, no matter what you conclude about the null hypothesis there is a possibility of error. A test of hypothesis is based on controlling the probability of making an error when the null hypothesis is Appendix E. Tests and Confidence Intervals for some One Parameter Problems 355 true. We define the a level of the test (also called the probability a type I error) as the probability of rejecting the null hypothesis (concluding that Ho is false) when the null hypothesis is in fact true. The a level test for Ho: Par = m versus HA : Par #- m is to reject Ho if Est- m S.E.(Est) > Q(1 - a/2), or if Est - m S.E.(Est) < - Q(1 - a/2).

This is equivalent to rejecting Ho if lEst - ml 1 S.E.(Est) > Q( - a/2).

If Ho is true, the distribution of Est - m S.E.(Est) is the known distribution from which Q(1 - a/2) was obtained, thus, if Ho is true we will reject Ho with probability a/2 + a/2 = a. Another thing to note is that Ho is rejected for those values of [Est - mJ/ S.E.(Est) that are most inconsistent with Ho (see the discussion above on what is inconsistent with Ho). One sided tests can be obtained in a similar manner. The a level test for

Ho: Par ~ m versus HA : Par> m is to reject Ho if Est - m S. E. (E)st > Q(1 - a) .

The a level test for Ho: Par;;:: m versus HA : Par < m is to reject Ho if Est- m S.E.(Est) < - Q(1 - a).

Note that in both cases we are rejecting Ho for the data that is most inconsistent with Ho, and in both cases if Par = m then the probability of making a mistake is a.

EXAMPLE E.2. Suppose sixteen observations are taken from a normal popula­ tion. Test Ho: /-l = 20 versus HA : /-l #- 20 with a level.Ol. The observed values of x and S2 are 19.78 and .25 respectively. (1) Est = x. (2) Par = /-l (3) S.E.(Est) = [s2/16J 1/2. 356 Appendix E. Tests and Confidence Intervals for some One Parameter Problems

In this case the S.E.(Est) is estimated.

(4) :s~ ~ p~r = [x _ /i]/[s2/16J1/2 has a t(15) distribution . . . Est

The Il( = .01 test is to reject Ho if

Ix - 201/[s/4] > 2.947 = t(.995,15).

Note that m = 20 and Q (1 - ~) = Q(.995) = t(.995,15). Since x = 19.78 and S2 = .25, we reject Ho if

119.78 - 201 > 2.947. J.25/16

Since 19.~0 = -1.76 is greater than - 2.947 we do not reject the null .25/16 hypothesis at the Il( = .01 level.

The P value of a test is the Il( level at which the test would just barely be rejected. In the example above, the value of the test statistic is -1.76. Since t(.95,15) = 1.75, the P value of the test is approximately 0.10. (An Il( = 0.10 test would use the t(.95,15) value.) The P value is a measure of the evidence against the null hypothesis. The smaller the P value, the more evidence against Ho. To keep this discussion as simple as possible, the examples have been restricted to one sample normal theory. However, the results apply to two sample problems, testing contrasts in analysis of variance, testing coefficients in regression, and in general, testing estimable parametric functions in arbitrary linear models. ApPENDIX F Approximate Methods for Unbalanced ANOVA's

This appendix presents approximate methods for the analysis of unbalanced ANOV A models. The author does not recommend these methods. They are presented in the fear that the reader may one day be confronted with them. Because ofthe difficulties (both computational and conceptual) in analyzing a two-way ANOVA with unequal numbers, various approximate procedures have been proposed. We will discuss three of these: the approximate propor­ tional numbers analysis, the weighted means analysis, and the unweighted means analysis. All three of the methods assume that nij #- 0 for all i and j.

Proportional Numbers

Snedecor and Cochran (1980, Section 20.5) suggest that if the ni/s are not too far from being proportional numbers, then the analysis can proceed based on the methods developed for proportional numbers. The first problem is in determining what it means to be close to having proportional numbers. If we think of the nij's as a two-way table of counts, we can test the table for independence. (Snedecor and Cochran, Chapter 11 give an introduction to testing for independence. Chapter XV of this book presents a general theory that includes testing for independence as a special case.) Under independence, the estimated expected count in the ij cell is

mij = ni.n.)n .. To measure the deviation from independence, there are two commonly used test statistics. The Pearson test statistic (PTS) is

a b PTS = L L (nij - miY/mij. i=l j=l 358 Appendix F. Approximate Methods for Unbalanced ANOVA's

The likelihood ratio test statistic (LR TS) is

a b LRTS = 2 L L ni)og(ni)r7Ii). i=l j=l If the nij's are all large and independence holds, then the approximations PTS '" x2 [(a - l)(b - 1)] and LRTS '" x2 [(a - l)(b - 1)] are valid. By Lemma 7.4.1, we have proportional numbers if, for all i andj,

nij = r71 ij. We can use the PTS and LRTS as measures of how far the nij's are from proportional numbers. Test statistics of zero indicate proportional numbers, and small test statistics are close to proportional numbers. Having chosen a test statistic, if for some small number ~, the test statistic is below the lower ~ percentile of the chi-squared distribution, then approximating the unequal numbers analysis with the proportional numbers analysis should give satisfac­ tory results. For example, if

PTS < x2(~,(a - l)(b - 1)), then the approximation should be reasonable. How small does ~ have to be? The smaller the better, but a value of ~ = 0.2 may be reasonable. The actual method of approximating the unequal numbers analysis with the proportional numbers technique is quite simple. The SSE is taken from the exact analysis that includes interaction, i.e., SSE = L L L (Yijk - YijY i j k The dfE is (n .. - ab). The analysis of the treatment effects is based on the proportional numbers model, the mij's, and the Yij.'s. The Yij.'s are computed using the nij's, i.e.,

Yij. = L Yijk/nij' k The rest of the analysis of treatment effects is conducted using the mij's and is handled exactly as if the numbers where really proportional. For example,

b

Y-·I. = '"l..J r71 IJ.. y-IJ-.. /r7I I... , j=l a a b Y = L r71 i.yJr7I .. = L L r71ijYij./m .. ; i=l i=l j=l a SS(a) = L r71i .('y; - y)2. i=l It should be noted that r71 i. = ni. and r71. j = n. j . Appendix F. Approximate Methods for Unbalanced ANOVA's 359

Weighted Means

Although the proportional numbers analysis is really an analysis of weighted means, the method commonly known as the analysis of weighted means is somewhat different. With Yij. taken as above, the analysis of weighted means takes

b - b-1 ,,- Yi = L.. Yij.· j=l The weights to be used are

b Wi = (f2fVar(yJ = b2 [ L 1/nij J-1 . }=1 With these weights, define a version of the grand mean as

a Yia] = L wiYdw. i=l With different weights we would get a different version of the grand mean. The sum of squares for testing the IX effects is

a 2 SS(IX) = L Wi<"Yi - Y[a]) , i=l with

a E[MS(IX)] = (f2 + (a - 1)-1 L Wi[(lXi + 1id - (~[a] + Y[a])]2, i=l where Yi. is the simple average of the Yij's with i fixed and a[a] and Y[a] are weighted averages. The calculations are exactly like the corresponding cal­ culations for the y/s. An F test of the hypothesis that the (lXi + yd's are all equal is provided by MS(IX)/MSE, where MSE is from the exact analysis with interaction. Similar definitions apply to Yj' Y['I]' and SS(t7) using weights Vj based on Var(Yj)' Note that the estimate of the grand mean is different when computing SS(t7). The test for interactions is performed as in the unweighted means analysis described below.

Unweighted Means

The analysis of unweighted means consists of analyzing the model

Yij. = J.l + lXi + t7j + Yij + eij' i = 1, ... , a, j = 1, ... , b, ei/s independent N(O, K(f2), where

a b K = (abr1 L L 1/nij' i=l j=l 360 Appendix F. Approximate Methods for Unbalanced ANOYA's

The variance K(12 is used because it is the average of the variance of the )lij.'s. The variance of )lij. is (12/nij' and the average of these quantities is

a b (ab)-1 L L (12/nij = K(12. i=1 j=1 Note that the model does not have enough observations for an estimate of error, however, K is a known quantity and (12 can be estimated from the exact analysis as indicated above. Snedecor and Cochran (1980, Section 20.4) suggest that if max(nij/nj'j') < 2, then the unweighted means model gives an adequate approximation for the purposes of testing interaction and contrasts. Gosslee and Lucas (1965) indicate that testing for main effects can be based on the approximation MS(a)/MSE '" F(fa, dfE), where

fa = [(a - WC~ wiYJI [(L Wi)2 + (a - 2)(a) L wl], and Wi is defined as in the analysis of weighted means. A similar test can be performed for the '1'S using the Vj weights to determine the appropriate number of degrees of freedom.

Exercise

EXERCISE F.1 Analyze the data of Exercise 7.6 using an unweighted means anal­ ysis and also a weighted means analysis. Should you be using an unweighted means analysis on this data? Check whether an approximate proportional numbers analysis would be appropriate. Regardless of whether you consider it appropriate, perform an approximate proportional numbers analysis. Compare the results of all of these analyses to your analysis from Exercise 7.6. ApPENDIX G Randomization Theory Models

The division of labor in statistics has traditionally designated randomization theory as an area of nonparametric statistics. Randomization theory is also of special interest in the theory of experimental design because randomization is used to justify the analysis of designed experiments. It can be argued that the linear models given in Chapter VIII are merely good approximations to the more appropriate randomization theory models. One aspect of this argument is that the F tests based on the theory of normal errors are a good approximation to randomization (permutation) tests. In­ vestigating this is beyond the scope of a linear models book (cf. Kempthorne (1952), and Puri and Sen (1971». Another aspect of the approximation argu­ ment is that the BLUE's under randomization theory are precisely the least squares estimates. By Theorem 10.4.5 we need to show that C(VX) c C(X) for the model y = Xp + e, E(e) = 0, Cov(e) = V, where V is the covariance matrix under randomization theory. This argument will be examined here for two experimental design models: the model for a completely randomized design and the model for a randomized complete block design. First, we introduce the subject with a discussion of simple random sampling.

Simple Random Sample

Randomization theory for a simple random sample assumes that observations Yi are picked at random (without replacement) from a larger finite population. Suppose the elements of the population are Sl, S2,,,,,SN' We can define 362 Appendix O. Randomization Theory Models elementary sampling random variables for i = l, ... ,n andj = 1, ... ,N

. {I if Yi = Sj. ~j = 0 otherwise.

Under simple random sampling without replacement, . . 1 E(~j) = PrW = 1) = N'

liN if (i,j) = (i',j'). E(~j~n = Pr(~j~/ = 1) = { I/N(N - 1) if i :F i' andj :F j'. o otherwise.

If we write Jl. = Lf=1 sjlNand (12 = Lf=1 (Sj - Jl.)2 I N, then N N Yi = L ~js) = Jl. + L ~j(Sj - Jl.). j=1 j=1 Letting ei = Lf=1 ~j(Sj - Jl.) gives the linear model

Yi=Jl.+ei· The population mean Jl. is a fixed unknown constant. The e;'s have the properties

E[ea = EL~ !5J(Sj - Jl.)] = j~ (Sj - Jl.)IN = 0;

N N Var(ej) = E[etJ = L L (Sj - Jl.)(sj' - Jl.)E[~j~}J j=1 j'=1 N = L (Sj - Jl.)2IN = (12. j=1 For i :F i' N N COV(Yi,Y;o) = E(eiej ,) = L L (Sj - Jl.)(sj' - Jl.)E[~j~}'J )=1 j'=1 = [N(N - 1)]-1 L (Sj - Jl.)(Sj' - Jl.) i'Fj' = [N(N - 1)]-1 (Lt (Sj - Jl.)J - jt (Sj - Jl.)2) = -(12/(N - 1).

In matrix terms the linear model can be written

Y = JJl. + e, E(e) = 0, Cov(e) = (12 V, Appendix G. Randomization Theory Models 363 where 1 -(N - 1)-1 -(N _1)-1 -(N - 1)-1 -(N - 1)-1 1 -(N _1)-1 -(N - 1)-1 V = I -(N - 1)-1 -(N - 1)-1 1 -(N - 1)-1

-(N - 1)-1 -(N - 1)-1 -(N - 1)-1 1

Clearly V J = (N - n)/(N - l)J, so the BLUE of Ji is y.

Completely Randomized Designs

Suppose that there are t treatments each to be randomly assigned to N experimental units. A one-way ANOV A model for this design is

Yij = Jii + eij, (1) i = 1, ... , t, j = 1, ... , N. Suppose further that the i'th treatment has an effect Ti and that the experimental units without treatments would have readings Sl"'" StN' The elementary sampling random variables are

~ij _ {1 ifthej'th replication of treatment i is assigned to the k'th unit. k - 0 otherwise.

With this restricted random sampling and n = tN, .. .. 1 E(~kJ) = Pr(~kJ = 1) = -. n

1/n(n - 1) if k of. k' and (i,j) of. (i',j'). E(~."W) ~ f'rlCi'W ~ 1) ~ 1: (i,j, k) = (i',j', k'). otherwise. We can write

n Yij = Ti + L ~~jSk' k=l

Taking Ji = L~=l Skin and Jii = Ji + Ti gives

n Yij = Jii + L ~;j(Sk - Ji). k=l

Letting eij = Lk=l ~~j(Sk - Ji) gives the linear model (1). Write (12 = [lin] Lk=l (Sk - Ji)2, then 364 Appendix G. Randomization Theory Models

E[eijJ = E (jNSk - J-I)] = (Sk - J-I)/n = O. [f.k=1 k=1f.

n n Var(Yij) = E(eij) = L L (Sk - J-I}(Sk' - J-I)E[(jN~{J k=l k'=l n = L (Sk - J-I)2/n = (12. k=l For (i,j) "# (i',j')

COV(Yij, Yi'j') = E[eijei'j' J

n n = L L (Sk - J-I}(Sk' - J-I)E[(j~j(jp'J k=l k'=l = [n(n - 1)]-1 L (Sk - J-I}(Sk' - J-I) k#k'

= 1 [n(n - l)r ([t1 (Sk - J-I)J - ktl (Sk - J-I)2)

= -(12/(n - 1). In matrix terms writing Y = (Ytl,Y12, ... ,YtN)', we get

y ~ x[I] + e, E(e) ~ 0, CoYle) ~ .'v, where 1 -1/(n - 1) -1/(n - 1) -1/(n - 1) 1 -1/{n-l) V = I -1/(n - 1) -1/(n-l)

-1/(n - 1) -1/(n - 1) 1 n 1 1 J =--1n- ---1n- :· The columns of X consist of N ones and the rest zeros, so

n N t VX=--X---J • n-l n-l n

Since J E C(X), C(VX) c C(X), and least squares estimates are BLUEs. Stan­ dard errors for estimable functions can be found as in Section XlI using the fact that this model involves only one cluster.

EXERCISE G.l Show whether least squares estimates are BLUE's in a completely randomized design with unequal numbers of observations on the treatments. Appendix G. Randomization Theory Models 365

Randomized Complete Block Designs

Suppose there are a treatments and b blocks. The experimental units must be in b blocks each of a units. Let the experimental unit effects be Skj' k = 1, ... , a j = 1, ... , b. Treatments are assigned at random to the a units in each block. The elementary sampling random variables are b(= {I if treatment i is assigned to unit k in block j. J 0 otherwise. . . 1 E(b:) = Pr(b~j = 1) = -. a 1/a if (i,j, k) = (i',j', k'). 1/a2 ifj' =f./ E(bfjb{j') = Pr(bfjb{j' = 1) = 1la(a - 1) ifj = j', k =f. k', i =f. i'. o otherwise.

If !Xi is the additive effect of the ith treatment and Pj = s,j' then

a Yij = !Xi + Pj + L bUSkj - Pj)' k=l Letting eij = Lk=l bUSkj - Pj) gives the linear model

Yij = !Xi + Pj + eij' (2) The column space of the design matrix for this model is precisely that of the model considered in Section VIII.3. Let ui = L~=l (Skj - PYla, then

a E[eij] = L (Skj - pj)/a = O. k=l a a Var(eij) = L L (Skj - Pj)(Sk'j - Pj)E[b:jb:'j] k=l k'=l a L (Skj - PYla = k=l ur For j =f. j'

a a Cov(eij,ei,j') = L L (Skj - Pj)(Sk'j' - Pj')E[b:jb{j'] k=l k'=l a a = a-2 L (Skj - Pj) L (Sk'j' - Pj') k=l k'=l =0. For j = j', i =f. i' 366 Appendix G. Randomization Theory Models

a a COV(eij, ei'j) = L L (Skj - P)(Sk'J - Pj)E[(j~j(j~:J] k=1 k'=1 = L (Skj - Pj)(Sk'j - p)/a(a - 1) k>Fk'

= [a(a - 1)]-1 ([ f (Skj - pj)]2 - f (Skj _ pj)2) k=1 k=1 = -ul!(a - 1).

Before proceeding, note that although the terms Pj are not known, the differences among these are known constants under randomization theory:

Y.j = a-1 (lXi + Pj + - Pj»)] [.f.=1 k=1f (j~j(Skj

= a-1 lXi + apj] [f1=1 =~. + Pj'

Therefore, Y.j - Y.j' = Pj - Pj' = s,j - s,j" Since these quantities are fixed and known, there is no basis for a test of Ho: P1 = '" = Pb' In fact the linear model is not just model (2), but model (2) subject to these constraints on the p's. To get best linear unbiased estimates we need to assume that uf = ui = ... = ut = u2 • We can now write the linear model in matrix form and establish that least squares estimates of treatment means and contrasts in the IX/S are BLUE's. In the discussion that follows, we use notation from Section VII.1. Model (2) can be rewritten

y = XI'{ + e, E(e) = 0, Cov(e) = V, (3) where I'{ = [,u,1X1,· .. ,IXa,P1,,,,,PbJ'. If we let X2 be the columns of X corre­ sponding to P1' ... , Pb' then (cf. Section XlI)

V = u2[a/(a - 1)] [I - (l/a)X2X;]

= u2[a/(a - 1)] [I - Mil - Mp].

If model (3) were the appropriate model, checking that C(VX) c C(X) would be trivial based on the fact that C(X2 ) c C(X). However, we must account for the constraints on the model. In particular,

MpXI'{ = (tij], where tij = Pj - P. = Y.j - Y .. = s,j - s, .. Appendix G. Randomization Theory Models 367

This is a fixed known quantity. Proceding as in Section III.3, the model is subject to the constraint MpX,., = MpY. Because Mp Y = Xb for some band C(MMp) = C(Mp), the constrained model is equivalent to (Y - MpY) = (J - Mp)Xy + e. (4) We want to show that least squares estimates of contrasts in the a's based on Yare BLUE's with respect to this model. First we show that least squares estimates based on (Y - Mp Y) = (J - Mp)Yare BLUE's. We need to show that

C(V(I - Mp)X) = C[(I - Mp. - Mp)(J - Mp)X] c C[(J - Mp)X]. Because (I - Mp. - Mp)(J - Mp) = (I - Mp. - Mp), we have

C(V(I - Mp)X) = C[(I - Mp. - Mp)X], and because C(J - Mp. - Mp) c C(J - Mp) we have

C[(J - Mp. - Mp)X] c C[(J - Mp)X]. To finish the proof that least squares estimates based on Yare BLUE's, note that the estimation space for model (4) is C[(I - Mp)X] = C(Mp. + M,.). BLUE's are based on (Mp. + M,.)(J - Mp) Y = (Mp. + M,.) Y. Thus, any linear parametric function in model (3) that generates a constraint on C(Mp. + M,.) has a BLUE based on (Mp. + M,.) Y (cf. Exercise 3.9.5.). In particular, this is true for contrasts in the ex's. Standard errors for estimable functions are found in a manner analogous to Section Xll. This is true even though model (4) is not of the form considered in XI.1 and is a result of the orthogonality relationships that are present. The assumption that ut = ... = u; is a substantial one. Least squares estimates without this assumption are unbiased, but may be far from optimal. It is important when choosing blocks to choose them so their variances are approximately equal.

EXERCISE G.2 Find the standard error for a contrast in the I1.;'S of model (2). References

Aitchison, J. and Dunsmore, I. R. (1975). Statistical Prediction Analysis. Cambridge University Press, Cambridge. Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis, Second Edition. John Wiley and Sons, New York. Andrews, D. F. (1974). A Robust Method for Multiple Regression. Technometrics, 16, 523-531. Arnold, Steven F. (1981). The Theory of Linear Models and Multivariate Analysis. John Wiley and Sons, New York. Atkinson, A. C. (1985). Plots, Transformations, and Regression. University Press, Oxford. Atwood, C. L. and Ryan, T. A., Jr. (1977). A class of tests for lack of fit to a regression model. Unpublished manuscript. Bailey, D. W. (1953). The Inheritance of Maternal Influences on the Growth of the Rat. Ph. D. Thesis, University of California. Bartle, Robert G. (1964). The Elements of Real Analysis. John Wiley and Sons, New York. Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression Diagnostics. John Wiley and Sons, New York. Benedetti, Jacqueline K. and Brown, Morton B. (1978). Strategies for the Selection of Log-Linear Models. Biometrics, 34, 680-686. Bishop, Yvonne M. M., Fienberg, Stephen E., and Holland, Paul W. (1975). Discrete Multivariate Analysis: Theory and Practice. MIT Press, Cambridge. Blom, G. (1958). Statistical Estimates and Transformed Beta Variates. John Wiley and Sons, New York. Box, G. E. P. and Cox, D. R. (1964). An analysis oftransformations (with discussion). Journal of the Royal Statistical Society, Series B, 26, 211-246. Box, G. E. P., Hunter, W. G., and Hunter, J. S. (1978). Statistics for Experimenters. John Wiley and Sons, New York. Brownlee, K. A. (1965). Statistical Theory and Methodology in Science and Engineering, John Wiley and Sons, New York. Christensen, Ronald (1984). A Note on Ordinary Least Squares Methods for Two­ Stage Sampling. Journal of the American Statistical Association, 79, 720-721. References 369

Christensen, Ronald (1986). Methods for the Analysis of Cluster Sampling Models. Department of Mathematical Sciences, Montana State University, Technical Report # 7-22-86. Christensen, Ronald (1987). The Analysis of Two-Stage Sampling Data by Ordinary Least Squares. Journal of the American Statistical Association, to appear. Cochran, William G. and Cox, Gertrude M. (1957). Experimental Designs. John Wiley and Sons, New York. Cook, R. D. (1977). Detection of influential observations in linear regression. Techno­ metrics, 19, 15-18. Cook, R. Dennis and Weisberg, Sanford (1982). Residuals and Influence in Regression. Chapman and Hall, New York. Cox, D. R. (1958). Planning of Experiments. John Wiley and Sons, New York. Cox, D. R. and Hinkley, D. V. (1974). Theoretical Statistics. Chapman and Hall, London. Doob, J. L. (1953). Stochastic Processes. John Wiley and Sons, New York. Daniel, Cuthbert and Wood, Fred S. (1980). Fitting Equations to Data, Second Edition. John Wiley and Sons, New York. Draper, Norman and Smith, Harry (1981). Applied Regression Analysis, Second Edi­ tion. John Wiley and Sons, New York. Durbin, J. and Watson, G. S. (1951). Testing for serial correlation in least squares regression II. Biometrika, 38, 159-178. Eaton, Morris L. (1983). Multivariate Statistics: A Vector Space Approach. John Wiley and Sons, New York. Eaton, Morris L. (1985). The Gauss-Markov Theorem in Multivariate Analysis. In Multivariate Analysis-VI, edited by P. R. Krishnaiah. Elsevier Science Pub­ lishers B. V. Ferguson, Thomas S. (1967). Mathematical Statistics: A Decision Theoretic Approach. Academic Press, New York. Fienberg, Stephen E. (1980). The Analysis of Cross-Classified Categorical Data. MIT Press, Cambridge. Fisher, Sir Ronald A. (1935). The Design of Experiments. Hafner Press, New York. Fuller, Wayne A. (1976). Introduction to Statistical Time Series. John Wiley and Sons, New York. Furnival, G. M. and Wilson, R. W. (1974). Regression by leaps and bounds. Techno­ metrics, 16,499-511. Geisser, Seymour (1971). The Inferential Use of Predictive Distributions. In Founda­ tions of Statistical Inference, edited by V. P. Godambe and D. A. Sprott. Holt, Rinehart, and Winston, Toronto. Gnanadesikan, R. (1977). Methods for Statistical Data Analysis of Multivariate Obser­ vations. John Wiley and Sons, New York. Goldstein, M. and Smith, A. F. M. (1974). Ridge-type estimators for regression analysis. Journal of the Royal Statistical SOCiety, Series B, 26, 284-291. Gosslee, D. G. and Lucas, H. L. (1965). Analysis of variance of disproportionate data when interaction is present. Biometrics, 21, 115-133. Graybill, Franklin A. (1976). Theory and Application of the Linear Model. Duxbury Press, North Scituate, MA. Grizzle, J. E. Starmer, C. F., and Koch, G. G. (1969). Analysis of Categorical Data by Linear Models. Biometrics, 25, 489-504. Guttman, I. (1970). Statistical Tolerance Regions: Classical and Bayesian. Hafner Press, New York. Haberman, Shelby J. (1974). The Analysis of Frequency Data. The University of Chicago Press, Chicago. Hartigan, J. A. (1969). Linear Bayesian Methods. Journal of the Royal Statistical Society, Series B, 31, 446-454. 370 References

Henderson, C. R. (1953). Estimation of variance and covariance components. Bio­ metrics, 9, 226-252. Hinkley, David V. (1969). Inference about the Intersection in Two-phase Regression. Biometrika, 56 495-504. Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: biased estimation for non-orthogonal problems. Technometrics, 12, 55-67. Journel, A. G. and Huijbregts, Ch. 1. (1978). Mining Geostatistics. Academic Press, New York. Kempthorne, Oscar (1952). Design and Analysis of Experiments. Krieger, Huntington. Lehmann, E. L. (1959). Testing Statistical Hypotheses. John Wiley and Sons, New York. Marquardt, D. W. (1970). Generalized inverses, ridge regression, biased linear estima- tion, and nonlinear estimation. Technometrics, 12, 591-612. Miller, Rupert G. Jr. (1981). Simultaneous Statistical Inference, Second Edition. Springer-Verlag, New York. Morrison, Donald F. (1976). Multivariate Statistical Methods. McGraw-Hill, New York. Mosteller, Frederick and Tukey, John W. (1977). Data Analysis and Regression. Addison-Wesley, Reading. Myers, Raymond H. (1976). Response Surface Methodology. Published privately. Neill, J. W. and Johnson, D. E. (1984). Testing for Lack of Fit in Regression-A review. Communications in Statistics-Theory and Methods, 13,485-511. Neter, John, Wasserman, William, and Kutner, Michael H. (1985). Applied Linear Statistical Models. Richard D. Irwin Inc., Homewood. Puri, Madan Lal and Sen, Pranab Kumar (1971). Nonparametric Methods in Multi­ variate Analysis. John Wiley and Sons, New York. Rao, C. R. (1971). Estimation of variance and covariance components-MINQUE theory. Journal of Multivariate Analysis, 1,257-275. Rao, C. Radhakrishna (1973). Linear Statistical Inference and Its Applications. John Wiley and Sons, New York. Schatzoff, M., Tsao, R., and Fienberg, S. (1968). Efficient calculations of all possible regressions. Technometrics, 10, 768-779. Scheffe, Henry (1959). The Analysis of Variance. John Wiley and Sons, New York. Searle, S. R. (1970). Large sample variances of maximum likelihood estimators of variance components. Biometrics, 26, 505-524. Searle, S. R. (1971). Linear Models. John Wiley and Sons, New York. Searle, S. R. (1979). Notes On Variance Component Estimation: A Detailed Account of Maximum Likelihood and Kindred Methodology. Biometrics Unit, Cornell Univer­ sity, Ithaca. Searle, S. R. and Pukelsheim, F. (1987). Estimation of the Mean Vector in Linear Models. Biometrics Unit, Cornell University, Technical Report BU-912-M. Seber, G. A. F. (1966). The Linear Hypothesis: A General Theory. Griffin, London. Seber, G. A. F. (1977). Linear Regression Analysis. John Wiley and Sons, New York. Shapiro, S. S. and Francia, R. S. (1972). An approximate analysis of variance test for normality. Journal of the American Statistical Association, 67, 215-216. Shapiro, S. S. and Wilk, M. B. (1965). An analysis of variance test for normality (complete samples). Biometrika, 52,591-611. Smith, Adrian F. M. (1986). Comment on an Article by B. Efron. The American Statistician, 40, 10. Snedecor, George W. and Cochran, William G. (1980). Statistical Methods. Iowa State University Press, Ames. Sulzberger, P. H. (1953). The effects of temperature on the strength of wood, plywood and glued joints. Aeronautical Research Consultative Committee, Australia, De­ partment of Supply, Report ACA-46. References 371

Vtts, J. (1982). The rainbow test for lack of fit in regression. Communications in Statistics-Theory and Methods, 11, 2801-2815. Weisberg, S. (1974). An empirical comparison ofthe cumulative distributions of Wand W'. Biometrika, 61,644-646. Weisberg, S. (1985). Applied Linear Regression, Second Edition. John Wiley and Sons, New York. Wermuth, Nanny (1976). Model search among multiplicative models. Biometrics, 32, 253-264. Williams, E. J. (1959). Regression Analysis, John Wiley and Sons, New York. Author Index

Aitchison, J., 27, III Draper, N., 86, 89, 245, 281 Anderson, T. W., 94 Dunsmore, I. R., 27, 111 Andrews, D. F., 281 Durbin, J., 263 Arnold, S. F., 14 Atkinson, A. c., 279 Atwood, C. L., 105 Eaton, M. L., 14, 224, 227

Bailey, D. W., 138 Ferguson, T. S., 184 Bartle, R. G., 321, 322 Fienberg, S. E., 141,283,302 Belsley, D. A., 293 Fisher, R. A., 217 Benedetti, J. K., 148 Francia, R. S., 259 Bishop, Y. M., 302 Fuller, W. A., 94 Blom, G., 258 Furnival, G. M., 283 Box, G. E. P., 111,279 Brownlee, K. A., 281 Brown, M. B., 148 Geisser, S., 93 Gnanadesikan, R., 106 Goldstein, M., 300 Christensen, R., 202 Gosslee, D. G., 360 Cochran, W. G., 78, 109, 152, 160, 176, Graybill, F. A., 14 280, 357, 360 Grizzle, J. E., 279, 280, 318 Cook, R. D., 247, 278, 279 Guttman, I., 27 Cox, D. R., 20,160,279 Cox, G. M., 152, 176 Haberman, S. J., 315 Hartigan, J. A., 94 Daniel, c., 86, 281 Henderson, C. R., 240 Doob, J. L., 94 Hinkley, D. V., 20, 111 374 Author Index

Hoerl, A. E., 298 Rao, C. R., 239, 280, 309 Holland, P. W., 302 Ryan, T. A., Jr., 105 Huijbregts, C. J., 227 Hunter, J. S., III Hunter, W. G., III SchatzofT, M., 283 SchefTe, H., 14, 138, 143 Searle, S. R., 14,200,223,231,240 Johnson, D. E., 106 Seber, G. A. F., 14 Joumel, A. G., 227 Sen, P. K., 361 Shapiro, S. S., 258, 259 Smith, A. F. M., 93, 300 Kempthome, 0.,152,361 Smith, H., 86, 89, 245, 281 Kennard, R. W., 298 Snedecor, G. W., 78, 109, 160,280,357, Koch, G. G., 279, 280, 318 360 Kuh, E., 293 Starmer, C. F., 279, 280, 318 Kutner, M. H., 78, 270 Sulzberger, P. H., 177

Lehmann, E. L., 20 Tsao, R., 283 Lucas, H. L., 360 Tukey, J. W., 301

Marquardt, D. W., 298 Miller, R. G., Jr., 71 Vtts, J., 105,278 Morrison, D. F., 95, 298 Mosteller, F., 301 Myers, R. H., III Wasserman, W., 78, 270 Watson, G. S., 263 Weisberg, S., 86,247,259,279 Neill, J. W., 106 Welsch, R. E., 293 Neter, J., 78, 270 Wermuth, N., 148 Wilk, M. B., 258 Williams, E. J., 177 Pukelsheim, F., 200 Wilson, R. W., 283 Puri, M. L., 361 Wood, F. S., 86, 281 Subject Index

Added variable plot, 271, 273 without interaction, 35-38, Adjusted R2, 284 42-43,78-79,113-119 All possible regressions, 283 proportional numbers, 132-133 Analysis of covariance, 3, 160-178 unequal numbers, 133-141 for analyzing balanced incomplete block designs, 169-176 estimation, 161-163,165,166 Backwards elimination, 288, 289 for missing observations, 166-169 Balanced analysis of variance, see testing, 163-165 Analysis of variance Analysis of means Balanced incomplete block designs, analysis of unweighted means, 169-176 359-360 Basis, 325 analysis of weighted means, 359 Best linear predictor (BLP), 95-97, 112 proportional numbers, 357-358 Best linear unbiased estimate (BLUE) Analysis of variance general Gauss-Markov models, approximate methods for unequal 180-184 numbers, 357-360 standard models, 18-19 balanced incomplete block designs, weighted least squares models, 23 169-176 Best linear unbiased predictor (BLUP), one-way, 2,30,32,47,49-51,57-68, 225-228 74, 107-110 Best subset regression, 283-286, multiway, 148 289-290 three-way Bonferroni inequality, 77 balanced,141-143 Bonferroni method of multiple com­ unequal numbers, 143-148 parisons, 76-77,81-83,127,277 two-way balanced

with interaction, 2, 120-128, Cp ,285-286,289 220-222 Calibration, 110 376 Subject Index

Canonical fonn, regression in, 295-297 Covariance, analysis of, see Analysis of Categorical data, 302-323 covariance Cauchy-Schwartz inequality, 97 Covariance matrix, 3 Cell means model, 121, 137-138 Centering data, 292 Change of scale, 292-293 Degrees of freedom, 345 Characteristic function, 5, 351 Design matrix, I Characteristic roots, see Eigenvalues Design space, see Estimation Space Chi-square distribution, 345 Detennination, coefficient of, 97-98 Choosing the best regression subset, Diagonal matrix 283-286,289-290 notation, 303, 329-330 Classification models, see Analysis of Distributions variance chi-square, 345 Coefficient of detennination, 97-98 F,346 Coefficient of partial detennination, multivariate nonnal, 4-7 99-101 t,345 Collinearity,290-300 Duncan's multiple range test, 80-83 Column space, 325 Durbin-Watson test, 263-264 Completely randomized designs, 152, 363-364 Completeness, 20 Eigenvalues, 331 Concomitant variable, see Analysis of Eigenvectors, 331 covariance Error degrees of freedom, 22 Condition number, 293-295 Error mean square, 18 Confidence bands, 87 Error rate, 81-83 Confidence regions Error space, 30 ellipsoids, 52 Estimable constraints, 39 intervals, 22, 25, 353-354 Estimable functions, 15-16, 307 simultaneous, 74-75 Estimation Constraint best linear unbiased (BLUE), 18-19, estimable, 39 23, 180-184 of an estimable hypothesis, 41 consistent linear unbiased (CLUE), on the model, 31, 39 185-189, 194-200 nonestimable, 62, 68, 122 with estimable constraints, 56 Contrasts general Gauss-Markov Models, in one-way ANOV A, 65-67 180-184 orthogonal, 67, 109, 125, 158, generalized least squares (GLS), 174-176 180-184, see also weighted least polynomial, 109-110, 129, 131, 158, squares 174-176 Henderson's Method 3, 240-243 in two-way ANOV A least squares, 16-18 with interaction, 122-126 logistic regression, 318-319 without interaction, 119 loglinear models, 303-307, 310-311, Cook's distance, 278-279 316-318,320-321 Correlation maximum likelihood (MLE), 19-20 multiple, 97-98 minimum nonn quadratic unbiased partial, 99-101 (MINQUE),237-239 serial, 261-264 minimum variance quadratic Subject Index 377

unbiased (MIVQUE), 239 264-271 minimum variance unbiased, 20-21 Honest significant difference (HSD), 77 ordinary least squares (OLS), see Hypotheses, see Tests of hypotheses least squares restricted maximum likelihood (REML), 235-237 Idempotent matrix, 340 simple least squares, see least squares III-conditioned design matrix, 291-295 unbiased for variance, 17,21 III-defined, 291 uniformly minimum variance Independence, 350 unbiased (UMVU), see Inner product, 340 minimum variance unbiased Interaction variance, unbiased, 17,21 balanced incomplete block designs, weighted least squares, 22-25 175-176 Estimation space, 30 factorial treatment structure, 156-159 Expectation, 348 split plot designs, 214-217 random vectors and matrices, 3 three-way ANOVA, 143-148 quadratic forms, 7 two-way ANOVA, 120-128, 133-140 Expected mean squares, 63 Interval estimation, 22, 25, 353-354 Exploratory data analysis (EDA), 127, Invariance, see Translation invariance 290 Kriging, 225, 227, see also Best linear unbiased prediction F distribution, 346 Kronecker product, 123-124,330 doubly noncentral, 104 Factorial treatments, 156-159, 169-170, 175-176 Lack of fit Fieller's method, III residual analysis, 271-273 Fixed effects, 1,223-225 tests, 10 1-106 Forward selection, 286-287 Latin square designs, 153-156 Full model, 32 Least significant difference (LSD). 75-76,81-83 Least squares Gauss-Markov theorem, 18 ordinary. 16-18 General Gauss-Markov models, weighted, 22-25 179-200 Length. 340 Generalized inverse, 336-338 Leverage. 247-257 Generalized inverse regression, 298 Likelihood ratio test, see generalized Generalized least squares (GLS), see likelihood ratio test General Gauss-Markov models Linear estimable function, 15-16 Generalized likelihood ratio test, 38, 55 Linear independence. 325 Generalized split plot models, 206-214 Logistic regression, 44,318-319 Graeco-Latin square, 153 Log-linear models, 44,302-323 Gram-Schmidt orthogonalization, 37, 325 Mahalanobis distance, 248-250

Mallows Cp , 285-286, 289 Henderson's Method 3, 240-243 Matrix Heteroscedasticity of variances, design, I 378 Subject Index

diagonal, 303, 329-330 Multiple correlation coefficient, 97-98 orthogonal, 332 Multivariate distributions, 348-351 projection Multivariate normal, 4-7 oblique, 340 perpendicular, 334 Maximum likelihood Nesting, 242-243 log-linear models, 302-323 Newman-Kuels' multiple range test, mixed models, 230-235 79-83 restricted, 235-237 Newton-Raphson method, 316 singular normal distributions, Noncentrality parameter, 33, 40, 234-235 345-346 standard models, 19-20 Nonestimable constraints, 62, 68, 122 weighted least squares models, 23 Nonsingular covariance matrix, see Mean squared error (MSE), 18,94 Weighted least squares model MINQUE,237-239 Nonsingular distribution, 3, 5 Missing data, 166-169 Normal distribution, 4-7 MIVQUE,239 Normal equations, 25-26, 92 Mixed models, 223-225 Normal probability plot, 257-258 Models Normality, test for, 258-261 analysis of covariance, 3, 160-166 Null space, 330 analysis of variance multifactor, 148 one-way, 57-68 Oblique projection, 340 three-way, 141-148 One-way ANOV A, see Analysis of two-way, 113-140 variance, one-way balanced incomplete block design, Optimal allocation of x values, 112 169-176 Ordinary least squares, see Least cell means, 121, 137-138 squares cluster sampling, 202-206 Orthogonal complement, 326 completely randomized design Orthogonal constraints, 41, 47, 51 (CRD), 152,363-364 Orthogonal contrasts, 67 with estimable constraints, 56 Orthogonality, 325 full, 32 Orthogonal matrix, 332 general Gauss-Markov, 179-200 Orthogonal polynomials, 107-110, generalized split plot, 206-214 129-132 Graeco-Latin square design, 156 Orthogonal projection, 334-340 Latin square design, 153-156 Orthonormal basis, 325 logistic regression, 44, 318-319 Outliers log-linear, 44, 302-323 in the dependent variable, 276-278 mixed,223-225 in the design space, 247-257 randomization theory, 361-367 randomized complete block design (RCB), 152-153,206,365-367 Pairwise comparisons of means, 77-83 reduced, 31-32 Partial correlation, 99-10 I regression, 85-110 Partial determination, coefficient of, split plot design, 214-217 99-101 subsampling,218-220 Partitioned matrices, 330, 344 weighted least squares, 22-25, 53-55 Partitioned model, see Analysis of Multicollinearity, see Collinearity covariance Subject Index 379

Perpendicularity, 325 Residual mean square (RMS), 283 Perpendicular projection, 334-340 Residual sum of squares (RSS), 283 Polynomial regression, 107-11 0, Residuals 129-132 plots Positive definite, 333 heteroscedasticity, 265-271 Power, 33, 70, 81-83, 346 lack of fit, 271-273 Predicted residuals, 273-275 normality, 258 Predicted residual sum of squares serial correlation, 261-263 (PRESS), 273-275 predicted, 253, 273-275 Prediction, 93-97 standardized, 245 best linear prediction (BLP), 95-97, standardized predicted residuals, 274, 112 277 best linear unbiased prediction Studentized, see standardized (BLUP),225-228 Response surface, 111 prediction interval, 26-27 Restricted maxirnum likelihood, Principle component regression, 235-237 297-298 Restrictions on the model, 31, 39 Projections Ridge regression, 298-300 oblique, 340 Robust regression, 257 perpendicular, 334-340 Proportional nurnbers, 132-133, 141 Pure error, 101-106 Scaling the design matrix, 292-293 Scheffe's method of multiple comparisons, 71-75, 81-83, Quadratic forms, 7-11 87 distribution, 8, 9 Serial correlation, 261-264 expectation, 7 Side conditions independence, 10-11 estimable, 56 variance, 240 nonestimable, 62, 68, 122, 321 Quantitative factors, 174-176 Simple least squares, 16-18 Simultaneous inference, 70-83 Singular covariance matrix, see General R2,97-98 Gauss-Markov model Random effects, 223-235 Singular distribution, 3 Randomization theory, 151,361-367 Singular normal distribution, 234-235 Randomized complete block designs, Singular value, see Eigenvalue 152-153,206,365-367 Singular value decomposition, 295 , 3 Split plot designs, 200, 214-217 Random vector, 3 Stepwise regression, 286-289 Range space, see Column space Studentized range, 78-79 Rank,325 Subsampling, 218-220 Reduced model, 31-32 Subspace, 324 Reduction in sums of squares, 36 Sum of squared errors (SSE), 18 Regression analysis, 85-110 Sum of squares for regression (SSReg), multiple regression, 3, 32,46, 87-92 89 simple linear regression, 1,30,86-87 Sweep operator, 165 Regression in canonical form, 295-297 REML, 235-237 Reparameterization, 29-31 t distribution, 345 380 Subject Index

Tests of hypotheses power, 279-280 Durbin-Watson test, 263-264 variance stabilizing, 280 generalized likelihood ratio, 38, 55 Translation invariance, 237 lack of fit, 101-106, 119 Tukey's method for pairwise models comparisons, 77-79, 81-83 cluster sampling, 205 Two-phase linear regression, III general Gauss-Markov, 189-194 Two-way ANOVA, see Analysis of generalized split plot, 212-2l3 variance, two-way logistic regression, 320 Type I errors, 81-83, 355 log-linear, 307-308, 3l3-315 standard, 31-38 weighted least squares, 53-54 Unbalanced data, l33-141, 143-148 multiple comparisons, 70-83 Unequal numbers, l33-141, 143-148 normality, 258-261 Unweighted means analysis, 359-360 one-parameter, 354-356 Updating formula, 273-276 parametric functions, "Usual" constraints, 62, 122, 321 cluster sampling models, 204-205 generalized split plot models, 208-211 Variance estimation standard models, 38-44 general Gauss-Markov models, 192 weighted least squares models, standard models, 17, 19,21 54-55 variance components, 230-234, serial correlation, 263-264 235-243 single degrees of freedom, 45-46 weighted least squares models, 25 variances, 55 Vector space, 324 Wilk-Shapiro test, 258-261 Test space, 48 Three-way ANOV A, see Analysis of Weighted least squares models variance, three-way estimation, 22-25 Tolerance, 26-27, 287 testing, 53-55 Transformations Weighted means analysis, 359 Box-Cox, 279-280 Wilk-Shapiro test, 258-261 Grizzle, Starmer, Koch, 280 Working-Hotelling confidence band, 87