Linear Algebra, Matrix Theory and Dist. | Shalabh, IIT Kanpur 1 Orthogonal Vectors: Two Vectors X and Y Are Said to Be Orthogonal If XY''0 YX

Chapter 1 Some Results on Linear Algebra, Matrix Theory and Distributions

We need some basic knowledge to understand the topics in the analysis of variance.

Vectors: A vector Y is an ordered n-tuple of real numbers. A vector can be expressed as a row vector or a column vector as

y1  y Y  2   yn is a column vector of order n1 and

Yyyy' (12 , ,...,n ) is a row vector of order 1n .

If all yi  0 for all i = 1,2,…,n then Y ' (0,0,...,0) is called the null vector.

x111 yz     x yz XYZ222,,          xnnn yz  then

x12 yky  1   x  yky XY22, kY   2     xnn yky  n

X ()()YZ XY  Z XY'( Z ) XY '  XZ ' kXY(')( kXY )' X '() kY kX() Y kXkY 

X '...Yxyxy11 2 2  xynn where k is a scalar.

Analysis of Variance | Chapter 1 | Linear Algebra, Matrix Theory and Dist. | Shalabh, IIT Kanpur 1 Orthogonal vectors: Two vectors X and Y are said to be orthogonal if XY''0 YX .

The null vector is orthogonal to every vector X and is the only such vector.

Linear combination:

If x12,xx ,..., m are m vectors and kk12, ,..., km are m scalars, then

m tkx  ii i1 is called the linear combination of x12,xx ,..., m .

Linear independence

If XX12, ,..., Xm are m vectors then they are said to be linearly independent if there exist scalars kk12, ,..., km such that

m  kXii00 k i  for all i = 1,2,…,m. i1

m If there exist kk12, ,..., km with at least one ki to be nonzero, such that  kxii 0 then i1 x12,xx ,..., m are said to be linearly dependent.  Any set of vectors containing the null vector is linearly dependent.  Any set of non-null pair-wise orthogonal vectors is linearly independent.  If m > 1 vectors are linearly dependent, it is always possible to express at least one of them as a linear combination of the others.

Linear function:

Let K  (kk12 , ,..., km )' be a m 1 vector of scalars and Xxxx (12 , ,...,m ) be a m  1 vector of variables,

m then K 'Yky  ii is called a linear function or linear form. The vector K is called the coefficient i1 vector. For example, the mean of x12,xx ,..., m can be expressed as

x1  11m x 1 2 ' x  xXim(1,1,...,1)  1 mmi1  m  xm

' where 1m is a m 1 vector of all elements unity. Analysis of Variance | Chapter 1 | Linear Algebra, Matrix Theory and Dist. | Shalabh, IIT Kanpur 2 Contrast:

m m The linear function K ' Xkx  ii is called a contrast in x12,xx ,..., m if  ki  0. i1 i1 For example, the linear functions x x xxx,2 3 xx , 1 x 3 121 2323 2 are contrasts.

m  A linear function KX' is a contrast if and only if it is orthogonal to a linear function  xi or to i 1 m the linear function x   x.i . m i1

 Contrasts x1213xx, x ,..., x 1  xj are linearly independent for all j  2,3,...,m .

 Every contrast in x12,xx ,..., n can be written as a linear combination of (m - 1) contrasts

x1213xx, x ,..., x 1  xm .

Matrix: A matrix is a rectangular array of real numbers. For example

aa11 12... a 1n  aa... a 21 22 2n   aamm12... a mn is a matrix of order mn with m rows and n columns.  If m = n, then A is called a square matrix.

 If aijmnij 0, ,  , then A is a diagonal matrix and is denoted as Ag  dia (aa , ,..., a ). 11 22 mm

 If m = n (square matrix) and aij  0 for i > j , then A is called an upper triangular matrix. On

the other hand if m = n and aij  0 for i < j then A is called a lower triangular matrix.

 If A is a mn matrix, then the matrix obtained by writing the rows of A and

columns of A as columns of A and rows of A respectively is called the transpose of a matrix A and is denoted as A' .  If A  A' then A is a symmetric matrix.  If A A' then A is skew-symmetric matrix.  A matrix whose all elements are equal to zero is called a null matrix.

Analysis of Variance | Chapter 1 | Linear Algebra, Matrix Theory and Dist. | Shalabh, IIT Kanpur 3  An identity matrix is a square matrix of order p whose diagonal elements are unity (ones) and all

the off diagonal elements are zero. It is denoted as I p .

 If A and B are matrices of order mn then ()'''.A  BAB  If A and B are the matrices of order m x n and n x p respectively and k is any scalar, then ()'''AB B A

()kA B A () kB k ( AB ) kAB .

 If the orders of matrices A is m x n, B is n x p and C is n x p then A()BC ABAC  .

 If the orders of matrices A is m x n, B is n x p and C is p x q then ()ABC ABC ().

 If A is the matrix of order m x n then IAmn AI A..

Trace of a matrix: The trace of nn matrix A , denoted as tr(A) or trace(A) is defined to be the sum of all the diagonal

n elements of A, i.e. tr() A  aii . i1

 If A is of order mn and B is of order nm , then tr() AB tr () BA .

 If A is nn matrix and P is any nonsingular nn matrix then tr() A tr ( P1 AP ). If P is an orthogonal matrix than tr() A tr (' P AP ).

* If A and B are nn matrices, aband are scalars then tr()()() aA bB atr A  btr B .

 If A is an mn matrix, then

nn 2 tr(') A A tr ( AA ')  aij ji11 and tr(') A A tr ( AA ')0 if and only if A  0.

 If A is nn matrix then tr(') A trA.

Analysis of Variance | Chapter 1 | Linear Algebra, Matrix Theory and Dist. | Shalabh, IIT Kanpur 4 Rank of matrices The rank of a matrix A of mn is the number of linearly independent rows in A. Let B be any other matrix of order nq .  A square matrix of order m is called non-singular if it has full rank.  rank()min((),()) AB rank A rank B

 rank( A B ) rank () A  rank () B  Rank A is equal to the maximum order of all nonsingular square sub-matrices of A.

 rank(') AA rank (') A A rank () A rank (') A .  A is of full row rank if rank(A) = m < n.  A is of full column rank if rank(A) = n < m.

Inverse of a matrix The inverse of a square matrix A of order m, is a square matrix of order m, denoted as A1 , such that

11 A AAAIm. The inverse of A exists if and only if A is non-singular.  ()A11 A .

 If A is non-singular, then (')(AA11 )'  If A and B are non-singular matrices of the same order, then their product, if defined, is also nonsingular and ()ABBA111 .

Idempotent matrix: A square matrix A is called idempotent if A2  AA A..

If A is an nn idempotent matrix with rank(A) = r  n. Then

 eigenvalues of A are 1 or 0.  trace A  rank  A r.

 If A is of full rank n, then A  In .  If A and B are idempotent and AB = BA, then AB is also idempotent.  If A is idempotent then (I – A) is also idempotent and A(I - A) = (I - A)A = 0.

Analysis of Variance | Chapter 1 | Linear Algebra, Matrix Theory and Dist. | Shalabh, IIT Kanpur 5 Quadratic forms: If A is a given matrix of order mn and X and Y are two given vectors of order m1and n1 respectively

mn XAY'   axyij i j ij11 where aij are the nonstochastic elements of A.

If A is a square matrix of order m and X = Y , then

22 XAXax'11 1 .... amm x m ( a 12 a 21 ) xx 1 2  ... ( a m 1, m a m , m 1 ) x m 1 x m .

If A is symmetric also, then

22 XAXax'11 1 .... amm x m 2 axx 12 1 2  ..... 2 a m 1, m x m 1 x m mn =axxij i j ij11 is called a quadratic form in m variables x12,xx ,..., m or a quadratic form in X.  To every quadratic form corresponds a symmetric matrix and vice versa.  The matrix A is called the matrix of the quadratic form.  The quadratic form X ' AX and the matrix A of the form is called.  Positive definite if XAX ' 0 for all x  0 .  Positive semidefinite if XAX'0 for all x  0 .  Negative definite if XAX ' 0 for all x  0 .  Negative semidefinite if XAX ' 0 for all x  0 .

 If A is positive semi-definite matrix then aii  0 and if aii  0 then aij  0 for all j, and

a ji  0 for all j.  If P is any nonsingular matrix and A is any positive definite matrix (or positive semi-definite matrix) then PAP' is also a positive definite matrix (or positive semi-definite matrix).  A matrix A is positive definite if and only if there exists a non-singular matrix P such that APP '.  A positive definite matrix is a nonsingular matrix.

 If A is mn matrix and rank A m n then AA' is positive definite and A' A is positive

semidefinite.  If A mn matrix and rank A k m n , then both A' Aand AA' are positive semidefinite.

Analysis of Variance | Chapter 1 | Linear Algebra, Matrix Theory and Dist. | Shalabh, IIT Kanpur 6 Simultaneous linear equations

The set of m linear equations in n unknowns x12,xx ,..., n and scalars aij and bi , imjn1,2,..., , 1,2,..., of the form

ax11 1 ax 12 2.... ax 1nn b 1 ax ax.... ax b 21 1 22 2 2nn 2 

axmm11 ax 2 2 .... ax mnnm b can be formulated as AX = b where A is a real matrix of known scalars of order mn called as a coefficient matrix, X is n 1 real vector and b is n1 real vector of known scalars given by

aa11 12... a 1n  aa... a Amn21 22 2n , is an real matrix called as coefficient matrix,   aamm12... a mn

xb11    xb Xn22is an 1 vector of variables and bm   , is an  1 real vector.     xbnm   If A is a nn nonsingular matrix, then AX = b has a unique solution.

 Let B = [A, b] is an augmented matrix. A solution to AX = b exist if and only if rank(A) = rank(B).  If A is a mn matrix of rank m , then AX b has a solution.  Linear homogeneous system AX = 0 has a solution other than X = 0 if and only if rank(A) < n.  If AX = b is consistent then AX = b has a unique solution if and only if rank(A) = n

th  If aii is the i diagonal element of an orthogonal matrix, then 11aii .

 Let the nn matrix be partitioned as A [aa12 , ,..., an ]where ai is an n 1 vector of the elements of ith column of A. A necessary and sufficient condition that A is an orthogonal matrix is given by the following:

' (iaa ) ii 1 for i 1,2,..., n ' ()ii aij a 0 for i j 1,2,...,. n

Analysis of Variance | Chapter 1 | Linear Algebra, Matrix Theory and Dist. | Shalabh, IIT Kanpur 7

Orthogonal matrix A square matrix A is called an orthogonal matrix if A''AAAI  or equivalently if A1  A'.

 An orthogonal matrix is non-singular.  If A is orthogonal, then AA' is also orthogonal.

 If A is an nn matrix and let P is an nn orthogonal matrix, then the determinants of A and PAP' are the same.

Random vectors:

Let YY12, ,..., Yn be n random variables then YYYY (12 , ,...,n )' is called a random vector.  The mean vector Y is

EY( ) (( EY (12 ), EY ( ),..., EY (n ))'  The covariance matrix or dispersion matrix of Y is

Var( Y1121 ) Cov ( Y , Y ) ... Cov ( Y , Yn )  Cov( Y , Y ) Var ( Y ) ... Cov ( Y , Y ) Var() Y  21 2 2n    Cov( Ynn , Y12 ) Cov ( Y , Y ) ... Var ( Y n ) which is a symmetric matrix

 If YY12, ,..., Yn are independently distributed, then the covariance matrix is a diagonal matrix.

2 2  If Var() Yi   for all i = 1, 2,…,n then Var() Y  In .

Linear function of random variable :

n If YY12, ,..., Yn are n random variables, and kk12,,.., kn are scalars, then  kYii is called a linear function i1 of random variables YY12, ,..., Yn .

n If YYYY(12 , ,...,nn )', Kkkk ( 12 , ,..., )' then K 'YkY  ii. i1

n  the mean KY' is E(')KY KEY '() kEYii () and i1  the variance of KY' is Var K' Y  K' Var Y  K .

Analysis of Variance | Chapter 1 | Linear Algebra, Matrix Theory and Dist. | Shalabh, IIT Kanpur 8

Multivariate normal distribution

A random vector YYYY (12 , ,...,n )' has a multivariate normal distribution with mean vector

  (,12 ,...,) n and dispersion matrix  if its probability density function is

11 fY(/,)  exp  ( Y  )'( 1 Y  ) n/2 1/2  (2 )  2 assuming  is a nonsingular matrix.

Chi-square distribution

 If YY12, ,..., Yk are independently distributed following the normal distribution random variables

k 2 2 with common mean 0 and common variance 1, then the distribution of Yi is called the  - i1 distribution with k degrees of freedom.  The probability density function of  2 -distribution with k degrees of freedom is given as

k 1 1 x fx() x2 exp ; 0 x  2 k/2  (/2)2k  2

 If YY12, ,..., Yk are independently distributed following the normal distribution with common mean 1 k 0 and common variance  2 then Y 2 has  2 distribution with k degrees of freedom. , 2  i  i1

 If the random variables YY12, ,..., Yk are normally distributed with non-null means 12, ,..., k but

k 2 2 common variance 1, then the distribution of Yi has noncentral  distribution with k i1

k 2 degrees of freedom and noncentrality parameter    i . i1

 If YY12, ,..., Yk are independently and normally distributed following the normal distribution with 1 k means  , ,..., but common variance  2 then Y 2 has non-central  2 distribution 12 k 2  i  i1 1 k with k degrees of freedom and noncentrality parameter 2 .   2  i  i1  If U has a Chi-square distribution with k degrees of freedom then E()Uk and Var() U 2 k .  If U has a noncentral Chi-square distribution with k degrees of freedom and

Analysis of Variance | Chapter 1 | Linear Algebra, Matrix Theory and Dist. | Shalabh, IIT Kanpur 9 noncentrality parameter  then EU   k  and Var() U 2 k  4 .

 If UU12, ,..., Uk are independently distributed random variables with each Ui having a

noncentral Chi-square distribution with ni degrees of freedom and non-centrality parameter

k k i ,ik 1,2,..., then Ui has noncentral Chi-square distribution with  ni degrees of i1 i1

k freedom and noncentrality parameter  i . i1

 Let XXXX (12 , ,...,n )' has a multivariate distribution with mean vector  and positive

definite covariance matrix  . Then X ' AX is distributed as noncentral  2 with k degrees of freedom if and only if A is an idempotent matrix of rank k.

 Let XXXX (,12 ,...,)n has a multivariate normal distribution with mean vector  and positive definite covariance matrix  . Let the two quadratic forms-

2  XAX' 1 is distributed as  with n1 degrees of freedom and noncentrality parameter

 ' A1 and

2  XAX' 2 is distributed as  with 2 degrees of freedom and noncentrality parameter

 ' A2 .

Then XAX'and'12 XAX are independently distributed if AA12  0 t-distribution  If  X has a normal distribution with mean 0 and variance 1,  Y has a  2 distribution with n degrees of freedom, and  X and Y are independent random variables, X then the distribution of the statistic T  is called the t-distribution with n degrees of Yn/ freedom. The probability density function of t is

n 1 n1  2  2 t 2 ftT ( ) 1 ; - t n n n 2 X  If the mean of X is nonzero then the distribution of is called the noncentral t- Yn/ distribution with n degrees of freedom and noncentrality parameter  . Analysis of Variance | Chapter 1 | Linear Algebra, Matrix Theory and Dist. | Shalabh, IIT Kanpur 10

F-distribution  If X and Y are independent random variables with  2 -distribution with m and n degrees of X / m freedom respectively, then the distribution of the statistic F  is called the F-distribution Yn/ with m and n degrees of freedom. The probability density function of F is

m/2 mn m mn  m2   2 2 n 2 m ffF ( ) f 1 f ; 0 f mn n  22  If X has a noncentral Chi-square distribution with m degrees of freedom and noncentrality parameter ;Y has a  2 distribution with n degrees of freedom, and X and Y are independent X / m random variables, then the distribution of F  is the noncentral F distribution with m and Yn/ n degrees of freedom and noncentrality parameter  .

Linear model: Suppose there are n observations. In the linear model, we assume that these observations are the values taken by n random variables YY12,,.., Yn satisfying the following conditions:

1. EY()i is a linear combination of p unknown parameters

12, ,...,p ,

EY(ii ) x11 x i 2 2  ... x ipp  , i 1,2,..., n

where xij 's are known constants.

2 2. YY12, ,..., Yn are uncorrelated and normality distributed with variance Var() Yi   . The linear model can be rewritten by introducing independent normal random variables following N(0, 2 ) ,as

Yxii11 x i 2 2 .... x ippi , i 1,2,..., n .

These equations can be written using the matrix notations as YX  where Y is a n 1 vector of observation, X is a np matrix of n observations on each of X12,XX ,..., p variables,  is a p1vector of parameters and  is a n 1vector of random error components with

Analysis of Variance | Chapter 1 | Linear Algebra, Matrix Theory and Dist. | Shalabh, IIT Kanpur 11 2 ~(0,).NIn Here Y is called study or dependent. variable, X12,XX ,..., p are called explanatory or independent variables and 12, ,..., p are called as regression coefficients.

Alternatively since YNX~( ,2 I ) so the linear model can also be expressed in the expectation form as a normal random variable Y with Ey() X

Var() y  2 I .

Note that  and  2 are unknown but X is known.

Estimable functions: A linear parametric function ' of the parameter is said to be an estimable parametric function or estimable if there exists a linear function of random variables ' y of Y where YYYY (12 , ,...,n )' such that Ey(')   ' with  (12 , ,...,n )' and   (12 , ,..., n )' being vectors of known scalars.

Best linear unbiased estimates (BLUE) The unbiased minimum variance linear estimate 'Y of an estimable function  ' is called the best linear unbiased estimate of  ' .

' ' ''  Suppose 1Y and  2Y are the BLUE of 12and respectively. Then ()'aaY11 2 2 is

the BLUE of ()'aa11  2 2 .

 If  ' is estimable, its best estimate is  'ˆ where ˆ is any solution of the equations X ''.XXY 

Least squares estimation The least-squares estimate of  is YX  is the value of  which minimizes the error sum of squares  ' .

Let SYXYX '( )'() 

YY' 2'' XY  '' X X .

Minimizing S with respect to  involves

Analysis of Variance | Chapter 1 | Linear Algebra, Matrix Theory and Dist. | Shalabh, IIT Kanpur 12 S  0  X ''XXY which is termed as normal equation. This normal equation has a unique solution given by ˆ  (')X XXY1 '

2S assuming rank X  p . Note that  XX' is a positive definite matrix. So ˆ  (')X XXY1 ' ' is the value of  which minimizes  ' and is termed as ordinary least squares estimator of  .

 In this case, 12, ,..., p are estimable and consequently, all the linear parametric function are estimable.  E()(')'()(')'ˆ  XX11 XEY XX XX

 Var()ˆ  ( X ' X )1121 X ' Var () Y X ( X ' X ) ( X ' X )

 If  'and'ˆˆ are the estimates of  'and' respectively, then

 Var(') ˆˆ ' Var () 21 ['(' X X )  ]

ˆˆ 21  Cov(',  ')  ['('  X X )  ].  YX ˆ is called the residual vector

 EY()0. Xˆ

Linear model with correlated observations: In the linear model YX    with EVar()  0, () and  is normally distributed, we find EY() X , VarY ()

Assuming  to be positive definite, so we can write PP' where P is a nonsingular matrix. Premultiplying YX    by P, we get PY PX P or YX * * * whereYPYXPX * , * and *  P .

2 Note that in this model EVarI(*00and  (*) .

Analysis of Variance | Chapter 1 | Linear Algebra, Matrix Theory and Dist. | Shalabh, IIT Kanpur 13

Distribution of 'Y : In the linear model YX,~(0,) N 2 I consider a linear function 'Y which is normally distributed with EY(' ) ' X ,

Var(' Y )  2 ('). Then ''YX ~,1N . '' 

(' Y )2 Further, has a noncentral Chi-square distribution with one degree of freedom and noncentrality  2' (' X  )2 parameter .  2'

Degrees of freedom: A linear function 'Y of the observations (0)  is said to carry one degrees of freedom. A set of linear functions LY' where L is r x n matrix, is said to have M degrees of freedom if there exist M linearly independent functions in the set and no more. Alternatively, the degrees of freedom carried by the set LY' equals rank() L . When the set LY' are the estimates of  ', the degrees of freedom of the set LY' will also be called the degrees of freedom for the estimates of  '  .

Sum of squares: Y ' If 'Y is a linear function of observations, then the projection of Y on  is the vector . . The ' (' Y )2 square of this projection is called the sum of squares (SS) due to ' y is given by . Since 'Y has ' one degree of freedom, so the SS due 'Y to has one degree of freedom.

The sum of squares and the degrees of freedom arising out of the mutually orthogonal sets of functions can be added together to give the sum of squares and degrees of freedom for the set of all the function together and vice versa.

Analysis of Variance | Chapter 1 | Linear Algebra, Matrix Theory and Dist. | Shalabh, IIT Kanpur 14

Let XXXX (,12 ,...,)n has a multivariate normal distribution with mean vector  and positive definite covariance matrix . Let the two quadratic forms.

2  X ',AX is distribution  with n1 degrees of freedom and noncentrality parameter  ' A1 and

2  XAX' 2 is distributed as  with n2 degrees of freedom and noncentrality parameter  ' A2 .

Then X 'and'AX12 X AX are independently distributed if AA12  0 .

Fisher-Cochran theorem

If X  (XX12 , ,..., Xn ) has a multivariate normal distribution with mean vector  and positive definite covariance matrix  and let

1 XXQQ'...12 Qk where QXAXii ' with rank(ANiii ) , 1,2,...,. k Then Qsi ' are independently distributed noncentral Chi-square distribution with Ni degrees of freedom and noncentrality parameter  ' Ai  if

k and only if  NNi  is which case i1

k 1  ''. Ai i1

Derivatives of quadratic and linear forms:

Let X  (,xx12 ,...,)' xn and f(X) be any function of n independent variables x12,xx ,..., n , then

f ()X x 1 f ()X fX()   x X 2 .    f ()X  xn If Kkkk (, ,...,)'is a vector of constants, then 12 n KX'  K X If A is a mn matrix, then XAX' 2(A AX ') . X

Analysis of Variance | Chapter 1 | Linear Algebra, Matrix Theory and Dist. | Shalabh, IIT Kanpur 15

Independence of linear and quadratic forms:

 Let Y be an n1 vector having multivariate normal distribution NI(,) and B be a mn matrix. Then the m1 vector linear form BY is independent of the quadratic form YAY' if BA = 0 where A is a symmetric matrix of known elements.

 Let Y be a n1 vector having multivariate normal distribution N(,)  with rank() n . If BA0, then the quadratic form YAY' is independent of linear form BY where B is a mn matrix.

Analysis of Variance | Chapter 1 | Linear Algebra, Matrix Theory and Dist. | Shalabh, IIT Kanpur 16