STAT 802: Multivariate Analysis Course Outline
Total Page:16
File Type:pdf, Size:1020Kb
STAT 802: Multivariate Analysis Course outline: Multivariate Distributions. • The Multivariate Normal Distribution. • The 1 sample problem. • Paired comparisons. • Repeated measures: 1 sample. • One way MANOVA. • Two way MANOVA. • Profile Analysis. • 1 Multivariate Multiple Regression. • Discriminant Analysis. • Clustering. • Principal Components. • Factor analysis. • Canonical Correlations. • 2 Basic structure of typical multivariate data set: Case by variables: data in matrix. Each row is a case, each column is a variable. Example: Fisher's iris data: 5 rows of 150 by 5 matrix: Case Sepal Sepal Petal Petal # Variety Length Width Length Width 1 Setosa 5.1 3.5 1.4 0.2 2 Setosa 4.9 3.0 1.4 0.2 . 51 Versicolor 7.0 3.2 4.7 1.4 . 3 Usual model: rows of data matrix are indepen- dent random variables. Vector valued random variable: function X : p T Ω R such that, writing X = (X ; : : : ; Xp) , 7! 1 P (X x ; : : : ; Xp xp) 1 ≤ 1 ≤ defined for any const's (x1; : : : ; xp). Cumulative Distribution Function (CDF) p of X: function FX on R defined by F (x ; : : : ; xp) = P (X x ; : : : ; Xp xp) : X 1 1 ≤ 1 ≤ 4 Defn: Distribution of rv X is absolutely con- tinuous if there is a function f such that P (X A) = f(x)dx (1) 2 ZA for any (Borel) set A. This is a p dimensional integral in general. Equivalently F (x1; : : : ; xp) = x1 xp f(y ; : : : ; yp) dyp; : : : ; dy : · · · 1 1 Z−∞ Z−∞ Defn: Any f satisfying (1) is a density of X. For most x F is differentiable at x and @pF (x) = f(x) : @x @xp 1 · · · 5 Building Multivariate Models Basic tactic: specify density of T X = (X1; : : : ; Xp) : Tools: marginal densities, conditional densi- ties, independence, transformation. Marginalization: Simplest multivariate prob- lem X = (X1; : : : ; Xp); Y = X1 (or in general Y is any Xj). Theorem 1 If X has density f(x1; : : : ; xp) and q < p then Y = (X1; : : : ; Xq) has density fY(x1; : : : ; xq) = 1 1 f(x ; : : : ; xp) dx : : : dxp · · · 1 q+1 Z−∞ Z−∞ fX1;:::;Xq is the marginal density of X1; : : : ; Xq and fX the joint density of X but they are both just densities. \Marginal" just to distinguish from the joint density of X. 6 Independence, conditional distributions Def'n: Events A and B are independent if P (AB) = P (A)P (B) : (Notation: AB is the event that both A and B happen, also written A B.) \ Def'n: Ai, i = 1; : : : ; p are independent if r P (A A ) = P (A ) i1 · · · ir ij jY=1 for any 1 i < < ir p. ≤ 1 · · · ≤ Def'n: X and Y are independent if P (X A; Y B) = P (X A)P (Y B) 2 2 2 2 for all A and B. Def'n: Rvs X1; : : : ; Xp independent: P (X A ; ; Xp Ap) = P (X A ) 1 2 1 · · · 2 i 2 i Y for any A1; : : : ; Ap. 7 Theorem: 1. If X and Y are independent with joint den- sity fX;Y(x; y) then X and Y have densities fX and fY, and fX;Y(x; y) = fX(x)fY(y) : 2. If X and Y independent with marginal den- sities fX and fY then (X; Y) has joint den- sity fX;Y(x; y) = fX(x)fY(y) : 3. If (X; Y) has density f(x; y) and there ex- ist g(x) and h(y) st f(x; y) = g(x)h(y) for (almost) all (x; y) then X and Y are inde- pendent with densities given by 1 fX(x) = g(x)= g(u)du Z−∞ 1 fY(y) = h(y)= h(u)du : Z−∞ 8 Theorem: If X1; : : : ; Xp are independent and Yi = gi(Xi) then Y1; : : : ; Yp are independent. Moreover, (X1; : : : ; Xq) and (Xq+1; : : : ; Xp) are independent. Conditional densities Conditional density of Y given X = x: fY X(y x) = fX;Y(x; y)=fX(x) ; j j in words \conditional = joint/marginal". 9 Change of Variables Suppose Y = g(X) Rp with X Rp having 2 2 density fX. Assume g is a one to one (\in- jective") map, i.e., g(x1) = g(x2) if and only if x1 = x2. Find fY: 1 Step 1: Solve for x in terms of y: x = g− (y). Step 2: Use basic equation: fY(y)dy = fX(x)dx and rewrite it in the form 1 dx fY(y) = fX(g− (y)) dy dx Interpretation of derivative dy when p > 1: dx @x = det i dy @y ! j which is the so called Jacobian . 10 Equivalent formula inverts the matrix: f (g 1(y)) ( ) = X − fY y dy : dx This notation means @y1 @y1 @y1 dy @x1 @x2 · · · @xp 2 . 3 = det . dx 6 @yp @yp @yp 7 6 @x @x @x 7 6 1 2 · · · p 7 4 5 but with x replaced by the corresponding value 1 of y, that is, replace x by g− (y). Example: The density 2 2 1 x1 + x2 fX(x1; x2) = exp 2π (− 2 ) is the standard bivariate normal density. Let Y = (Y ; Y ) where Y = X2 + X2 and 0 1 2 1 1 2 ≤ Y2 < 2π is angle from the pqositive x axis to the ray from the origin to the point (X1; X2). I.e., Y is X in polar co-ordinates. 11 Solve for x in terms of y: X1 = Y1 cos(Y2) X2 = Y1 sin(Y2) so that g(x1; x2) = (g1(x1; x2); g2(x1; x2)) 2 2 = ( x1 + x2; argument(x1; x2)) q 1 1 1 g− (y1; y2) = (g1− (y1; y2); g2− (y1; y2)) = (y1 cos(y2); y1 sin(y2)) cos(y ) y sin(y ) dx 2 − 1 2 = det 0 1 dy sin(y2) y1 cos(y2) B C @ A = y1 : It follows that 2 1 y1 fY(y1; y2) = exp y1 2π (− 2 ) × 1(0 y < )1(0 y < 2π) : ≤ 1 1 ≤ 2 12 Next: marginal densities of Y1, Y2? Factor fY as fY(y1; y2) = h1(y1)h2(y2) where 2 h (y ) = y e y1=21(0 y < ) 1 1 1 − ≤ 1 1 and h (y ) = 1(0 y < 2π)=(2π) : 2 2 ≤ 2 Then 1 fY1(y1) = h1(y1)h2(y2) dy2 Z−∞ 1 = h1(y1) h2(y2) dy2 Z−∞ so marginal density of Y1 is a multiple of h1. Multiplier makes fY1 = 1 but in this case R 2π 1 1 h2(y2) dy2 = (2π)− dy2 = 1 0 Z−∞ Z so that 2 f (y ) = y e y1=21(0 y < ) : Y1 1 1 − ≤ 1 1 (Special Weibull or Rayleigh distribution.) 13 Similarly f (y ) = 1(0 y < 2π)=(2π) Y2 2 ≤ 2 which is the Uniform(0; 2π) density. Exercise: 2 W = Y1 =2 has standard exponential distribu- 2 2 tion. Recall: by definition U = Y1 has a χ distribution on 2 degrees of freedom. Exer- 2 cise: find χ2 density. y2=2 Remark: easy to check 01 ye− dy = 1. R Thus: have proved original bivariate normal density integrates to 1. x2=2 Put I = 1 e− dx. Get −∞ R 2 1 x2=2 1 y2=2 I = e− dx e− dy Z−∞ Z−∞ 1 1 (x2+y2)=2 = e− dydx Z−∞ Z−∞ = 2π: So I = p2π. 14 Linear Algebra Review Notation: Vectors x Rn are column vectors • 2 x1 . x = 2 . 3 xn 6 7 4 5 An m n matrix A has m rows, n columns • × and entries Aij. Matrix and vector addition defined compo- • nentwise: (A+B)ij = Aij +Bij; (x+y)i = xi +yi If A is m n and B is n r then AB is the • × × m r matrix × n (AB)ij = AikBkj kX=1 15 The matrix I or sometimes In n which is • × an n n matrix with I = 1 for all i and × ii I = 0 for any pair i = j is called the n n ij 6 × identity matrix. The span of a set of vectors x ; : : : ; xp • f 1 g is the set of all vectors x of the form x = cixi. It is a vector space. The column Pspace of a matrix, A, is the span of the set of columns of A. The row space is the span of the set of rows. A set of vectors x ; : : : ; xp is linearly in- • f 1 g dependent if cixi = 0 implies ci = 0 for all i. The dimensionP of a vector space is the cardinality of the largest possible set of linearly independent vectors. 16 Defn: The transpose, AT , of an m n matrix × A is the n m matrix whose entries are given × by T (A )ij = Aji so that AT is n m. We have × (A + B)T = AT + BT and (AB)T = BT AT : Defn: rank of matrix A, rank(A): # of linearly independent columns of A. We have rank(A) = dim(column space of A) = dim(row space of A) = rank(AT ) If A is m n then rank(A) min(m; n). × ≤ 17 Matrix inverses For now: all matrices square n n. × If there is a matrix B such that BA = In n × then we call B the inverse of A. If B exists it is unique and AB = I and we write B = 1 A− . The matrix A has an inverse if and only if rank(A) = n. Inverses have the following properties: 1 1 1 (AB)− = B− A− (if one side exists then so does the other) and T 1 1 T (A )− = (A− ) 18 Determinants Again A is n n. The determinant if a function × on the set of n n matrices such that: × 1.