STAT 802: Multivariate Analysis Course Outline

STAT 802: Multivariate Analysis Course outline: Multivariate Distributions. • The Multivariate Normal Distribution. • The 1 sample problem. • Paired comparisons. • Repeated measures: 1 sample. • One way MANOVA. • Two way MANOVA. • Profile Analysis. • 1 Multivariate Multiple Regression. • Discriminant Analysis. • Clustering. • Principal Components. • Factor analysis. • Canonical Correlations. • 2 Basic structure of typical multivariate data set: Case by variables: data in matrix. Each row is a case, each column is a variable. Example: Fisher's iris data: 5 rows of 150 by 5 matrix: Case Sepal Sepal Petal Petal # Variety Length Width Length Width 1 Setosa 5.1 3.5 1.4 0.2 2 Setosa 4.9 3.0 1.4 0.2 . 51 Versicolor 7.0 3.2 4.7 1.4 . 3 Usual model: rows of data matrix are independent random variables. Vector valued random variable: function X : p T Ω R such that, writing X = (X ; : : : ; Xp) , 7! 1 P (X x ; : : : ; Xp xp) 1 ≤ 1 ≤ defined for any const's (x1; : : : ; xp). Cumulative Distribution Function (CDF) p of X: function FX on R defined by F (x ; : : : ; xp) = P (X x ; : : : ; Xp xp) : X 1 1 ≤ 1 ≤ 4 Defn: Distribution of rv X is absolutely con- tinuous if there is a function f such that P (X A) = f(x)dx (1) 2 ZA for any (Borel) set A. This is a p dimensional integral in general. Equivalently F (x1; : : : ; xp) = x1 xp f(y ; : : : ; yp) dyp; : : : ; dy : · · · 1 1 Z−∞ Z−∞ Defn: Any f satisfying (1) is a density of X. For most x F is differentiable at x and @pF (x) = f(x) : @x @xp 1 · · · 5 Building Multivariate Models Basic tactic: specify density of T X = (X1; : : : ; Xp) : Tools: marginal densities, conditional densities, independence, transformation. Marginalization: Simplest multivariate problem X = (X1; : : : ; Xp); Y = X1 (or in general Y is any Xj). Theorem 1 If X has density f(x1; : : : ; xp) and q < p then Y = (X1; : : : ; Xq) has density fY(x1; : : : ; xq) = 1 1 f(x ; : : : ; xp) dx : : : dxp · · · 1 q+1 Z−∞ Z−∞ fX1;:::;Xq is the marginal density of X1; : : : ; Xq and fX the joint density of X but they are both just densities. \Marginal" just to distinguish from the joint density of X. 6 Independence, conditional distributions Def'n: Events A and B are independent if P (AB) = P (A)P (B) : (Notation: AB is the event that both A and B happen, also written A B.) \ Def'n: Ai, i = 1; : : : ; p are independent if r P (A A ) = P (A ) i1 · · · ir ij jY=1 for any 1 i < < ir p. ≤ 1 · · · ≤ Def'n: X and Y are independent if P (X A; Y B) = P (X A)P (Y B) 2 2 2 2 for all A and B. Def'n: Rvs X1; : : : ; Xp independent: P (X A ; ; Xp Ap) = P (X A ) 1 2 1 · · · 2 i 2 i Y for any A1; : : : ; Ap. 7 Theorem: 1. If X and Y are independent with joint density fX;Y(x; y) then X and Y have densities fX and fY, and fX;Y(x; y) = fX(x)fY(y) : 2. If X and Y independent with marginal densities fX and fY then (X; Y) has joint density fX;Y(x; y) = fX(x)fY(y) : 3. If (X; Y) has density f(x; y) and there ex- ist g(x) and h(y) st f(x; y) = g(x)h(y) for (almost) all (x; y) then X and Y are independent with densities given by 1 fX(x) = g(x)= g(u)du Z−∞ 1 fY(y) = h(y)= h(u)du : Z−∞ 8 Theorem: If X1; : : : ; Xp are independent and Yi = gi(Xi) then Y1; : : : ; Yp are independent. Moreover, (X1; : : : ; Xq) and (Xq+1; : : : ; Xp) are independent. Conditional densities Conditional density of Y given X = x: fY X(y x) = fX;Y(x; y)=fX(x) ; j j in words \conditional = joint/marginal". 9 Change of Variables Suppose Y = g(X) Rp with X Rp having 2 2 density fX. Assume g is a one to one (\in- jective") map, i.e., g(x1) = g(x2) if and only if x1 = x2. Find fY: 1 Step 1: Solve for x in terms of y: x = g− (y). Step 2: Use basic equation: fY(y)dy = fX(x)dx and rewrite it in the form 1 dx fY(y) = fX(g− (y)) dy dx Interpretation of derivative dy when p > 1: dx @x = det i dy @y ! j which is the so called Jacobian . 10 Equivalent formula inverts the matrix: f (g 1(y)) ( ) = X − fY y dy : dx This notation means @y1 @y1 @y1 dy @x1 @x2 · · · @xp 2 . 3 = det . dx 6 @yp @yp @yp 7 6 @x @x @x 7 6 1 2 · · · p 7 4 5 but with x replaced by the corresponding value 1 of y, that is, replace x by g− (y). Example: The density 2 2 1 x1 + x2 fX(x1; x2) = exp 2π (− 2 ) is the standard bivariate normal density. Let Y = (Y ; Y ) where Y = X2 + X2 and 0 1 2 1 1 2 ≤ Y2 < 2π is angle from the pqositive x axis to the ray from the origin to the point (X1; X2). I.e., Y is X in polar co-ordinates. 11 Solve for x in terms of y: X1 = Y1 cos(Y2) X2 = Y1 sin(Y2) so that g(x1; x2) = (g1(x1; x2); g2(x1; x2)) 2 2 = ( x1 + x2; argument(x1; x2)) q 1 1 1 g− (y1; y2) = (g1− (y1; y2); g2− (y1; y2)) = (y1 cos(y2); y1 sin(y2)) cos(y ) y sin(y ) dx 2 − 1 2 = det 0 1 dy sin(y2) y1 cos(y2) B C @ A = y1 : It follows that 2 1 y1 fY(y1; y2) = exp y1 2π (− 2 ) × 1(0 y < )1(0 y < 2π) : ≤ 1 1 ≤ 2 12 Next: marginal densities of Y1, Y2? Factor fY as fY(y1; y2) = h1(y1)h2(y2) where 2 h (y ) = y e y1=21(0 y < ) 1 1 1 − ≤ 1 1 and h (y ) = 1(0 y < 2π)=(2π) : 2 2 ≤ 2 Then 1 fY1(y1) = h1(y1)h2(y2) dy2 Z−∞ 1 = h1(y1) h2(y2) dy2 Z−∞ so marginal density of Y1 is a multiple of h1. Multiplier makes fY1 = 1 but in this case R 2π 1 1 h2(y2) dy2 = (2π)− dy2 = 1 0 Z−∞ Z so that 2 f (y ) = y e y1=21(0 y < ) : Y1 1 1 − ≤ 1 1 (Special Weibull or Rayleigh distribution.) 13 Similarly f (y ) = 1(0 y < 2π)=(2π) Y2 2 ≤ 2 which is the Uniform(0; 2π) density. Exercise: 2 W = Y1 =2 has standard exponential distribu- 2 2 tion. Recall: by definition U = Y1 has a χ distribution on 2 degrees of freedom. Exer- 2 cise: find χ2 density. y2=2 Remark: easy to check 01 ye− dy = 1. R Thus: have proved original bivariate normal density integrates to 1. x2=2 Put I = 1 e− dx. Get −∞ R 2 1 x2=2 1 y2=2 I = e− dx e− dy Z−∞ Z−∞ 1 1 (x2+y2)=2 = e− dydx Z−∞ Z−∞ = 2π: So I = p2π. 14 Linear Algebra Review Notation: Vectors x Rn are column vectors • 2 x1 . x = 2 . 3 xn 6 7 4 5 An m n matrix A has m rows, n columns • × and entries Aij. Matrix and vector addition defined compo- • nentwise: (A+B)ij = Aij +Bij; (x+y)i = xi +yi If A is m n and B is n r then AB is the • × × m r matrix × n (AB)ij = AikBkj kX=1 15 The matrix I or sometimes In n which is • × an n n matrix with I = 1 for all i and × ii I = 0 for any pair i = j is called the n n ij 6 × identity matrix. The span of a set of vectors x ; : : : ; xp • f 1 g is the set of all vectors x of the form x = cixi. It is a vector space. The column Pspace of a matrix, A, is the span of the set of columns of A. The row space is the span of the set of rows. A set of vectors x ; : : : ; xp is linearly in- • f 1 g dependent if cixi = 0 implies ci = 0 for all i. The dimensionP of a vector space is the cardinality of the largest possible set of linearly independent vectors. 16 Defn: The transpose, AT , of an m n matrix × A is the n m matrix whose entries are given × by T (A )ij = Aji so that AT is n m. We have × (A + B)T = AT + BT and (AB)T = BT AT : Defn: rank of matrix A, rank(A): # of linearly independent columns of A. We have rank(A) = dim(column space of A) = dim(row space of A) = rank(AT ) If A is m n then rank(A) min(m; n). × ≤ 17 Matrix inverses For now: all matrices square n n. × If there is a matrix B such that BA = In n × then we call B the inverse of A. If B exists it is unique and AB = I and we write B = 1 A− . The matrix A has an inverse if and only if rank(A) = n. Inverses have the following properties: 1 1 1 (AB)− = B− A− (if one side exists then so does the other) and T 1 1 T (A )− = (A− ) 18 Determinants Again A is n n. The determinant if a function × on the set of n n matrices such that: × 1.

STAT 802: Multivariate Analysis Course Outline

Mx (T) = 2$/2) T'w# (&&J Terp

Bayesian Inference Via Approximation of Log-Likelihood for Priors in Exponential Family

Interpolation of the Wishart and Non Central Wishart Distributions

Singular Inverse Wishart Distribution with Application to Portfolio Theory

An Introduction to Wishart Matrix Moments Adrian N

Package 'Matrixsampling'

Arxiv:1402.4306V2 [Stat.ML] 19 Feb 2014 Advantages Come at No Additional Computa- Archambeau and Bach, 2010]

Wishart Distributions for Covariance Graph Models

2. Wishart Distributions and Inverse-Wishart Sampling

Bayesian Inference for the Multivariate Normal

Exact Largest Eigenvalue Distribution for Doubly Singular Beta Ensemble

Non-Central Multivariate Chi-Square and Gamma Distributions