Statistical Inference for Astronomers: Multivariate Analysis

Statistical Inference for Astronomers: Multivariate Analysis

Summer School in Statistics for Astronomers & Physicists June 5{10, 2005 Center for Astrostatistics Pennsylvania State University Statistical Inference for Astronomers: Multivariate Analysis Thriyambakam Krishnan Systat Software Asia{Paci¯c Limited Bangalore, India 1 Multivariate Statistical Analysis: Statistical theory, methods, algorithms, etc. ² for simultaneous study of more than one vari- able Descriptive statistics and graphical represen- ² tation Inference problems similar to univariate ² ¥ based on the multivariate normal Study of relationships between variables and ² ¯nding structure Problems of combining variables and dimen- ² sionality reduction 2 Multivariate Normal Distribution Notation: X s (¹; 2) univariate normal N X: p-column vector X p(¹; §) multivariate normal » N Reasons for studying Multivariate Normal: 1. p-variate generalization of univariate normal; 2. same reasons as for univariate normal in univariate analysis; 3. multivariate central limit theorem; 4. robustness of some procedures; 5. theory and methods analogous to univariate based on , like t and Hotelling's T 2, ANOVA and MANOVA; N 6. not many other multivariate models; 7. mathematically tractable and elegant; 8. similar parameters{mean vector ¹, covariance ma- trix §. 3 Bivariate Normal Let T X = (X1; X2); X s (¹; §); N2 T 11 12 ¹ = (¹1; ¹2); § = " 21 22 # with 12 = 21. § is non-negative de¯nite. Let 2 2 1 = 11; 2 = 22 Correlation coe±cient ½ = 12 p1122 Density:f(x ; x ) = (if § is p.d.) N2 1 2 1 x ¹ x ¹ x ¹ x ¹ 1 [( 1¡ 1 )2 2 ( 1¡ 1 )( 2¡ 2 )+( 2¡ 2 )2 2(1 ½2) ½ e¡ ¡ 1 ¡ 1 2 2 ] 2 2¼12 (1 ½ ) ¡ 2 p (x1; x2) 2 < If (X1; X2) s 2 then N X1; X2 independent ½ = 0 In general ½ = 0 does()not imply independence 4 Bivariate Normal Densities ¹1 = ¹2 = 0; 11 = 22 = 1 ½ = 0:8; ½ = 0:5; ½ = 0; ½ = 0:8; ½ = 0:5; ¡ ¡ 5 In 2 density, term inside the exponential is N 1 T 1 (x ¹) §¡ (x ¹) ¡2 ¡ ¡ 1 and constant is p 1 ; where p = 2: p2¼ § 2 j j This is the form of p density: N 1 1 T § 1 1 exp (x ¹) ¡ (x ¹) p p2¼ § 2 f¡2 ¡ ¡ g j j if § is strictly p.d. (it has anyway to be n.n.d., being a covariance matrix). Indeed, x; ¹ are p-vectors and § is a nonsingular (symmetric) p p matrix. The term £ T 1 Q = (x ¹) §¡ (x ¹) ¡ ¡ is a positive-de¯nite quadratic form. Q is covariance-matrix adjusted distance of x from ¹ ² Larger this distance, smaller is probability density ² density decreases exponentially with square of distance ² You can de¯ne p by this density and investigate its N properties. An alternative and elegant way is to use the following de¯nition: A random p-vector X is said to be multivariate normally distributed if p-vectors `, `T X has a univariate normal 8 distribution (or is a constant). This de¯nition makes sense even if § is singular. 6 Properties of p: N 1. ¹ is the vector of means of X1; X2; : : : ; Xp. 2. § is the (symmetric) matrix of variances and co- variances of X1; X2; : : : ; Xp. 3. If variance-covariance matrix (also called simply co- variance matrix or dispersion matrix) is singular, above density does not hold, but the alternative de¯nition still holds. For instance, if X (0; 4), »4N 12 then (X; 3X + 2) has covariance matrix , 12 36 singular, but all linear combinations are ·of the form¸ a + bX for constants a; b and hence are univariate normal. (X; 3X + 2) is bivariate normal by alterna- tive de¯nition. 4. The covariance matrix is singular (multivariate nor- mal or not) with linear dependence of columns given by §` = 0, i® XT ` is a constant (degenerate ran- dom variable)¹ (deterministic linear dependence of variables). In such cases, by removing deterministi- cally dependent components, § of remaining com- ponents can be made nonsingular. Near-singularity of covariance matrix is a computa- tional and conceptual problem. Some exploratory methods detect this problem. Some methods (e.g., ridge regression) overcome this problem. 5. Let us deal only with nonsingular §. 7 k 6. Let X s p(¹; §); A, k p matrix, c . Then N £ 2 < Y = AX + c (A¹ + c; A§AT ) » Nk [If k > p, then A§AT is singular.] 7. § diagonal means X1; X2; : : : ; Xp are independent random vari- ables. 8. X p(0; Ip) means X1; X2; : : : Xp are independent standard normal» N va¹riables. 9. Let X p(¹; §). Then » N 1 1 §¡ 2 X p(§¡ 2 ¹; Ip) » N 1 Y = §¡ 2 (X ¹) p(0; Ip) ¡ » N ¹ 10. Marginal Distributions: All marginal (1-dimensional and q < p-dimensional) are (multivariate) normal. That is, if you par- tition X1 ¹ §11 §12 X = ; ¹ = 1 ; § = § § X2 ¹2 21 22 analogously³ as ´q and p q³ dimensional´ vectoh rs and matrixi § §T ¡ (note that 21 = 12), then X1 s q(¹ ; §11); X2 s q(¹ ; §22) N 1 N 2 11. Under the above (multivariate normal) set-up, X 1 and X2 are independent i® §12 = 0, that is all covariances are zero. ¹ 8 Conditional Distributions and Regression s § 12. (X2 X1 = x1) (¹2:1; 22:1), where j § §N1 § § § § 1§ ¹22:1 = ¹2+ 21 11¡ (x1 ¹1); 22:1 = 22 21 11¡ 12 [Notation: A B stands¡for event A conditional¡ on event B; alsojused as X Y = y for variable X given variable Y = y.] j (a) if § is nonsingular, so is §11. (b) this conditional expectation is linear in x1. (c) regression being de¯ned as conditional expec- tation, this shows that under multivariate nor- mality, (multiple) regression of any subset of variables on the others is linear. (d) this linear regression formula is exactly the same as what you obtain by the least-squares crite- rion. s 2 2 (e) p = 2, (X2 X1 = x1) (¯0 + ¯1x1; 2(1 ½ )); j 21 N ¡ where ¯1 = and ¯0 = ¹2 ¯1¹1, the well- 11 ¡ known formulas for (least-squares) simple linear regression. (f) conditional covariance matrix does not depend on x1. (g) These results justify linearity and homoscedas- ticity (common variance) assumptions in the multiple linear regression model. 9 conditional means on a straight line ² conditional variances same ² 10 More Properties 1. [We know: if X s N(0; 1), then X2 s Â2(1)] 2 T 1 T 2 ¢ (X) = (X ¹) §¡ (X ¹) = Y Y s  (p), being sum of squa¡ res of p indep¡ endent N(0,1)'s by (9) above 2. Sample (of size n) mean vector X¹ and sample sum of squares and products matrix S are independently distributed. 3. X¹ s (¹; 1§) N n 4. S s p(n 1; §) is called the Wishart distribu- tion,Wthe multiva¡ riateW analog of the Â2 distribution| we shall not discuss it here. 5. For (¹; 2), Student's t statistic based on a sam- ple ofN n with mean X¹ and sample (mean-corrected) 2 X¹ ¹ sum of squares S is t = ¡ extended to S=pn(n 1) Hotelling's ¡ 2 T 1 T = (X¹ ¹) S¡ (X¹ ¹) ¡ ¡ 6. Analysis of Variance (ANOVA) which decomposes observed variation into its components is analo- gously extended to Multivariate Analysis of Variance (MANOVA) 11 Estimation of p(¹; §) parameters N Random sample X ; X ; : : : ; Xn from p 1 2 N Observed values: x1; x2; : : : ; xn Data matrix: n p matrix with row and col- £ U umn names as indicated: Y Y : : : Y p 1 # 2 # # X X11 X12 : : : X1 1 ! p X X X : : : X 2 ! 2 21 22 2p 3 : : : : : : : : : : : : : : : 6 7 X 6 X X : : : X 7 n 6 n1 n2 np 7 ! 4 5 12 X¹ = (X¹1; X¹2; : : : ; X¹p): Sample mean vector; S = ((Sij)): Sample (mean-corrected) Sum of Squares and Products Matrix n S = (X X¹ )(X X¹ ) = Y T Y nX¹ X¹ ; ij i` ¡ i j` ¡ j i j ¡ i j `X=1 i; j = 1; 2; : : : ; p n n S = (X X¹ )(X X¹ )T = X XT nX¹ X¹ T `¡ `¡ ` ` ¡ `X=1 `X=1 = T nX¹ X¹ T U U ¡ Analogs of Univariate Normal: X¹ : unbiased estimate of ¹ ² 1 S: unbiased estimate of § ² n 1 X¹¡; S: su±cient statistics for ¹; § [In a sense, ² these statistics contain all the information in the sample X1; X2; : : : ; Xn in respect of ¹; §.] 13 Maximum Likelihood Estimation of ¹; §: Rao (1973) Density: 1 (2¼) p=2 § 1=2 exp[ tr § 1(x ¹)(x ¹)T ] ¡ j j¡ ¡2 f ¡ ¡ ¡ g Joint density of observations (but for constant not involving parameters), which is the likeli- hood function in terms of parameters: n n=2 1 1 T L = §¡ exp[ tr §¡ (x ¹)(x ¹) ] ¡2 f ` ¡ ` ¡ g =1 X` n (x ¹)(x ¹)T ` ¡ ` ¡ =1 X` n = (x x¹)(x x¹)T + n(x¹ ¹)(x¹ ¹)T ` ¡ ` ¡ ¡ ¡ =1 X` = S + n(x¹ ¹)(x¹ ¹)T ¡ ¡ 1 T tr §¡ (x ¹)(x ¹) f ¡ ¡ g 1 1 T = tr §¡ S + ntr §¡ (x¹ ¹)(x¹ ¹) f g f ¡ ¡ g p p ij T 1 = Sij + n(x¹ ¹) §¡ (x¹ ¹) ¡ ¡ i=1 j=1 X X 1 ij where §¡ = (( )). 14 p p n 1 1 ij n T 1 log L = log §¡ Sij (x¹ ¹) §¡ (x¹ ¹) 2 j j¡ 2 ¡ 2 ¡ ¡ i=1 j=1 X X (A) p p n 1 1 ij = log §¡ [Sij + n(x¹i ¹ )(x¹j ¹ )] (B) 2 j j ¡ 2 ¡ i ¡ j i=1 j=1 X X Di®erentiating (A) w.r.t. ¹ leads to 1 §¡ (x¹ ¹) = 0 x¹ = ¹ x¹ = ¹^ (C) ¡ ) ) Di®erentiating (B) w.r.t. ij leads to 1 n @ §¡ j j = [S + n(x¹ ¹ )(x¹ ¹ )] (D) 1 ij ij i i j j §¡ @ ¡ ¡ j j § 1 @ ¡ ij 1 j j = cofactor of in §¡ @ij 1 n @ §¡ j j = n (E) 1 ij ij §¡ @ j j Equations (C), (D) and (E) lead to 1 §^ = S n a slightly biased estimate as in the univariate case.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    60 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us