Properties of the Multivariate Normal
Total Page:16
File Type:pdf, Size:1020Kb
Properties of the multivariate normal We can write that a vector is multivariate normal as y ∼ Np(µ; Σ). Some important properties of multivariate normal distributions include 1. Linear combinations of the variables y1;:::; yp are also normal with Note that for some distributions, such as the Poisson, sums of independent (but not necessarily identically distributed) random variables stay within the same family of distributions. For other distributions, they don't stay in the same family (e.g., exponential random variables). However, it is not clear that sums of two correlated Poissons will still be Poisson. Also, differences between Poisson random variables are not Poisson. For the normal, we have the nice property that even if two (or more) normal random variables are correlated, any linear combinations will still be normal. SAS Programming February 4, 2015 1 / 48 Properties of multivariate normal distributions 2. If A is constant (entries are not random variables) and is q × p with rank q ≤ p, then 0 Ay ∼ Nq(Aµ; AΣA ) What happens if q > p? SAS Programming February 4, 2015 2 / 48 Properties of multivariate normal distributions 3. A vector y can be standardized using either z = (T0)−1(y − µ) where T is obtained using the Cholesky decomposition so that T0T = Σ, or z = (Σ1=2)−1(y − µ) This standardization is similar to the idea of z-scores; however, just taking the usual z-scores of the individual variables in y will still leave the variables correlated. The standardizations above result in z ∼ Np(0; I) SAS Programming February 4, 2015 3 / 48 Properties of multivariate normal distributions 4. Sums of squares of p independent standard normal random variables have a χ2 distribution with p degrees of freedom. Therefore, if y ∼ Np(µ; Σ), then −1 (y − µ)0Σ−1(y − µ) = (y − µ)0 Σ1=2Σ1=2 (y − µ) −1 −1 = (y − µ)0 Σ1=2 Σ1=2 (y − µ) −1 0 −1 = Σ1=2 (y − µ) Σ1=2 (y − µ) = z0z Here the vector z consists of i.i.d. standard normal vectors according to property 3, so z0z is a sum of squared i.i.d. standard normals, which is known to have χ2 distribution. Therefore 0 −1 2 (y − µ) Σ (y − µ) ∼ χp SAS Programming February 4, 2015 4 / 48 Properties of multivariate normal distributions 5. Normality of marginal distributions If y has p random variables and is multivariate normal, then any subset yi1 ;:::; yir , r < p, is also multivariate normal. We can assume that the r variables of interested are listed first so that 0 0 y1 = (y1;:::; yr ) ; y2 = (yr+1;:::; yp) Then we have y µ Σ Σ y = 1 ; µ = 1 ; Σ = 11 12 y2 µ2 Σ21 Σ22 and y1 ∼ Nr (µ1; Σ11) SAS Programming February 4, 2015 5 / 48 Properties of multivariate normal distributions 0 Note that if y1 and y2 are each normal, it doesn't follow that y = (y1; y2) is multivariate normal. For an extreme example, let y1 ∼ N(0; 1), z ∼ Bernoulli(1/2) and y2 = y1 · I (z = 1) − y1 · (z = 0). In other words, with probability 1/2, y2 = y1, and with probability 1/2, y2 = −y1. Then 0 y2 is normal, yet (y1; y2) is not multivariate normal. 0 What does the distribution of (y1; y2) look like? SAS Programming February 4, 2015 6 / 48 Properties of multivariate normal distributions > y2 <- y1*(z==1)-y1*(z==0) > y1 <- rnorm(1000) > z <- rbinom(1000,1,.5) > y2 <- y1*(z==1)-y1*(z==0) > shapiro.test(y2) # quick test of normality of a vector Shapiro-Wilk normality test data: y2 W = 0.9989, p-value = 0.8234 SAS Programming February 4, 2015 7 / 48 Properties of multivariate normal distributions To check more theoretically that y2 is normal, we use the fact that for a standard normal y1 and −y1 have the same distribution P(y2 ≤ x) = P(y2 ≤ xjz = 1)P(z = 1) + P(y2 ≤ xjz = 2)P(z = 2) = P(y1 ≤ x)(1=2) + P(−y1 ≤ x)(1=2) = P(y1 ≤ x)(1=2) + P(y1 ≤ x)(1=2) = P(y1 ≤ x) Therefore y2 has the same CDF (cumulative distribution function) as y1, so they have the same distribution. This shows more than that y2 is normal | it also standard normal just like y1. SAS Programming February 4, 2015 8 / 48 Properties of multivariate normal distributions SAS Programming February 4, 2015 9 / 48 Marginal distributions If you have a collection of random variables, and you ignore some of them, the distribution of the remaining is a marginal distribution. For a bivariate 0 random variable y = (y1; y2) , the distribution of y1 is a marginal distribution of the distribution of y. In non-vector notation, the joint density for two random variables is often written f12(y1; y2) and the marginal distribution can be obtained by Z 1 f1(y1) = f12(y1; y2) dy2 −∞ The joint density for y1 is Z 1 Z 1 f12···r (y1;:::; yr ) = ··· f1···p(y1;:::; yp) dyr+1 ··· dyp −∞ −∞ And this is why it is called a marginal density. SAS Programming February 4, 2015 10 / 48 Plotting marginal densities in R > install.packages("ade4") > library(ade4) > x <- rnorm(100) > y <- x+ rnorm(100) > d <- data.frame(x,y) > s.hist(d) SAS Programming February 4, 2015 11 / 48 Plotting marginal densities in R SAS Programming February 4, 2015 12 / 48 Plotting marginal densities in R For the weird example with y2 = y11 · I (z = 1) − y1 · I (z = 0) 300 200 100 d = 21 716 764 600 870 262 305184 819 681 44 960 135612663815 809665518 464 656659 885 207 197 179209828497 758 218852668 551928122614 311 14687222242842916 302896353496 888 404989434 2 566 636413751742160 907946316247 423 99145229427285249687366 286 675695 5335165137 829439 970 906593640801890 859 394595 749482526354 136826 435 14079347 349759615 621717 8538075915 820 163780948 476643699 876 403950 623283 312135 426495 206 654 41652199 411915132770115 417 123274 6794655551912412778501208202884 255332463580523590619766 420470537336 688204450 83449 493 433 238236914 3 438861 582118 9828571097782052566678875025781376 12 830 902 848170 553 4063466114012199667 377 308 260940499 15315070186229 441639985 436999 35930614 794400203211240 684571736873765519 882844840921729303223402655432 568 734931776 789156130977 594900825860318510604867293451 199 748149331 864747422691910772856588877670 6 325239 224715893506941 564527816 587395835389782279 23090144722629893797 733967268 627755800437 8807331989 194767 273507832927881 557414831289536543241964693613721683637119129457 939444542 712930 55960645329718243111112876243410633544 962573980 4451903608737023936760874461205181714806 775 314 725 947724215415 849772885310717147933775 629 583763855 917 32394584622767885034201635625462 866784261956570752 2923381789325376290379851335792489418 599 586535904 843280148997 1925781527456903481426424461912018 483891 165 981 281968231664796979965569271650827408 822340382726 242711304125212 398925407195602 966477456818229531954538529567345 534996 362935 8122452538996226695052870 4 134 234703661139458 387397 357374524961990373 837817842173598713437187275567373814591912932220141754 25254545 648575984 299926309277845 3813809769299695044982465782468556117390469210 126987 847547781811378 471502 972166 50 790591138 487886 6079595929116136 243933 480545952 189657585301321 333102307 311383 391911549266942287838228 62949339620167539 131998364193786 28471327886816386 5503449872 757 958324425513481 63189526923523250818074161095316494414358166225645923366686369 116176 486106584 466267768704773 11839 198641720554351 909618920798517385363 744 983 105 975897196546384783777104889692 894865159361113 769735667740617978638963761731 237 803 516326576 630528 70942916927021468652 887392 175177601 5322725975004753156130440 69456394 6806895091000 771310449608589698 57494369 3509026 448151 460 485 624296988484473488 4928664 821 93471995745521437264941158922368427 92 558 64585813757970267224818546779 33 112 986 47836532234787967449045463482628646 710144607 259541100383155174746971730 56 355 430 924474676908869892 264328 133412 47227685 632396 697903700 421805342804540522995225334 603503 341282 84785706 739188677659 974 923494 530244913756671787836938951 572 258833552371320330696682443 863 9588 644565147799442 221251520 375424626114265 732183110200 73751122707 91879 405263401876318632358 7 217 753108 46735648 658 154585281 295 616 329883515216596317685919393875 808871367854 65380 802 38101562 548728250 561121 823168 419 993388213774708 992512 79127535274 172723 10560905 124 17 841592 743514409 839810 162103 973722 491313399 327994 647 898 660 609300 705 797 814955157254 878 577 605 SAS Programming February 4, 2015 13 / 48 Properties of the multivariate normal distribution Let the observation vector (rows of the data matrix) be partitioned into y and x with y µ y Σ Σ E = y ; cov = yy yx x µx x Σxy Σxx with y µy Σyy Σyx ∼ Np+q ; x µx Σxy Σxx SAS Programming February 4, 2015 14 / 48 Properties of the multivariate normal distribution 5. (a) The subvectors y and x are independent if Σyx = 0. (b) yi and yj (or yi and xj ) are independent if σij = 0. These properties do not always hold if the distribution is not multivariate normal. In particular, for the weird example where y2 = y1 · I (z = 1) − y1 · I (z = 0), what can you say about the correlation between y1 and y2, and what can you say about their independence? SAS Programming February 4, 2015 15 / 48 Properties of the multivariate normal distribution It is often easier to show that two variables are uncorrelated than that they are independent. So this property of the multivariate normal, that no correlation implies independence, is quite useful. SAS Programming February 4, 2015 16 / 48 Conditional distributions Given two or more random variables with a joint distribution, we can condition on some random variables to get the conditional distribution of the remaining variables. For example, with the heights and ages of couples example, we could look