The Bivariate and Multivariate Normal Distribution

Probability 2 - Notes 11 The bivariate and multivariate normal distribution. 2 2 Definition. Two r.v.’s (X;Y) have a bivariate normal distribution N(µ1;µ2;s1;s2;r) if their joint p.d.f. is · ¸ ³ ´2 ³ ´³ ´ ³ ´2 ¡1 x¡µ1 ¡2r x¡µ1 y¡µ2 + y¡µ2 1 2(1¡r2) s1 s1 s2 s2 fX;Y (x;y) = p e (1) 2 2ps1s2 (1 ¡ r ) for all x;y. The parameters µ1;µ2 may be any real numbers, s1 > 0; s2 > 0, and ¡1 · r · 1. It is convenient to rewrite (1) in the form ¡ 1 Q(x;y) 1 fX;Y (x;y) = ce 2 ; where c = p and 2 2ps1s2 (1 ¡ r ) "µ ¶ µ ¶µ ¶ µ ¶ # x ¡ µ 2 x ¡ µ y ¡ µ y ¡ µ 2 Q(x;y) = (1 ¡ r2)¡1 1 ¡ 2r 1 2 + 2 (2) s1 s1 s2 s2 2 2 Statement. The marginal distributions of N(µ1;µ2;s1;s2;r) are normal with r.v.’s X and Y having density functions 2 2 ¡ (x¡µ1) ¡ (y¡µ2) 1 2s2 1 2s2 fX (x) = p e 1 ; fY (y) = p e 2 : 2ps1 2ps2 Proof. The expression (2) for Q(x;y) can be rearranged as follows: "µ ¶ µ ¶ # 1 x ¡ µ y ¡ µ 2 y ¡ µ 2 (x ¡ a)2 (y ¡ µ )2 Q(x;y) = 1 ¡ r 2 + (1 ¡ r2) 2 = + 2 ; 2 2 2 2 1 ¡ r s1 s2 s2 (1 ¡ r )s1 s2 (3) where a = a(y) = µ + r s1 (y ¡ µ ). Hence 1 s2 2 Z (y¡µ )2 Z (x¡a)2 (y¡µ )2 ¥ ¡ 2 ¥ ¡ ¡ 2 2s2 2(1¡r2)s2 1 2s2 fY (y) = fX;Y (x;y)dx = ce 2 £ e 1 dx = p e 2 ; ¡¥ ¡¥ 2ps2 R (x¡a)2 p p ¥ ¡ 2 2 where the last step makes use of the formula ¡¥ e 2s dx = 2ps with s = s1 1 ¡ r . 2 Exercise. Derive the formula for fX (x). Corollaries. 2 2 1. Since X » N(µ1;s1), Y » N(µ2;s2), we know the meaning of four parameters involved into the definition of the normal distribution, namely 2 2 E(X) = µ1, Var(X) = s1, E(Y) = µ2, Var(X) = s2. 2. Xj(Y = y) is a normal r.v. To verify this statement we substitute the necessary ingredients into the formula defining the relevant conditional density: (x¡a(y))2 fX;Y (x;y) 1 ¡ 2 2 f (xjy) = = p e 2s1(1¡r ) : XjY 2 fY (y) 2p(1 ¡ r )s1 1 2 2 In other words, Xj(Y = y) » N(a(y);(1 ¡ r )s1). Hence: 3. E(XjY = y) = a(y) or, equivalently, E(XjY) = µ + r s1 (Y ¡ µ ). In particular, we see that 1 s2 2 E(XjY) is a linear function of Y. 4. E(XY) = s1s2r + µ1µ2. Proof. E(XY) = E[E(XYjY)] = E[YE(XjY)] = E[Y(µ + r s1 (Y ¡ µ )] = µ E(Y)+ 1 s2 2 1 r s1 [E(Y 2) ¡ µ E(Y)] = µ µ + r s1 [E(Y 2) ¡ µ2] = µ µ + r s1 Var(Y) = s s r + µ µ . 2 s2 2 1 2 s2 2 1 2 s2 1 2 1 2 5. Cov(X;Y) = s1s2r. This follows from Corollary 4 and the formula Cov(X;Y) = E(XY) ¡ E(X)E(X). 6. r(X;Y) = r. In words: r is the correlation coefficient of X; Y. This is now obvious from the definition r(X;Y) = p Cov(X;Y) . Var(X)var(Y) Exercise. Show that X and Y are independent iff r = 0. (We proved this in the lecture; it is easily seen from either the joint p.d.f.) Remark. It is possible to show that the m.g.f. of X;Y is 1 2 2 2 2 (µ1t1+µ2t2)+ (s t +2rs1s2t1t2+s t ) MX;Y (t1;t2) = e 2 1 1 2 2 Many of the above statements follow from it. (To actually do this is a very useful exercise.) The Multivariate Normal Distribution. Using vector and matrix notation. To study the joint normal distributions of more than two r.v.’s, it is convenient to use vectors and matrices. But let us first introduce these notations for the case of two normal r.v.’s X1;X2. We set µ ¶ µ ¶ µ ¶ µ ¶ µ ¶ 2 X1 x1 t1 µ1 s1 rs1s2 X = ; x = ; t = ; m = ; V = 2 X2 x2 t2 µ2 rs1s2 s2 Then m is the vector of means and V is the variance-covariance matrix. Note that jVj = 2 2 2 s1s2(1 ¡ r ) and Ã 1 ¡r ! 1 2 s s ¡1 s1 1 2 V = 2 ¡r 1 (1 ¡ r ) s s 2 1 2 s2 1 ¡ 1 (x¡m)T V¡1(x¡m) tT m+ 1 tT Vt Hence f (x) = e 2 for all x. Also M (t) = e 2 . X (2p)2=2jVj1=2 X We again use matrix and vector notation, but now there are n random variables so that X, x, t th th and m are now n-vectors with i entries Xi, xi, ti and µi and V is the n £ n matrix with ii entry 2 th T si and i j entry (for i 6= j) si j. Note that V is symmetric so that V = V. 2 1 ¡ 1 (x¡m)T V¡1(x¡m) The joint p.d.f. is f (x) = e 2 for all x. We say that X » N(m;V). X (2p)n=2jVj1=2 We can find the joint m.g.f. quite easily. h i Z Z n T ¥ ¥ 1 T ¡1 T ∑ t jXj t X 1 ¡ ((x¡m) V (x¡m)¡2t x) MX(t) = E e j=1 = E[e ] = ::: e 2 dx1:::dxn ¡¥ ¡¥ (2p)n=2jVj1=2 We do the equivalent of completing the square, i.e. we write (x ¡ m)T V¡1(x ¡ m) ¡ 2tT x = (x ¡ m ¡ a)T V¡1(x ¡ m ¡ a) + b for a suitable choice of the n-vector a of constants and a constant b. Then Z ¥ Z ¥ ¡b=2 1 ¡ 1 (x¡m¡a)T V¡1(x¡m¡a) ¡b=2 MX(t) = e ::: e 2 dx1:::dxn = e : ¡¥ ¡¥ (2p)n=2jVj1=2 We just need to find a and b. Expanding we have ((x ¡ m) ¡ a)T V¡1((x ¡ m) ¡ a) + b = (x ¡ m)T V¡1(x ¡ m) ¡ 2aT V¡1(x ¡ m) + aT V¡1a + b £ ¤ = (x ¡ m)T V¡1(x ¡ m) ¡ 2aT V¡1x + 2aT V¡1m + aT V¡1a + b T ¡1 T T ¡1 T This has£ to equal (x ¡ m) V¤ (x ¡ m) ¡ 2t x for all £x. Hence we¤ need a V = t and b = ¡ 2aT V¡1m + aT V¡1a . Hence a = Vt and b = ¡ 2tT m + tT Vt . Therefore ¡b=2 tT m+ 1 tT Vt MX(t) = e = e 2 Results obtained using the m.g.f. 1. Any (non-empty) subset of multivariate normals is multivariate normal. Simply put t j = 0 for 2 2 t1µ1+t1 s1=2 all j for which Xj is not in the subset. For example MX1 (t1) = MX1;:::;Xn (t1;0;:::;0) = e . 2 2 Hence X1 » N(µ1;s1). A similar result holds for Xi. This identifies the parameters µi and si as the mean and variance of Xi. Also 1 2 2 2 2 t1µ1+t2µ2+ 2 (t1 s1+2s12t1t2+s2t2 ) MX1;X2 (t1;t2) = MX1;:::;Xn (t1;t2;0;:::;0) = e Hence X1 and X2 have bivariate normal distribution with s12 = Cov(X1;X2). A similar result holds for the joint distribution of Xi and Xj for i 6= j. This identifies V as the variance-covariance matrix for X1;:::;Xn. 3 2. X is a vector of independent random variables iff V is diagonal (i.e. all off-diagonal entries are zero so that si j = 0 for i 6= j). 0 Proof. From (1), if the X s are independent then si j = Cov(Xi;Xj) = 0 for all i 6= j, so that V is diagonal. T n 2 2 If V is diagonal then t Vt = ∑ j=1 s jt j and hence n ³ ´ n T 1 T 1 2 2 t m+ 2 t Vt µ jt j+ 2 s jt j =2 MX(t) = e = ∏ e = ∏ MXj (t j) j=1 j=1 By the uniqueness of the joint m.g.f., X1;:::;Xn are independent. 3. Linearly independent linear functions of multivariate normal random variables are multivariate normal random variables. If Y = AX + b, where A is an n £ n non-singular matrix and b is a (column) n-vector of constants, then Y » N(Am + b;AVAT ). Proof. Use the joint m.g.f. tT Y tT AX+b tT b (AT t)T X tT b T MY(t) = E[e ] = E[e ] = e E[e ] = e MX(A t) tT b (AT t)T m+ 1 (AT t)T V(AT t) tT (Am+b)+ 1 tT (AVAT )t = e e 2 = e 2 This is just the m.g.f. for the multivariate normal distribution with vector of means Am + b and variance-covariance matrix AVAT . Hence, from the uniqueness of the joint m.g.f, Y » N(Am + b;AVAT ). Note that from (2) a subset of the Y 0s is multivariate normal. NOTE. The results concerning the vector of means and variance-covariance matrix for linear functions of random variables hold regardless of the joint distribution of X1;:::;Xn.

The Bivariate and Multivariate Normal Distribution

1. How Different Is the T Distribution from the Normal?

1 One Parameter Exponential Families

Chapter 5 Sections

Basic Econometrics / Statistics Statistical Distributions: Normal, T, Chi-Sq, & F

The Normal Moment Generating Function

A Family of Skew-Normal Distributions for Modeling Proportions and Rates with Zeros/Ones Excess

Lecture 2 — September 24 2.1 Recap 2.2 Exponential Families

The Singly Truncated Normal Distribution: a Non-Steep Exponential Family

Distributions (3) © 2008 Winton 2 VIII

Student's T Distribution

On Curved Exponential Families

The Multivariate Normal Distribution