Lecture 9: Joint Distributions

Department of Mathematics Ma 3/103 KC Border Introduction to Probability and Statistics Winter 2017 Lecture 9: Joint Distributions Relevant textbook passages: Pitman [3]: Chapter 5; Section 6.4–6.5 Larsen–Marx [2]: Sections 3.7, 3.8, 3.9, 3.11 9.1 Random vectors and joint distributions Recall that a random variable X is a real-valued function on the sample space (S, E,P ), where P is a probability measure on S; and that it induces a probability measure PX on R, called the distribution of X, given by ( ) PX (B) = P (X ∈ B) = P {s ∈ S : X(s) ∈ B} , for every (Borel) subset of R. The distribution is enough to calculate the expectation of any (Borel) function of X. Now suppose I have more than one random variable on the same sample space. Then I can consider the random vector (X, Y ), or X = (X1,...,Xn). n • A random vector X defines a probability PX on R , called the distribution of X via: { ( ) } PX (B) = P (X ∈ B) = P s ∈ S : X1(s),...,Xn(s) ∈ B , for every (Borel) subset of Rn. This distribution is also called the joint distribution of X1,...,Xn. • We can use this to define a joint cumulative distribution function, denoted FX , by FX (x1, . , xn) = P (Xi 6 xi, for all i = 1, . , n) • N.B. While the subscript X,Y or X is often used to identify a joint distribution or cumulative distribution function, it is also frequently omitted. You are supposed to figure out the domain of the function by inspecting its arguments. 9.1.1 Example Let S = {SS,SF,FS,FF } and let P be the probability measure on S defined by 7 3 1 1 P (SS) = ,P (SF ) = ,P (FS) = ,P (FF ) = . 12 12 12 12 Define the random variables X and Y by X(SS) = 1,X(SF ) = 1,X(FS) = 0,X(FF ) = 0, Y (SS) = 1,Y (SF ) = 0,Y (FS) = 1,Y (FF ) = 0. That is, X and Y indicate Success or Failure on two different experiments, but the experiments are not necessarily independent. 9–1 Ma 3/103 Winter 2017 KC Border Joint Distributions 9–2 Then 10 2 P (1) = ,P (0) = , X 12 X 12 8 4 P (1) = ,P (0) = , Y 12 Y 12 7 3 1 1 P (1, 1) = ,P (1, 0) = ,P (0, 1) = ,P (0, 0) = . X,Y 12 X,Y 12 X,Y 12 X,Y 12 □ 9.2 Joint PMFs A random vector X on a probability space (S, E,P ) is discrete, if you can enumerate its range. Pitman [3]: Section 3.1; When X1,...,Xn are discrete, the joint probability mass function of the random vector also p. 348 X = (X1,...,Xn) is usually denoted pX , and is given by Larsen– Marx [2]: pX (x1, x2, . , xn) = P (X1 = x1 and X2 = x2 and ··· and Xn = xn) . Section 3.7 If X and Y are independent random variables, then pX,Y (x, y) = pX (x)pY (y). For a function g of X and Y we have ∑ ∑ E g(X, Y ) = g(x, y)pX,Y (x, y). x y 9.3 Joint densities Pitman [3]: Chapter 5.1 Let X and Y be random variables on a probability space (S, E,P ). The random vector (X, Y ) 2 Larsen– has a joint density fX,Y (x, y) if for every (Borel) subset B ⊂ R , Marx [2]: ∫ ∫ Section 3.7 P ((X, Y ) ∈ B) = fX,Y (x, y) dx dy. B If X and Y are independent, then fX,Y (x, y) = fX (x)fY (y). For example, ∫ ∞ ∫ ∞ P (X > Y ) = fX,Y (x, y) dx dy. −∞ y For a function g of X and Y we have ∫ ∞ E g(X, Y ) = g(x, y)fX,Y (x, y) dx dy. −∞ Again, the subscript X,Y or X on a density is frequently omitted. 9.4 Recovering marginal distributions from joint distributions So now we have random variables X and Y , and the random vector (X, Y ). They have distributions PX , PY , and PX,Y . How are they related? The marginal distribution of X is just the distribution PX of X alone. We can recover its probability mass function from the joint probability mass function pX,Y as follows. v. 2017.02.02::09.29 KC Border Ma 3/103 Winter 2017 KC Border Joint Distributions 9–3 In the discrete case: ∑ pX (x) = P (X = x) = pX,Y (x, y) y Likewise ∑ pY (y) = P (Y = y) = pX,Y (x, y) y If X and Y are independent random variables, then pX,Y (x, y) = pX (x)pY (y). Larsen– Marx [2]: p. 169 For the density case, the marginal density of X, denoted fX is given by ∫ ∞ fX (x) = fX,Y (x, y) dy, −∞ and the marginal density fY of Y is given by ∫ ∞ fY (y) = fX,Y (x, y) dx. −∞ The recovery of a marginal density of X from a joint density of X and Y is sometimes described as “integrating out” out y. 9.5 ⋆ Expectation of a random vector Since random vectors are just vector-valued functions on a sample space S, we can add them and multiply them just like any other functions. For example, the sum of random vectors X = (X1,...,Xn) and Y = (Y1,...,Yn) is given by ( ) ( ) (X + Y )(s) = X(s) + Y (s) = X1(s),...,Xn(s) + Y1(s),...,Yn(s) . Thus the set of random vectors is a vector space. In fact, the subset of random vectors whose components have a finite expectation is also a vector subspace of the vector space of allrandom vectors. If X = (X1,...,Xn) is a random vector, and each Xi has expectation E Xi, the expectation of X is defined to be EX = (E X1,..., E Xn). • Expectation is a linear operator on the space of random vectors. This means that E(aX + bY ) = a EX + b EY . • Expectation is a positive operator on the space of random vectors. For vectors x = (x1, . , xn), define x = 0 if xi > 0 for each i = 1, . , n. Then X = 0 =⇒ EX = 0. 9.6 The distribution of a sum We already know to calculate the expectation of a sum of random variables—since expectation is a linear operator, the expectation of a sum is the sum of the expectations. KC Border v. 2017.02.02::09.29 Ma 3/103 Winter 2017 KC Border Joint Distributions 9–4 We are now in a position to describe the distribution of the sum of two random variables. Let Z = X + Y . Pitman [3]: Discrete case: p. 147 ∑ ∑ Larsen– P (Z = z) = P (x, y) = pX,Y (x, z − x) Marx [2]: (x,y):x+y=z all x p. 178ff. 9.6.1 Density of a sum If (X, Y ) has joint density fX,Y (x, y), what is the density of X + Y ? Recall that the density is Pitman [3]: the derivative of the cdf, so pp. 372–373 ∫ ∫ d d fX+Y (t) = P (X + Y 6 t) = fX,Y (x, y) dx dy dt dt { 6 − } ∫ (x,y∫):x t y ) d ∞ t−y = fX,Y (x, y) dx dy dt −∞ −∞ ∫ (∫ ) ∞ d t−y = fX,Y (x, y) dx dy −∞ dt −∞ ∫ ∞ = fX,Y (t − y, y) dy. −∞ So if X and Y are independent, we get the convolution ∫ ∞ fX+Y (t) = fX (t − y)fY (y) dy. −∞ 9.7 Covariance Pitman [3]: When X and Y are independent, we proved § 6.4, p. 430 Var(X + Y ) = Var X + Var Y. More generally however, since expectation is a positive linear operator, ( Var(X + Y ) = E (X + Y ) − E(X + Y ))2 ( = E (X − E X) + (Y − E Y ))2 ( = E (X − E X)2 + 2(X − E X)(Y − E Y ) + (Y − E Y )2 = Var(X) + Var(Y ) + 2 E(X − E X)(Y − E Y ). 9.7.1 Definition The covariance of X and Y is defined to be Cov(X, Y ) = E(X − E X)(Y − E Y ). (1) In general Var(X + Y ) = Var(X) + Var(Y ) + 2 Cov(X, Y ). v. 2017.02.02::09.29 KC Border Ma 3/103 Winter 2017 KC Border Joint Distributions 9–5 There is another way to write the covariance: Cov(X, Y ) = E(XY ) − E(X) E(Y ). (2) Proof : Since expectation is a positive linear operator, ( )( ) Cov(X, Y ) = E (X − E(X) Y − E(Y ) ( ) = E XY − X E(Y ) − Y E(X) + E(X) E(Y ) = E(XY ) − E(X) E(Y ) − E(X) E(Y ) + E(X) E(Y ) = E(XY ) − E(X) E(Y ). 9.7.2 Remark It follows that for any random variable X, Cov(X, X) = Var X. If X and Y are independent, then Cov(X, Y ) = 0. The converse is not true. 9.7.3 Example (Covariance = 0, but variables are not independent) (cf. Feller [1, p. 236]) Let X be a random variable that assumes the values ±1 and ±2, each with probability 1/4.(E X = 0) Define Y = X2, and let Y¯ = E Y (= 2.5). Then ( ) Cov(X, Y ) = E X(Y − Y¯ ) ( ) 1 ( ) 1 ( ) 1 ( ) 1 = 1(1 − Y¯ ) + (−1)(1 − Y¯ ) + 2(4 − Y¯ ) + (−2)(4 − Y¯ ) 4 4 4 4 = 0. But X and Y are not independent: P (X = 1 & Y = 1) = P (X = 1) = 1/2, but P (X = 1) = 1/4 and P (Y = 1) = 1/2, so P (X = 1) · P (Y = 1) = 1/8. □ 9.7.4 Example (Covariance = 0, but variables are not independent) Let U, V be independent and identically distributed random variables with E U = E V = 0.

Lecture 9: Joint Distributions

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support