Lecture 9: Joint Distributions

Department of Mathematics Ma 3/103 KC Border Introduction to Probability and Statistics Winter 2017 Lecture 9: Joint Distributions Relevant textbook passages: Pitman [3]: Chapter 5; Section 6.4–6.5 Larsen–Marx [2]: Sections 3.7, 3.8, 3.9, 3.11 9.1 Random vectors and joint distributions Recall that a random variable X is a real-valued function on the sample space (S, E,P ), where P is a probability measure on S; and that it induces a probability measure PX on R, called the distribution of X, given by ( ) PX (B) = P (X ∈ B) = P {s ∈ S : X(s) ∈ B} , for every (Borel) subset of R. The distribution is enough to calculate the expectation of any (Borel) function of X. Now suppose I have more than one random variable on the same sample space. Then I can consider the random vector (X, Y ), or X = (X1,...,Xn). n • A random vector X defines a probability PX on R , called the distribution of X via: { ( ) } PX (B) = P (X ∈ B) = P s ∈ S : X1(s),...,Xn(s) ∈ B , for every (Borel) subset of Rn. This distribution is also called the joint distribution of X1,...,Xn. • We can use this to define a joint cumulative distribution function, denoted FX , by FX (x1, . , xn) = P (Xi 6 xi, for all i = 1, . , n) • N.B. While the subscript X,Y or X is often used to identify a joint distribution or cumulative distribution function, it is also frequently omitted. You are supposed to figure out the domain of the function by inspecting its arguments. 9.1.1 Example Let S = {SS,SF,FS,FF } and let P be the probability measure on S defined by 7 3 1 1 P (SS) = ,P (SF ) = ,P (FS) = ,P (FF ) = . 12 12 12 12 Define the random variables X and Y by X(SS) = 1,X(SF ) = 1,X(FS) = 0,X(FF ) = 0, Y (SS) = 1,Y (SF ) = 0,Y (FS) = 1,Y (FF ) = 0. That is, X and Y indicate Success or Failure on two different experiments, but the experiments are not necessarily independent. 9–1 Ma 3/103 Winter 2017 KC Border Joint Distributions 9–2 Then 10 2 P (1) = ,P (0) = , X 12 X 12 8 4 P (1) = ,P (0) = , Y 12 Y 12 7 3 1 1 P (1, 1) = ,P (1, 0) = ,P (0, 1) = ,P (0, 0) = . X,Y 12 X,Y 12 X,Y 12 X,Y 12 □ 9.2 Joint PMFs A random vector X on a probability space (S, E,P ) is discrete, if you can enumerate its range. Pitman [3]: Section 3.1; When X1,...,Xn are discrete, the joint probability mass function of the random vector also p. 348 X = (X1,...,Xn) is usually denoted pX , and is given by Larsen– Marx [2]: pX (x1, x2, . , xn) = P (X1 = x1 and X2 = x2 and ··· and Xn = xn) . Section 3.7 If X and Y are independent random variables, then pX,Y (x, y) = pX (x)pY (y). For a function g of X and Y we have ∑ ∑ E g(X, Y ) = g(x, y)pX,Y (x, y). x y 9.3 Joint densities Pitman [3]: Chapter 5.1 Let X and Y be random variables on a probability space (S, E,P ). The random vector (X, Y ) 2 Larsen– has a joint density fX,Y (x, y) if for every (Borel) subset B ⊂ R , Marx [2]: ∫ ∫ Section 3.7 P ((X, Y ) ∈ B) = fX,Y (x, y) dx dy. B If X and Y are independent, then fX,Y (x, y) = fX (x)fY (y). For example, ∫ ∞ ∫ ∞ P (X > Y ) = fX,Y (x, y) dx dy. −∞ y For a function g of X and Y we have ∫ ∞ E g(X, Y ) = g(x, y)fX,Y (x, y) dx dy. −∞ Again, the subscript X,Y or X on a density is frequently omitted. 9.4 Recovering marginal distributions from joint distributions So now we have random variables X and Y , and the random vector (X, Y ). They have distributions PX , PY , and PX,Y . How are they related? The marginal distribution of X is just the distribution PX of X alone. We can recover its probability mass function from the joint probability mass function pX,Y as follows. v. 2017.02.02::09.29 KC Border Ma 3/103 Winter 2017 KC Border Joint Distributions 9–3 In the discrete case: ∑ pX (x) = P (X = x) = pX,Y (x, y) y Likewise ∑ pY (y) = P (Y = y) = pX,Y (x, y) y If X and Y are independent random variables, then pX,Y (x, y) = pX (x)pY (y). Larsen– Marx [2]: p. 169 For the density case, the marginal density of X, denoted fX is given by ∫ ∞ fX (x) = fX,Y (x, y) dy, −∞ and the marginal density fY of Y is given by ∫ ∞ fY (y) = fX,Y (x, y) dx. −∞ The recovery of a marginal density of X from a joint density of X and Y is sometimes described as “integrating out” out y. 9.5 ⋆ Expectation of a random vector Since random vectors are just vector-valued functions on a sample space S, we can add them and multiply them just like any other functions. For example, the sum of random vectors X = (X1,...,Xn) and Y = (Y1,...,Yn) is given by ( ) ( ) (X + Y )(s) = X(s) + Y (s) = X1(s),...,Xn(s) + Y1(s),...,Yn(s) . Thus the set of random vectors is a vector space. In fact, the subset of random vectors whose components have a finite expectation is also a vector subspace of the vector space of allrandom vectors. If X = (X1,...,Xn) is a random vector, and each Xi has expectation E Xi, the expectation of X is defined to be EX = (E X1,..., E Xn). • Expectation is a linear operator on the space of random vectors. This means that E(aX + bY ) = a EX + b EY . • Expectation is a positive operator on the space of random vectors. For vectors x = (x1, . , xn), define x = 0 if xi > 0 for each i = 1, . , n. Then X = 0 =⇒ EX = 0. 9.6 The distribution of a sum We already know to calculate the expectation of a sum of random variables—since expectation is a linear operator, the expectation of a sum is the sum of the expectations. KC Border v. 2017.02.02::09.29 Ma 3/103 Winter 2017 KC Border Joint Distributions 9–4 We are now in a position to describe the distribution of the sum of two random variables. Let Z = X + Y . Pitman [3]: Discrete case: p. 147 ∑ ∑ Larsen– P (Z = z) = P (x, y) = pX,Y (x, z − x) Marx [2]: (x,y):x+y=z all x p. 178ff. 9.6.1 Density of a sum If (X, Y ) has joint density fX,Y (x, y), what is the density of X + Y ? Recall that the density is Pitman [3]: the derivative of the cdf, so pp. 372–373 ∫ ∫ d d fX+Y (t) = P (X + Y 6 t) = fX,Y (x, y) dx dy dt dt { 6 − } ∫ (x,y∫):x t y ) d ∞ t−y = fX,Y (x, y) dx dy dt −∞ −∞ ∫ (∫ ) ∞ d t−y = fX,Y (x, y) dx dy −∞ dt −∞ ∫ ∞ = fX,Y (t − y, y) dy. −∞ So if X and Y are independent, we get the convolution ∫ ∞ fX+Y (t) = fX (t − y)fY (y) dy. −∞ 9.7 Covariance Pitman [3]: When X and Y are independent, we proved § 6.4, p. 430 Var(X + Y ) = Var X + Var Y. More generally however, since expectation is a positive linear operator, ( Var(X + Y ) = E (X + Y ) − E(X + Y ))2 ( = E (X − E X) + (Y − E Y ))2 ( = E (X − E X)2 + 2(X − E X)(Y − E Y ) + (Y − E Y )2 = Var(X) + Var(Y ) + 2 E(X − E X)(Y − E Y ). 9.7.1 Definition The covariance of X and Y is defined to be Cov(X, Y ) = E(X − E X)(Y − E Y ). (1) In general Var(X + Y ) = Var(X) + Var(Y ) + 2 Cov(X, Y ). v. 2017.02.02::09.29 KC Border Ma 3/103 Winter 2017 KC Border Joint Distributions 9–5 There is another way to write the covariance: Cov(X, Y ) = E(XY ) − E(X) E(Y ). (2) Proof : Since expectation is a positive linear operator, ( )( ) Cov(X, Y ) = E (X − E(X) Y − E(Y ) ( ) = E XY − X E(Y ) − Y E(X) + E(X) E(Y ) = E(XY ) − E(X) E(Y ) − E(X) E(Y ) + E(X) E(Y ) = E(XY ) − E(X) E(Y ). 9.7.2 Remark It follows that for any random variable X, Cov(X, X) = Var X. If X and Y are independent, then Cov(X, Y ) = 0. The converse is not true. 9.7.3 Example (Covariance = 0, but variables are not independent) (cf. Feller [1, p. 236]) Let X be a random variable that assumes the values ±1 and ±2, each with probability 1/4.(E X = 0) Define Y = X2, and let Y¯ = E Y (= 2.5). Then ( ) Cov(X, Y ) = E X(Y − Y¯ ) ( ) 1 ( ) 1 ( ) 1 ( ) 1 = 1(1 − Y¯ ) + (−1)(1 − Y¯ ) + 2(4 − Y¯ ) + (−2)(4 − Y¯ ) 4 4 4 4 = 0. But X and Y are not independent: P (X = 1 & Y = 1) = P (X = 1) = 1/2, but P (X = 1) = 1/4 and P (Y = 1) = 1/2, so P (X = 1) · P (Y = 1) = 1/8. □ 9.7.4 Example (Covariance = 0, but variables are not independent) Let U, V be independent and identically distributed random variables with E U = E V = 0.

Lecture 9: Joint Distributions

Probability and Statistics Lecture Notes

The Probability Set-Up.Pdf

1 Probabilities

Propensities and Probabilities

Topic 1: Basic Probability Definition of Sets

Probability Theory Review 1 Basic Notions: Sample Space, Events

1 Probability Measure and Random Variables

CSE 21 Mathematics for Algorithm and System Analysis

Characterization of Palm Measures Via Bijective Point-Shifts

Probability Spaces Lecturer: Dr

1 Probability Spaces

Set Theory Background for Probability Defining Sets (A Very Naïve Approach)