<<

6.3, 6.4 Conditional Distributions Conditional / Distributions

Let X and Y be random variables then Lecture 21: Conditional Distributions and / :

Correlation P(X = x, Y = y) P(X = x|Y = y) = P(Y = y) 104 f (x, y) f (x|y) = Colin Rundel fY (y) Product rule: April 9, 2012 P(X = x, Y = y) = P(X = x|Y = y)P(Y = y)

f (x, y) = f (x|y)fY (y)

Statistics 104 (Colin Rundel) Lecture 21 April 9, 2012 1 / 23

6.3, 6.4 Conditional Distributions 6.3, 6.4 Conditional Distributions Bayes’ Theorem and the Example - Conditional Uniforms

Let X and Y be random variables then Let X ∼ Unif(0, 1) and Y |X ∼ Unif(x, 1), find f (x, y), fY (y), and f (x|y). Bayes’ Theorem:

f (x, y) f (y|x)f (x) f (x|y) = = X fY (y) fY (y) f (x, y) f (x|y)f (y) f (y|x) = = Y fX (x) fX (x) Law of Total Probability: Z ∞ Z ∞ fX (x) = f (x, y) dy = f (x|y)fY (y) dy −∞ −∞ Z ∞ Z ∞ fY (y) = f (x, y) dx = f (y|x)fX (x) dx −∞ −∞

Statistics 104 (Colin Rundel) Lecture 21 April 9, 2012 2 / 23 Statistics 104 (Colin Rundel) Lecture 21 April 9, 2012 3 / 23 6.3, 6.4 Conditional Distributions 6.3, 6.4 Independence and Conditional Distributions Conditional Expectation

If X and Y are independent random variables then If X and Y are independent random variables then we define the conditional expectation as follows

f (x|y) = f (x) 4.7 Conditional Expectation X 257 X The value of E(Y x) will not be uniquely defined for those values of x such that E(X |Y = y) = xf (x|y) dx f (they marginal|x) = p.f.fY or( p.d.f.y| ) of X satisfies f (x) 0. However, since these values of x 1 = all y 256 Chapter 4 Expectation form a set of points whose probability is 0, the definition of E(Y x) at such a point | is irrelevant. (See Exercise 11 in Sec. 3.6.) It is also possible that there will be some Z ∞ 4.7 Conditional Expectation values of x such that the mean of the conditional distribution of Y given X x is Justification for this is straightundefined forward: for those x values. When the mean of Y exists and is finite, the set= of x E(X |Y = y) = xf (x|y) dx Since expectations (including and ) are propertiesvalues forof distri- which the conditional mean is undefined has probability 0. −∞ butions, there will exist conditional versions of all such distributionalThe summaries expressions in Eqs. (4.7.1) and (4.7.2) are functions of x. These functions of as well as conditional versions of all theorems that we havef proven(x,cany or) be will computed laterfX (x)f beforeY (y)X is observed, and this idea leads to the following useful prove about expectations. In particular, suppose that we wish to predict one ran- f (x|y) = concept.= = fX (x) dom variable Y using a function d(X) of another random variablefY (y)X so as to fY (y) minimize E([Y d(X)]2). Then d(X) should be the conditional mean of Y given X. There is also− a very useful theorem that is anDefinition extension to expectationsConditional of Meansthe as Random Variables. Let h(x) stand for the function of x that is law of total probability. 4.7.2 denoted E(Y x) in either (4.7.1) or (4.7.2). Define the symbol E(Y X) to mean h(X) and call it the| conditional mean of Y given X. |

Definition and Basic Properties In other words, E(Y X) is a (a function of X) whose value when X x is E(Y x). Obviously,| we could define E(X Y)and E(X y) analogously. Example Household Survey. A collection of households were surveyed, and each= household| re- | | 4.7.1 ported the number of members and the number of automobiles owned. The reported numbers are in Table 4.1. Example Household Survey. Consider the household survey in Example 4.7.1. Let X be the Suppose that we were to sample a household at4.7.2 random fromnumber those households of members in a randomly selected household from the survey, and let Y be in the survey and learn the number of members. What would thenthe numberbe the expected of cars owned by that household. The 250 surveyed households are all number of automobiles that they own? equally likely to! be selected, so Pr(X x, Y y) is the number of households with = = The question at the end of Example 4.7.1 is closely relatedx members to the conditional and y cars, divided by 250. Those are reported in Table 4.2. Suppose that the sampled household has X 4 members. The conditional p.f. of Y distributionStatistics of one 104 random (Colin variable Rundel) given the other, as defined in Sec.Lecture 3.6. 21 April 9, 2012 4 / 23 Statistics 104 (Colin Rundel) Lecture 21 April 9, 2012 5 / 23 given X 4 is g (y 4) f(4, y)/f (4), which= is the x 4 column of Table 4.2 divided = 2 | = 1 = Definition Conditional Expectation/Mean. Let X and Y be random variablesby suchf1( that4) the0.208, mean namely, 4.7.1 of Y exists and is finite. The conditional expectation (or conditional mean)= of Y given g (0 4) 0.0385,g(1 4) 0.5769,g(2 4) 0.2885,g(3 4) 0.0962. X x is denoted by E(Y x) and is defined to be the expectation of the2 | conditional= 2 | = 2 | = 2 | = distribution= of Y given X| x. = The conditional mean of Y given X 4 is then = For example, if Y has a continuous conditional distribution6.3, given 6.4X xConditionalwith Expectation 6.3, 6.4 Conditional Expectation E(Y= 4) 0 0.0385 1 0.5769 2 0.2885 3 0.0962 1.442. conditional p.d.f. g2(y x), then | = × + × + × + × = | Similarly, we can compute E(Y x) for all eight values of x. They are E(Y x) ∞ yg (y x) dy. (4.7.1) | Example - Family| = 2 Cars| (Example 4.7.2 deGroot) Conditional Expectation as a Random Variable !−∞ x 12345678 Similarly, if Y has a discrete conditional distribution given X x with conditional p.f. g (y x), then = 2 | E(Y x) 0.609 1.057 1.317 1.442 1.538 1.533 1.75 2 Let X be the numberE(Y x) of membersyg (y x). in a randomly| (4.7.2) selected household and Y be the number of | = 2 | Based on the previous example we can see that the value of E(Y |X ) cars owned by that household.All y " changes depending on the value of x. Table 4.2 Joint p.f. f (x, y) of X and Y in Example 4.7.2 together with marginal Table 4.1 Reported numbers of household members and p.f.’s f1(x) and f2(y) automobiles in Example 4.7.1 As such we can think of the conditional expectation as being a function of x Number of members Number of the random variable X , thereby making E(Y |X ) itself a random variable, y 12345678f2(y) automobiles 12345678 0 0.040 0.028 0.012 0.008 0.008 0.004 0 0 0.100 which can be manipulated like any other random variable. 0 10 7 3 2 2 1 0 0 1 0.048 0.084 0.100 0.120 0.100 0.060 0.020 0.004 0.536 1 12 21 25 30 25 15 5 1 Law of Total Probability for Expectations: 2 1 5 10 15 20 11 5 3 2 0.004 0.020 0.040 0.060 0.080 0.044 0.020 0.012 0.280 302355321 3 0 0.008 0.012 0.020 0.020 0.012 0.008 0.004 0.084 f1(x) 0.092 0.140 0.164 0.208 0.208 0.120 0.048 0.020 Z ∞ Z ∞ Z ∞ E[E(Y |X )] = E(Y |X = x)fX (x) dx = yf (y|x)fX (x) dy dx We can then calculate the expected number of cars in a household with 4 members as follows −∞ −∞ −∞ Z ∞ Z ∞ Z ∞ Z ∞ 4 4 f (x, y) X X = y fX (x) dy dx = yf (x, y) dy dx E(Y |X = 4) = y · P(Y = y|X = 4) = y · P(X = 4, Y = y)/P(X = 4) −∞ −∞ fX (x) −∞ −∞ y=0 y=0 Z ∞ Z ∞  = y f (x, y) dx dy = 0 × 0.0385 + 1 × 0.5769 + 2 × 0.2885 + 3 × 0.0962 = 1.442 −∞ −∞ Similarly, we can calulcate E(Y |X ) for the other seven values of x Z ∞ = yfY (y) dy = E(Y ) x 1 2 3 4 5 6 7 8 −∞ E(Y |X = x) 0.609 1.057 1.317 1.442 1.538 1.533 1.75 2 Statistics 104 (Colin Rundel) Lecture 21 April 9, 2012 6 / 23 Statistics 104 (Colin Rundel) Lecture 21 April 9, 2012 7 / 23 4.7 Conditional Expectation 257

The value of E(Y x) will not be uniquely defined for those values of x such that | the marginal p.f. or p.d.f. of X satisfies f1(x) 0. However, since these values of x form a set of points whose probability is 0, the= definition of E(Y x) at such a point is irrelevant. (See Exercise 11 in Sec. 3.6.) It is also possible that there| will be some values of x such that the mean of the conditional distribution of Y given X x is undefined for those x values. When the mean of Y exists and is finite, the set= of x values for which the conditional mean is undefined has probability 0. The expressions in Eqs. (4.7.1) and (4.7.2) are functions of x. These functions of x can be computed before X is observed, and this idea leads to the following useful concept.

Definition Conditional Means as Random Variables. Let h(x) stand for the function of x that is 4.7.2 denoted E(Y x) in either (4.7.1) or (4.7.2). Define the symbol E(Y X) to mean h(X) and call it the| conditional mean of Y given X. |

In other words, E(Y X) is a random variable (a function of X) whose value when X x is E(Y x). Obviously,| we could define E(X Y)and E(X y) analogously. = | | | Example Household Survey. Consider the household survey in Example 4.7.1. Let X be the 4.7.2 number of members in a randomly selected household from the survey, and let Y be the number of cars owned by that household. The 250 surveyed households are all equally likely to be selected, so Pr(X x, Y y) is the number of households with x members and y cars, divided by 250.= Those= probabilities are reported in Table 4.2. Suppose that the sampled household has X 4 members. The conditional p.f. of Y 6.3, 6.4 Conditional= Expectation 6.3, 6.4 Conditional Expectation given X 4 is g2(y 4) f(4, y)/f1(4), which is the x 4 column of Table 4.2 divided by f (4)= 0.208, namely,| = = 1 = g (0 4) 0.0385,g(1 4) 0.5769,g(2 4) 0.2885,g(3 4) 0.0962. 2 | = 2 | = 2 | = 2 | = Example - FamilyThe Cars, conditional mean cont. of Y given X 4(Example is then 4.7.2 deGroot) Example - Conditional Uniforms, cont. = E(Y 4) 0 0.0385 1 0.5769 2 0.2885 3 0.0962 1.442. | = × + × + × + × = Similarly, we can compute E(Y x) for all eight values of x. They are | We can confirm the Lawx of Total12345678 Probability for Expectations using the Let X ∼ Unif(0, 1) and Y |X ∼ Unif(x, 1), find E(Y ). E(Y x) 0.609 1.057 1.317 1.442 1.538 1.533 1.75 2 data from the previous example.|

Table 4.2 Joint p.f. f (x, y) of X and Y in Example 4.7.2 together with marginal p.f.’s f1(x) and f2(y)

x

y 12345678f2(y)

0 0.040 0.028 0.012 0.008 0.008 0.004 0 0 0.100 1 0.048 0.084 0.100 0.120 0.100 0.060 0.020 0.004 0.536 2 0.004 0.020 0.040 0.060 0.080 0.044 0.020 0.012 0.280 3 0 0.008 0.012 0.020 0.020 0.012 0.008 0.004 0.084

f1(x) 0.092 0.140 0.164 0.208 0.208 0.120 0.048 0.020 x 1 2 3 4 5 6 7 8 E(Y |X = x) 0.609 1.057 1.317 1.442 1.538 1.533 1.75 2

3 X E(Y ) = yfY (y) = 0 × 0.100 + 1 × 0.536 + 2 × 0.280 + 3 × 0.084 = 1.348 y=0

8 X E(Y ) = E[E(Y |X )] = E(Y |X = x)fX (x) = 0.609 × 0.092 + ··· + 2 × 0.020 = 1.348 x=1

Statistics 104 (Colin Rundel) Lecture 21 April 9, 2012 8 / 23 Statistics 104 (Colin Rundel) Lecture 21 April 9, 2012 9 / 23

6.3, 6.4 Conditional Expectation 6.3, 6.4 Conditional Expectation Example - Deriving Useful Identities Conditional

Using conditional expectations in this way is useful as it allows us to treat Similar to conditional expectation we can also define conditional variance the random variable we are conditioning on as constant, which can make some problems much simpler. h i 2 Var(Y |X ) = E (Y − E(Y |X )) X Let E(Y |X ) = aX + b for some constants a and b, what is E(XY )? = E(Y 2|X ) − E(Y |X )2

there is also an equivalent to the law of total probability for expectations, the Law of Total Probability for Variance:

Var(Y ) = E[Var(Y |X )] + Var[E(Y |X )]

Statistics 104 (Colin Rundel) Lecture 21 April 9, 2012 10 / 23 Statistics 104 (Colin Rundel) Lecture 21 April 9, 2012 11 / 23 6.3, 6.4 Conditional Expectation 6.3, 6.4 Covariance and Correlation

Example - Joint Distribution Exercise deGroot 4.7.7 Covariance

Suppose that X and Y have a continuous joint distribution for which the joint pdf is as follows: We have previously discussed Covariance in relation to the variance of the ( x + y for 0 < x < 1 and 0 < y < 1 sum of two random variables (Review Lecture 8). f (x, y) = 0 otherwise

Find E(Y |X ) and Var(Y |X ). Var(X + Y ) = Var(X ) + Var(Y ) + 2Cov(X , Y ) Specifically, covariance is defined as

Cov(X , Y ) = E[(X − E(X ))(Y − E(Y ))] = E(XY ) − E(X )E(Y )

this is a generalization of variance to two random variables and generally measures the degree to which X and Y tend to be large (or small) at the same time or the degree to which one tends to be large while the other is small.

Statistics 104 (Colin Rundel) Lecture 21 April 9, 2012 12 / 23 Statistics 104 (Colin Rundel) Lecture 21 April 9, 2012 13 / 23

6.3, 6.4 Covariance and Correlation 6.3, 6.4 Covariance and Correlation Covariance, cont. Properties of Covariance

The magnitude of the covariance is not usually informative since it is affected by the magnitude of both X and X . However, the sign of the Cov(X , Y ) = E[(X − µx )(Y − µy )] = E(XY ) − µx µy covariance tells us something useful about the relationship between X and Y . Cov(X , Y ) = Cov(Y , X )

Consider the following conditions: Cov(X , Y ) = 0 if X and Y are independent

X > µX and Y > µY then (X − µX )(Y − µY ) will be positive. Cov(X , c) = 0

X < µX and Y < µY then (X − µX )(Y − µY ) will be positive. Cov(X , X ) = Var(X )

X > µX and Y < µY then (X − µX )(Y − µY ) will be negative. Cov(aX , bY ) = ab Cov(X , Y )

X < µX and Y > µY then (X − µX )(Y − µY ) will be negative. Cov(X + a, Y + b) = Cov(X , Y ) Cov(X , Y + Z) = Cov(X , Y ) + Cov(X , Z)

Statistics 104 (Colin Rundel) Lecture 21 April 9, 2012 14 / 23 Statistics 104 (Colin Rundel) Lecture 21 April 9, 2012 15 / 23 6.3, 6.4 Covariance and Correlation 6.3, 6.4 Covariance and Correlation Correlation Correlation, cont.

Since Cov(X , Y ) depends on the magnitude of X and Y we would prefer to have a measure of association that is not effected by arbitrary changes in the scales of the random variables.

The most common measure of linear association is correlation which is defined as

Cov(X , Y ) ρ(X , Y ) = σX σY −1 < ρ(X , Y ) < 1 Where the magnitude of the correlation measures the strength of the linear association and the sign determines if it is a positive or negative relationship.

Statistics 104 (Colin Rundel) Lecture 21 April 9, 2012 16 / 23 Statistics 104 (Colin Rundel) Lecture 21 April 9, 2012 17 / 23

6.3, 6.4 Covariance and Correlation 6.3, 6.4 Covariance and Correlation Correlation and Independence Example - Linear Dependence

Given random variables X and Y Let X ∼ Unif(0, 1) and Y |X = aX + b for constants a and b. Find X and Y are independent =⇒ Cov(X , Y ) = ρ(X , Y ) = 0 Cov(X , Y ) and ρ(X , Y )

Cov(X , Y ) = ρ(X , Y ) = 0 =6 ⇒ X and Y are independent

Example - Let X = −1, 0, 1 with equal probability and Y = X 2. Clearly X and Y are not independent random variables.

1 1 1 E(X ) = −1 × + 0 × + 1 × = 0 3 3 3 1 1 1 E(Y ) = E(X 2) = 1 × + 0 × + 1 × = 2/3 3 3 3 1 1 1 E(XY ) = E(X 3) = −1 × + 0 × + 1 × = 0 3 3 3 Cov(X , Y ) = E(XY ) − E(X )E(Y ) = 0

Statistics 104 (Colin Rundel) Lecture 21 April 9, 2012 18 / 23 Statistics 104 (Colin Rundel) Lecture 21 April 9, 2012 19 / 23 6.3, 6.4 Covariance and Correlation 6.3, 6.4 Covariance and Correlation Example - Joint Distribution Example - Joint Distribution, cont.

Suppose that X and Y have a continuous joint distribution for which the joint pdf is as follows: ( x + y for 0 < x < 1 and 0 < y < 1 f (x, y) = 0 otherwise

Find Cov(X , Y ) and ρ(X , Y ).

Statistics 104 (Colin Rundel) Lecture 21 April 9, 2012 20 / 23 Statistics 104 (Colin Rundel) Lecture 21 April 9, 2012 21 / 23

6.3, 6.4 Covariance and Correlation 6.3, 6.4 Covariance and Correlation Multinomial Distribution Example - Covariance of Multinomial Distribution

Marginal distribution of X - consider category i a success and all other categories to be a Let X1, X2, ··· , Xk be the k random variables that reflect the number of i failure, therefore in the n trials there are Xi successes and n − Xi failures with the probability of outcomes belonging to categories 1,..., k in n trials with the probability success is pi and failure is 1 − pi which means Xi has a of success for category i being pi , X1, ··· , Xk ∼ Multinom(n, p1, ··· , pk ) Xi ∼ Binom(n, pi )

Similarly, the distribution of Xi + Xj will also follow a Binomial distribution with probability of P(X1 = x1, ··· , Xk = xk ) = f (x1, ··· , xk |n, p1, ··· , pk ) success pi + pj . n! x1 xk = p1 ··· pk x1! ··· xk ! Xi + Xj ∼ Binom(n, pi + pj ) k k X X where xi = n and pi = 1 i=1 i=1

E(Xi ) = npi

Var(Xi ) = npi (1 − pi )

Cov(Xi , Xj ) = −npi pj

Statistics 104 (Colin Rundel) Lecture 21 April 9, 2012 22 / 23 Statistics 104 (Colin Rundel) Lecture 21 April 9, 2012 23 / 23