Gov 2000: 3. Multiple Random Variables

Matthew Blackwell Fall 2016

1 / 57 1. Distributions of Multiple Random Variables

2. Properties of Joint Distributions

3. Conditional Distributions

4. Wrap-up

2 / 57 Where are we? Where are we going?

• Distributions of one variable: how to describe and summarize uncertainty about one variable. • Today: distributions of multiple variables to describe relationships between variables. • Later: use data to learn about probability distributions.

3 / 57 Why multiple random variables?

10

9

8

7

Log GDP/popgrowth 6 1 2 3 4 5 6 7 8

Log Settler Mortality

1. How do we summarize the relationship between two variables, 푋 and 푌? 2. What if we have many observations of the same variable, 푋1, 푋2, … , 푋푛?

4 / 57 1/ Distributions of Multiple Random Variables

5 / 57 Joint distributions

5 2 2

0 1 1

y y y -5 0 0

-10 -1 -1

-2 -2 -15

-2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2

x x x

• The joint distribution of two r.v.s, 푋 and 푌, describes what pairs of observations, (푥, 푦) are more likely than others. ▶ Settler mortality (푋) and GDP per capita (푌) for the same country. • Shape of the joint distribution now includes the relationship between 푋 and 푌

6 / 57 Discrete r.v.s

Joint probability mass function The joint p.m.f. of a pair of discrete r.v.s, (푋, 푌) describes the probability of any pair of values:

푓푋,푌 (푥, 푦) = ℙ(푋 = 푥, 푌 = 푦)

• Properties of a joint p.m.f.:

▶ 푓푋,푌 (푥, 푦) ≥ 0 (probabilities can’t be negative) ▶ ∑푥 ∑푦 푓푋,푌 (푥, 푦) = 1 (something must happen) ▶ ∑푥 is shorthand for sum over all possible values of 푋

7 / 57 Example: Gay marriage and gender

Favor Gay Oppose Gay Marriage Marriage 푌 = 1 푌 = 0 Female 푋 = 1 0.3 0.21 Male 푋 = 0 0.22 0.27

• Joint p.m.f. can be summarized in a cross-tab:

▶ Each cell is the probability of that combination, 푓푋,푌 (푥, 푦) • Probability that we randomly select a woman who favors gay marriage?

푓푋,푌 (1, 1) = ℙ(푋 = 1, 푌 = 1) = 0.3

8 / 57 Marginal distributions

• Often need to figure out the distribution of just one ofthe r.v.s

▶ Called the marginal distribution in this context. • Computing marginals from the joint p.m.f.:

푓푌 (푦) = ℙ(푌 = 푦) = ∑ 푓푋,푌 (푥, 푦) 푥 • Intuition: sum over the probability that 푌 = 푦 for all possible values of 푥

▶ Works because these are mutually exclusive events that partition the space of 푋

9 / 57 Example: marginals for gay marriage

Favor Gay Oppose Gay Marriage Marriage Marginal 푌 = 1 푌 = 0 Female 푋 = 1 0.3 0.21 0.51 Male 푋 = 0 0.22 0.27 0.49 Marginal 0.52 0.48

• What’s the 푓푌 (1) = ℙ(푌 = 1)? ▶ Probability that a man favors gay marriage plus the probability that a woman favors gay marriage.

푓푌 (1) = 푓푋,푌 (1, 1) + 푓푋,푌 (0, 1) = 0.3 + 0.22 = 0.52

• Works for all marginals.

10 / 57 Continuous r.v.s

푌 퐴

• We will focus on getting the probability of being in some subset of the 2-dimensional plane.

11 / 57 Continuous joint p.d.f.

Continuous joint distribution Two continuous r.v.s 푋 and 푌 have a continuous joint distribution if there is a nonnegative function 푓푋,푌 (푥, 푦) such that for any subset 퐴 of the 푥푦-plane,

ℙ((푋, 푌) ∈ 퐴) = ∬ 푓 (푥, 푦)푑푥푑푦. (푥,푦)∈퐴 푋,푌

• 푓푋,푌 (푥, 푦) is the joint probability density function. • {(푥, 푦) ∶ 푓푋,푌 (푥, 푦) > 0} is called the support of the distribution. • Joint p.d.f. must meet the following conditions:

1. 푓푋,푌 (푥, 푦) ≥ 0 for all values of (푥, 푦), (nonnegative) 2. ∞ ∞ , (probabilities “sum” to 1) ∫−∞ ∫−∞ 푓푋,푌 (푥, 푦)푑푥푑푦 = 1 • ℙ(푋 = 푥, 푌 = 푦) = 0 for similar reasons as with single r.v.s.

12 / 57 Joint densities are 3D

4 0.15

2 0.10

y 0

0.05 -2

-4 0.00 -4 -2 0 2 4

x

• 푋 and 푌 axes are on the “floor,” height is the value of 푓푋,푌 (푥, 푦). • Remember 푓푋,푌 (푥, 푦) ≠ ℙ(푋 = 푥, 푌 = 푦).

13 / 57 Probability = volume

4 0.15

2 0.10

y 0

0.05 -2

-4 0.00 -4 -2 0 2 4

x

• ℙ((푋, 푌) ∈ 퐴) = ∬(푥,푦)∈퐴 푓푋,푌 (푥, 푦)푑푥푑푦 • Probability = volume above a specific region.

14 / 57 Working with joint p.d.f.s

• Suppose we have the following form of a joint p.d.f.:

{⎧푐(푥 + 푦) for 0 < 푥 < 2 and 0 < 푦 < 2 푓푋,푌 (푥, 푦) = ⎨ ⎩{0 otherwise • What does 푐 have to be for this to be a valid p.d.f.? ∞ ∞ 1 = ∫ ∫ 푓 (푥, 푦)푑푥푑푦 −∞ −∞ 푋,푌 2 2 = ∫ ∫ 푐(푥 + 푦)푑푥푑푦 0 0 푥=2 2 푥2 = 푐 ∫ ( + 푥푦)∣ 푑푦 0 2 푥=0 2 = 푐 ∫ (2 + 2푦)푑푦 0 2 = (2푐푦 + 푐푦2)∣ = 8푐 0

• Thus to be a valid p.d.f., 푐 = 1/8

15 / 57 Example continuous distribution

2.0 0.5

0.4 1.5

zplus 0.3

y 1.0 0.2

0.5 0.1

0.0 0.0

0.0 0.5 1.0 1.5 2.0 y x x

{⎧(푥 + 푦)/8 for 0 < 푥 < 2 and 0 < 푦 < 2 푓푋,푌 (푥, 푦) = ⎨ ⎩{0 otherwise

16 / 57 Continuous marginal distributions

• We can recover the marginal PDF of one of the variables by integrating over the distribution of the other variable:

∞ 푓 (푦) = ∫ 푓 (푥, 푦)푑푥 푌 −∞ 푋,푌 • Works for either variable:

∞ 푓 (푥) = ∫ 푓 (푥, 푦)푑푦 푋 −∞ 푋,푌

17 / 57 Visualizing continuous marginals

z

y

x

• Marginal integrates (sums, basically) over other r.v.: ∞ 푓푌 (푦) = ∫−∞ 푓푋,푌 (푥, 푦)푑푥 • Pile up/flatten all of the joint density onto a single dimension.

18 / 57 Deriving continuous marginals

{⎧(푥 + 푦)/8 for 0 < 푥 < 2 and 0 < 푦 < 2 푓푋,푌 (푥, 푦) = ⎨ ⎩{0 otherwise

• Let’s calculate the marginals for this p.d.f.: 2 1 푓 (푥) = ∫ (푥 + 푦)푑푦 푋 0 8 푦=2 푥푦 푦2 = ( + )∣ 8 16 푦=0 푥 1 푥 + 1 = + = 4 4 4

• By symmetry we have the same for 푦:

푓푌 (푦) = (푦 + 1)/4

19 / 57 Joint c.d.f.s

Joint cumulative distribution function For two r.v.s 푋 and 푌, the joint cumulative distribution function or joint c.d.f. 퐹푋,푌 (푥, 푦) is a function such that for finite values 푥 and 푦, 퐹푋,푌 (푥, 푦) = ℙ(푋 ≤ 푥, 푌 ≤ 푦).

2 • 휕 퐹푋,푌 (푥,푦) Deriving p.d.f. from c.d.f.: 푓푋,푌 (푥,푦) = 휕푥휕푦 • Deriving c.d.f. from p.d.f: 푦 푥 퐹푋,푌 (푥, 푦) = ∫−∞ ∫−∞ 푓푋,푌 (푟, 푠)푑푟푑푠

20 / 57 2/ Properties of Joint Distributions

21 / 57 Properties of joint distributions

• Single r.v.: summarized 푓푋(푥) with 피[푋] and 핍[푋] • With 2 r.v.s, we can additionally measure how strong the dependence is between the variables. • First: expectations over joint distributions and independence

22 / 57 Expectations over multiple r.v.s

• 2-d LOTUS: take expectations over the joint distribution. • With discrete 푋 and 푌:

피[푔(푋, 푌)] = ∑ ∑ 푔(푥, 푦) 푓푋,푌 (푥, 푦) 푥 푦 • With continuous 푋 and 푌:

피[푔(푋, 푌)] = ∫ ∫ 푔(푥, 푦) 푓 (푥, 푦)푑푥푑푦 푥 푦 푋,푌 • Marginal expectations:

피[푌] = ∑ ∑ 푦 푓푋,푌 (푥, 푦) 푥 푦 • Example: expectation of the product:

피[푋푌] = ∑ ∑ 푥푦 푓푋,푌 (푥, 푦) 푥 푦

23 / 57 Marginal expectations from joint

{⎧(푥 + 푦)/8 for 0 < 푥 < 2 and 0 < 푦 < 2 푓푋,푌 (푥, 푦) = ⎨ ⎩{0 otherwise

• Marginal expectation of 푌: 2 2 1 피[푌] = ∫ ∫ 푦 (푥 + 푦)푑푥푑푦 0 0 8 2 2 1 = ∫ 푦 ∫ (푥 + 푦)푑푥푑푦 0 0 8 2 1 = ∫ 푦 (푦 + 1)푑푦 0 4 2 푦3 푦2 = ( + )∣ 12 8 0 2 1 7 = + = 3 2 6 • By symmetry, 피[푋] = 피[푌] = 7/6

24 / 57 Independence

Independence Two r.v.s 푌 and 푋 are independent (which we write 푋 ⟂⟂푌) if for all sets 퐴 and 퐵:

ℙ(푋 ∈ 퐴, 푌 ∈ 퐵) = ℙ(푋 ∈ 퐴)ℙ(푌 ∈ 퐵).

• Knowing the value of 푋 gives us no information about the value of 푌. • If 푋 and 푌 are independent, then:

▶ 푓푋,푌 (푥, 푦) = 푓푋 (푥)푓푌 (푦) (joint is the product of marginals) ▶ 퐹푋,푌 (푥, 푦) = 퐹푋 (푥)퐹푌 (푦) ▶ ℎ(푋) ⟂⟂푔(푌) for any functions ℎ() and 푔() (functions of independent r.v.s are independent)

25 / 57 Key properties of independent r.v.s

• Theorem If 푋 and 푌 are independent r.v.s, then

피[푋푌] = 피[푋]피[푌].

• Proof for discrete 푋 and 푌:

피[푋푌] = ∑ ∑ 푥푦 푓푋,푌 (푥, 푦) 푥 푦

= ∑ ∑ 푥푦 푓푋(푥)푓푌 (푦) 푥 푦 ⎛ ⎞ = (∑ 푥 푓푋(푥)) ⎜∑ 푦 푓푌 (푦)⎟ 푥 ⎝ 푦 ⎠ = 피[푋]피[푌]

26 / 57 Why independence?

• Independence assumptions are everywhere in theoretical and applied .

▶ Each response in a poll is considered independent of all other responses. ▶ In a randomized control trial, treatment assignment is independent of background characteristics. • Lack of independence is a blessing or a curse:

▶ Two variables not independent ⇝ potentially interesting relationship. ▶ In observational studies, treatment assignment is usually not independent of background characteristics.

27 / 57 Covariance

• If two variables are not independent, how do we measure the strength of their dependence?

▶ Covariance ▶ Correlation • Covariance: how do two r.v.s vary together?

▶ How often do high values of 푋 occur with high values of 푌?

28 / 57 Defining covariance

• If two variables are not independent, how do we measure the strength of their dependence?

Covariance The covariance between two r.v.s, 푋 and 푌 is defined as:

Cov[푋, 푌] = 피[(푋 − 피[푋])(푌 − 피[푌])]

• How often do high values of 푋 occur with high values of 푌? • Properties of covariances: ▶ Cov[푋, 푌] = 피[푋푌] − 피[푋]피[푌] ▶ If 푋 ⟂⟂푌, Cov[푋, 푌] = 피[푋푌] − 피[푋]피[푌] = 피[푋]피[푌] − 피[푋]피[푌] = 0

29 / 57 Covariance intuition

4 E[X]

2

y 0 E[Y]

-2

-4

-4 -2 0 2 4

x

30 / 57 Covariance intuition

4 Y > E[Y] E[X] Y > E[Y] X < E[X] X > E[X] 2

y 0 E[Y]

-2 Y < E[Y] Y < E[Y] -4 X < E[X] X > E[X]

-4 -2 0 2 4

x

• Large values of 푋 tend to occur with large values of 푌: ▶ (푋 − 피[푋])(푌 − 피[푌]) = (pos. num.) × (pos. num) = + • Small values of 푋 tend to occur with small values of 푌: ▶ (푋 − 피[푋])(푌 − 피[푌]) = (neg. num.) × (neg. num) = + • If these dominate ⇝ positive covariance.

31 / 57 Covariance intuition

4 Y > E[Y] E[X] Y > E[Y] X < E[X] X > E[X] 2

y 0 E[Y]

-2 Y < E[Y] Y < E[Y] -4 X < E[X] X > E[X]

-4 -2 0 2 4

x

• Large values of 푋 tend to occur with small values of 푌: ▶ (푋 − 피[푋])(푌 − 피[푌]) = (pos. num.) × (neg. num) = − • Small values of 푋 tend to occur with large values of 푌: ▶ (푋 − 피[푋])(푌 − 피[푌]) = (neg. num.) × (pos. num) = − • If these dominate ⇝ negative covariance.

32 / 57 Covariance from joint p.d.f.

• Using our running example of 푓푋,푌 (푥, 푦) = (푥 + 푦)/8 • From earlier: 피[푋] = 피[푌] = 7/6 • Expectation of the product: 2 2 1 피[푋푌] = ∫ ∫ 푥푦 (푥 + 푦)푑푥푑푦 0 0 8 2 2 1 = ∫ ∫ (푥2푦 + 푥푦2)푑푥푑푦 0 0 8 푥=2 2 푥3푦 푥2푦2 = ∫ ( + )∣ 푑푦 0 24 16 푥=0 2 푦 푦2 = ∫ ( + ) 푑푦 0 3 4 2 푦2 푦3 2 2 4 = ( + )∣ = + = 6 12 3 3 3 0 • Covariance: 4 7 2 1 Cov[푋, 푌] = 피[푋푌] − 피[푋]피[푌] = − ( ) = − 3 6 36 33 / 57 Zero covariance doesn’t imply independence

• We saw that 푋 ⟂⟂푌 ⇝ Cov[푋, 푌] = 0. • Does Cov[푋, 푌] = 0 imply that 푋⟂⟂푌? No! • Counterexample: 푋 ∈ {−1, 0, 1} with equal probability and 푌 = 푋2. • Covariance is a measure of linear dependence, so it can miss non-linear dependence.

34 / 57 Properties of variances and covariances

• Properties of covariances:

1. Cov[푎푋 + 푏, 푐푌 + 푑] = 푎푐Cov[푋, 푌]. 2. Cov[푋, 푋] = 핍[푋]

• Properties of variances that we can state now that we know covariance:

1. 핍[푎푋 + 푏푌 + 푐] = 푎2핍[푋] + 푏2핍[푌] + 2푎푏Cov[푋, 푌] 2. If 푋 and 푌 independent, 핍[푋 + 푌] = 핍[푋] + 핍[푌].

35 / 57 Using properties of covariance

• Rescale our running example: 푍 = 2푋, 푊 = 2푌. • What’s the covariance of (푍, 푊)?

▶ Ugh, let’s avoid more integrals. • Use properties of covariances: 1 Cov[푍, 푊] = Cov[2푋, 2푌] = 2 × 2 × Cov[푋, 푌] = − 9

36 / 57 Correlation

• Covariance is not scale-free: Cov[2푋, 푌] = 2Cov[푋, 푌] ▶ ⇝ hard to compare covriances across different r.v.s ▶ Is a relationship stronger? Or just do to rescaling? • Correlation is a scale-free measure of linear dependence. Correlation The correlation between two r.v.s 푋 and 푌 is defined as: Cov[푋, 푌] 휌 = 휌(푋, 푌) = √핍[푋]핍[푌]

• Covariance after dividing out the scales of the respective variables. • Correlation properties: ▶ −1 ≤ 휌 ≤ 1 ▶ if |휌(푋, 푌)| = 1, then 푋 and 푌 are perfectly correlated with a deterministic linear relationship: 푌 = 푎 + 푏푋.

37 / 57 3/ Conditional Distributions

38 / 57 Conditional distributions

• Conditional distribution: distribution of 푌 if we know 푋 = 푥. Conditional probability mass function The conditional probability mass function or conditional p.m.f. of 푌 conditional on 푋 is

ℙ(푋 = 푥, 푌 = 푦) 푓푋,푌 (푥, 푦) 푓푌|푋(푦|푥) = = ℙ(푋 = 푥) 푓푋(푥)

• Intuitive definition: Probability that 푋 = 푥 and 푌 = 푦 푓 (푦|푥) = 푌|푋 Probability that 푋 = 푥 • This is a valid !

▶ 푓푌|푋 (푦|푥) ≥ 0 and ∑푦 푓푌|푋 (푦|푥) = 1

• If 푋 ⟂⟂푌 then 푓푌|푋 (푦|푥) = 푓푌 (푦) (conditional is the marginal)

39 / 57 Example: conditionals for gay marriage

Favor Gay Oppose Gay Marriage Marriage Marginal 푌 = 1 푌 = 0 Female 푋 = 1 0.3 0.21 0.51 Male 푋 = 0 0.22 0.27 0.49 Marginal 0.52 0.48

• Probability of favoring gay marriage conditional on being a man? ℙ(푋 = 0, 푌 = 1) 0.22 푓 (푦 = 1|푥 = 0) = = = 0.44 푌|푋 ℙ(푋 = 0) 0.22 + 0.27

40 / 57 Example: conditionals for gay marriage

Men Women

1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 Conditional Prob. Conditional Prob. 0.0 0.0 0 1 0 1

Gay marriage support (Y) Gay marriage support (Y)

• Two values of 푋 ⇝ two univariate conditional distribution of 푌

41 / 57 Continuous conditional distributions Conditional probability density function The conditional p.d.f. of a continuous is

푓푋,푌 (푥, 푦) 푓푌|푋 (푦|푥) = 푓푋(푥)

assuming that 푓푋(푥) > 0.

• Implies 푏 ℙ(푎 < 푌 < 푏|푋 = 푥) = ∫ 푓 (푦|푥)푑푦. 푎 푌|푋 • Based on the definition of the conditional p.m.f./p.d.f., we have the following factorization:

푓푋,푌 (푥, 푦) = 푓푌|푋 (푦|푥)푓푋(푥)

42 / 57 Conditional distributions as slices

• 푓푌|푋 (푦|푥0) is the conditional p.d.f. of 푌 when 푋 = 푥0 • 푓푌|푋 (푦|푥0) is proportional to joint p.d.f. along 푥0: 푓푋,푌 (푦, 푥0) • Normalize by dividing by 푓푋(푥0) to ensure proper p.d.f.

43 / 57 Continuous conditional example

• Using our running example of 푓푋,푌 (푥, 푦) = (푥 + 푦)/8

• Earlier we calculated 푓푋(푥) = (푥 + 1)/4 • Calculate conditional:

푓푋,푌 (푥, 푦) 푓푌|푋 (푦|푥) = 푓푋(푥) (푥 + 푦)/8 = (푥 + 1)/4 푥 + 푦 = 2(푥 + 1) • Remember the limits: 0 < 푦 < 2, 0 otherwise

44 / 57 Conditional Independence

Conditional independence Two r.v.s 푋 and 푌 are conditionally independent given 푍 (written 푋 ⟂⟂푌|푍) if 푓푋,푌|푍 (푥, 푦|푧) = 푓푋|푍 (푥|푧)푓푌|푍 (푦|푧).

• 푋 and 푌 are independent within levels of 푍. • Massively important for regression, causal inference. • Example:

▶ 푋 = swimming accidents, 푌 = number of ice cream cones sold. ▶ In general, dependent. ▶ Conditional on 푍 = temperature, independent.

45 / 57 Summarizing conditional distributions

f(y|0) f(y|1)

-4 -2 0 2 4

y

• Conditional distributions are also univariate distribution and so we can summarize them with its mean and variance. • Gives us insight into a key question:

▶ How does the mean of 푌 change as we change 푋?

46 / 57 Defining condition expectations

Conditional expectation The conditional expectation of 푌 conditional on 푋 = 푥 is:

⎧ ∑ 푦 푓 (푦|푥) discrete 푌 { 푌|푋 피[푌|푋 = 푥] = 푦 ⎨ ∞ { ∫ 푦 푓 (푦|푥)푑푦 continuous 푌 ⎩ −∞ 푌|푋

• Intuition: exactly the same definition of the expected value with 푓푌|푋 (푦|푥) in place of 푓푌 (푦) • The expected value of the (univariate) conditional distribution. • This is a function of 푥!

47 / 57 Calculating conditional expectations

Favor Gay Oppose Gay Marriage Marriage Marginal 푌 = 1 푌 = 0 Female 푋 = 1 0.3 0.21 0.51 Male 푋 = 0 0.22 0.27 0.49 Marginal 0.52 0.48

• What’s the conditional expectation of support for gay marriage 푌 given someone is a man 푋 = 0?

피[푌|푋 = 0] = ∑ 푦 푓푌|푋 (푦|푥 = 0) 푦 = 0 × 푓 (푦 = 0|푥 = 0) + 1 × 푓 (푦 = 1|푥 = 0) 0.22 = 1 × 0.22 + 0.27 = 0.44 48 / 57 Conditional expectations are random variables

• For a particular 푥, 피[푌|푋 = 푥] is a number. • But 푋 takes on many possible values with uncertainty ⇝ 피[푌|푋] takes on many possible values with uncertainty. • ⇝ Conditional expectations are random variables! • Binary 푋:

{⎧피[푌|푋 = 0] with prob. ℙ(푋 = 0) 피[푌|푋] = ⎨ ⎩{피[푌|푋 = 1] with prob. ℙ(푋 = 1) • Has an expectation, 피[피[푌|푋]], and a variance, 핍[피[푌|푋]].

49 / 57 Law of iterated expectations

• Average/mean of the conditional expectations: 피[피[푌|푋]].

▶ Can we connect this to the marginal (overall) expectation? • Theorem (The Law of Iterated Expectations): If the expectation exist and for discrete 푋,

피[푌] = 피 [피[푌|푋]] = ∑ 피[푌|푋 = 푥]푓푋(푥) 푥

50 / 57 Example: law of iterated expectations

Favor Gay Oppose Gay Marriage Marriage Marginal 푌 = 1 푌 = 0 Female 푋 = 1 0.3 0.21 0.51 Male 푋 = 0 0.22 0.27 0.49 Marginal 0.52 0.48 1

• 피[푌|푋 = 1] = 0.59 and 피[푌|푋 = 0] = 0.44. • 푓푋(1) = 0.51 (females) and 푓푋(0) = 0.49 (males). • Plug into the iterated expectations:

피[피[푌|푋]] = 피[푌|푋 = 0]푓푋(0) + 피[푌|푋 = 1]푓푋(1) = 0.44 × 0.49 + 0.59 × 0.51 = 0.52 = 피[푌]

51 / 57 Properties of conditional expectations

1. 피[푐(푋)|푋] = 푐(푋) for any function 푐(푋).

▶ Example: 피[푋2|푋] = 푋2 (If we know 푋, then we also know 푋2) 2. If 푋 and 푌 are independent r.v.s, then

피[푌|푋 = 푥] = 피[푌].

3. If 푋 ⟂⟂푌|푍, then

피[푌|푋 = 푥, 푍 = 푧] = 피[푌|푍 = 푧].

52 / 57 Conditional Variance Conditional expectation The conditional variance of a 푌 given 푋 = 푥 is defined as:

핍[푌|푋 = 푥] = 피 [(푌 − 피[푌|푋 = 푥])2|푋 = 푥]

• Conditional variance describes the spread of the conditional distribution around the conditional expectation. • Usual variance formula applied to conditional distribution. • Using LOTUS: ▶ Discrete 푌:

2 핍[푌|푋 = 푥] = ∑(푦 − 피[푌|푋 = 푥]) 푓푌|푋 (푦|푥) 푦

▶ Continuous 푌: 핍[푌|푋 = 푥] = ∫ (푦 − 피[푌|푋 = 푥])2푓 (푦|푥)푑푦 푦 푌|푋

53 / 57 Conditional variance is a random variable

• Again, 핍[푌|푋] is a random variable and a function of 푋, just like 피[푌|푋]. With a binary 푋:

{⎧핍[푌|푋 = 0] with prob. ℙ(푋 = 0) 핍[푌|푋] = ⎨ ⎩{핍[푌|푋 = 1] with prob. ℙ(푋 = 1)

54 / 57 Law of total variance

• We can also relate the marginal variance to the conditional variance and the conditional expectation. • Theorem (Law of Total Variance/EVE’s law):

핍[푌] = 피[핍[푌|푋]] + 핍[피[푌|푋]]

• The total variance can be decomposed into: 1. the average of the within group variance (피[핍[푌|푋]]) and 2. how much the average varies between groups (핍[피[푌|푋]]).

55 / 57 4/ Wrap-up

56 / 57 Review

• Multiple r.v.s require joint p.m.f.s and joint p.d.f.s • Multiple r.v.s can have distributions that exhibit dependence as measured by covariance and correlation. • The conditional expectation of one variable on the other is an important quantity that we’ll see over and over again.

57 / 57