Gov 2000: 3. Multiple Random Variables
Matthew Blackwell Fall 2016
1 / 57 1. Distributions of Multiple Random Variables
2. Properties of Joint Distributions
3. Conditional Distributions
4. Wrap-up
2 / 57 Where are we? Where are we going?
• Distributions of one variable: how to describe and summarize uncertainty about one variable. • Today: distributions of multiple variables to describe relationships between variables. • Later: use data to learn about probability distributions.
3 / 57 Why multiple random variables?
10
9
8
7
Log GDP/popgrowth 6 1 2 3 4 5 6 7 8
Log Settler Mortality
1. How do we summarize the relationship between two variables, 푋 and 푌? 2. What if we have many observations of the same variable, 푋1, 푋2, … , 푋푛?
4 / 57 1/ Distributions of Multiple Random Variables
5 / 57 Joint distributions
5 2 2
0 1 1
y y y -5 0 0
-10 -1 -1
-2 -2 -15
-2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2
x x x
• The joint distribution of two r.v.s, 푋 and 푌, describes what pairs of observations, (푥, 푦) are more likely than others. ▶ Settler mortality (푋) and GDP per capita (푌) for the same country. • Shape of the joint distribution now includes the relationship between 푋 and 푌
6 / 57 Discrete r.v.s
Joint probability mass function The joint p.m.f. of a pair of discrete r.v.s, (푋, 푌) describes the probability of any pair of values:
푓푋,푌 (푥, 푦) = ℙ(푋 = 푥, 푌 = 푦)
• Properties of a joint p.m.f.:
▶ 푓푋,푌 (푥, 푦) ≥ 0 (probabilities can’t be negative) ▶ ∑푥 ∑푦 푓푋,푌 (푥, 푦) = 1 (something must happen) ▶ ∑푥 is shorthand for sum over all possible values of 푋
7 / 57 Example: Gay marriage and gender
Favor Gay Oppose Gay Marriage Marriage 푌 = 1 푌 = 0 Female 푋 = 1 0.3 0.21 Male 푋 = 0 0.22 0.27
• Joint p.m.f. can be summarized in a cross-tab:
▶ Each cell is the probability of that combination, 푓푋,푌 (푥, 푦) • Probability that we randomly select a woman who favors gay marriage?
푓푋,푌 (1, 1) = ℙ(푋 = 1, 푌 = 1) = 0.3
8 / 57 Marginal distributions
• Often need to figure out the distribution of just one ofthe r.v.s
▶ Called the marginal distribution in this context. • Computing marginals from the joint p.m.f.:
푓푌 (푦) = ℙ(푌 = 푦) = ∑ 푓푋,푌 (푥, 푦) 푥 • Intuition: sum over the probability that 푌 = 푦 for all possible values of 푥
▶ Works because these are mutually exclusive events that partition the space of 푋
9 / 57 Example: marginals for gay marriage
Favor Gay Oppose Gay Marriage Marriage Marginal 푌 = 1 푌 = 0 Female 푋 = 1 0.3 0.21 0.51 Male 푋 = 0 0.22 0.27 0.49 Marginal 0.52 0.48
• What’s the 푓푌 (1) = ℙ(푌 = 1)? ▶ Probability that a man favors gay marriage plus the probability that a woman favors gay marriage.
푓푌 (1) = 푓푋,푌 (1, 1) + 푓푋,푌 (0, 1) = 0.3 + 0.22 = 0.52
• Works for all marginals.
10 / 57 Continuous r.v.s
푌 퐴
푋
• We will focus on getting the probability of being in some subset of the 2-dimensional plane.
11 / 57 Continuous joint p.d.f.
Continuous joint distribution Two continuous r.v.s 푋 and 푌 have a continuous joint distribution if there is a nonnegative function 푓푋,푌 (푥, 푦) such that for any subset 퐴 of the 푥푦-plane,
ℙ((푋, 푌) ∈ 퐴) = ∬ 푓 (푥, 푦)푑푥푑푦. (푥,푦)∈퐴 푋,푌
• 푓푋,푌 (푥, 푦) is the joint probability density function. • {(푥, 푦) ∶ 푓푋,푌 (푥, 푦) > 0} is called the support of the distribution. • Joint p.d.f. must meet the following conditions:
1. 푓푋,푌 (푥, 푦) ≥ 0 for all values of (푥, 푦), (nonnegative) 2. ∞ ∞ , (probabilities “sum” to 1) ∫−∞ ∫−∞ 푓푋,푌 (푥, 푦)푑푥푑푦 = 1 • ℙ(푋 = 푥, 푌 = 푦) = 0 for similar reasons as with single r.v.s.
12 / 57 Joint densities are 3D
4 0.15
2 0.10
y 0
0.05 -2
-4 0.00 -4 -2 0 2 4
x
• 푋 and 푌 axes are on the “floor,” height is the value of 푓푋,푌 (푥, 푦). • Remember 푓푋,푌 (푥, 푦) ≠ ℙ(푋 = 푥, 푌 = 푦).
13 / 57 Probability = volume
4 0.15
2 0.10
y 0
0.05 -2
-4 0.00 -4 -2 0 2 4
x
• ℙ((푋, 푌) ∈ 퐴) = ∬(푥,푦)∈퐴 푓푋,푌 (푥, 푦)푑푥푑푦 • Probability = volume above a specific region.
14 / 57 Working with joint p.d.f.s
• Suppose we have the following form of a joint p.d.f.:
{⎧푐(푥 + 푦) for 0 < 푥 < 2 and 0 < 푦 < 2 푓푋,푌 (푥, 푦) = ⎨ ⎩{0 otherwise • What does 푐 have to be for this to be a valid p.d.f.? ∞ ∞ 1 = ∫ ∫ 푓 (푥, 푦)푑푥푑푦 −∞ −∞ 푋,푌 2 2 = ∫ ∫ 푐(푥 + 푦)푑푥푑푦 0 0 푥=2 2 푥2 = 푐 ∫ ( + 푥푦)∣ 푑푦 0 2 푥=0 2 = 푐 ∫ (2 + 2푦)푑푦 0 2 = (2푐푦 + 푐푦2)∣ = 8푐 0
• Thus to be a valid p.d.f., 푐 = 1/8
15 / 57 Example continuous distribution
2.0 0.5
0.4 1.5
zplus 0.3
y 1.0 0.2
0.5 0.1
0.0 0.0
0.0 0.5 1.0 1.5 2.0 y x x
{⎧(푥 + 푦)/8 for 0 < 푥 < 2 and 0 < 푦 < 2 푓푋,푌 (푥, 푦) = ⎨ ⎩{0 otherwise
16 / 57 Continuous marginal distributions
• We can recover the marginal PDF of one of the variables by integrating over the distribution of the other variable:
∞ 푓 (푦) = ∫ 푓 (푥, 푦)푑푥 푌 −∞ 푋,푌 • Works for either variable:
∞ 푓 (푥) = ∫ 푓 (푥, 푦)푑푦 푋 −∞ 푋,푌
17 / 57 Visualizing continuous marginals
z
y
x
• Marginal integrates (sums, basically) over other r.v.: ∞ 푓푌 (푦) = ∫−∞ 푓푋,푌 (푥, 푦)푑푥 • Pile up/flatten all of the joint density onto a single dimension.
18 / 57 Deriving continuous marginals
{⎧(푥 + 푦)/8 for 0 < 푥 < 2 and 0 < 푦 < 2 푓푋,푌 (푥, 푦) = ⎨ ⎩{0 otherwise
• Let’s calculate the marginals for this p.d.f.: 2 1 푓 (푥) = ∫ (푥 + 푦)푑푦 푋 0 8 푦=2 푥푦 푦2 = ( + )∣ 8 16 푦=0 푥 1 푥 + 1 = + = 4 4 4
• By symmetry we have the same for 푦:
푓푌 (푦) = (푦 + 1)/4
19 / 57 Joint c.d.f.s
Joint cumulative distribution function For two r.v.s 푋 and 푌, the joint cumulative distribution function or joint c.d.f. 퐹푋,푌 (푥, 푦) is a function such that for finite values 푥 and 푦, 퐹푋,푌 (푥, 푦) = ℙ(푋 ≤ 푥, 푌 ≤ 푦).
2 • 휕 퐹푋,푌 (푥,푦) Deriving p.d.f. from c.d.f.: 푓푋,푌 (푥,푦) = 휕푥휕푦 • Deriving c.d.f. from p.d.f: 푦 푥 퐹푋,푌 (푥, 푦) = ∫−∞ ∫−∞ 푓푋,푌 (푟, 푠)푑푟푑푠
20 / 57 2/ Properties of Joint Distributions
21 / 57 Properties of joint distributions
• Single r.v.: summarized 푓푋(푥) with 피[푋] and 핍[푋] • With 2 r.v.s, we can additionally measure how strong the dependence is between the variables. • First: expectations over joint distributions and independence
22 / 57 Expectations over multiple r.v.s
• 2-d LOTUS: take expectations over the joint distribution. • With discrete 푋 and 푌:
피[푔(푋, 푌)] = ∑ ∑ 푔(푥, 푦) 푓푋,푌 (푥, 푦) 푥 푦 • With continuous 푋 and 푌:
피[푔(푋, 푌)] = ∫ ∫ 푔(푥, 푦) 푓 (푥, 푦)푑푥푑푦 푥 푦 푋,푌 • Marginal expectations:
피[푌] = ∑ ∑ 푦 푓푋,푌 (푥, 푦) 푥 푦 • Example: expectation of the product:
피[푋푌] = ∑ ∑ 푥푦 푓푋,푌 (푥, 푦) 푥 푦
23 / 57 Marginal expectations from joint
{⎧(푥 + 푦)/8 for 0 < 푥 < 2 and 0 < 푦 < 2 푓푋,푌 (푥, 푦) = ⎨ ⎩{0 otherwise
• Marginal expectation of 푌: 2 2 1 피[푌] = ∫ ∫ 푦 (푥 + 푦)푑푥푑푦 0 0 8 2 2 1 = ∫ 푦 ∫ (푥 + 푦)푑푥푑푦 0 0 8 2 1 = ∫ 푦 (푦 + 1)푑푦 0 4 2 푦3 푦2 = ( + )∣ 12 8 0 2 1 7 = + = 3 2 6 • By symmetry, 피[푋] = 피[푌] = 7/6
24 / 57 Independence
Independence Two r.v.s 푌 and 푋 are independent (which we write 푋 ⟂⟂푌) if for all sets 퐴 and 퐵:
ℙ(푋 ∈ 퐴, 푌 ∈ 퐵) = ℙ(푋 ∈ 퐴)ℙ(푌 ∈ 퐵).
• Knowing the value of 푋 gives us no information about the value of 푌. • If 푋 and 푌 are independent, then:
▶ 푓푋,푌 (푥, 푦) = 푓푋 (푥)푓푌 (푦) (joint is the product of marginals) ▶ 퐹푋,푌 (푥, 푦) = 퐹푋 (푥)퐹푌 (푦) ▶ ℎ(푋) ⟂⟂푔(푌) for any functions ℎ() and 푔() (functions of independent r.v.s are independent)
25 / 57 Key properties of independent r.v.s
• Theorem If 푋 and 푌 are independent r.v.s, then
피[푋푌] = 피[푋]피[푌].
• Proof for discrete 푋 and 푌:
피[푋푌] = ∑ ∑ 푥푦 푓푋,푌 (푥, 푦) 푥 푦
= ∑ ∑ 푥푦 푓푋(푥)푓푌 (푦) 푥 푦 ⎛ ⎞ = (∑ 푥 푓푋(푥)) ⎜∑ 푦 푓푌 (푦)⎟ 푥 ⎝ 푦 ⎠ = 피[푋]피[푌]
26 / 57 Why independence?
• Independence assumptions are everywhere in theoretical and applied statistics.
▶ Each response in a poll is considered independent of all other responses. ▶ In a randomized control trial, treatment assignment is independent of background characteristics. • Lack of independence is a blessing or a curse:
▶ Two variables not independent ⇝ potentially interesting relationship. ▶ In observational studies, treatment assignment is usually not independent of background characteristics.
27 / 57 Covariance
• If two variables are not independent, how do we measure the strength of their dependence?
▶ Covariance ▶ Correlation • Covariance: how do two r.v.s vary together?
▶ How often do high values of 푋 occur with high values of 푌?
28 / 57 Defining covariance
• If two variables are not independent, how do we measure the strength of their dependence?
Covariance The covariance between two r.v.s, 푋 and 푌 is defined as:
Cov[푋, 푌] = 피[(푋 − 피[푋])(푌 − 피[푌])]
• How often do high values of 푋 occur with high values of 푌? • Properties of covariances: ▶ Cov[푋, 푌] = 피[푋푌] − 피[푋]피[푌] ▶ If 푋 ⟂⟂푌, Cov[푋, 푌] = 피[푋푌] − 피[푋]피[푌] = 피[푋]피[푌] − 피[푋]피[푌] = 0
29 / 57 Covariance intuition
4 E[X]
2
y 0 E[Y]
-2
-4
-4 -2 0 2 4
x
30 / 57 Covariance intuition
4 Y > E[Y] E[X] Y > E[Y] X < E[X] X > E[X] 2
y 0 E[Y]
-2 Y < E[Y] Y < E[Y] -4 X < E[X] X > E[X]
-4 -2 0 2 4
x
• Large values of 푋 tend to occur with large values of 푌: ▶ (푋 − 피[푋])(푌 − 피[푌]) = (pos. num.) × (pos. num) = + • Small values of 푋 tend to occur with small values of 푌: ▶ (푋 − 피[푋])(푌 − 피[푌]) = (neg. num.) × (neg. num) = + • If these dominate ⇝ positive covariance.
31 / 57 Covariance intuition
4 Y > E[Y] E[X] Y > E[Y] X < E[X] X > E[X] 2
y 0 E[Y]
-2 Y < E[Y] Y < E[Y] -4 X < E[X] X > E[X]
-4 -2 0 2 4
x
• Large values of 푋 tend to occur with small values of 푌: ▶ (푋 − 피[푋])(푌 − 피[푌]) = (pos. num.) × (neg. num) = − • Small values of 푋 tend to occur with large values of 푌: ▶ (푋 − 피[푋])(푌 − 피[푌]) = (neg. num.) × (pos. num) = − • If these dominate ⇝ negative covariance.
32 / 57 Covariance from joint p.d.f.
• Using our running example of 푓푋,푌 (푥, 푦) = (푥 + 푦)/8 • From earlier: 피[푋] = 피[푌] = 7/6 • Expectation of the product: 2 2 1 피[푋푌] = ∫ ∫ 푥푦 (푥 + 푦)푑푥푑푦 0 0 8 2 2 1 = ∫ ∫ (푥2푦 + 푥푦2)푑푥푑푦 0 0 8 푥=2 2 푥3푦 푥2푦2 = ∫ ( + )∣ 푑푦 0 24 16 푥=0 2 푦 푦2 = ∫ ( + ) 푑푦 0 3 4 2 푦2 푦3 2 2 4 = ( + )∣ = + = 6 12 3 3 3 0 • Covariance: 4 7 2 1 Cov[푋, 푌] = 피[푋푌] − 피[푋]피[푌] = − ( ) = − 3 6 36 33 / 57 Zero covariance doesn’t imply independence
• We saw that 푋 ⟂⟂푌 ⇝ Cov[푋, 푌] = 0. • Does Cov[푋, 푌] = 0 imply that 푋⟂⟂푌? No! • Counterexample: 푋 ∈ {−1, 0, 1} with equal probability and 푌 = 푋2. • Covariance is a measure of linear dependence, so it can miss non-linear dependence.
34 / 57 Properties of variances and covariances
• Properties of covariances:
1. Cov[푎푋 + 푏, 푐푌 + 푑] = 푎푐Cov[푋, 푌]. 2. Cov[푋, 푋] = 핍[푋]
• Properties of variances that we can state now that we know covariance:
1. 핍[푎푋 + 푏푌 + 푐] = 푎2핍[푋] + 푏2핍[푌] + 2푎푏Cov[푋, 푌] 2. If 푋 and 푌 independent, 핍[푋 + 푌] = 핍[푋] + 핍[푌].
35 / 57 Using properties of covariance
• Rescale our running example: 푍 = 2푋, 푊 = 2푌. • What’s the covariance of (푍, 푊)?
▶ Ugh, let’s avoid more integrals. • Use properties of covariances: 1 Cov[푍, 푊] = Cov[2푋, 2푌] = 2 × 2 × Cov[푋, 푌] = − 9
36 / 57 Correlation
• Covariance is not scale-free: Cov[2푋, 푌] = 2Cov[푋, 푌] ▶ ⇝ hard to compare covriances across different r.v.s ▶ Is a relationship stronger? Or just do to rescaling? • Correlation is a scale-free measure of linear dependence. Correlation The correlation between two r.v.s 푋 and 푌 is defined as: Cov[푋, 푌] 휌 = 휌(푋, 푌) = √핍[푋]핍[푌]
• Covariance after dividing out the scales of the respective variables. • Correlation properties: ▶ −1 ≤ 휌 ≤ 1 ▶ if |휌(푋, 푌)| = 1, then 푋 and 푌 are perfectly correlated with a deterministic linear relationship: 푌 = 푎 + 푏푋.
37 / 57 3/ Conditional Distributions
38 / 57 Conditional distributions
• Conditional distribution: distribution of 푌 if we know 푋 = 푥. Conditional probability mass function The conditional probability mass function or conditional p.m.f. of 푌 conditional on 푋 is
ℙ(푋 = 푥, 푌 = 푦) 푓푋,푌 (푥, 푦) 푓푌|푋(푦|푥) = = ℙ(푋 = 푥) 푓푋(푥)
• Intuitive definition: Probability that 푋 = 푥 and 푌 = 푦 푓 (푦|푥) = 푌|푋 Probability that 푋 = 푥 • This is a valid univariate probability distribution!
▶ 푓푌|푋 (푦|푥) ≥ 0 and ∑푦 푓푌|푋 (푦|푥) = 1
• If 푋 ⟂⟂푌 then 푓푌|푋 (푦|푥) = 푓푌 (푦) (conditional is the marginal)
39 / 57 Example: conditionals for gay marriage
Favor Gay Oppose Gay Marriage Marriage Marginal 푌 = 1 푌 = 0 Female 푋 = 1 0.3 0.21 0.51 Male 푋 = 0 0.22 0.27 0.49 Marginal 0.52 0.48
• Probability of favoring gay marriage conditional on being a man? ℙ(푋 = 0, 푌 = 1) 0.22 푓 (푦 = 1|푥 = 0) = = = 0.44 푌|푋 ℙ(푋 = 0) 0.22 + 0.27
40 / 57 Example: conditionals for gay marriage
Men Women
1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 Conditional Prob. Conditional Prob. 0.0 0.0 0 1 0 1
Gay marriage support (Y) Gay marriage support (Y)
• Two values of 푋 ⇝ two univariate conditional distribution of 푌
41 / 57 Continuous conditional distributions Conditional probability density function The conditional p.d.f. of a continuous random variable is
푓푋,푌 (푥, 푦) 푓푌|푋 (푦|푥) = 푓푋(푥)
assuming that 푓푋(푥) > 0.
• Implies 푏 ℙ(푎 < 푌 < 푏|푋 = 푥) = ∫ 푓 (푦|푥)푑푦. 푎 푌|푋 • Based on the definition of the conditional p.m.f./p.d.f., we have the following factorization:
푓푋,푌 (푥, 푦) = 푓푌|푋 (푦|푥)푓푋(푥)
42 / 57 Conditional distributions as slices
• 푓푌|푋 (푦|푥0) is the conditional p.d.f. of 푌 when 푋 = 푥0 • 푓푌|푋 (푦|푥0) is proportional to joint p.d.f. along 푥0: 푓푋,푌 (푦, 푥0) • Normalize by dividing by 푓푋(푥0) to ensure proper p.d.f.
43 / 57 Continuous conditional example
• Using our running example of 푓푋,푌 (푥, 푦) = (푥 + 푦)/8
• Earlier we calculated 푓푋(푥) = (푥 + 1)/4 • Calculate conditional:
푓푋,푌 (푥, 푦) 푓푌|푋 (푦|푥) = 푓푋(푥) (푥 + 푦)/8 = (푥 + 1)/4 푥 + 푦 = 2(푥 + 1) • Remember the limits: 0 < 푦 < 2, 0 otherwise
44 / 57 Conditional Independence
Conditional independence Two r.v.s 푋 and 푌 are conditionally independent given 푍 (written 푋 ⟂⟂푌|푍) if 푓푋,푌|푍 (푥, 푦|푧) = 푓푋|푍 (푥|푧)푓푌|푍 (푦|푧).
• 푋 and 푌 are independent within levels of 푍. • Massively important for regression, causal inference. • Example:
▶ 푋 = swimming accidents, 푌 = number of ice cream cones sold. ▶ In general, dependent. ▶ Conditional on 푍 = temperature, independent.
45 / 57 Summarizing conditional distributions
f(y|0) f(y|1)
-4 -2 0 2 4
y
• Conditional distributions are also univariate distribution and so we can summarize them with its mean and variance. • Gives us insight into a key question:
▶ How does the mean of 푌 change as we change 푋?
46 / 57 Defining condition expectations
Conditional expectation The conditional expectation of 푌 conditional on 푋 = 푥 is:
⎧ ∑ 푦 푓 (푦|푥) discrete 푌 { 푌|푋 피[푌|푋 = 푥] = 푦 ⎨ ∞ { ∫ 푦 푓 (푦|푥)푑푦 continuous 푌 ⎩ −∞ 푌|푋
• Intuition: exactly the same definition of the expected value with 푓푌|푋 (푦|푥) in place of 푓푌 (푦) • The expected value of the (univariate) conditional distribution. • This is a function of 푥!
47 / 57 Calculating conditional expectations
Favor Gay Oppose Gay Marriage Marriage Marginal 푌 = 1 푌 = 0 Female 푋 = 1 0.3 0.21 0.51 Male 푋 = 0 0.22 0.27 0.49 Marginal 0.52 0.48
• What’s the conditional expectation of support for gay marriage 푌 given someone is a man 푋 = 0?
피[푌|푋 = 0] = ∑ 푦 푓푌|푋 (푦|푥 = 0) 푦 = 0 × 푓 (푦 = 0|푥 = 0) + 1 × 푓 (푦 = 1|푥 = 0) 0.22 = 1 × 0.22 + 0.27 = 0.44 48 / 57 Conditional expectations are random variables
• For a particular 푥, 피[푌|푋 = 푥] is a number. • But 푋 takes on many possible values with uncertainty ⇝ 피[푌|푋] takes on many possible values with uncertainty. • ⇝ Conditional expectations are random variables! • Binary 푋:
{⎧피[푌|푋 = 0] with prob. ℙ(푋 = 0) 피[푌|푋] = ⎨ ⎩{피[푌|푋 = 1] with prob. ℙ(푋 = 1) • Has an expectation, 피[피[푌|푋]], and a variance, 핍[피[푌|푋]].
49 / 57 Law of iterated expectations
• Average/mean of the conditional expectations: 피[피[푌|푋]].
▶ Can we connect this to the marginal (overall) expectation? • Theorem (The Law of Iterated Expectations): If the expectation exist and for discrete 푋,
피[푌] = 피 [피[푌|푋]] = ∑ 피[푌|푋 = 푥]푓푋(푥) 푥
50 / 57 Example: law of iterated expectations
Favor Gay Oppose Gay Marriage Marriage Marginal 푌 = 1 푌 = 0 Female 푋 = 1 0.3 0.21 0.51 Male 푋 = 0 0.22 0.27 0.49 Marginal 0.52 0.48 1
• 피[푌|푋 = 1] = 0.59 and 피[푌|푋 = 0] = 0.44. • 푓푋(1) = 0.51 (females) and 푓푋(0) = 0.49 (males). • Plug into the iterated expectations:
피[피[푌|푋]] = 피[푌|푋 = 0]푓푋(0) + 피[푌|푋 = 1]푓푋(1) = 0.44 × 0.49 + 0.59 × 0.51 = 0.52 = 피[푌]
51 / 57 Properties of conditional expectations
1. 피[푐(푋)|푋] = 푐(푋) for any function 푐(푋).
▶ Example: 피[푋2|푋] = 푋2 (If we know 푋, then we also know 푋2) 2. If 푋 and 푌 are independent r.v.s, then
피[푌|푋 = 푥] = 피[푌].
3. If 푋 ⟂⟂푌|푍, then
피[푌|푋 = 푥, 푍 = 푧] = 피[푌|푍 = 푧].
52 / 57 Conditional Variance Conditional expectation The conditional variance of a 푌 given 푋 = 푥 is defined as:
핍[푌|푋 = 푥] = 피 [(푌 − 피[푌|푋 = 푥])2|푋 = 푥]
• Conditional variance describes the spread of the conditional distribution around the conditional expectation. • Usual variance formula applied to conditional distribution. • Using LOTUS: ▶ Discrete 푌:
2 핍[푌|푋 = 푥] = ∑(푦 − 피[푌|푋 = 푥]) 푓푌|푋 (푦|푥) 푦
▶ Continuous 푌: 핍[푌|푋 = 푥] = ∫ (푦 − 피[푌|푋 = 푥])2푓 (푦|푥)푑푦 푦 푌|푋
53 / 57 Conditional variance is a random variable
• Again, 핍[푌|푋] is a random variable and a function of 푋, just like 피[푌|푋]. With a binary 푋:
{⎧핍[푌|푋 = 0] with prob. ℙ(푋 = 0) 핍[푌|푋] = ⎨ ⎩{핍[푌|푋 = 1] with prob. ℙ(푋 = 1)
54 / 57 Law of total variance
• We can also relate the marginal variance to the conditional variance and the conditional expectation. • Theorem (Law of Total Variance/EVE’s law):
핍[푌] = 피[핍[푌|푋]] + 핍[피[푌|푋]]
• The total variance can be decomposed into: 1. the average of the within group variance (피[핍[푌|푋]]) and 2. how much the average varies between groups (핍[피[푌|푋]]).
55 / 57 4/ Wrap-up
56 / 57 Review
• Multiple r.v.s require joint p.m.f.s and joint p.d.f.s • Multiple r.v.s can have distributions that exhibit dependence as measured by covariance and correlation. • The conditional expectation of one variable on the other is an important quantity that we’ll see over and over again.
57 / 57