Quick viewing(Text Mode)

Lecture 10: Joint Distributions and the Law of Large Numbers

Lecture 10: Joint Distributions and the Law of Large Numbers

Chapter 3.1,3.3,3.4 Midterm 1 Midterm #1

Exam will be passed back at the end of class Lecture 10: Joint Distributions and the Law of Large Exam was hard, on the whole the class did well: Numbers : 75

Statistics 104 : 81

Colin Rundel SD: 21.8 February 20, 2012 Max: 105

Final grades will be curved, midterm grades will be posted this week.

Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 1 / 26

Chapter 3.1,3.3,3.4 Joint Distributions Chapter 3.1,3.3,3.4 Joint Distributions Joint Distributions - Example Joint Distributions - Example, cont.

Draw two socks at random, without replacement, from a drawer full of Let B be the number of Black socks, W the number of White socks twelve colored socks: drawn, then the distributions of B and W are given by:

6 black, 4 white, 2 purple W Let B be the number of Black socks, W the number of White socks 0 1 2 drawn, then the distributions of B and W are given by: 1 8 6 15 0 66 66 66 66 12 24 36 0 1 2 B 1 66 66 0 66 15 15 2 66 0 0 66 P(B=k) 6 5 = 15 2 6 6 = 36 6 5 = 15 12 11 66 12 11 66 12 11 66 28 32 6 66 66 66 66 66 P(W=k) 8 7 = 28 2 4 8 = 32 4 3 = 6 12 11 66 12 11 66 12 11 66 6 4  2  P(B = b, W = w) = b w 2−b−w 6 6  4 8  12 k 2−k k 2−k Note - B ∼ HyperGeo(12, 6, 2) = 12 and W ∼ HyperGeo(12, 4, 2) = 12 2 2 2 Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 2 / 26 Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 3 / 26 Chapter 3.1,3.3,3.4 Joint Distributions Chapter 3.1,3.3,3.4 Joint Distributions Conditional Distribution

Note that the column and row sums are the distributions of B and W Conditional distributions are defined as we have seen previously with respectively. P(X = x, Y = y) joint pmf P(X = x|Y = y) = = P(Y = y) marginal pmf P(B = b) = P(B = b, W = 0) + P(B = b, W = 1) + P(B = b, W = 2) Therefore the pmf for white socks given no black socks were drawn is P(W = w) = P(B = 0, W = w)+P(B = 1, W = w)+P(B = 2, W = w)  These are the marginal distributions of B and W . In general, 1  15 = 1 if W = 0 P(W = w, B = 0)  66 66 15 P(W = w|B = 0) = = 8  15 8 X X 66 66 = 15 if W = 1 P(X = x) = P(X = x, Y = y) = P(X = x|Y = y)P(Y = y) P(B = 0)  6  15 = 6 if W = 2 y y 66 66 15

Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 4 / 26 Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 5 / 26

Chapter 3.1,3.3,3.4 Joint Distributions Chapter 3.1,3.3,3.4 Joint Distributions Expectation of Joint Distributions Independence, cont.

X X E[g(X , Y )] = g(x, y)P(X = x, Y = y) x y

For example we can define g(x, y) = x · y then Remember that Cov(X , Y ) = 0 when X and Y are independent.

E(BW ) =(0 · 0 · 1/66) + (0 · 1 · 8/66) + (0 · 2 · 6/66) + (1 · 0 · 12/66) + (1 · 1 · 24/66) + (1 · 2 · 0/66) Cov(B, W ) = E[(B − E[B])(W − E[W ])] + (2 · 0 · 15/66) + (2 · 1 · 0/66) + (1 · 2 · 0/66) =24/66 = 4/11 = E(BW ) − E(B)E(W ) = 4/11 − 2/3 = −10/33 = −0.30303 Note that E(BW ) 6= E(B)E(W ) since

E(B)E(W ) = (0 · 15/66 + 1 · 36/66 + 2 · 15/66) × (0 · 28/66 + 1 · 32/66 + 2 · 6/66) =66/66 × 44/66 = 2/3

This implies that B and W are not independent. Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 6 / 26 Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 7 / 26 Chapter 3.1,3.3,3.4 Joint Distributions Chapter 3.1,3.3,3.4 Joint Distributions Expectation of Conditional Multinomial Distribution

Let X1, X2, ··· , Xk be the k random variables that reflect the number of Works like any other distribution outcomes belonging to category k in n trials with the probability of success for category k being pk , X1, ··· , Xk ∼ Multinom(n, p1, ··· , pk ) X E(X |Y = y) = xP(X = x|Y = y)

x P(X1 = x1, ··· , Xk = xk ) = f (x1, ··· , xk |n, p1, ··· , pk ) n! Therefore we can calculating things like conditional mean and , x1 xk = p1 ··· pk x1! ··· xk ! k k E(W |B = 0) = 0 · 1/15 + 1 · 8/15 + 2 · 6/15 = 20/15 = 1.333 X X where xi = n and pi = 1 E(W 2|B = 0) = 02 · 1/15 + 12 · 8/15 + 22 · 6/15 = 32/15 = 2.1333 i=1 i=1

Var(W |B = 0) = E(W 2|B = 0) − E(W |B = 0)2 E(Xi ) = npi = 32/15 − (4/3)2 = 16/45 = 0.3556 Var(Xi ) = npi (1 − pi )

Cov(Xi , Xj ) = −npi pj

Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 8 / 26 Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 9 / 26

Chapter 3.1,3.3,3.4 Joint Distributions Chapter 3.1,3.3,3.4 Multinomial Example Markov’s Inequality

For any X ≥ 0 and constant a > 0, then

Some regions of DNA have an elevated amount of GC relative to AT base E(X ) pairs. If in a normal region of DNA we expect equal amounts of ACGT vs P(X ≥ a) ≤ a a GC rich region which has twice as much GC as AT. If we observe the Corollary - Chebyshev’s Inequality: following sequence ACTGACTTGGACCCGACGGA what is the probability that it is from a normal region or a GC rich region. Var(X ) P(|X − E(X )| ≥ a) ≤ a2

Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 10 / 26 Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 11 / 26 Chapter 3.1,3.3,3.4 Law of Large Numbers Chapter 3.1,3.3,3.4 Law of Large Numbers Derivation of Markov’s Inequality Derivation of Chebyshev’s Inequality

Let X be a random variable such that X ≥ 0 then Proposition - for a non-decreasing f (x) then ( E(f (X )) 1 if X ≥ a P(X ≥ a) = P(f (X ) ≥ f (a)) ≤ IX ≥a = f (a) 0 if X < a If we define the positive valued random variable to be |X − E(X )| and f (x) = x2 then aI ≤ X X ≥a E([X − E(X )]2) Var(X ) P(|X − E(X )| ≥ a) = P[(X − E(X )]2 ≥ a2) ≤ = E(aIX ≥a) ≤ E(X ) a2 a2 aE(IX ≥a) ≤ E(X ) p E(X ) If we define a = kσ where σ = Var(X ) then P(X ≥ a) ≤ a Var(X ) 1 P(|X − E(X )| ≥ kσ) ≤ = k2σ2 k2

Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 12 / 26 Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 13 / 26

Chapter 3.1,3.3,3.4 Law of Large Numbers Chapter 3.1,3.3,3.4 Law of Large Numbers Independent and Identically Distributed (iid) Sums of iid Random Variables

iid Let X1, X2, ··· , Xn ∼ D where D is some where 2 E(Xi ) = µ and Var(Xi ) = σ . We defined Sn = X1 + X2 + ··· + Xn

E(Sn) = E(X1 + X2 + ··· + Xn)

= E(X1) + E(X2) + ··· + E(Xn) = µ + µ + ··· + µ = nµ A collection of random variables that share the same probability distribution and all are mutually independent. 2 Var(Sn) = E[((X1 + X2 + ··· + Xn) − (µ + µ + ··· + µ)) ] 2 = E[((X1 − mu) + (X2 − µ) + ··· + (Xn − µ)) ] Example n n n X 2 X X = E[(Xi − µ) ] + E[(Xi − µ)(Xj − µ)] iid If X ∼ Binom(n, p) then X = Pn Y where Y , ··· , Y ∼ Bern(p) i=1 i=1 j=1 i=1 i 1 n i6=j n n n X X X 2 = Var(Xi ) + Cov(Xi , Xj ) = nσ i=1 i=1 j=1 i6=j

Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 14 / 26 Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 15 / 26 Chapter 3.1,3.3,3.4 Law of Large Numbers Chapter 3.1,3.3,3.4 Law of Large Numbers of iid Random Variables Weak Law of Large Numbers

iid Let X1, X2, ··· , Xn ∼ D where D is some probability distribution where 2 E(Xi ) = µ and Var(Xi ) = σ . We defined X¯n = (X1 + X2 + ··· + Xn)/n Based on these results and Markov’s Inequality we can show the following:

¯ E(Xn) = E(Sn/n) 2 2 2 P(|X¯n − µ| > ) = P(|Sn − nµ| ≥ n) = P[(Sn − nµ) ≥ n  ] = E(Sn)/n = µ E[(S − nµ)2] nσ2 σ2 ≤ n = = n22 n22 n2 ¯ Var(Xn) = Var(Sn/n) Therefore, given σ2 < ∞ 1 = Var(Sn) n2 nσ2 σ2 lim P(|X¯n − µ| ≥ ) = 0 = = n→∞ n2 n

Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 16 / 26 Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 17 / 26

Chapter 3.1,3.3,3.4 Law of Large Numbers Chapter 3.1,3.3,3.4 Law of Large Numbers Law of Large Numbers LLN and CLT

Law of large numbers shows us that Weak Law of Large Numbers (X¯n converges in probability to µ): Sn − nµ lim = lim (X¯n − µ) → 0 lim P(|X¯n − µ| > ) = 0 n→∞ n→∞ n→∞ n which shows that n >>> S¯n − nµ. Strong Law of Large Numbers (X¯n converges to µ): √ What happens if we divide by something that grows slower than n like n? P( lim X¯n = µ) = 1 n→∞ Sn − n ∗ mu √ d 2 lim √ = lim n(X¯n − µ) → N(0, σ ) Strong LLN is a more powerful result (Strong LLN implies Weak LLN), n→∞ n n→∞ proof is more complicated. This is the Central Limit , of which the DeMoivre-Laplace These results justify the long term definition of probability theorem for the normal approximation to the binomial is a special case. Hopefully by the end of this class we will have the tools to prove this.

Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 18 / 26 Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 19 / 26