Lecture 10: Joint Distributions and the Law of Large Numbers

Chapter 3.1,3.3,3.4 Midterm 1 Midterm #1 Exam will be passed back at the end of class Lecture 10: Joint Distributions and the Law of Large Exam was hard, on the whole the class did well: Numbers Mean: 75 Statistics 104 Median: 81 Colin Rundel SD: 21.8 February 20, 2012 Max: 105 Final grades will be curved, midterm grades will be posted this week. Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 1 / 26 Chapter 3.1,3.3,3.4 Joint Distributions Chapter 3.1,3.3,3.4 Joint Distributions Joint Distributions - Example Joint Distributions - Example, cont. Draw two socks at random, without replacement, from a drawer full of Let B be the number of Black socks, W the number of White socks twelve colored socks: drawn, then the distributions of B and W are given by: 6 black, 4 white, 2 purple W Let B be the number of Black socks, W the number of White socks 0 1 2 drawn, then the distributions of B and W are given by: 1 8 6 15 0 66 66 66 66 12 24 36 0 1 2 B 1 66 66 0 66 15 15 2 66 0 0 66 P(B=k) 6 5 = 15 2 6 6 = 36 6 5 = 15 12 11 66 12 11 66 12 11 66 28 32 6 66 66 66 66 66 P(W=k) 8 7 = 28 2 4 8 = 32 4 3 = 6 12 11 66 12 11 66 12 11 66 6 4 2 P(B = b; W = w) = b w 2−b−w 6 6 4 8 12 k 2−k k 2−k Note - B ∼ HyperGeo(12; 6; 2) = 12 and W ∼ HyperGeo(12; 4; 2) = 12 2 2 2 Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 2 / 26 Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 3 / 26 Chapter 3.1,3.3,3.4 Joint Distributions Chapter 3.1,3.3,3.4 Joint Distributions Marginal Distribution Conditional Distribution Note that the column and row sums are the distributions of B and W Conditional distributions are defined as we have seen previously with respectively. P(X = x; Y = y) joint pmf P(X = xjY = y) = = P(Y = y) marginal pmf P(B = b) = P(B = b; W = 0) + P(B = b; W = 1) + P(B = b; W = 2) Therefore the pmf for white socks given no black socks were drawn is P(W = w) = P(B = 0; W = w)+P(B = 1; W = w)+P(B = 2; W = w) 8 These are the marginal distributions of B and W . In general, 1 15 = 1 if W = 0 P(W = w; B = 0) <> 66 66 15 P(W = wjB = 0) = = 8 15 8 X X 66 66 = 15 if W = 1 P(X = x) = P(X = x; Y = y) = P(X = xjY = y)P(Y = y) P(B = 0) :> 6 15 = 6 if W = 2 y y 66 66 15 Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 4 / 26 Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 5 / 26 Chapter 3.1,3.3,3.4 Joint Distributions Chapter 3.1,3.3,3.4 Joint Distributions Expectation of Joint Distributions Independence, cont. X X E[g(X ; Y )] = g(x; y)P(X = x; Y = y) x y For example we can define g(x; y) = x · y then Remember that Cov(X ; Y ) = 0 when X and Y are independent. E(BW ) =(0 · 0 · 1=66) + (0 · 1 · 8=66) + (0 · 2 · 6=66) + (1 · 0 · 12=66) + (1 · 1 · 24=66) + (1 · 2 · 0=66) Cov(B; W ) = E[(B − E[B])(W − E[W ])] + (2 · 0 · 15=66) + (2 · 1 · 0=66) + (1 · 2 · 0=66) =24=66 = 4=11 = E(BW ) − E(B)E(W ) = 4=11 − 2=3 = −10=33 = −0:30303 Note that E(BW ) 6= E(B)E(W ) since E(B)E(W ) = (0 · 15=66 + 1 · 36=66 + 2 · 15=66) × (0 · 28=66 + 1 · 32=66 + 2 · 6=66) =66=66 × 44=66 = 2=3 This implies that B and W are not independent. Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 6 / 26 Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 7 / 26 Chapter 3.1,3.3,3.4 Joint Distributions Chapter 3.1,3.3,3.4 Joint Distributions Expectation of Conditional Probability Multinomial Distribution Let X1; X2; ··· ; Xk be the k random variables that reflect the number of Works like any other distribution outcomes belonging to category k in n trials with the probability of success for category k being pk , X1; ··· ; Xk ∼ Multinom(n; p1; ··· ; pk ) X E(X jY = y) = xP(X = xjY = y) x P(X1 = x1; ··· ; Xk = xk ) = f (x1; ··· ; xk jn; p1; ··· ; pk ) n! Therefore we can calculating things like conditional mean and variance, x1 xk = p1 ··· pk x1! ··· xk ! k k E(W jB = 0) = 0 · 1=15 + 1 · 8=15 + 2 · 6=15 = 20=15 = 1:333 X X where xi = n and pi = 1 E(W 2jB = 0) = 02 · 1=15 + 12 · 8=15 + 22 · 6=15 = 32=15 = 2:1333 i=1 i=1 Var(W jB = 0) = E(W 2jB = 0) − E(W jB = 0)2 E(Xi ) = npi = 32=15 − (4=3)2 = 16=45 = 0:3556 Var(Xi ) = npi (1 − pi ) Cov(Xi ; Xj ) = −npi pj Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 8 / 26 Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 9 / 26 Chapter 3.1,3.3,3.4 Joint Distributions Chapter 3.1,3.3,3.4 Law of Large Numbers Multinomial Example Markov's Inequality For any random variable X ≥ 0 and constant a > 0, then Some regions of DNA have an elevated amount of GC relative to AT base E(X ) pairs. If in a normal region of DNA we expect equal amounts of ACGT vs P(X ≥ a) ≤ a a GC rich region which has twice as much GC as AT. If we observe the Corollary - Chebyshev's Inequality: following sequence ACTGACTTGGACCCGACGGA what is the probability that it is from a normal region or a GC rich region. Var(X ) P(jX − E(X )j ≥ a) ≤ a2 Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 10 / 26 Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 11 / 26 Chapter 3.1,3.3,3.4 Law of Large Numbers Chapter 3.1,3.3,3.4 Law of Large Numbers Derivation of Markov's Inequality Derivation of Chebyshev's Inequality Let X be a random variable such that X ≥ 0 then Proposition - for a non-decreasing function f (x) then ( E(f (X )) 1 if X ≥ a P(X ≥ a) = P(f (X ) ≥ f (a)) ≤ IX ≥a = f (a) 0 if X < a If we define the positive valued random variable to be jX − E(X )j and f (x) = x2 then aI ≤ X X ≥a E([X − E(X )]2) Var(X ) P(jX − E(X )j ≥ a) = P[(X − E(X )]2 ≥ a2) ≤ = E(aIX ≥a) ≤ E(X ) a2 a2 aE(IX ≥a) ≤ E(X ) p E(X ) If we define a = kσ where σ = Var(X ) then P(X ≥ a) ≤ a Var(X ) 1 P(jX − E(X )j ≥ kσ) ≤ = k2σ2 k2 Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 12 / 26 Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 13 / 26 Chapter 3.1,3.3,3.4 Law of Large Numbers Chapter 3.1,3.3,3.4 Law of Large Numbers Independent and Identically Distributed (iid) Sums of iid Random Variables iid Let X1; X2; ··· ; Xn ∼ D where D is some probability distribution where 2 E(Xi ) = µ and Var(Xi ) = σ . We defined Sn = X1 + X2 + ··· + Xn E(Sn) = E(X1 + X2 + ··· + Xn) = E(X1) + E(X2) + ··· + E(Xn) = µ + µ + ··· + µ = nµ A collection of random variables that share the same probability distribution and all are mutually independent. 2 Var(Sn) = E[((X1 + X2 + ··· + Xn) − (µ + µ + ··· + µ)) ] 2 = E[((X1 − mu) + (X2 − µ) + ··· + (Xn − µ)) ] Example n n n X 2 X X = E[(Xi − µ) ] + E[(Xi − µ)(Xj − µ)] iid If X ∼ Binom(n; p) then X = Pn Y where Y ; ··· ; Y ∼ Bern(p) i=1 i=1 j=1 i=1 i 1 n i6=j n n n X X X 2 = Var(Xi ) + Cov(Xi ; Xj ) = nσ i=1 i=1 j=1 i6=j Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 14 / 26 Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 15 / 26 Chapter 3.1,3.3,3.4 Law of Large Numbers Chapter 3.1,3.3,3.4 Law of Large Numbers Average of iid Random Variables Weak Law of Large Numbers iid Let X1; X2; ··· ; Xn ∼ D where D is some probability distribution where 2 E(Xi ) = µ and Var(Xi ) = σ . We defined X¯n = (X1 + X2 + ··· + Xn)=n Based on these results and Markov's Inequality we can show the following: ¯ E(Xn) = E(Sn=n) 2 2 2 P(jX¯n − µj > ) = P(jSn − nµj ≥ n) = P[(Sn − nµ) ≥ n ] = E(Sn)=n = µ E[(S − nµ)2] nσ2 σ2 ≤ n = = n22 n22 n2 ¯ Var(Xn) = Var(Sn=n) Therefore, given σ2 < 1 1 = Var(Sn) n2 nσ2 σ2 lim P(jX¯n − µj ≥ ) = 0 = = n!1 n2 n Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 16 / 26 Statistics 104 (Colin Rundel) Lecture 10 February 20, 2012 17 / 26 Chapter 3.1,3.3,3.4 Law of Large Numbers Chapter 3.1,3.3,3.4 Law of Large Numbers Law of Large Numbers LLN and CLT Law of large numbers shows us that Weak Law of Large Numbers (X¯n converges in probability to µ): Sn − nµ lim = lim (X¯n − µ) ! 0 lim P(jX¯n − µj > ) = 0 n!1 n!1 n!1 n which shows that n >>> S¯n − nµ.

Load more