Lecture 7: Joint Distributions and the Law of Large Numbers E(X ) = P Var(X ) = P(1 − P)

Chapter 3.1,3.3,3.4 A little more E(X ) Practice Problem - Skewness of Bernoulli Random Variable Let X ∼ Bern(p) We have shown that Lecture 7: Joint Distributions and the Law of Large Numbers E(X ) = p Var(X ) = p(1 − p) Sta230/Mth230 Find the Skewness of X where skewness is defined as Colin Rundel ! X − E(X )3 E (X − µ)3 E = February 7, 2014 SD(X ) σ3 Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 1 / 22 Chapter 3.1,3.3,3.4 Joint Distributions Chapter 3.1,3.3,3.4 Joint Distributions Joint Distributions - Example Joint Distributions - Example, cont. Draw two socks at random, without replacement, from a drawer full of Let B be the number of Black socks, W the number of White socks twelve colored socks: drawn, then the distributions of B and W are given by: 6 black, 4 white, 2 purple W Let B be the number of Black socks, W the number of White socks 0 1 2 drawn, then the distributions of B and W are given by: 1 8 6 15 0 66 66 66 66 12 24 36 0 1 2 B 1 66 66 0 66 15 15 6 5 15 6 6 36 6 5 15 2 66 0 0 66 P(B=k) 12 11 = 66 2 12 11 = 66 12 11 = 66 28 32 6 66 66 66 66 66 8 7 28 4 8 32 4 3 6 P(W=k) 12 11 = 66 2 12 11 = 66 12 11 = 66 6 4 2 b w 2−b−w 6 6 4 8 P(B = b; W = w) = k 2−k k 2−k 12 Note - B ∼ HyperGeo(12; 6; 2) = and W ∼ HyperGeo(12; 4; 2) = 12 12 2 2 2 Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 2 / 22 Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 3 / 22 Chapter 3.1,3.3,3.4 Joint Distributions Chapter 3.1,3.3,3.4 Joint Distributions Marginal Distribution Conditional Distribution Note that the column and row sums are the distributions of B and W Conditional distributions are defined as we have seen previously with respectively. P(X = x; Y = y) joint P(X = xjY = y) = = P(Y = y) marginal P(B = b) = P(B = b; W = 0) + P(B = b; W = 1) + P(B = b; W = 2) P(W = w) = P(B = 0; W = w) + P(B = 1; W = w) + P(B = 2; W = w) Therefore the pmf for white socks given no black socks were drawn is These are the marginal distributions of B and W . 8 1 15 = 1 if W = 0 P(W = w; B = 0) <> 66 66 15 In general, 8 15 8 P(W = wjB = 0) = = 66 66 = 15 if W = 1 P(B = 0) > X X : 6 15 = 6 if W = 2 P(X = x) = P(X = x; Y = y) = P(X = xjY = y)P(Y = y) 66 66 15 all y all y Z Z = P(X = x; Y = y) dy = P(X = xjY = y)P(Y = y) dy all y all y Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 4 / 22 Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 5 / 22 Chapter 3.1,3.3,3.4 Joint Distributions Chapter 3.1,3.3,3.4 Joint Distributions Expectation of Discrete Joint Distributions Expectation of Discrete Conditional Distribution X X Eg(X ; Y ) = g(x; y)P(X = x; Y = y) Works like any other discrete distribution x y X For example we can define g(x; y) = x · y then E(X jY = y) = x P(X = xjY = y) x E(BW ) =(0 · 0 · 1=66) + (0 · 1 · 8=66) + (0 · 2 · 6=66) Therefore we can calculating things like conditional means and variances, + (1 · 0 · 12=66) + (1 · 1 · 24=66) + (1 · 2 · 0=66) + (2 · 0 · 15=66) + (2 · 1 · 0=66) + (1 · 2 · 0=66) E(W jB = 0) = 0 · 1=15 + 1 · 8=15 + 2 · 6=15 = 20=15 = 1:333 =24=66 = 4=11 2 2 2 2 Note that E(BW ) 6= E(B)E(W ) since E(W jB = 0) = 0 · 1=15 + 1 · 8=15 + 2 · 6=15 = 32=15 = 2:1333 E(B)E(W ) = (0 · 15=66 + 1 · 36=66 + 2 · 15=66) × (0 · 28=66 + 1 · 32=66 + 2 · 6=66) 2 2 = 66=66 × 44=66 = 2=3 Var(W jB = 0) = E(W jB = 0) − E(W jB = 0) = 32=15 − (4=3)2 = 16=45 = 0:3556 This implies that B and W are not independent and Cov(X ; Y ) 6= 0. Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 6 / 22 Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 7 / 22 Chapter 3.1,3.3,3.4 Joint Distributions Chapter 3.1,3.3,3.4 Joint Distributions Joint Distribution - Example Joint Distribution - Example Suppose that X and Y have a discrete joint distribution for which the Suppose that X and Y have a discrete joint distribution for which the joint joint pmf is defined as follows: pmf is defined as follows ( ( cjx + yj for x; y 2 {−2; −1; 0; 1; 2g 1 (x + y) for x = 0; 1; 2 and y = 0; 1; 2; 3 f (x; y) = f (x; y) = 30 0 otherwise 0 otherwise a) What is the value of the constant c a) Determine the marginal pmf's of X and Y . b) P(X = 0 and Y = −2) b) Are X and Y independent? c) P(X = 1) d) P(X = −1jY = 0) e) P(jX − Y j ≤ 1) From De Groot and Schervish (2011) From De Groot and Schervish (2011) Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 8 / 22 Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 9 / 22 Chapter 3.1,3.3,3.4 Joint Distributions Chapter 3.1,3.3,3.4 Joint Distributions Multinomial Distribution Multinomial Example Let X1; X2; ··· ; Xk be the k random variables that reflect the number of Some regions of DNA have an elevated amount of GC relative to AT base outcomes belonging to category k in n trials with the probability of pairs. If in a normal region of DNA we expect equal amounts of ACGT vs success for category k being pk , X1; ··· ; Xk ∼ Multinom(n; p1; ··· ; pk ) a GC rich region which has twice as much GC as AT. If we observe the following sequence ACTGACTTGGACCCGACGGA what is the probability P(X1 = x1; ··· ; Xk = xk ) = f (x1; ··· ; xk jn; p1; ··· ; pk ) that it is from a normal region or a GC rich region. n! x1 xk = p1 ··· pk x1! ··· xk ! k k X X where xi = n and pi = 1 i=1 i=1 E(Xi ) = npi Var(Xi ) = npi (1 − pi ) Cov(Xi ; Xj ) = −npi pj Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 10 / 22 Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 11 / 22 Chapter 3.1,3.3,3.4 Law of Large Numbers Chapter 3.1,3.3,3.4 Law of Large Numbers Markov's Inequality Derivation of Markov's Inequality For any random variable X ≥ 0 and constant a > 0, then Let X be a random variable such that X ≥ 0 then E(X ) P(X ≥ a) ≤ a Corollary - Chebyshev's Inequality: Var(X ) P(jX − E(X )j ≥ a) ≤ a2 \The inequality says that the probability that X is far away from its mean is bounded by a quantity that increases as Var(X ) increases." Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 12 / 22 Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 13 / 22 Chapter 3.1,3.3,3.4 Law of Large Numbers Chapter 3.1,3.3,3.4 Law of Large Numbers Derivation of Chebyshev's Inequality Chebyshev's Inequality - Example Proposition - if f (x) is a non-decreasing function then Use Chebyshev's inequality to make a statement about the bounds for the probability of being with in 1, 2, or 3 standard deviations of the mean for Ef (X ) P(X ≥ a) = P f (X ) ≥ f (a) ≤ all random variables. f (a) p If we define the positive valued random variable to be jX − E(X )j and If we define a = kσ where σ = Var(X ) then 2 f (x) = x then Var(X ) 1 P(jX − E(X )j ≥ kσ) ≤ = k2σ2 k2 Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 14 / 22 Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 15 / 22 Chapter 3.1,3.3,3.4 Law of Large Numbers Chapter 3.1,3.3,3.4 Law of Large Numbers Independent and Identically Distributed (iid) Sums of iid Random Variables iid A collection of random variables that share the same probability Let X1; X2; ··· ; Xn ∼ D where D is some probability distribution with 2 distribution and all are mutually independent. E(Xi ) = µ and Var(Xi ) = σ . We defined Sn = X1 + X2 + ··· + Xn E(Sn) = E(X1 + X2 + ··· + Xn) Example = E(X1) + E(X2) + ··· + E(Xn) Pn iid = µ + µ + ··· + µ = nµ If X ∼ Binom(n; p) then X = i=1 Yi where Y1; ··· ; Yn ∼ Bern(p) 2 Var(Sn) = E[((X1 + X2 + ··· + Xn) − (µ + µ + ··· + µ)) ] 2 = E[((X1 − mu) + (X2 − µ) + ··· + (Xn − µ)) ] n n n X 2 X X = E[(Xi − µ) ] + E[(Xi − µ)(Xj − µ)] i=1 i=1 j=1 i6=j n n n X X X 2 = Var(Xi ) + Cov(Xi ; Xj ) = nσ i=1 i=1 j=1 i6=j Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 16 / 22 Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 17 / 22 Chapter 3.1,3.3,3.4 Law of Large Numbers Chapter 3.1,3.3,3.4 Law of Large Numbers Average of iid Random Variables Weak Law of Large Numbers iid Let X1; X2; ··· ; Xn ∼ D where D is some probability distribution with Based on these results and Markov's Inequality we can show the following: 2 E(Xi ) = µ and Var(Xi ) = σ .

Load more