<<

Chapter 3.1,3.3,3.4 A little more E(X ) Practice Problem - Skewness of Bernoulli

Let X ∼ Bern(p) We have shown that Lecture 7: Joint Distributions and the E(X ) = p Var(X ) = p(1 − p)

Sta230/Mth230 Find the Skewness of X where skewness is defined as Colin Rundel ! X − E(X )3 E (X − µ)3 E = February 7, 2014 SD(X ) σ3

Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 1 / 22

Chapter 3.1,3.3,3.4 Joint Distributions Chapter 3.1,3.3,3.4 Joint Distributions Joint Distributions - Example Joint Distributions - Example, cont.

Draw two socks at random, without replacement, from a drawer full of Let B be the number of Black socks, W the number of White socks twelve colored socks: drawn, then the distributions of B and W are given by:

6 black, 4 white, 2 purple W Let B be the number of Black socks, W the number of White socks 0 1 2 drawn, then the distributions of B and W are given by: 1 8 6 15 0 66 66 66 66 12 24 36 0 1 2 B 1 66 66 0 66 15 15 6 5 15 6 6 36 6 5 15 2 66 0 0 66 P(B=k) 12 11 = 66 2 12 11 = 66 12 11 = 66 28 32 6 66 66 66 66 66 8 7 28 4 8 32 4 3 6 P(W=k) 12 11 = 66 2 12 11 = 66 12 11 = 66 6 4  2  b w 2−b−w 6 6  4 8  P(B = b, W = w) = k 2−k k 2−k 12 Note - B ∼ HyperGeo(12, 6, 2) = and W ∼ HyperGeo(12, 4, 2) = 12 12 2 2 2 Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 2 / 22 Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 3 / 22 Chapter 3.1,3.3,3.4 Joint Distributions Chapter 3.1,3.3,3.4 Joint Distributions Conditional Distribution

Note that the column and row sums are the distributions of B and W Conditional distributions are defined as we have seen previously with respectively. P(X = x, Y = y) joint P(X = x|Y = y) = = P(Y = y) marginal P(B = b) = P(B = b, W = 0) + P(B = b, W = 1) + P(B = b, W = 2) P(W = w) = P(B = 0, W = w) + P(B = 1, W = w) + P(B = 2, W = w) Therefore the pmf for white socks given no black socks were drawn is These are the marginal distributions of B and W .  1  15 = 1 if W = 0 P(W = w, B = 0)  66 66 15 In general, 8  15 8 P(W = w|B = 0) = = 66 66 = 15 if W = 1 P(B = 0)  X X  6  15 = 6 if W = 2 P(X = x) = P(X = x, Y = y) = P(X = x|Y = y)P(Y = y) 66 66 15 all y all y Z Z = P(X = x, Y = y) dy = P(X = x|Y = y)P(Y = y) dy all y all y

Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 4 / 22 Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 5 / 22

Chapter 3.1,3.3,3.4 Joint Distributions Chapter 3.1,3.3,3.4 Joint Distributions Expectation of Discrete Joint Distributions Expectation of Discrete Conditional Distribution

X X Eg(X , Y ) = g(x, y)P(X = x, Y = y) Works like any other discrete distribution x y X For example we can define g(x, y) = x · y then E(X |Y = y) = x P(X = x|Y = y) x

E(BW ) =(0 · 0 · 1/66) + (0 · 1 · 8/66) + (0 · 2 · 6/66) Therefore we can calculating things like conditional and , + (1 · 0 · 12/66) + (1 · 1 · 24/66) + (1 · 2 · 0/66) + (2 · 0 · 15/66) + (2 · 1 · 0/66) + (1 · 2 · 0/66) E(W |B = 0) = 0 · 1/15 + 1 · 8/15 + 2 · 6/15 = 20/15 = 1.333 =24/66 = 4/11

2 2 2 2 Note that E(BW ) 6= E(B)E(W ) since E(W |B = 0) = 0 · 1/15 + 1 · 8/15 + 2 · 6/15 = 32/15 = 2.1333

E(B)E(W ) = (0 · 15/66 + 1 · 36/66 + 2 · 15/66) × (0 · 28/66 + 1 · 32/66 + 2 · 6/66) 2 2 = 66/66 × 44/66 = 2/3 Var(W |B = 0) = E(W |B = 0) − E(W |B = 0) = 32/15 − (4/3)2 = 16/45 = 0.3556 This implies that B and W are not independent and Cov(X , Y ) 6= 0. Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 6 / 22 Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 7 / 22 Chapter 3.1,3.3,3.4 Joint Distributions Chapter 3.1,3.3,3.4 Joint Distributions Joint Distribution - Example Joint Distribution - Example

Suppose that X and Y have a discrete joint distribution for which the Suppose that X and Y have a discrete joint distribution for which the joint joint pmf is defined as follows: pmf is defined as follows ( ( c|x + y| for x, y ∈ {−2, −1, 0, 1, 2} 1 (x + y) for x = 0, 1, 2 and y = 0, 1, 2, 3 f (x, y) = f (x, y) = 30 0 otherwise 0 otherwise

a) What is the value of the constant c a) Determine the marginal pmf’s of X and Y . b) P(X = 0 and Y = −2) b) Are X and Y independent? c) P(X = 1)

d) P(X = −1|Y = 0)

e) P(|X − Y | ≤ 1) From De Groot and Schervish (2011)

From De Groot and Schervish (2011)

Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 8 / 22 Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 9 / 22

Chapter 3.1,3.3,3.4 Joint Distributions Chapter 3.1,3.3,3.4 Joint Distributions Multinomial Distribution Multinomial Example

Let X1, X2, ··· , Xk be the k random variables that reflect the number of Some regions of DNA have an elevated amount of GC relative to AT base outcomes belonging to category k in n trials with the of pairs. If in a normal region of DNA we expect equal amounts of ACGT vs success for category k being pk , X1, ··· , Xk ∼ Multinom(n, p1, ··· , pk ) a GC rich region which has twice as much GC as AT. If we observe the following sequence ACTGACTTGGACCCGACGGA what is the probability P(X1 = x1, ··· , Xk = xk ) = f (x1, ··· , xk |n, p1, ··· , pk ) that it is from a normal region or a GC rich region.

n! x1 xk = p1 ··· pk x1! ··· xk ! k k X X where xi = n and pi = 1 i=1 i=1

E(Xi ) = npi

Var(Xi ) = npi (1 − pi )

Cov(Xi , Xj ) = −npi pj

Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 10 / 22 Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 11 / 22 Chapter 3.1,3.3,3.4 Law of Large Numbers Chapter 3.1,3.3,3.4 Law of Large Numbers Markov’s Inequality Derivation of Markov’s Inequality

For any random variable X ≥ 0 and constant a > 0, then Let X be a random variable such that X ≥ 0 then

E(X ) P(X ≥ a) ≤ a Corollary - Chebyshev’s Inequality:

Var(X ) P(|X − E(X )| ≥ a) ≤ a2 “The inequality says that the probability that X is far away from its is bounded by a quantity that increases as Var(X ) increases.”

Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 12 / 22 Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 13 / 22

Chapter 3.1,3.3,3.4 Law of Large Numbers Chapter 3.1,3.3,3.4 Law of Large Numbers Derivation of Chebyshev’s Inequality Chebyshev’s Inequality - Example

Proposition - if f (x) is a non-decreasing then Use Chebyshev’s inequality to make a statement about the bounds for the probability of being with in 1, 2, or 3 standard deviations of the mean for   Ef (X ) P(X ≥ a) = P f (X ) ≥ f (a) ≤ all random variables. f (a) p If we define the positive valued random variable to be |X − E(X )| and If we define a = kσ where σ = Var(X ) then 2 f (x) = x then Var(X ) 1 P(|X − E(X )| ≥ kσ) ≤ = k2σ2 k2

Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 14 / 22 Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 15 / 22 Chapter 3.1,3.3,3.4 Law of Large Numbers Chapter 3.1,3.3,3.4 Law of Large Numbers Independent and Identically Distributed (iid) Sums of iid Random Variables

iid A collection of random variables that share the same probability Let X1, X2, ··· , Xn ∼ D where D is some with 2 distribution and all are mutually independent. E(Xi ) = µ and Var(Xi ) = σ . We defined Sn = X1 + X2 + ··· + Xn

E(Sn) = E(X1 + X2 + ··· + Xn)

Example = E(X1) + E(X2) + ··· + E(Xn) Pn iid = µ + µ + ··· + µ = nµ If X ∼ Binom(n, p) then X = i=1 Yi where Y1, ··· , Yn ∼ Bern(p)

2 Var(Sn) = E[((X1 + X2 + ··· + Xn) − (µ + µ + ··· + µ)) ] 2 = E[((X1 − mu) + (X2 − µ) + ··· + (Xn − µ)) ] n n n X 2 X X = E[(Xi − µ) ] + E[(Xi − µ)(Xj − µ)] i=1 i=1 j=1 i6=j n n n X X X 2 = Var(Xi ) + Cov(Xi , Xj ) = nσ i=1 i=1 j=1 i6=j

Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 16 / 22 Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 17 / 22

Chapter 3.1,3.3,3.4 Law of Large Numbers Chapter 3.1,3.3,3.4 Law of Large Numbers of iid Random Variables Weak Law of Large Numbers

iid Let X1, X2, ··· , Xn ∼ D where D is some probability distribution with Based on these results and Markov’s Inequality we can show the following: 2 E(Xi ) = µ and Var(Xi ) = σ .

We defined X n = (X1 + X2 + ··· + Xn)/n then

E(X¯n) = E(Sn/n) = E(Sn)/n = µ

Var(X¯n) = Var(Sn/n) 1 = Var(S ) 2 n2 n Therefore, as long as σ < ∞ nσ2 σ2 = = 2 lim P(|X¯n − µ| ≥ ) = 0 ⇒ lim P(|X¯n − µ| < ) = 1 n n n→∞ n→∞

Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 18 / 22 Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 19 / 22 Chapter 3.1,3.3,3.4 Law of Large Numbers Chapter 3.1,3.3,3.4 Law of Large Numbers Law of Large Numbers LLN - Example

Weak Law of Large Numbers (X¯n converges in probability to µ): How large a random sample must be taken from a given distribution in order for the probability to be at least 0.99 that the sample mean will be lim P(|X¯n − µ| > ) = 0 within 2 standard deviations of the mean of the distribution? n→∞ What about 0.95 probability to be within 1 standard deviations of the ¯ Strong Law of Large Numbers (Xn converges to µ): mean?   P lim X¯n = µ = 1 n→∞

Strong LLN is a more powerful result (Strong LLN implies Weak LLN), but its proof is more complicated.

Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 20 / 22 Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 21 / 22

Chapter 3.1,3.3,3.4 Law of Large Numbers LLN and CLT

Law of large numbers shows us that

Sn − nµ lim = lim (X¯n − µ) → 0 n→∞ n n→∞

which shows that for large n, n  S¯n − nµ.

√ What happens if we divide by something that grows slower than n like n?

Sn − nµ √ d 2 lim √ = lim n(X¯n − µ) → N(0, σ ) n→∞ n n→∞

This is the Central Limit , of which the DeMoivre-Laplace theorem for the normal approximation to the binomial is a special case. Hopefully by the end of this class we will have the tools to prove this.

Sta230/Mth230 (Colin Rundel) Lecture 7 February 7, 2014 22 / 22