<<

Chapter 5 Chapter 5 sections

Discrete distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions Continuous univariate distributions: 5.6 Normal distributions 5.7 Gamma distributions Just skim 5.8 Beta distributions Multivariate distributions Just skim 5.9 Multinomial distributions 5.10 Bivariate normal distributions

1 / 43 Chapter 5 5.1 Introduction Families of distributions

How: and Parameter space pf /pdf and cdf - new notation: f (x| ) , and the m.g.f. ψ(t) Features, connections to other distributions, approximation Reasoning behind a distribution Why: Natural justification for certain A model for the in an All models are wrong, but some are useful – George Box

2 / 43 Chapter 5 5.2 Bernoulli and Binomial distributions Bernoulli distributions

Def: Bernoulli distributions – Bernoulli(p) A .v. X has the with parameter p if P(X = 1) = p and P(X = 0) = 1 − p. The pf of X is

 px (1 − p)1−x for x = 0, 1 f (x|p) = 0 otherwise

Parameter space: p ∈ [0, 1]

In an experiment with only two possible outcomes, “success” and “failure”, let X = number successes. Then X ∼ Bernoulli(p) where p is the of success. E(X) = p, Var(X) = p(1 − p) and ψ(t) = E(etX ) = pet + (1 − p)   0 for x < 0 The cdf is F(x|p) = 1 − p for 0 ≤ x < 1  1 for x ≥ 1

3 / 43 Chapter 5 5.2 Bernoulli and Binomial distributions Binomial distributions

Def: Binomial distributions – Binomial(n, p) A r.v. X has the with parameters n and p if X has the pf

 npx (1 − p)n−x for x = 0, 1,..., n f (x|n, p) = x 0 otherwise

Parameter space: n is a positive integer and p ∈ [0, 1]

If X is the number of “successes” in n independent tries where prob. of success is p each time, then X ∼ Binomial(n, p) Theorem 5.2.1

If X1, X2,..., Xn form n Bernoulli trials with parameter p (i.e. are i.i.d. Bernoulli(p)) then X = X1 + ··· + Xn ∼ Binomial(n, p)

4 / 43 Chapter 5 5.2 Bernoulli and Binomial distributions Binomial distributions

Let X ∼ Binomial(n, p) E(X) = np, Var(X) = np(1 − p)

To find the m.g.f. of X write X = X1 + ··· + Xn where Xi ’s are t i.i.d. Bernoulli(p). Then ψi (t) = pe + 1 − p and we get

n n Y Y t  t n ψ(t) = ψi (t) = pe + 1 − p = (pe + 1 − p) i=1 i=1

Px n t n−t cdf: F(x|n, p) = t=0 t p (1 − p) = yikes!

Theorem 5.2.2

If Xi ∼ Binomial(ni , p), i = 1,..., k and the Xi ’s are independent, then Pk X = X1 + ··· + Xk ∼ Binomial( i=1 ni , p)

5 / 43 Chapter 5 5.2 Bernoulli and Binomial distributions Example: Blood testing (Example 5.2.7)

The setup: 1000 people need to be tested for a disease that affects 0.2% of all people. The test is guaranteed to detect the disease if it is present in a blood sample.

Task: Find all the people that have the disease. Strategy: Test 1000 samples What’s the expected number of people that have the disease? Any assumptions you need to make? Strategy (611): Divide the people into 10 groups of 100. For each group take a portion of each of the 100 blood samples and combine into one sample. Then test the combined blood samples (10 tests).

6 / 43 Chapter 5 5.2 Bernoulli and Binomial distributions Example: Blood testing (Example 5.2.7) – continued

Strategy (611): If all of these tests are negative then none of the 1000 people have the disease. Total number of tests needed: 10 If one of these tests are positive then we test each of the 100 people in that group. Total number of tests needed: 110 ... If all of the 10 tests are positive we end up having to do 1010 tests Is this strategy better? What is the expected number of tests needed? When does this strategy lose?

7 / 43 Chapter 5 5.2 Bernoulli and Binomial distributions Example: Blood testing (Example 5.2.7) – continued

Let Yi = 1 if test for group i is positive and Yi = 0 otherwise

Let Y = Y1 + ··· + Y10 = the number of groups where every individual has to be tested. Total number of tests needed: T = 10 + 100Y .

Let Zi = number of people in group i that have the disease, i = 1,..., 10. Then Zi ∼ Binomial(100, 0.002) Then Yi is a Bernoulli(p) r.v. where

p = P(Yi = 1) = P(Zi > 0) = 1 − P(Zi = 0) 100 = 1 − 0.0020(1 − 0.002)100 = 1 − 0.998100 = 0.181 0 Then Y ∼ Binomial(10, 0.181) ET = E(10+100Y ) = 10+100E(Y ) = 10+100(10×0.181) = 191

8 / 43 Chapter 5 5.2 Bernoulli and Binomial distributions Example: Blood testing (Example 5.2.7) – continued

When does this strategy (611) lose? Worst case scenario 10 10 0 P(T ≥ 1000) = P(Y ≥ 9.9) = P(Y = 10) = 10 0.181 0.819 3.8 × 10−8

Question: can we go further - a 611-A strategy Any further improvement?

9 / 43 Chapter 5 5.3 Hypergeometric distributions Hypergeometric distributions

Def: Hypergeometric distributions A random X has the Hypergeometric distribution with parameters N, M and n if it has the pf

N M  f (x|N, M, n) = x n−x N+M n Parameter space: N, M and n are nonnegative integers with n ≤ N + M

Reasoning: Say we have a finite population with N items of type I and M items of type II. Let X be the number of items of type I when we take n samples without replacement from that population Then X has the hypergeometric distribution 10 / 43 Chapter 5 5.3 Hypergeometric distributions Hypergeometric distributions

Binomial: with replacement (effectively infinite population) Hypergeometric: Sample without replacement from a finite population You can also think of the Hypergeometric distribution as a sum of dependent Bernoulli trials Limiting situation: Theorem 5.3.4: If the samples size n is much smaller than the total population N + M then the Hypergeometric distribution with parameters N, M and n will be nearly the same as the Binomial distribution with parameters

N n and p = N + M

11 / 43 Chapter 5 5.4 Poisson distributions Poisson distributions

Def: Poisson distributions – Poisson(λ) A X has the with mean λ if it has the pf −λ x  e λ for x = 0, 1, 2 ... f (x|λ) = x! 0 otherwise Parameter space: λ > 0

Show that f (x|λ) is a pf Var(X) = λ E(X) = λ ψ(t) = eλ(et −1)

Px e−λλk The cdf: F(x|λ) = k=0 k! = yikes.

12 / 43 Chapter 5 5.4 Poisson distributions Why Poisson?

The Poisson distribution is useful for modeling uncertainty in counts / arrivals Examples: How many calls arrive at a switch board in one hour? How many busses pass while you wait at the bus stop for 10 min? How many bird nests are there in a certain area? Under certain conditions (Poisson postulates) the Poisson distribution can be shown to be the distribution of the number of arrivals (Poisson process). However, the Poisson distribution is often used as a model for uncertainty of counts in other types of experiments. The Poisson distribution can also be used as an approximation to the Binomial(n, p) distribution when n is large and p is small.

13 / 43 Chapter 5 5.4 Poisson distributions Poisson Postulates

For t ≥ 0, let Xt be a random variable with possible values in N0 (Think: Xt = number of arrivals from time 0 to time t)

(i) Start with no arrivals: X0 = 0

(ii) Arrivals in disjoint time periods are ind.: Xs and Xt − Xs ind. if s < t (iii) Number of arrivals depends only on period length:

Xs and Xt+s − Xt are identically distributed (iv) Arrival probability is proportional to period length, if length is small: P(X = 1) lim t = λ t→0 t

P(Xt >1) (v) No simultaneous arrivals: limt→0 t = 0 If (i) - (v) hold then for any integer n (λt)n P(X = n) = e−λt that is, X ∼ Poisson(λt) t n! t Can be defined in terms of spatial areas too. 14 / 43 Chapter 5 5.4 Poisson distributions Properties of the Poisson Distributions

λ Useful recursive property: P(X = x) = x P(X = x − 1) for x ≥ 1 Theorem 5.4.4: Sum of Poissons is a Poisson

If X1,..., Xk are independent r.v. and Xi ∼ Poisson(λi ) for all i, then k ! X X1 + ··· + Xk ∼ Poisson λi i=1

Theorem 5.4.5: Approximation to Binomial ∞ Let Xn ∼ Binomial(n, pn), where 0 < pn < 1 for all n and {pn}n=1 is a sequence so that limn→∞ npn = λ. Then

x −λ λ Poisson lim fX (x|n, pn) = e = f (x|λ) n→∞ n x! for all x = 0, 1, 2,...

15 / 43 Chapter 5 5.4 Poisson distributions Example: Poisson as approximation to Binomial Recall the disease testing example. We had 1000 X X = Xi ∼ Binomial(1000, 0.002) and i=1 Y ∼ Binomial(100, 0.181)

16 / 43 Chapter 5 5.5 Negative Binomial distributions Geometric distributions

Def: Geometric distributions Geometric(p) A random variable X has the with parameter p if it has the pf

 p(1 − p)x for x = 0, 1, 2 ... f (x|r, p) = 0 otherwise

Parameter space: 0 < p < 1

Say we have an infinite sequence of Bernoulli trials with parameter p X = number of “failures” before the first “success” . Then X ∼ Geometric(p)

17 / 43 Chapter 5 5.5 Negative Binomial distributions Negative Binomial distributions

Def: Negative Binomial distributions – NegBinomial(r, p) A random variable X has the Negative Binomial distribution with parameters r and p if it has the pf

 r+x−1pr (1 − p)x for x = 0, 1, 2 ... f (x|r, p) = x 0 otherwise

Parameter space: 0 < p < 1 and r positive integer.

Say we have an infinite sequence of Bernoulli trials with parameter p X = number of “failures” before the r th “success”. Then X ∼ NegBinomial(r, p) Geometric(p) = NegBinomial(1, p) Theorem 5.5.2: If X1,..., Xr are i.i.d. Geometric(p) then X = X1 + ··· + Xr ∼ NegBinomial(r, p)

18 / 43 Chapter 5 5.5 Negative Binomial distributions Chapter 5 sections

Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions Continuous univariate distributions: 5.6 Normal distributions 5.7 Gamma distributions Just skim 5.8 Beta distributions Multivariate distributions Just skim 5.9 Multinomial distributions 5.10 Bivariate normal distributions

19 / 43 Chapter 5 5.7 Gamma distributions Gamma distributions

The Gamma : Γ(α) = R ∞ xα−1e−x dx √ 0 Γ(1) = 1 and Γ(0.5) = π Γ(α) = (α − 1)Γ(α − 1) if α > 1

Def: Gamma distributions – Gamma(α, β) A continuous r.v. X has the with parameters α and β if it has the pdf

( α β xα−1e−βx for x > 0 f (x|α, β) = Γ(α) 0 otherwise

Parameter space: α > 0 and β > 0

Gamma(1, β) is the same as the with parameter β, Expo(β)

20 / 43 Chapter 5 5.7 Gamma distributions Properties of the gamma distributions

α  β  ψ(t) = β−t , for t < β. α α E(X) = β and E(X) = β2 If X1,..., Xk are independent Γ(αi , β) r.v. then

k ! X X1 + ··· + Xk ∼ Gamma αi , β i=1

21 / 43 Chapter 5 5.7 Gamma distributions Properties of the gamma distributions

Theorem 5.7.9: Exponential distribution is memoryless Let X ∼ Expo(β) and let t > 0. Then for any h > 0

P(X ≥ t + h|X ≥ t) = P(X ≥ h)

Theorem 5.7.12: Times between arrivals in a Poisson process th Let Zk be the time until the k arrival in a Poisson process with rate β. Let Y1 = Z1 and Yk = Zk − Zk−1 for k ≥ 2. Then Y1, Y2, Y3,... are i.i.d. with the exponential distribution with parameter β.

22 / 43 Chapter 5 5.8 Beta distributions Beta distributions

Def: Beta distributions – Beta(α, β) A continuous r.v. X has the with parameters α and β if it has the pdf ( Γ(α+β) xα−1(1 − x)β−1 for 0 < x < 1 f (x|α, β) = Γ(α)Γ(β) 0 otherwise

Parameter space: α > 0 and β > 0

Beta(1, 1) = Uniform(0, 1) Used to model a r.v.that takes values between 0 and 1. The Beta distributions are often used as prior distributions for probability parameters, e.g. the p in the Binomial distribution.

23 / 43 Chapter 5 5.8 Beta distributions Beta distributions

24 / 43 Chapter 5 5.8 Beta distributions Chapter 5 sections

Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions Continuous univariate distributions: 5.6 Normal distributions 5.7 Gamma distributions Just skim 5.8 Beta distributions Multivariate distributions Just skim 5.9 Multinomial distributions 5.10 Bivariate normal distributions

25 / 43 Chapter 5 5.6 Normal distributions Why Normal? Works well in practice. Many physical experiments have distributions that are approximately normal Central Theorem: Sum of many i.i.d. random variables are approximately normally distributed Mathematically convenient – especially the multivariate . Can explicitly obtain the distribution of many Gauss functions of a normally distributed random variable have. Marginal and conditional distributions of a multivariate normal are also normal (multivariate or univariate).

Developed by Gauss and then Laplace in the early 1800s Also known at the Gaussian distributions Laplace

26 / 43 Chapter 5 5.6 Normal distributions Normal distributions

Def: Normal distributions – N(µ, σ2) A continuous r.v. X has the normal distribution with mean µ and variance σ2 if it has the pdf

1  (x − µ)2  f (x|µ, σ2) = √ exp − , −∞ < x < ∞ 2π σ 2σ2

Parameter space: µ ∈ R and σ2 > 0 Show: 1 2 2 ψ(t) = exp µt + 2 σ t E(X) = µ Var(X) = σ2

27 / 43 Chapter 5 5.6 Normal distributions

28 / 43 Chapter 5 5.6 Normal distributions Standard normal

Standard normal distribution: N(0, 1) The normal distribution with µ = 0 and σ2 = 1 is called the standard normal distribution and the pdf and cdf are denoted as φ(x) and Φ(x)

The cdf for a normal distribution cannot be expressed in closed form and is evaluated using numerical approximations. Φ(x) is tabulated in the back of the book. Many calculators and programs such as R, Matlab, Excel etc. can calculate Φ(x). Φ(−x) = 1 − Φ(x) Φ−1(p) = −Φ−1(1 − p)

29 / 43 Chapter 5 5.6 Normal distributions Properties of the normal distributions

Theorem 5.6.4: Linear transformation of a normal is still normal If X ∼ N(µ, σ2) and Y = aX + b where a and b are constants and a 6= 0 then Y ∼ N(aµ + b, a2σ2)

Let F be the cdf of X, where X ∼ N(µ, σ2). Then

x − µ F(x) = Φ σ

and F −1(p) = µ + σΦ−1(p)

30 / 43 Chapter 5 5.6 Normal distributions Example: Measured Voltage

Suppose the measured voltage, X, in a certain electric circuit has the normal distribution with mean 120 and 2 1 What is the probability that the measured voltage is between 118 and 122? 2 Below what value will 95% of the measurements be?

31 / 43 Chapter 5 5.6 Normal distributions Properties of the normal distributions

Theorem 5.6.7: of ind. normals is a normal 2 Let X1,..., Xk be independent r.v. and Xi ∼ N(µi , σi ) for i = 1,..., k. Then  2 2 X1 + ··· + Xk ∼ N µ1 + ··· + µk , σ1 + ··· + σk

Also, if a1,..., ak and b are constants where at least one ai is not zero:

k k ! X X 2 2 a1X1 + ··· + ak Xk + b ∼ N b + ai µi , ai σi i=1 i=1

In particular: 1 Pn The sample mean: X n = n i=1 Xi 2 If X1,..., Xn are a random sample from a N(µ, σ ), what is the distribution of the sample mean?

32 / 43 Chapter 5 5.6 Normal distributions Example: Measured voltage – continued

Suppose the measured voltage, X, in a certain electric circuit has the normal distribution with mean 120 and standard deviation 2. If three independent measurements of the voltage are made, what is the probability that the sample mean X 3 will lie between 118 and 120?

Find x that satisfies P(|X 3 − 120| ≤ x) = 0.95

33 / 43 Chapter 5 5.6 Normal distributions Area under the curve

34 / 43 Chapter 5 5.6 Normal distributions Lognormal distributions

Def: Lognormal distributions If log(X) ∼ N(µ, σ2) then we say that X has the Lognormal distribution with parameters µ and σ2.

The of the lognormal distribution is (0, ∞). Often used to model time before failure.

Example: Let X and Y be independent random variables such that log(X) ∼ N(1.6, 4.5) and log(Y ) ∼ N(3, 6). What is the distribution of the product XY ?

35 / 43 Chapter 5 5.10 Bivariate normal distributions Bivariate normal distributions

Def: Bivariate normal

Two continuous r.v. X1 and X2 have the bivariate normal distribution 2 2 with µ1 and µ2, σ1 and σ2 and correlation ρ if they have the joint pdf 1 ( , ) = f x1 x2 2 1/2 2π(1 − ρ ) σ1σ2  1  (x − µ )2  x − µ   x − µ  (x − µ )2  × − 1 1 − ρ 1 1 2 2 + 2 2 exp 2 2 2 2 (1) 2(1 − ρ ) σ1 σ1 σ2 σ2 2 Parameter space: µi ∈ R, σi > 0 for i = 1, 2 and −1 ≤ ρ ≤ 1

36 / 43 Chapter 5 5.10 Bivariate normal distributions Bivariate normal pdf

Bivariate normal pdf with different ρ:

Contours:

37 / 43 Chapter 5 5.10 Bivariate normal distributions Bivariate normal as linear combination

Theorem 5.10.1: Bivariate normal from two ind. standard normals

Let Z1 ∼ N(0, 1) and Z2 ∼ N(0, 1) be independent. 2 Let µi ∈ R, σi > 0 for i = 1, 2 and −1 ≤ ρ ≤ 1 and let

X1 = σ1Z1 + µ1 p 2 X2 = σ2(ρZ1 + 1 − ρ Z2) + µ2 (2)

Then the joint distribution of X1 and X2 is bivariate normal with 2 2 parameters µ1, µ2, σ1, σ2 and ρ

Theorem 5.10.2 (part 1) – the other way

Let X1 and X2 have the pdf in (1). Then there exist independent standard normal r.v. Z1 and Z2 so that (2) holds.

38 / 43 Chapter 5 5.10 Bivariate normal distributions Properties of a bivariate normal

Theorem 5.10.2 (part 2)

Let X1 and X2 have the pdf in (1). Then the marginal distributions are

2 2 X1 ∼ N(µ1, σ1) and X2 ∼ N(µ2, σ2)

And the correlation between X1 and X2 is ρ

Theorem 5.10.4: The conditional is normal

Let X1 and X2 have the pdf in (1). Then the conditional distribution of X2 given that X1 = x1 is (univariate) normal with

(x1 − µ1) E(X2|X1 = x1) = µ2 + ρσ2 and σ1 2 2 Var(X2|X1 = x1) = (1 − ρ )σ2

39 / 43 Chapter 5 5.10 Bivariate normal distributions Properties of a bivariate normal

Theorem 5.10.3: Uncorrelated ⇒ Independent

Let X1 and X2 have the bivariate normal distribution. Then X1 and X2 are independent if and only if they are uncorrelated.

Only holds for the multivariate normal distribution One of the very convenient properties of the normal distribution

Theorem 5.10.5: Linear combinations are normal

Let X1 and X2 have the pdf in (1) and let a1, a2 and b be constants. Then Y = a1X1 + a2X2 + b is normally distributed with

E(Y ) = a1µ1 + a2µ2 + b and 2 2 2 2 Var(Y ) = a1σ1 + a2σ2 + 2a1a2ρσ1σ2

This extends what we already had for independent normals

40 / 43 Chapter 5 5.10 Bivariate normal distributions Example

Let X1 and X2 have the bivariate normal distribution with means µ1 = 3, µ2 = 5, 2 2 variances σ1 = 4, σ2 = 9 and correlation ρ = 0.6. a) Find the distribution of X2 − 2X1 b) What is of X2, given that we observed X1 = 2? c) What is the probability that X1 > X2?

41 / 43 Chapter 5 5.10 Bivariate normal distributions Multivariate normal – Matrix notation

The pdf of an n-dimensional normal distribution, X ∼ N(µ, Σ):

1  1  f (x) = exp − (x − µ)|Σ−1(x − µ) (2π)n/2|Σ|1/2 2

where

 2      σ1 σ1,2 σ1,3 ··· σ1,n µ1 x1 2 σ2,1 σ2 σ2,3 ··· σ2,n µ2 x2  2  µ =   , x =   and Σ = σ3,1 σ3,2 σ ··· σ3,n  .   .   3   .   .   ......   . . . . .  µn xn 2 σn,1 σn,2 σn,3 ··· σn µ is the mean vector and Σ is called the variance-covariance matrix.

42 / 43 Chapter 5 5.10 Bivariate normal distributions Multivariate normal – Matrix notation

Same things hold for multivariate normal distribution as the bivariate. Let X ∼ N(µ, Σ) Linear combinations of X are normal AX + b is (multivariate) normal for fixed matrix A and vector b

The of Xi is normal with mean µi and 2 variance σi The off-diagonal elements of Σ are the covariances between individual elements of X, i.e. Cov(Xi , Xj ) = σi,j . The joint marginal distributions are also normal where the mean and covariance matrix are found by picking the corresponding elements from µ and rows and columns from Σ. The conditional distributions are also normal (multivariate or univariate)

43 / 43