Module 3: Distributions and the CLT (OA3102)

Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chpt 7.1-7.3, 7.5

Revision: 1-12 1 Goals for this Module

• Statistics and their distributions – Deriving a sampling distribution • Analytically • Using simulation • Sampling Distributions – Distribution of the – Distributions related to the normal • • Normal approximation to the binomial

Revision: 1-12 2 Definition:

A statistic is a function of observable random variables in a sample and known constants

Revision: 1-12 3 Statistics and Their Distributions (1)

• Remember, we denote random variables with upper case Roman letters

– E.g., Y1, Y2, Y3,… – Represent “placeholders” for the actual values once we observe them • We use lower case Roman letters to denote the observed values: y1, y2, y3,… • Thus:

– Y1, Y2, Y3,… are random quantities and thus are described by probability distributions

– y1, y2, y3,…are just numbers

Revision: 1-12 4 Statistics and Their Distributions (2)

• Since Y1, Y2, Y3,… are random variables, so is any function of them 1 n – E.g., is a YY  i n i1 • It’s the mean of n random variables before we observe their values • Thus, statistics of random variables are random variables themselves – So, they have their own – It’s called the sampling distribution

Revision: 1-12 5 Definition: Sampling Distribution

A sampling distribution is the probability distribution of a statistic

Revision: 1-12 6 Illustrating Random Statistics

Consider drawing samples from a Weibull distribution with a=2 and b=5 (so that m =E(X)=4.43, m =4.16, and s=2.32)

Six samples of size n=10 drawn from a Weibull distribution

Note that the sample , , and standard deviations are all different – ! Revision: 1-12 7 * Figure and table from Probability and Statistics for Engineering and the Sciences, 7th ed., Duxbury Press, 2008. Demonstrating Randomness

This is a demonstration showing that statistics (i.e., functions of random variables) are random variables too.

TO DEMO

Applets created by Prof Gary McClelland, University of Colorado, Boulder You can access them at www.thomsonedu.com/statistics/book_content/0495110817_wackerly/applets/seeingstats/index.html

Revision: 1-12 8 Simple Random Sampling (1)

• The sampling distribution of a statistic depends on the – Population distribution – Sample size – Method of sampling • For this class, we will always assume simple random sampling (SRS) – Each X (or Y) in the sample comes from the same distribution and is independent of the other Xs – Shorthand: They’re independent and identically distributed (iid)

Revision: 1-12 9 Simple Random Sampling (2)

• In this class, we will be thinking of iid random variables from a probability distribution – It’s an idealized “model” of the real world – Implies that the population is infinite in size • In real world, populations are often finite – If sample with replacement, then SRS still holds – If sample without replacement, but sample less than 5 percent of population, close-enough approximation

Revision: 1-12 10 Example (Review)

• A balanced (i.e., “fair”) die is tossed three

times. Let Y1, Y2, and Y3 be the outcomes, and denote the average of the three outcomes by Y (“Y-bar”) • Find the mean and of – That is, find and mY s Y

Revision: 1-12 11 Example (Review)

Revision: 1-12 12 Analytically Deriving a Sampling Distribution

• Consider the following problem – The NEX automobile service center charges $40, $45, or $50 for a tune-up on 4, 6, and 8 cylinder cars, respectively – The pmf of revenue for a random car, X, is

– So, m=46.5 and s2=15.25 • What’s the distribution of the average

revenue from two tune-ups, (X1+X2)/2, assuming they are independent? Revision: 1-12 13 Analytically Deriving a Sampling Distribution, cont’d

• Tabulating all outcomes, associated probabilities, and statistics gives • Thus, we calculate:

Revision: 1-12 14 * Table from Probability and Statistics for Engineering and the Sciences, 7th ed., Duxbury Press, 2008. Picturing the Sampling Distribution

• The two distributions look like this:

Distribution of X Sampling Distribution of (X +X )/2 • Note that the: 1 2 – Means of the two distributions look to be the same – Variability of the sampling distribution looks smaller • This is not an accident… Revision: 1-12 15 * Figures from Probability and Statistics for Engineering and the Sciences, 7th ed., Duxbury Press, 2008. Another Sampling Distribution

• Consider the same service center, but now calculate the sampling distribution of the average revenue from four (independent) tune-ups: 1 4 XX  i 4 i1 • The sampling distribution looks like this

Revision: 1-12 16 * Figure from Probability and Statistics for Engineering and the Sciences, 7th ed., Duxbury Press, 2008. Back to the Die Example

• We could do the same thing to derive the sampling distribution for the mean of three rolls of the die • E.g., we know: – The outcomes from Y  1 (roll three ones) to Y  6 (roll three sixes) – There are 63=216 possible outcomes of the three rolls – but not all translate into unique Y values – Specific values sampling distribution can take on are 3/3, 4/3, 5/3, 6/3, 7/3,…,17/3, 18/3

Revision: 1-12 17 Example: Analytically Calculating the Sampling Distribution

• Calculate Pr  Y  1  :

• Now calculate Pr  Y  4 / 3  :

Revision: 1-12 18 Example: Analytically Calculating the Sampling Distribution

• And now calculate Pr  Y  5 / 3  :

• Etc…

Revision: 1-12 19 Using Simulation to Approximate the Sampling Distribution

• These calculations are tedious • Use R to simulate for approximate results

Revision: 1-12 20 Now, Fancier…

• Previous plot shows frequencies using a “”… • Let’s do some more calculations and clean things up

Check against

exact answer

sampling sampling distribution

Revision: 1-12 21 Approximate So, Here’s a Nicer Plot

Revision: 1-12 22 Simulation

• As we’ve seen, can use simulation to empirically estimate sampling distributions – Can be useful when analytics hard/impossible • Need to specify: – Statistic of interest – Population distribution – Sample size – Number of replications

Revision: 1-12 23 Example

• Statistic: sample mean • Population distribution: N(8.25, 0.752)

• Sample size: (a) n=5, (b) n=10, (c) n=20, (d) n=30 • Number of replications: 500 each Revision: 1-12 24 * Figures from Probability and Statistics for Engineering and the Sciences, 7th ed., Duxbury Press, 2008. Another Example

• Statistic: sample mean • Population distribution: LN(3, 0.16)

• Sample size: (a) n=5, (b) n=10, (c) n=20, (d) n=30 • Number of replications: 500 each Revision: 1-12 25 * Figures from Probability and Statistics for Engineering and the Sciences, 7th ed., Duxbury Press, 2008. Sampling Distributions Related to the Normal

• Distribution of the sample mean (when the population is normally distributed) • Chi-squared (c2) distribution – Sums of squared normally distributed r.v.s • t distribution – Ratio of standard normal rv to function of a chi- squared random variable • F distribution – Ratio of (a function of) chi-squared r.v.s

Revision: 1-12 26 Why Should We Care???

• Eventually we will be doing hypothesis tests and constructing confidence intervals • Important statistics that we will want to test have these sampling distributions • So, it seems pretty esoteric here, but all of these distributions will play important roles in practical, real-world problems

Revision: 1-12 27 Remember Linear Combinations of Random Variables (see Theorem 5.12)

• Given a collection of n random variables

Y1, Y2,…, Yn and n numerical constants a1, a2,…, an, the random variable n

X ai Y i  a1 Y 1  a 2 Y 2  a n Y n i1

is called a linear combination of the Yis • Note that we get the

• Total, X=T0, if a1=a2=…=an=1

• Sample mean, X= Y if a1=a2=…=an=1/n

• But also note the Yis are not necessarily iid Revision: 1-12 28 Some Useful Facts (1)

• Let Y1, Y2,…, Yn have mean values m1, m2,…, mn, respectively, and 2 2 2 s 12 ,,, s s n , respectively

1. Whether or not the Yis are independent

E a1 Y 1 a 2 Y 2   ann Y 

a1 E Y 1  a 2 E Y 2    ann E Y 

a1m 1  a 2 m 2   ann m n  aiim Revision: 1-12 i1 29 Some Useful Facts (2)

2. If the Y1, Y2,…, Yn are independent

Var a1 Y 1 a 2 Y 2   ann Y  2 2 2 a1Var Y 1  a 2 Var Y 2    ann Var  Y  2 2 2 2 2 2 a1s 1  a 2 s 2   an s 3

So, sa2 s 2  a 2 s 2  a 2 s 2 a1 Y 1 a 2 Y 2  ann Y1 1 2 2 n 3

3. For any Y , Y ,…, Y , 1 2 n nn Vara1 Y 1 a 2 Y 2  an Y n   a i a j Cov( Y i , Y j ) ij11 Revision: 1-12 30 Sampling Distribution of the Sample Mean (Population Normally Dist’d)

• Theorem 7.1: Let Y1, Y2,…, Yn be a random sample of size n from a

with mean mY and standard deviation sY n • Then, 1 2 Y  Yi Nms Y, Y n n i1 • In particular, note that – The sample mean of normally distributed random variables is normally distributed – Also mm  and 22 Y Y ssY  Y n – This is true for any sample size n Revision: 1-12 31 Proof

Revision: 1-12 32 Proof (continued)

Revision: 1-12 33 Proof (continued)

Revision: 1-12 34 Example 7.2

• Amount dispensed (in ounces) by a beer bottling machine is normally distributed with s2=1.0. For a sample of size n=9, find the probability that the sample mean is within 0.3 ounces of the true mean m. • Solution:

Revision: 1-12 35 Example 7.2 (continued)

Revision: 1-12 36 Table 4, Appendix 3

Revision: 1-12 37 Example 7.3

• In Example 7.2, how big of a sample size do we need if we want the sample mean to be within 0.3 ounces of m with probability 0.95? • Solution:

Revision: 1-12 38 Example 7.3 (continued)

Revision: 1-12 39 Sampling Distribution of the Sum of Squared Standard Normal R.V.s

• Theorem 7.2: Let Y1, Y2,…, Yn be defined as

in Theorem 7.1. Then ZY i   i ms Y  Y are iid standard normal r.v.s and 2 nn 22YiY m Zin c  s ii11Y 2 where c   n denotes a chi-square distribution with n degrees of freedom • Proof based on theorem from Chapter 6, so we’ll skip it Revision: 1-12 40 The Chi-squared Distribution

• Chi-squared distribution has one parameter  –  can take on values 1, 2, 3,… – Distribution is very skewed for lower values of n – f(x,) is positive only for values of x>0 2 • Graphs of three  density functions:

Revision: 7-10 41 * Figure from Probability and Statistics for Engineering and the Sciences, 7th ed., Duxbury Press, 2008. Looking Up Chi-squared Quantiles

• Can look up in WM&S Table 6 – Note that, because the distribution is not symmetric, must look up each tail separately – Table gives probability in the right tail:

Revision: 7-10 42 * Figures from Probability and Statistics for Engineering and the Sciences, 7th ed., Duxbury Press, 2008. WM&S Table 6

43 Example 7.4

• Let Z1, Z2,…, Z6 be a random sample from the standard normal distribution. Find the number b such that n PrZb2  0.95  i i1 • Solution:

Revision: 1-12 44 Sampling Distribution: Ratio of Sample to Population Variance

• Theorem 7.3: Let Y1, Y2,…, Yn be an iid

sample from a normal distribution with mean

mY and standard deviation sY. Then 2 n nS1 1 2 2 YYin c 1 ssYYi1 2 wherec   n 1 denotes a chi-square distribution with n-1 degrees of freedom • Also, Y and S2 are independent random variables Revision: 1-12 45 Proof (for n=2)

Revision: 1-12 46 Proof (continued)

Revision: 1-12 47 Example 7.5

• In Example 7.2, amount dispensed (in ounces) is normally distributed with s2=1.0.

For a sample of size n=10, find b1 and b2 such 2 that Pr  b 12  S  b   0.9 . • Solution:

Revision: 1-12 48 Example 7.2 (continued)

Revision: 1-12 49 Sampling Distribution: Sample Mean (Popul’n Normally Dist’d, s Unknown)

• Definition 7.2: Let Z be a standard normal r.v. and let W be a chi-square distributed r.v. Then if Z and W are independent Z Tt ~  W 

where t is the t distribution with  dfs • In particular, note that

YZ m Yn ms  /  ~ tn1 Sn 22 Wn1 n11 Ss  n    Revision: 1-12 50 Illustrating the t Distribution

0.40

normal t (3 df) 0.30 t (10 df)

t (100 df)

)

x (

f f 0.20

0.10

0.00 -4 -3 -2 -1 0 1 2 3 4 Revision: 7-10 x 51 WM&S Table (Inside Front Cover)

Revision: 7-10 52 Example 7.6

• The tensile strength of a type of wire is normally distributed with unknown mean m and variance s2 – Six pieces are randomly selected from a large roll

– Tensile strength will be measured (Y1,…,Y6) – We usually use Y to estimate m and S2 for s2, so it’s 2 2 reasonable to estimate s Y with Sn • So, find the probability that will be within 2 Sn 2 of the true population mean m

Revision: 1-12 53 Example 7.6 Solution

Revision: 1-12 54 Sampling Distribution: Ratio of Chi- Squared RVs (and Their DFs)

• Definition 7.3: Let W1 and W2 be independent chi-square distributed r.v.s with 1 and 2 dfs, respectively. Then W  FF 11~ 12, W22 where F is the F distribution with  &  dfs 12, 1 2 • In particular, note that

22 22 S s n111 S 1s 1 n 1  Wn/1   1111 ~ F S22s 22 W/1 n  nn121, 1 2 2n211 S 2s 2 n 2  2 2  Revision: 1-12 55 The F Distribution

• The F distribution specified by the two degrees of

freedom,  1 and  2 • We will often be interested in right tail probabilities – Notation: F a,, 12  • That’s how WM&S Table 7 set up (next slide) – For left tail probabilities, must use FF1/ 1a , 1 ,  2 a ,  2 ,  1 Revision: 2-10 56 * Figure from Probability and Statistics for Engineering and the Sciences, 7th ed., Duxbury Press, 2008. WM&S Table 7

Revision: 2-10 57 Example 7.7

• If we take independent samples of size n1=6 and n2=10 from two normal populations with equal population variances, find the number b

such that 2 S1 Pr2 b 0.95 S2 • Solution:

Revision: 1-12 58 Exercise 7.7: Table 7 Excerpt

Revision: 1-12 59 Finding Probabilities and Quantiles Using R

• R functions:

– Note: functions are based on cumulative probabilities (i.e., the left tails), not the right tails – To do calculations like in the tables • Use the lower.tail=TRUE option (so p=a) • Use the function as is, but remember p=1-a Revision: 1-12 60 Back to the Examples

• Example 7.2: • Example 7.3:

• Example 7.4:

• Example 7.6:

• Example 7.7: Revision: 1-12 61 The Central Limit Theorem (CLT)

• The Central Limit Theorem says that, for sufficiently large n[1], sums of iid r.v.s are approximately normally distributed – As n gets bigger, the approximation gets better • More precisely, as n, the distribution of Y  m Z  s n converges to a standard normal distribution – Where E(Y)=m and Var(Y)=s2 [1] A generally conservative rule of thumb is n>30 Revision: 1-12 62 CLT (continued)

• So, let Y1, Y2,…, Yn be a random sample from any distribution with mean mY and standard deviation sY • Then if n is sufficiently large,Y has an approximate normal distribution with 22 m XX  m XX and s s n • Similarly, if n is sufficiently large, then

Z YmsYY  n  has an approximate standard normal distribution with and Revision: 1-12 63 Example: Sums of Dice Rolls

Roll of a Single Die

120

100

80 60 Sum of 1 roll

Frequency 40

20

0 Sum of Tw o Dice 1 2 3 4 7005 6 Outcome 600

500 400 Sum of 2 rolls

300

200

100

0 Sum of 5 Dice 2 3 4 5 6 7 8 970 10 11 12 Sum 60 50 40 Sum of 5 rolls 30

Frequency 20 10

0 Sum of 10 Dice

350

1 3 5 7 9

11 13 15 17 19 21 23 25 Sum 300 250 Sum of 200

150 Frequency 10 rolls 100

50

0

5 8

11 14 17 20 23 26 29 32 35 38 41 44 47 50 53 56 59 Revision: 1-12 Sum 64 Demonstrating the CLT

This is a simulation demonstrating the Central Limit Theorem.

TO DEMO

Applets created by Prof Gary McClelland, University of Colorado, Boulder You can access them at www.thomsonedu.com/statistics/book_content/0495110817_wackerly/applets/seeingstats/index.html

Revision: 1-12 65 Illustrating the CLT in R

> m<-matrix(data=runif(10000*100),nrow=10000,ncol=100)

> avg1col <- m[,1] > avg2col <- apply(m[,1:2],1,mean) > avg3col <- apply(m[,1:3],1,mean) > avg4col <- apply(m[,1:4],1,mean) > avg5col <- apply(m[,1:5],1,mean) > avg10col <- apply(m[,1:10],1,mean) > avg20col <- apply(m[,1:20],1,mean) > avg50col <- apply(m[,1:50],1,mean) > avg100col <- apply(m[,1:100],1,mean)

> par(mfrow=c(3,3))

> hist(avg1col,prob=TRUE,xlim=c(0,1)) > curve(dnorm(x,.5,sqrt(1/12)),lwd=2,col="red",add=TRUE) > hist(avg2col,prob=TRUE,xlim=c(0,1)) > curve(dnorm(x,.5,sqrt(1/(12*2))),lwd=2,col="red",add=TRUE) > hist(avg3col,prob=TRUE,xlim=c(0,1)) > curve(dnorm(x,.5,sqrt(1/(12*3))),lwd=2,col="red",add=TRUE) > hist(avg4col,prob=TRUE,xlim=c(0,1)) > curve(dnorm(x,.5,sqrt(1/(12*4))),lwd=2,col="red",add=TRUE) > hist(avg5col,prob=TRUE,xlim=c(0,1)) > curve(dnorm(x,.5,sqrt(1/(12*5))),lwd=2,col="red",add=TRUE) > hist(avg10col,prob=TRUE,xlim=c(0,1)) > curve(dnorm(x,.5,sqrt(1/(12*10))),lwd=2,col="red",add=TRUE) > hist(avg20col,prob=TRUE,xlim=c(0,1)) > curve(dnorm(x,.5,sqrt(1/(12*20))),lwd=2,col="red",add=TRUE) > hist(avg50col,prob=TRUE,xlim=c(0,1)) > curve(dnorm(x,.5,sqrt(1/(12*50))),lwd=2,col="red",add=TRUE) > hist(avg100col,prob=TRUE,xlim=c(0,1)) > curve(dnorm(x,.5,sqrt(1/(12*100))),lwd=2,col="red",add=TRUE)

Revision: 1-12 66 The CLT More Formally

• Theorem 7.4: Let Y1, Y2,…, Yn be iid r.v.s with 2 mean EY  iY   m and variance VarYiY s   • Define n YniY m Y  mY i1 Un  ssYY/ nn

• Then as n the distribution function of Un converges to the standard normal distribution: u 1 t2 /2 lim PrUn  u e dt for all u n   2 Revision: 1-12 67 Example 7.8

• For the whole population, achievement

scores on a certain test have mean mY = 60 and sY = 8. For a random sample of n=100 scores from students at one school, the average score is 58. • Is there evidence to suggest this school is inferior? That is, what’s the probability of seeing an average score of 58 if the true school average matches the population?

Revision: 1-12 68 Example 7.8

• Solution:

Revision: 1-12 69 Example 7.9

• The service times for customers coming through a Navy Exchange checkout counter

are iid with mY = 1.5 and sY = 1.0. Approximate the probability that n=100 customers can be served in less than 2 hours. • Solution:

Revision: 1-12 70 Example 7.9

Revision: 1-12 71 Checking the Solutions in R

• Example 7.8

• Example 7.9

Revision: 1-12 72 Normal Approximation to the Binomial

• A r.v. Y~Bin(n,p) is the number of “successes” out of n independent trials with probability of success p for each trial

• Define indicator variables X1, X2,…, Xn as 1 if the i th trial is a success X i   0 if the i th trial is a failure

• So, X1, X2,…, Xn are iid Bernoulli r.v.s and we have Y = X1, X2,…, Xn – That is, Y is a sum of iid random variables, so for large enough n the CLT applies Revision: 1-12 73 Exercise 7.10

• Candidate A believes she can win an election if she can get 55% of the votes in precinct 1. Assuming 50% of the precinct 1 voters favor her and n=100 random voters show up, what is the (approximate) probability she will receive at least 55% of their votes? • Solution:

Revision: 1-12 74 Exercise 7.10

Revision: 1-12 75 When to Use the Approximation?

• Y and Y/n have an approximate normal distribution for large enough n, but “large enough n” depends on p • Rule of thumb: Approximation works well when p  3/ pq n lies in the interval (0,1) • An equivalent criterion is max(pq , ) n  9 min(pq , ) – See extra credit Exercise 7.70

Revision: 1-12 76 Exercise 7.11

• Suppose Y has a binomial distribution with n=25 and p=0.4. Find the exact probabilities that Y  8 and Y=8 and compare these with the corresponding values from the normal approximation. • Exact solutions: – Table 1 in Appendix 3 gives Pr  Y  8  0.274 and PrYYY 8  Pr  8  Pr  7  0.274  0.154  0.120 – In R:

Revision: 1-12 77 Exercise 7.11

• Solution:

Revision: 1-12 78 The Continuity Correction

• The issue is we are approximating a discrete distribution with a continuous one • So, to improve the approximation, rather than use the value for the continuous distribution halfway between the two discrete values • In other words – Add 0.5 to the value we’re approximating for Pr  Yy   calculations – Subtract 0.5 from the value we’re approximating for Pr  Yy   calculations Revision: 1-12 79 Exercise 7.11

• Solution with continuity correction:

Revision: 1-12 80 What We Covered in this Module

• Statistics and their distributions – Deriving a sampling distribution • Analytically • Using simulation • Sampling Distributions – Distribution of the sample mean – Distributions related to the normal • Central Limit Theorem • Normal approximation to the binomial

Revision: 1-12 81 Homework

• WM&S chapter 7 – Required exercises: 1, 2, 9, 25, 31a-c, 48, 49, 72, 73 – Extra credit: 15a&b, 70 • Useful hints:  Problem 7.1: Get to the applet more directly at www.thomsonedu.com/statistics/book_content/0495110817_ wackerly/applets/seeingstats/index.html. Click on “7. Distributions to the Normal” > DiceSample  Problem 7.25 part b: Use R, not the applet. The relevant R function is qt(p,df,lower.tail=FALSE)  Problem 7.31: Solutions in the back of the book are wrong.

Revision: 1-12 82