Statistics Sampling Distribution Note
Total Page:16
File Type:pdf, Size:1020Kb
9/30/2015 Statistics •A statistic is any quantity whose value can be calculated from sample data. CH5: Statistics and their distributions • A statistic can be thought of as a random variable. MATH/STAT360 CH5 1 MATH/STAT360 CH5 2 Sampling Distribution Note • Any statistic, being a random variable, has • In this group of notes we will look at a probability distribution. examples where we know the population • The probability distribution of a statistic is and it’s parameters. sometimes referred to as its sampling • This is to give us insight into how to distribution. proceed when we have large populations with unknown parameters (which is the more typical scenario). MATH/STAT360 CH5 3 MATH/STAT360 CH5 4 1 9/30/2015 The “Meta-Experiment” Sample Statistics • The “Meta-Experiment” consists of indefinitely Meta-Experiment many repetitions of the same experiment. Experiment • If the experiment is taking a sample of 100 items Sample Sample Sample from a population, the meta-experiment is to Population Population Sample of n Statistic repeatedly take samples of 100 items from the of n Statistic population. Sample of n Sample • This is a theoretical construct to help us Statistic understand the probabilities involved in our Sample of n Sample experiment. Statistic . Etc. MATH/STAT360 CH5 5 MATH/STAT360 CH5 6 Distribution of the Sample Mean Example: Random Rectangles 100 Rectangles with µ=7.42 and σ=5.26. Let X1, X2,…,Xn be a random sample from Histogram of Areas a distribution with mean value µ and standard deviation σ. Then 1. E(X ) X 2 2 2. V (X ) X / n frequency SD(X ) X / n 0 5 10 15 MATH/STAT360 CH5 7 MATH/STAT3600 5 CH5 10 15 20 8 Areas 2 9/30/2015 Based on 68 random samples of size 5: So, the distribution of the sample mean Mean of the sample means=7.33 based on samples of size 5, should have SD of the sample means=1.88. Histogram of Sample Means 1. E(X ) 7.42 2. SD(X ) 5.26 / 5 2.35 frequency 051015 MATH/STAT360 CH5 9 MATH/STAT36005101520 CH5 10 Means Normal Distributions Example: Women’s Heights • Let X , X ,…,X be a random sample from • It is known that women’s heights are normally 1 2 n distributed with population mean 64.5 inches a normal distribution with mean value µ and population standard deviation 2.5 inches. and standard deviation σ. • We will look at the distribution of sample • Then for any n, X is normally distributed means for various sample sizes. (with mean µ and standard deviation / n ). • Since the population follows a normal distribution, the sampling distribution of X is also normal regardless of sample size. MATH/STAT360 CH5 11 MATH/STAT360 CH5 12 3 9/30/2015 For n=9, the sample means will be normally distributed with For n=25, the sample means will be normally distributed with mean=64.5 and standard deviation= 2.5/ 9 0.83. mean=64.5 and standard deviation= 2.5/ 25 0.5. Distribution of Sample Means (n=9) Distribution of Sample Means (n=25) Height of Curve of Height Height of Curve of Height 0.0 0.2 0.4 0.6 0.8 0.0 0.1 0.2 0.3 0.4 62 63 64 65 66 67 68 62 63 64 65 66 67 68 MATH/STAT360Sample CH5 Mean 13 MATH/STAT360 CH5Sample Mean 14 For n=100, the sample means will be normally distributed with mean=64.5 and standard deviation = 2.5/ 100 0.25. The Central Limit Theorem Distribution of Sample Means (n=100) • Let X1, X2,…,Xn be a random sample from a distribution with mean value µ and standard deviation σ. • Then if n is sufficiently large, X has approximately a normal distribution with Height of Curve of Height E(X ) and V (X ) 2 / n. 0.0 0.5 1.0 1.5 62 63 64 65 66 67 68 MATH/STAT360 CH5Sample Mean 15 MATH/STAT360 CH5 16 4 9/30/2015 Rule of Thumb Sampling Distribution Simulation • If n>30, the Central Limit Theorem can be used. • For highly skewed data or data with http://onlinestatbook.com/stat_sim/index.html extreme outliers, it may take sample of 40+ before the CLT starts “working”. MATH/STAT360 CH5 17 MATH/STAT360 CH5 18 Why is the Central Limit Theorem Dependence on Sample Size Important? The sampling distribution of X depends on the • Every different type of population needs a sample size (n) in two ways: different set of procedures (i.e. probability / n 1. The standard deviation is X tables) to answer questions about probability. which is inversely proportional to n • The CLT shows us that we can use the same 2. If the population distribution is not normal, then procedure (normal probability procedures) for the shape of the sampling distribution of questions about probability and the sample X depends on n, being more nearly normal mean, regardless of the shape of the original for larger n. population. • The only requirement is a “large” sample. MATH/STAT360 CH5 19 MATH/STAT360 CH5 20 5 9/30/2015 Proportions as Means Distribution of Sample Proportion • Recall that a binomial RV X is the number • The CLT implies that if n is sufficiently of successes in an experiment consisting large, then pˆ has approximately a normal of n independent success/failure trials. distribution with • Let 1 if theith trial results in a success p(1 p) X E( pˆ) p and V ( pˆ) i n 0 if theith trial results in a failure • Then the sample proportion can be where p is the true population proportion. expressed as • In order for the approximation to hold we need #successes X np≥10 and n(1-p)≥10. pˆ i X n n MATH/STAT360 CH5 21 MATH/STAT360 CH5 22 Example: Suppose we took samples of size 20 from a population where p=0.5. The sampling distribution of pˆ is approximately Normal with mean = 0.5 and variance = p(1 p) / n 0.50.5/ 20 0.0125 Probability MATH/STAT3600.00 0.05 0.10 CH5 0.15 23 0.0 0.2 0.4 0.6 0.8 1.0 p-hat 6.