Sampling Distributions and the CLT Statistics (OA3102)

Module 3: Sampling Distributions and the CLT Statistics (OA3102) Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chpt 7.1-7.3, 7.5 Revision: 1-12 1 Goals for this Module • Statistics and their distributions – Deriving a sampling distribution • Analytically • Using simulation • Sampling Distributions – Distribution of the sample mean – Distributions related to the normal • Central Limit Theorem • Normal approximation to the binomial Revision: 1-12 2 Definition: Statistic A statistic is a function of observable random variables in a sample and known constants Revision: 1-12 3 Statistics and Their Distributions (1) • Remember, we denote random variables with upper case Roman letters – E.g., Y1, Y2, Y3,… – Represent “placeholders” for the actual values once we observe them • We use lower case Roman letters to denote the observed values: y1, y2, y3,… • Thus: – Y1, Y2, Y3,… are random quantities and thus are described by probability distributions – y1, y2, y3,…are just numbers Revision: 1-12 4 Statistics and Their Distributions (2) • Since Y1, Y2, Y3,… are random variables, so is any function of them 1 n – E.g., is a random variable YY i n i1 • It’s the mean of n random variables before we observe their values • Thus, statistics of random variables are random variables themselves – So, they have their own probability distribution – It’s called the sampling distribution Revision: 1-12 5 Definition: Sampling Distribution A sampling distribution is the probability distribution of a statistic Revision: 1-12 6 Illustrating Random Statistics Consider drawing samples from a Weibull distribution with a=2 and b=5 (so that m =E(X)=4.43, m =4.16, and s=2.32) Six samples of size n=10 drawn from a Weibull distribution Note that the sample means, medians, and standard deviations are all different – randomness! Revision: 1-12 7 * Figure and table from Probability and Statistics for Engineering and the Sciences, 7th ed., Duxbury Press, 2008. Demonstrating Randomness This is a demonstration showing that statistics (i.e., functions of random variables) are random variables too. TO DEMO Applets created by Prof Gary McClelland, University of Colorado, Boulder You can access them at www.thomsonedu.com/statistics/book_content/0495110817_wackerly/applets/seeingstats/index.html Revision: 1-12 8 Simple Random Sampling (1) • The sampling distribution of a statistic depends on the – Population distribution – Sample size – Method of sampling • For this class, we will always assume simple random sampling (SRS) – Each X (or Y) in the sample comes from the same distribution and is independent of the other Xs – Shorthand: They’re independent and identically distributed (iid) Revision: 1-12 9 Simple Random Sampling (2) • In this class, we will be thinking of iid random variables from a probability distribution – It’s an idealized “model” of the real world – Implies that the population is infinite in size • In real world, populations are often finite – If sample with replacement, then SRS still holds – If sample without replacement, but sample less than 5 percent of population, close-enough approximation Revision: 1-12 10 Example (Review) • A balanced (i.e., “fair”) die is tossed three times. Let Y1, Y2, and Y3 be the outcomes, and denote the average of the three outcomes by Y (“Y-bar”) • Find the mean and standard deviation of – That is, find and mY s Y Revision: 1-12 11 Example (Review) Revision: 1-12 12 Analytically Deriving a Sampling Distribution • Consider the following problem – The NEX automobile service center charges $40, $45, or $50 for a tune-up on 4, 6, and 8 cylinder cars, respectively – The pmf of revenue for a random car, X, is – So, m=46.5 and s2=15.25 • What’s the distribution of the average revenue from two tune-ups, (X1+X2)/2, assuming they are independent? Revision: 1-12 13 Analytically Deriving a Sampling Distribution, cont’d • Tabulating all outcomes, associated probabilities, and statistics gives • Thus, we calculate: Revision: 1-12 14 * Table from Probability and Statistics for Engineering and the Sciences, 7th ed., Duxbury Press, 2008. Picturing the Sampling Distribution • The two distributions look like this: Distribution of X Sampling Distribution of (X +X )/2 • Note that the: 1 2 – Means of the two distributions look to be the same – Variability of the sampling distribution looks smaller • This is not an accident… Revision: 1-12 15 * Figures from Probability and Statistics for Engineering and the Sciences, 7th ed., Duxbury Press, 2008. Another Sampling Distribution • Consider the same service center, but now calculate the sampling distribution of the average revenue from four (independent) tune-ups: 1 4 XX 4 i i1 • The sampling distribution looks like this Revision: 1-12 16 * Figure from Probability and Statistics for Engineering and the Sciences, 7th ed., Duxbury Press, 2008. Back to the Die Example • We could do the same thing to derive the sampling distribution for the mean of three rolls of the die • E.g., we know: – The outcomes range from Y 1 (roll three ones) to Y 6 (roll three sixes) – There are 63=216 possible outcomes of the three rolls – but not all translate into unique Y values – Specific values sampling distribution can take on are 3/3, 4/3, 5/3, 6/3, 7/3,…,17/3, 18/3 Revision: 1-12 17 Example: Analytically Calculating the Sampling Distribution • Calculate Pr Y 1 : • Now calculate Pr Y 4 / 3 : Revision: 1-12 18 Example: Analytically Calculating the Sampling Distribution • And now calculate Pr Y 5 / 3 : • Etc… Revision: 1-12 19 Using Simulation to Approximate the Sampling Distribution • These calculations are tedious • Use R to simulate for approximate results Revision: 1-12 20 Now, Fancier… • Previous plot shows frequencies using a “histogram”… • Let’s do some more calculations and clean things up Check against exact answer sampling sampling distribution Revision: 1-12 21 Approximate So, Here’s a Nicer Plot Revision: 1-12 22 Simulation Experiments • As we’ve seen, can use simulation to empirically estimate sampling distributions – Can be useful when analytics hard/impossible • Need to specify: – Statistic of interest – Population distribution – Sample size – Number of replications Revision: 1-12 23 Example • Statistic: sample mean • Population distribution: N(8.25, 0.752) • Sample size: (a) n=5, (b) n=10, (c) n=20, (d) n=30 • Number of replications: 500 each Revision: 1-12 24 * Figures from Probability and Statistics for Engineering and the Sciences, 7th ed., Duxbury Press, 2008. Another Example • Statistic: sample mean • Population distribution: LN(3, 0.16) • Sample size: (a) n=5, (b) n=10, (c) n=20, (d) n=30 • Number of replications: 500 each Revision: 1-12 25 * Figures from Probability and Statistics for Engineering and the Sciences, 7th ed., Duxbury Press, 2008. Sampling Distributions Related to the Normal • Distribution of the sample mean (when the population is normally distributed) • Chi-squared (c2) distribution – Sums of squared normally distributed r.v.s • t distribution – Ratio of standard normal rv to function of a chi- squared random variable • F distribution – Ratio of (a function of) chi-squared r.v.s Revision: 1-12 26 Why Should We Care??? • Eventually we will be doing hypothesis tests and constructing confidence intervals • Important statistics that we will want to test have these sampling distributions • So, it seems pretty esoteric here, but all of these distributions will play important roles in practical, real-world problems Revision: 1-12 27 Remember Linear Combinations of Random Variables (see Theorem 5.12) • Given a collection of n random variables Y1, Y2,…, Yn and n numerical constants a1, a2,…, an, the random variable n X ai Y i a1 Y 1 a 2 Y 2 a n Y n i1 is called a linear combination of the Yis • Note that we get the • Total, X=T0, if a1=a2=…=an=1 • Sample mean, X= Y if a1=a2=…=an=1/n • But also note the Yis are not necessarily iid Revision: 1-12 28 Some Useful Facts (1) • Let Y1, Y2,…, Yn have mean values m1, m2,…, mn, respectively, and variances 2 2 2 s 12 ,,, s s n , respectively 1. Whether or not the Yis are independent E a1 Y 1 a 2 Y 2 ann Y a1 E Y 1 a 2 E Y 2 ann E Y a1m 1 a 2 m 2 ann m n aiim Revision: 1-12 i1 29 Some Useful Facts (2) 2. If the Y1, Y2,…, Yn are independent Var a1 Y 1 a 2 Y 2 ann Y 2 2 2 a1Var Y 1 a 2 Var Y 2 ann Var Y 2 2 2 2 2 2 a1s 1 a 2 s 2 an s 3 So, sa2 s 2 a 2 s 2 a 2 s 2 a1 Y 1 a 2 Y 2 ann Y1 1 2 2 n 3 3. For any Y , Y ,…, Y , 1 2 n nn Vara1 Y 1 a 2 Y 2 an Y n a i a j Cov( Y i , Y j ) ij11 Revision: 1-12 30 Sampling Distribution of the Sample Mean (Population Normally Dist’d) • Theorem 7.1: Let Y1, Y2,…, Yn be a random sample of size n from a normal distribution with mean mY and standard deviation sY n • Then, 1 2 Y Yi Nms Y, Y n n i1 • In particular, note that – The sample mean of normally distributed random variables is normally distributed – Also mm and 22 Y Y ssY Y n – This is true for any sample size n Revision: 1-12 31 Proof Revision: 1-12 32 Proof (continued) Revision: 1-12 33 Proof (continued) Revision: 1-12 34 Example 7.2 • Amount dispensed (in ounces) by a beer bottling machine is normally distributed with s2=1.0.

Load more