Estimating the Mean and Variance of a Normal Distribution
Total Page:16
File Type:pdf, Size:1020Kb
Estimating the Mean and Variance of a Normal Distribution Learning Objectives After completing this module, the student will be able to • explain the value of repeating experiments • explain the role of the law of large numbers in estimating population means • describe the effect of increasing the sample size or reducing measurement errors or other sources of variability Knowledge and Skills • Properties of the arithmetic mean • Estimating the mean of a normal distribution • Law of Large Numbers • Estimating the Variance of a normal distribution • Generating random variates in EXCEL Prerequisites 1. Calculating sample mean and arithmetic average 2. Calculating sample standard variance and standard deviation 3. Normal distribution Citation: Neuhauser, C. Estimating the Mean and Variance of a Normal Distribution. Created: September 9, 2009 Revisions: Copyright: © 2009 Neuhauser. This is an open‐access article distributed under the terms of the Creative Commons Attribution Non‐Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 1 Pretest 1. Laura and Hamid are late for Chemistry lab. The lab manual asks for determining the density of solid platinum by repeating the measurements three times. To save time, they decide to only measure the density once. Explain the consequences of this shortcut. 2. Tom and Bao Yu measured the density of solid platinum three times: 19.8, 21.4, and 21.9 g/cm3. Determine the arithmetic average of these three measurements accurate to three decimal places. 3. The following graphs are densities of probability distributions. Which represent the density of a normal distribution? (a) (b) (c) 0.5 2.5 0.35 0.45 0.3 0.4 2 0.25 0.35 0.3 1.5 0.2 0.25 0.15 0.2 1 0.15 0.1 0.1 0.5 0.05 0.05 0 0 0 0 2 4 6 0 2 4 6 0 2 4 6 t t 4. Which two parameters are typically used to describe the normal distribution? a. Median b. Variance c. Standard deviation d. Mean 5. Suppose X is normally distributed with mean 3 and standard deviation 1, that is, XN∼ (3,1) . Use EXCEL to (a) find PX(3)> , (b) find PX(1< < 4) , and (c) determine a so that PX(>= a ) 0.74 . Citation: Neuhauser, C. Estimating the Mean and Variance of a Normal Distribution. Created: September 9, 2009 Revisions: Copyright: © 2009 Neuhauser. This is an open‐access article distributed under the terms of the Creative Commons Attribution Non‐Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 2 Estimating the Mean of a Normally Distributed Population Suppose an experiment is repeated n times under identical conditions. Denote by xi ,1,2,,in= … the outcome of each individual experiment. The arithmetic average xn is calculated n xx12+++… xn 1 xni==∑x nni=1 When outcomes are not all distinct, we can count the number of times each value occurs: Suppose again that an experiment is repeated n times under identical conditions. But now, we assume that there are only k distinct values x j , jk= 1,2,..., , and that x j occurs fj times. Then the arithmetic average xn is calculated 11k xnkkjj=+++=()xf11 xf 22 ... xf∑ xf nnj=1 Example Suppose that the following data represent the ages of patients in a study: 17, 19, 19, 20, 21, 24, 26, 26, 26, and 27. We find for the arithmetic average 17+++++++++ 19 19 20 21 24 26 26 26 27 225 x ===22.5 10 10 10 Since some of the values occur more than twice, we can also use the frequency distribution: xj 17 19 20 21 24 26 27 fj 1 2 1 1 1 3 1 For the arithmetic average we find 1225 x =++++++==()(17)(1) (19)(2) (20)(1) (21)(1) (24)(1) (26)(3) (27)(1) 22.5 10 10 10 Citation: Neuhauser, C. Estimating the Mean and Variance of a Normal Distribution. Created: September 9, 2009 Revisions: Copyright: © 2009 Neuhauser. This is an open‐access article distributed under the terms of the Creative Commons Attribution Non‐Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 3 In‐class Activity We will explore the properties of the arithmetic mean when measurements are taken from a normal distribution. Open the first tab (Explore 1) on the accompanying spreadsheet. Column B has 100 random variates from a normal distribution with mean 3 and variance 1. Recall that the function “=NORMINV(probability,mean,standard_dev)” returns the inverse of the normal cumulative distribution for the specified mean and standard deviation. Column C calculates the cumulative sum and Column D has the corresponding arithmetic averages. The Figure plots Column D against Column A. Use the F9 key to explore the arithmetic average. What do you observe? Theory In Explore 1, you observed that the arithmetic mean stabilizes around the mean of the normal distribution, regardless of the variance, as you increase the sample size. This is a consequence of the Law of Large Numbers. While we do not yet have the background to completely understand its mathematical formulation, we will give it here anyway so that you can see how a mathematical result expressing this property is formulated. We will come back to this result later in the course when we have more background. Law of Large Numbers If XX12,,,… Xn are independent and identically distributed with EX||i < ∞ , then as n tends to infinity, Xn converges to EX1 in probability. Problems 1. A random variate is a particular outcome of a random variable. Assume that random variates are drawn repeatedly from a normal distribution with mean 4 and variance 9. If you calculated the arithmetic average for a large number of variates from this distribution, what would you expect the arithmetic average to be close to? 2. The Law of Large Numbers holds quite generally. Without going more deeply into the theory, can you guess the answer to the following problem? Suppose you repeatedly tossed a biased coin where heads occur with probability 0.2. What percentage of time would you expect to see heads? Based on our observations in Explore 1, we conclude that the mean of a normal distribution can be estimated by repeatedly sampling from the normal distribution and calculating the arithmetic average of the sample. This arithmetic average serves as an estimate for the mean of the normal distribution. Citation: Neuhauser, C. Estimating the Mean and Variance of a Normal Distribution. Created: September 9, 2009 Revisions: Copyright: © 2009 Neuhauser. This is an open‐access article distributed under the terms of the Creative Commons Attribution Non‐Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 4 Properties of the Arithmetic Average Explore 2 When you compare the arithmetic averages of 100 random variates in Explore 1, you will realize that different runs of the simulation result in slightly different averages. Arithmetic averages are random variables and we will explore their distribution as a function of the sample size. Again, we will use normally distributed random variables. A simulation is set up under the tab Explore 2 that simulates arithmetic averages of normally distributed random variables. We vary the sample sizes. Details are explained in the spreadsheet. Use the F9 key to explore the effect of the sample size on the arithmetic average. What do you observe? Explore 3 The variation in the arithmetic mean comes from the fact that the random variates in each sample vary from run to run. The more the random variates vary, the more the arithmetic mean varies. The degree of variation is described by the standard deviation. To explore the effect of the variation, we simulate arithmetic means for two different scenarios in the spreadsheet under tab Explore 3: in one simulation, we calculate arithmetic means for random variates that are normally distributed with mean 3 and standard deviation 1; in the second scenario, we calculate arithmetic means for random variates that are normally distributed with mean 3 and standard deviation 0.5. Details are explained in the spreadsheet. Use the F9 key to explore the effect of the standard deviation on the arithmetic average. What do you observe? Problems (cont.) 3. Based on your observations in Explore 2 and 3, what is the effect on the arithmetic mean when you (a) increase sample size and (b) reduce variation. What does this imply for experiments? Citation: Neuhauser, C.