Sampling and Sampling Distributions s2

Chapter 7 Sampling and sampling distributionsOne of the reasons for taking a sample is to try to understand how the population is distributed using a sample rather than a census. It probably appeals to most people that you could gain a better understanding with a large sample as opposed to a small sample (more information is better than less information). So, you might get a better understanding with n=10000 as opposed to n=100 and n=100 might be better than n=1. In this chapter we will study sampling from a known population (nobody would do this in practice, why sample when you already know what the population looks like). However, we can compare the results of the sample with something that we already know. This will allow us to determine how much better off we would be if we increased the sample size. It will also allow us to develop some rules to use when we consider sampling from an unknown population.What we want to do in this chapter is to determine  If large samples are better than small samples.  How much better large samples are than small samples.  Construct some rules that we can use when sampling from an unknown population.Sampling from a known normal distributionSuppose we took a sample of size n from a normal distribution with   100 and  10 and then computed the sample mean. Now repeat this a large number of times and plot the histogram. Figures 1,2, and 3 show histograms of sample means for sample sizes n=1, n=10, and n=100. Figure 1. The histogram of the means of 1000 samples of size n=1 taken from a normal distributionHere even with samples of size n=1, the distribution of sample means appears to be ___ normally distributed. The bins represent values of the sample means X.Figure 2. The histogram of 1000 samples of size n=10 taken from a normal distributionFigure 2 also suggests that the distribution of sample means for samples of size n=10 is ___ normally distributed. However, the range of values of X is much smaller than it was for samples of size n=1. Figure 3. The histogram of 1000 samples of size n=100 taken from a normal distribution___ The results for Figure 3 also indicate a normal distribution. The range of values of X is still smaller than for samples of size n=10. Note that the spread (standard deviation) of the distribution of sample means decreases as n gets larger.The sampling distribution when sampling from a normal distribution__  The distribution of sample means, X , will be normally distributed This distribution will have a mean X    This distribution will have a standard deviation    X nSuppose that you have a normal distribution with   100 and   10 . Find the probability that a single observation taken from this distribution will be between 99 and 101. That is, find P(99<X<101).(X  ) (99 100) 1 Z     0.1  10 10 (X  ) (101100) 1 Z     0.1  10 10 P(99  X  101)  P(0.1  Z  0.1)  0.5398  0.4602  0.0796 Suppose now that we take a sample of size n=100 from this distribution. Find the ___ probability that the sample mean will be in [99,101]. That is find P(99  X  101).X   100  10  ___    1 X n 100 ___   X   __  X (99 100) 1 Z       1  ___ 1 1 X  ___   X   __  X (99 100) 1 Z       1  ___ 1 1 X __ P(99  X  101)  P(1  Z  1)  0.8413  0.1587  0.6826Suppose now that we take a sample of size n=1000 from this distribution. Find the ___ probability that the sample mean will be in [99,101]. That is find P(99  X  101). X   100  10  ___    0.316 X n 1000 ___   X   __  X (99 100) 1 Z       3.16  X 0.316 0.316  ___   X   __  X (99 100) 1 Z       3.16  X 0.316 0.316 __ P(99  X  101)  P(3.16  Z  3.16)  0.9992  0.0008  0.9984Of course we can also use Excel Example 7.1 Suppose that and auditing team examines accounts receivable for a certain firm. Unknown to the auditors the mean and standard deviation of these accounts is   1332.52 and   237.55 (these are population values). The auditing team takes a sample of n=36 accounts. Find the probability that the resulting sample mean will exceed $1350. Find the probability that the sample mean will be less than 1300. Find the probability that the sample mean will be between $1310 and $1360. Assume that accounts receivable can be described by a normal distribution. We have1332.52  X    237.55     39.592 X n 36 a) Find PX  1350X    1350 1332.52 Z  X   0.44 39.592  XPX  1350  P(Z  0.44)  1 PZ  0.44  1 0.6700  0.3300 b) Find PX  1300 X    1300 1332.52 Z  X   0.82 39.592  XPX  1300  P(Z  0.82)  0.2061 c) Find P(1310  X  1360)X    1310 1332.52 Z  X   0.57 39.592  XX    1360 1332.52 Z  X   0.69 39.592  XP(1310  X  1360)  P(0.57  Z  0.69)  0.7549  0.2843 In ExcelExample 7.2 Suppose that the time it takes to takes to fabricate a central processor chip for a computer can be described by a normal distribution with a mean of 35 minutes and a standard deviation of 5 minutes. A time management team is studying the process in hopes of improving it. The management team does not know what the mean fabrication time is, so the take a sample of n=100 time histories to try to get an estimate of the true, but unknown mean. Find the probability that the sample mean time is less than 34 minutes, the probability that it is greater than 36.3 minutes, and the probability that it will be between 34 and 35.7 minutes.    35 X  5     0.5 X n 100 a) Find PX  34X    34  35 Z  X   2.0 0.5  XPX  34 PZ  2.0  0.0228 b) Find PX  36.3X    36.3  35 Z  X   2.6 0.5  X PX  36.3  PZ  2.6  1 PZ  2.6  1 0.9953  0.0047 c) Find P34  X  35.7X    34  35 Z  X   2.0 0.5  X X    35.7  35 Z  X   1.4 0.5  X P34  X  35.7  P 2.0  Z  1.4  0.9192  0.0228  0.8964And using ExcelThe central limit theorem Figure 4. The normal distribution between 90 and 110It might not be too surprising that the sample means taken from a normal distribution would be normal, but let’s consider sampling from a uniform distribution where the samples must be in the range [90,110]. The results of such samples are shown in Figures 5, 6, and 7.Figure 5. The histogram of 1000 sample means of size n=1 taken from a uniform distribution between [90,110] Note that for a sample of size n=1, we are just sampling the distribution, and so the distribution of sample means just reproduces the distribution from which the sample was taken. The results here are not normal, but are the for the uniform distribution.Figure 6. The histogram of 1000 sample means of size n=10 taken from a uniform distribution between [90,110]Figure 7. The histogram of 1000 sample means of size n=100 taken from a uniform distribution between [90,110]If the sample size is increased to n=10, the distribution of sample means is starting to look like a normal distribution. See the graph in Fig. 6. Fig. 7 shows the histogram of 1000 sample means of size n=100. The histogram of these 1000 means looks quite normal.It appears here that if the sample size is as small as n=10 the resulting sampling distribution is normally distributed. In fact, if we make the size large enough, the distribution of sample means will be normally distributed. As a rule of thumb, if the sample size is on the order of n=30 then the sampling distribution will be normally distributed. These results are the most important in statistics and are called the Central Limit Theorem (CLT).The Central Limit Theorem(CLT)Regardless of the nature of the distribution from which a sample is taken, if the sample size is large enough (rule of thumb, n=30 is large enough), then__  The distribution of sample means, X , will be normally distributed This distribution will have a mean X   This distribution will have a standard deviation    X nNote: if the sample comes from a population that is normally distributed, the CLT will hold for a sample of size n=1 or larger. In most practical applications in an unknown situation, people will say the CLT hold for samples of size n=30 or larger.The sampling distribution of the binomial distribution ( pˆ ).Here we will vary a little bit from our rule of thumb of (n=30) being large enough for the CLT to hold. We know a good deal more about the binomial; it is not just an unknown distribution. The CLT will hold for the binomial when np  5 nq  5So the CLT will hold for the binomial when the normal approximation to the binomial distribution is good. We do make a change here. We will find it convenient to look at binomial problems in terms of the proportion of successes out of n trials rather than the number of successes. For the binomial distribution,  E(X )  np where E(X) indicates the “expected value” of the distribution. It is another term for the mean of a distribution. The expected value is the value you would expect to get for the average result of performing an experiment a large number of times. Suppose that you flipped a coin ten times where p=0.5. Call getting a head a Success and record X, the number of S’s. Repeat this a large number of times and average the number of X’s. You would expect this average to be five. So for n=10, p=0.5E(x)  np  5Now define the proportion of successes in n trials to beX pˆ  n so that the mean or expected value of the proportion of successes in n trials for the binomial isEX  np Epˆ     p. n nThe mean is just p, the probability of a success in any trial.The standard deviation of the distribution in terms of pˆ is    pq / n X pFor the binomial distribution, the sampling distribution of pˆ will be normally distributed withX   pˆ  X n pq / n  X   p  if np  5 and nq  5 and the Z score is pˆ  p Z   pExample 7.3 Polls are almost always reported in terms of proportions (the percentage of respondents that favor something) rather that in terms of X (the number of respondents that favor it). Suppose that a poll has been commissioned in an election contest between A and B. Consider a response for A to be a success. Suppose that 55% of all voters actually favor A. The size of the poll is n=1200 voters. What is the probability that the response for A will be in the range [52%,58%]?In this problem p=0.55 and n=1200. This gives p  pq / n  (0.5)(0.45) /1200  0.0144pˆ  p 0.52  0.55 Z    2.08  p 0.0144pˆ  p 0.58  0.55 Z    2.08  p 0.0144P(0.52＃ pˆ 0.58) = P ( - 2.08 < Z < 2.08) = 0.9812 - 0.0188 = 0.9624In Excel n= 1200 1200 p= 0.55 0.55 q= 0.45 0.45 sigma-p 0.0144 =SQRT(0.55*0.45/1200) P(0.52<p-hat<0.58) 0.9628 =NORMDIST(0.58,0.55,0.0144,TRUE)-NORMDIST(0.52,0.55,0.0144,TRUE) Example 7.4 A market survey is taken of n =1000 potential buyers to see how they like a test product. Suppose that 10% of the population likes the product. What is the probability that 12% or more of the test group will indicate that they like the product.Does the CLT hold for the problem? np =1000� 0.1 100 nq =1000*0.9 = 900 so the CLT holds for this problem. The standard deviation of the sampling distribution is s pˆ =pq/ n = (0.1)(0.9) /1000 = 0.0095So( pˆ - p) (0.12 - 0.10) z = = = 2.10 s p 0.0095 and P( pˆ >0.12) = P( Z > 2.10) = 1 - P( Z < 2.10) = 1 - 0.9821 = 0.0179So there is less than a 2% chance of getting a sample proportion greater than 12% if the true population proportion is 10%.Problems7. 1 Suppose a sample of size n=10 is taken from a normal distribution with m =150 and s =12 . Find a. P( X >153) b. P(148< X < 151) c. P( X <148)7.2 Repeat problem 7.1 using a sample size of n=100. 7.3 A sample of size n=10 is taken from a population which is not normal, but has  100 and  10 . Does the Central Limit Theorem hold? Can you find P x 101 using the Central Limit Theorem?7.4 Suppose household incomes in Flagstaff are normally distributed with   22,000 and   2,000 . A sample of size n=10 households are sampled. Find a. P x  21000 b. P21,599 x  22,500 P21,599 x  22,500 c. P x  21,5007.5 Suppose a machine is producing defective items at a 10% rate. One thousand items of the machines output are inspected. What is the probability that between 9% and 11% of the inspected items will be defective?7.6 Repeat problem 7.5 to find the probability that between 8% and 12% of the inspected items will be defective.7.6 Suppose that 52% of the registered voters are in favor of a certain proposition placed on an upcoming Arizona election ballot. A sample of n=1200 voters are selected at Random. What is the probability that a. between 49\% and 53\% of the sampled voters will favor the proposition? b. a majority of the voters in the sample will favor the proposition? c. more than 55\% of the voters in the sample will favor the proposition?7.7 A machine is producing ball bearings with an average diameter of 101 cm and with a standard deviation of 8cm. A sample of n=49 ball bearings is taken. What is the probability that the sample mean will be between 98cm and 100cm?Answers7.1 a) 0.2148, b) 0.3045, 0.29817.2 a) 0.0062, b) $0.7492, c) 0.04757.3 7.4 a) 0.9429, b) 0.5209 , c) 0.21487.5 P0.09 pˆ  0.11  P  1.05  Z  1.05  0.70627.6 0.96527.7 a) 0.7361, b) 0.9177, c) 0.01887.8 0.1878

Sampling and Sampling Distributions s2

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support