M09_LEVI5199_06_OM_C09.QXD 2/4/10 10:57 AM Page 1

9.6 The 1 9.6 The Power of a Test Section 9.1 defined Type I and Type II errors and their associated risks. Recall that a represents the probability that you reject the null hypothesis when it is true and should not be rejected, and b rep- resents the probability that you do not reject the null hypothesis when it is false and should be rejected. The power of the test, 1 - b, is the probability that you correctly reject a false null hypothesis. This probability depends on how different the actual population parameter is from the value being hypothesized (under H0 ), the value of a used, and the sample size. If there is a large difference between the population parameter and the hypothesized value, the power of the test will be much greater than if the difference between the population parameter and the hypothesized value is small. Selecting a larger value of a makes it easier to reject H0 and therefore increases the power of a test. Increasing the sample size increases the precision in the estimates and therefore increases the ability to detect differences in the parameters and increases the power of a test. The power of a statistical test can be illustrated by using the Oxford Cereal Company sce- nario. The filling process is subject to periodic inspection from a representative of the consumer affairs office. The representative’s job is to detect the possible “short weighting” of boxes, which that cereal boxes having less than the specified 368 grams are sold. Thus, the representa- tive is interested in determining whether there is evidence that the cereal boxes have a weight that is less than 368 grams. The null and alternative hypotheses are as follows: Ú H0: m 368 (filling process is working properly) 6 H1: m 368 (filling process is not working properly) The representative is willing to accept the company’s claim that the , s, equals 15 grams. Therefore, you can use the Z test. Using Equation (9.1) on page 302, with X L (the lower critical X value) substituted for X , you can find the value of X that enables you to reject the null hypothesis: X - m Z = L s 1n s = - Za 2 X L m > 1n s = + XL m Za 2 > 1n

Because you have a one-tail test with a level of significance of 0.05, the value of Za 2 is equal > to -1.645 (see Figure 9.16). The sample size n = 25. Therefore, (15) X = 368 + (-1.645) = 368 - 4.935 = 363.065 L 125 The decision rule for this one-tail test is 6 Reject H0 if X 363.065;

otherwise, do not reject H0. FIGURE 9.16 Determining the lower critical value for a one-tail Z test for a population mean at the 0.05 level of significance .95 .05 μ XL = 368 X

Region of Region of Rejection Nonrejection

Z ZL = –1.645 0 M09_LEVI5199_06_OM_C09.QXD 2/4/10 10:57 AM Page 2

2 CHAPTER 9 Fundamentals of Hypothesis Testing

The decision rule states that if in a random sample of 25 boxes, the sample mean is less than 363.065 grams, you reject the null hypothesis, and the representative concludes that the process is not working properly. The power of the test measures the probability of concluding that the process is not working properly for differing values of the true popula- tion mean. What is the power of the test if the actual population mean is 360 grams? To determine the chance of rejecting the null hypothesis when the population mean is 360 grams, you need to = determine the area under the normal curve below X L 363.065 grams. Using Equation (9.1), with the population mean m = 360,

X - m Z = STAT s 1n 363.065 - 360 = = 1.02 15 125

From Table E.2, there is an 84.61% chance that the Z value is less than +1.02. This is the power of the test where m is the actual population mean (see Figure 9.17). The probability (b) that you will not reject the null hypothesis (m = 368) is 1 - 0.8461 = 0.1539. Thus, the probability of committing a Type II error is 15.39%.

FIGURE 9.17 Determining the power of the test and the ␤ probability of a Type II Power = .8461 error when m = 360 grams .1539 μ = 360 XL = 363.065 X

0 +1.02 Z

Now that you have determined the power of the test if the population mean were equal to 360, you can calculate the power for any other value of m. For example, what is the power of the test if the population mean is 352 grams? Assuming the same standard deviation, sample size, and level of significance, the decision rule is 6 Reject H0 if X 363.065

otherwise, do not reject H0. Once again, because you are testing a hypothesis for a mean, from Equation (9.1),

X - m Z = STAT s 1n If the population mean shifts down to 352 grams (see Figure 9.18), then

363.065 - 352 Z = = 3.69 STAT 15 125 M09_LEVI5199_06_OM_C09.QXD 2/4/10 10:57 AM Page 3

9.6 The Power of a Test 3

FIGURE 9.18 Determining the power of the test and the ␤ = .00011 probability of a Type II error when m = 352 Power = .99989 grams μ = 352 XL = 363.065 X

0 +3.69 Z

From Table E.2, there is a 99.989% chance that the Z value is less than + 3.69. This is the power of the test when the population mean is 352. The probability (b) that you will not reject the null hypothesis (m = 368) is 1 - 0.99989 = 0.00011. Thus, the probability of committing a Type II error is only 0.011%. In the preceding two examples, the power of the test is high, and the chance of commit- ting a Type II error is low. In the next example, you compute the power of the test when the population mean is equal to 367 grams—a value that is very close to the hypothesized mean of 368 grams. Once again, from Equation (9.1),

X - m Z = STAT s 1n

If the population mean is equal to 367 grams (see Figure 9.19), then

363.065 - 367 Z = =-1.31 STAT 15 125

FIGURE 9.19 Determining the power of the test and the Power = .0951 probability of a Type II error when m = 367 grams ␤ = .9049 μ XL = 363.065 = 367 X

–1.31 0 Z

From Table E.2, the probability less than Z =-1.31 is 0.0951 (or 9.51%). Because the rejec- tion region is in the lower tail of the distribution, the power of the test is 9.51%, and the chance of making a Type II error is 90.49%. Figure 9.20 illustrates the power of the test for various possible values of m (including the three values examined). This graph is called a power curve. M09_LEVI5199_06_OM_C09.QXD 2/4/10 10:57 AM Page 4

4 CHAPTER 9 Fundamentals of Hypothesis Testing

FIGURE 9.20 .99961 .9964 .9783 1.00 .9545 .99989 .99874 .9909 Power curve of the .9131 0.90 cereal-box-filling process .8461 m 6 for H1: 368 grams 0.80 .7549 0.70 .6406 0.60

0.50 .5080 Power 0.40 .3783 0.30 .2578 0.20 .1635 .0951 0.10 .0500 0.00 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 Possible Values for μ (grams)

From Figure 9.20, you can see that the power of this one-tail test increases sharply (and approaches 100%) as the population mean takes on values farther below the hypothesized mean of 368 grams. Clearly, for this one-tail test, the smaller the actual mean m, the greater 1For situations involving one-tail the power to detect this difference.1 For values of m close to 368 grams, the power is small tests in which the actual mean, m1, because the test cannot effectively detect small differences between the actual population exceeds the hypothesized mean, the mean and the hypothesized value of 368 grams. When the population mean approaches 368 converse would be true. The larger grams, the power of the test approaches a , the level of significance (which is 0.05 in this the actual mean, m1, compared with the hypothesized mean, the greater example). is the power. For two-tail tests, the Figure 9.21 summarizes the computations for the three cases. You can see the drastic greater the distance between the changes in the power of the test for different values of the actual population means by review- actual mean, m1, and the hypothe- sized mean, the greater the power of ing the different panels of Figure 9.20. From Panels A and B you can see that when the popula- the test. tion mean does not greatly differ from 368 grams, the chance of rejecting the null hypothesis, based on the decision rule involved, is not large. However, when the population mean shifts substantially below the hypothesized 368 grams, the power of the test greatly increases, approaching its maximum value of 1 (or 100%). In the above discussion, a one-tail test with a = 0.05 and n = 25 was used. The type of statistical test (one-tail vs. two-tail), the level of significance, and the sample size all affect the power. Three basic conclusions regarding the power of the test are summarized below: 1. A one-tail test is more powerful than a two-tail test. 2. An increase in the level of significance a results in an increase in power. A decrease in 1 2 a results in a decrease in power. 3. An increase in the sample size, n, results in an increase in power. A decrease in the sample size, n, results in a decrease in power. M09_LEVI5199_06_OM_C09.QXD 2/4/10 10:57 AM Page 5

9.6 The Power of a Test 5

FIGURE 9.21 Region of Rejection Region of Nonrejection

Determining statistical Panel A XL = 363.065 power for varying values of the population mean Given: α = .05, σ = 15, n = 25 One-tail test μ = 368 (null hypothesis is true)

15 XL = 368 – (1.645) = 363.065 1 – α = .95 ͱͱහසහ25ස α = .050 Decision rule: Reject H0 if X < 363.065; otherwise, do not reject

Panel B 368 X

Given: α = .05, σ = 15, n = 25 One-tail test μ H0: = 368 μ = 367 (true mean shifts to 367 grams) X – μ 363.065 – 367 Z = = = –1.31 STAT σ 3 ␤ ͱහͱහn Power = .0951 = .9049 Power = .0951

Panel C 367 X

Given: α = .05, σ = 15, n = 25 One-tail test μ H0: = 368 μ = 360 (true mean shifts to 360 grams) X – μ 363.065 – 360 Z = = = +1.02 STAT σ 3 Power = .8461 ␤ = .1539 ͱහͱහn Power = .8461

Panel D 360 X

Given: α = .05, σ = 15, n = 25 One-tail test μ H0: = 368 μ = 352 (true mean shifts to 352 grams) X – μ 363.065 – 352 Z = = = +3.69 STAT σ 3 Power = .99989 ͱහn ␤ = .00011 Power = .99989

352 X XL = 363.065 Region of Rejection Region of Nonrejection M09_LEVI5199_06_OM_C09.QXD 2/4/10 10:57 AM Page 6

6 CHAPTER 9 Fundamentals of Hypothesis Testing

Problems for Section 9.6 APPLYING THE CONCEPTS a Type I error, compute the power of the test and the proba- b 9.78 A coin-operated soft-drink machine is designed to bility of a Type II error ( ) if the population mean life is discharge at least 7 ounces of beverage per cup, with a stan- actually dard deviation of 0.2 ounce. If you select a random sample a. 24,000 miles. b. 24,900 miles. of 16 cups and you are willing to have an a = 0.05 risk of committing a Type I error, compute the power of the test and 9.82 Refer to Problem 9.81. If you are willing to have an the probability of a Type II error (b) if the population mean a = 0.01 risk of committing a Type I error, compute the amount dispensed is actually power of the test and the probability of a Type II error (b) if a. 6.9 ounces per cup. the population mean life is actually b. 6.8 ounces per cup. a. 24,000 miles. 9.79 Refer to Problem 9.78. If you are willing to have an b. 24,900 miles. c. Compare the results in (a) and (b) of this problem and (a) a = 0.01 risk of committing a Type I error, compute the power of the test and the probability of a Type II error (b) if and (b) in Problem 9.81. What conclusion can you reach? the population mean amount dispensed is actually 9.83 Refer to Problem 9.81. If you select a random sample a. 6.9 ounces per cup. of 25 tires and are willing to have an a = 0.05 risk of com- b. 6.8 ounces per cup. mitting a Type I error, compute the power of the test and the c. Compare the results in (a) and (b) of this problem and in probability of a Type II error (b) if the population mean life Problem 9.78. What conclusion can you reach? is actually 9.80 Refer to Problem 9.78. If you select a random sample a. 24,000 miles. b. 24,900 miles. of 25 cups and are willing to have an a = 0.05 risk of com- mitting a Type I error, compute the power of the test and the c. Compare the results in (a) and (b) of this problem and probability of a Type II error (b) if the population mean (a) and (b) in Problem 9.81. What conclusion can you amount dispensed is actually reach? a. 6.9 ounces per cup. 9.84 Refer to Problem 9.81. If the operations manager b. 6.8 ounces per cup. stops the process when there is evidence that the mean life is c. Compare the results in (a) and (b) of this problem and in different from 25,000 miles (either less than or greater than) Problem 9.78. What conclusion can you reach? and a random sample of 100 tires is selected, along with a a = 9.81 A tire manufacturer produces tires that have a mean level of significance of 0.05, compute the power of the b life of at least 25,000 miles when the production process is test and the probability of a Type II error ( ) if the popula- working properly. Based on past experience, the standard tion mean life is actually deviation of the tires is 3,500 miles. The operations manager a. 24,000 miles. stops the production process if there is evidence that the b. 24,900 miles. mean tire life is below 25,000 miles. If you select a random c. Compare the results in (a) and (b) of this problem and sample of 100 tires (to be subjected to destructive testing) (a) and (b) in Problem 9.81. What conclusion can you reach? and you are willing to have an a = 0.05 risk of committing