Page Layout View!) THIRD HOUR EXAM Hour of Class Registered (Circle

252y0333 11/25/03 ECO252 QBA2 Name KEY (Page layout view!) THIRD HOUR EXAM Hour of Class Registered (Circle) Nov 25, 2003

I. (30+ points) Do all the following (2points each unless noted otherwise).

TABLE 11-0

Shiffler and Adams present the partially complete ANOVA table below that resulted from the analysis of a problem with 3 rows and 3 columns.

Source of Variation SS df MS F F  Columns 18 Rows 40 Interaction Within (Error) 208

Total 296 62 1. Complete the table. Assume a 5% significance level. You may not be able to get exactly the degrees of freedom you are looking for, but you should be able to come close. (4) Solution: The SS column must add up, which accounts for the 30. For c columns the DF must be c 1  2 . For r rows, DF = r 1  2 . For interaction, DF = r 1c 1  22  4. DF for within is 62 – 2 – 2 – 4 =54. MS column is SS column divided by df column. F column is MS column divided by MSW  3.852. F  is from F table. ANOVA

Source of Variation SS df MS F F  Columns 18 2 9 2.336ns 3.17 Rows 40 2 20 5.192 s 3.17 Interaction 30 4 7.5 1.947ns 2.54 Within (Error) 208 54 3.852

Total 296 62 2. Is there significant interaction? Explain your answer. Solution: Since the computed F is less than the table F, we do not reject the null hypothesis of no interaction – this is shown by ns for ‘not significant,’ though I need to know why it is not significant. Remember SS and df columns must add up. 252y0333 11/25/03

TABLE 13-6

The following Minitab table (with many parts deleted) was obtained when "Score received on an exam (measured in percentage points)" (Y) is regressed on "percentage attendance" (X) for 22 students in a Statistics for Business and Economics course. Regression Analysis: Orders versus Weight

The regression equation is Score = …… + ….. Attendance

Predictor Coef SE Coef T P Constant 39.3927 37.2435 1.0576 0.3028 Attendance 0.34058 0.52852 0.6444 0.5266

S = 20.2598 R-Sq = 2.034% R-Sq(adj) = -2.864%

Analysis of Variance

Source DF SS MS F P Regression 1 0.523 Residual Error 20 Total 21

3. Referring to Table 13-6, which of the following statements is true? a) -2.86% of the total variability in score received can be explained by percentage attendance. b) -2.86% of the total variability in percentage attendance can be explained by score received. c) *2% of the total variability in score received can be explained by percentage attendance. d) 2% of the total variability in percentage attendance can be explained by score received. Explanation: R 2 is the total variability in Y that is explained by the regression line.

4. Referring to Table 13-6, which of the following statements is true? a) If attendance increases by 0.341%, the estimated average score received will increase by 1 percentage point. b) If attendance increases by 1%, the estimated average score received will increase by 39.39 percentage points. c) *If attendance increases by 1%, the estimated average score received will increase by 0.341 percentage points. d) If the score received increases by 39.39%, the estimated average attendance will go up by 1%. Explanation: The equation is Score  39.3927 .34058Attendance . If attendance goes up by 1, Score goes up by .34058.

2 252y0333 11/25/03

5. (Text CD problem 12.51)The manager of a commercial mortgage department has collected data over 104 weeks concerning the number of mortgages approved. The data is the x and O columns below ( x is the number of mortgages approved and O is the number of weeks that happened, for example there were 32 weeks in which 2 mortgages were approved) and the problem asks if it follows a Poisson distribution. Row x O E

1 0 13 12.7355 2 1 25 26.7445 3 2 32 28.0817 4 3 17 19.6572 5 4 9 10.3200 6 5 6 4.3344 7 6 1 1.5170 8 7 1 0.4551 9 8 0 0.1195 10 9 0 0.0279 11 10 0 0.0059 12 11 0 0.0011 13 12 0 0.0002 Since we have no guide as to what the parameter of the distribution is, the x and O columns were multiplied together to tell us that there were 219 mortgages approved over 104 weeks to give us an average of 2.1 mortgages per week. The E above is the computer – generated Poisson distribution multiplied by 104 . In a Kolmogorov – Smirnov procedure we make the O and E into cumulative distributions and compare them as is done below.

Row Fo Fe D

1 0.12500 0.12246 0.0025435 2 0.36538 0.37962 0.0142304 3 0.67308 0.64963 0.0234453 4 0.83654 0.83864 0.0021047 5 0.92308 0.93787 0.0147973 6 0.98077 0.97955 0.0012180 7 0.99038 0.99414 0.0037536 8 1.00000 0.99851 0.0014857 9 1.00000 0.99966 0.0003369 10 1.00000 0.99993 0.0000689 11 1.00000 0.99999 0.0000126 12 1.00000 1.00000 0.0000019 13 1.00000 1.00000 0.0000000 Assume this is correct and explain how you would finish this analysis and why you would or

would not reject the null hypothesis. (4) Solution: The null hypotheses is H 0 : Poisson . n  O  104. We use the K-S table and find that the critical value for the maximum discrepancy is 1.36 1.36   0.1334. since the maximum of the D column is less than the critical value, we do not n 104 reject the null hypothesis.

6. Referring to the previous problem, a more direct method of comparing the observed and expected data is below. Answer the following questions. a) What method is being used? (1) Answer: Chi-squared. b) How many degrees of freedom do we have? (1) Answer: Normally, DF would be the number of rows minus 1, but here we have estimated a parameter from the data and lost a degree of freedom. 7 – 1 – 1 = 5.

3 252y0333 11/25/03

c) Why are the columns shorter here than in Problem 5? (1) Answer: The text says that E cannot be below 1. The E s are that are below 1 and the corresponding O s are added together to form a new 7th line. d) Do we reject our null hypothesis? Why? (3) Solution: To get our computed chi – 2 squared, take the O column and subtract n  104. The result, 1.8415, must be E compared with the 5% value of chi-squared from the table. The value from the table is 11.00705, so we do not reject the null hypothesis. 2 Row O E O E

1 13 12.7355 13.2700 2 25 26.7445 23.3693 3 32 28.0817 36.4650 4 17 19.6572 14.7020 5 9 10.3200 7.8488 6 6 4.3344 8.3056 7 2 2.1267 1.8808 104 104.0000 105.8415

7. In problems 5 and 6, one of the methods was used improperly. Why? Answer: The K-S method was wrong because we did not completely know the distribution we were testing for and had to estimate the mean from the data.

8. Random samples of salaries (in thousands) for lawyers in 3 cities are presented by Dummeldinger. They are repeated in the three left columns. Atlanta DC LA rank-At rank-DC rank-LA 1 45.5 41.5 52.0 12.0 7.0 17.5 2 47.9 40.1 72.0 13.0 5.0 21.0 3 43.1 39.0 41.0 11.0 3.5 6.0 4 42.0 56.5 54.0 8.5 20.0 19.0 5 49.0 37.0 33.0 14.5 2.0 1.0 6 52.0 49.0 42.0 17.5 14.5 8.5 7 39.0 43.0 50.0 3.5 10.0 16.0 80.0 62.0 89.0

You are asked to analyze them, which you do using a Kruskal – Wallis procedure. You are aware that the tables you have are only appropriate for columns with 5 or fewer items in them, so you drop the last two items in each column and after ranking the items from 1 to 15 get a Kruskal – Wallis H of 1.82. If you use the tables, What did you test and what is the conclusion? (3) Answer: The table gives a number of p-values, but none for 1.82. However, it should be plain from the values given, that 1.82 has a p-value above .102 We cannot reject the null hypothesis, which is equal medians, because the p-value is above any significance level we are likely to use. Another way to do this is to note from the Friedman table that the p-value for H  5.66 is .057

and the p-value for H  5.78 is .049, so that H .05  5.8. Since 1.83 is cousiderably below 5.8, do not reject the null hypothesis.

9. You remember how to work with column sizes that are too large for the table. You rank the data as appears in the three right columns above. Compute the Kruskal – Wallis H and use it to test your null hypothesis at the 5% significance level.(3)

4 252y0333 11/25/03  12  SR 2  Solution: Compute the Kruskal-Wallis statistic H    i   3n 1 nn 1  n   i  i   12  802 622 892  18165         3 22  .02597  66  1.4025       . Since the 5% chi-square 2122 7 7 7   7  for 2 degrees of freedom is 5.9915, we do not reject our null hypothesis. Note: In spite of my warnings, most of you changed the 12 to something else, most likely n . If this worked the formula  1  SR 2  would be H  ?  i   3n 1 . Unless I was lying, this is not the case. n 1  n   i  i 

10. The Kruskal – Wallis test above was done on the assumption that the underlying data did not follow the Normal distribution. Let’s assume that you found out that the underlying distributions were Normal and had a common variance. The method to use would be. a) Friedman Test b) Chi – squared test. c) *One way ANOVA d) Two – way ANOVA

TABLE 13-8 The regression equation is GPA = 0.5681 + .1021 ACT

Predictor Coef SE Coef T P Constant .5681 0.9284 0.6119 0.5630 ACT .1021 0.0356 2.8633 0.0286

S = 0.2691 R-Sq = 0.5774% R-Sq(adj) = 0.5069%

Source DF SS MS F P Regression 1 0.5940 0.5940 8.1986 .0287 Residual Error 6 0.4347 0.0724 Total 7 1.0287

It is believed that GPA (grade point average, based on a four point scale) should have a positive linear relationship with ACT scores. Given below is the Excel output from regressing GPA on ACT scores using a data set of 8 randomly chosen students from a Big Ten university.

11. Referring to Table 13-8, the interpretation of the coefficient of determination in this regression is that a) *57.74% of the total variation of ACT scores can be explained by GPA. b) ACT scores account for 57.74% of the total fluctuation in GPA. c) GPA accounts for 57.74% of the variability of ACT scores. d) *none of the above (error)

12. Referring to Table 13-8, the value of the measured test statistic to test whether there is any linear relationship between GPA and ACT is a) 0.0356. b) 0.1021. c) 0.7598. d) *2.8633. The t statistic.

5 252y0333 11/25/03

13. Referring to Table 13-8, what is the predicted average value of GPA when ACT = 20? a) *2.61 .5681 + .1021(20) = 2.6101 b) 2.66 c) 2.80 d) 3.12

14. Referring to Table 13-8, what are the decision and conclusion on testing whether there is any linear relationship at the 1% level of significance between GPA and ACT scores? a) *Do not reject the null hypothesis; hence, there is not sufficient evidence to show that ACT scores and GPA are linearly related. b) Reject the null hypothesis; hence, there is not sufficient evidence to show that ACT scores and GPA are linearly related. c) Do not reject the null hypothesis; hence, there is sufficient evidence to show that ACT scores and GPA are linearly related. d) Reject the null hypothesis; hence, there is sufficient evidence to show that ACT scores and GPA are linearly related. Answer: The p-value is above the significance level, so do not reject the null hypothesis. The null hypothesis is usually that a relationship is not significant.

6 252y0333 11/25/03 ECO252 QBA2 Third EXAM Nov 25 2003 TAKE HOME SECTION - Name: ______Social Security Number: ______

Please Note: computer problems 2 and 3 should be turned in with the exam. In problem 2, the 2 way ANOVA table should be completed. The three F tests should be done with a 5% significance level and you should note whether there was (i) a significant difference between drivers, (ii) a significant difference between cars and (iii) significant interaction. In problem 3, you should show on your third graph where the regression line is.

II. Do the following: (23+ points). Assume a 5% significance level. Show your work!

1. Assume that each column below represents a random sample of sales of the popular cereal brand, ‘Whee!’ As it was moved from shelf 1 (lowest) to shelf 4 (highest) of a group of supermarkets. Assume that the underlying distribution is Normal and test the hypothesis 1   2  3   4 . a) Before you start add the second to last digit of your social security number to the 451 in column 4 and find the sample variance of sales from shelf 4. For example, Seymour Butz’s SS number is 123456789 and he will change 451 to 459. This should not change the results by much. (2) b) Test the hypothesis (6) Show your work – it is legitimate to check your results by running these problems on the computer, but I expect to see hand computations for every part of them. c) Compare means two by two, using any one appropriate statistical method, to find out which shelves are significantly better than others. (3) d) (Extra Credit) What if you found out that each row represented one store? If this changes your analysis, redo the analysis. (5) e) (Extra Credit) What if you found out that each row represented one store and that the underlying distribution was not Normal? If this changes your analysis, redo the analysis. (5) f) I did some subsequent analysis on theist problem. The output, in part said Levene's Test (any continuous distribution)

Test Statistic: 0.609 P-Value : 0.613

What was I testing for and what should my conclusion be? (2)

Sales of ‘Whee’ Cereal Shelf Row 1 2 3 4 Sum of shelf 1 = 3590.0 1 336 440 464 354 Sum of squares of shelf 1 = 1362860 2 417 277 479 423 Sum of shelf 2 = 3954.0 3 208 374 492 321 Sum of squares of shelf 2 = 1602366 4 420 421 456 424 Sum of shelf 3 = 4586.0 5 366 481 338 518 Sum of squares of shelf 3 = 2140264 6 227 349 413 451 7 357 328 383 311 8 353 449 554 462 9 518 462 497 339 10 388 373 510 202

7 252y0333 11/25/03

The original data is presented in my format for 1-way ANOVA

x1 x2 x3 x4 sum

1 336 440 464 354 2 417 277 479 423 3 208 374 492 321 4 420 421 456 424 5 366 481 338 518 6 227 349 413 451 7 357 328 383 311 8 353 449 554 462 9 518 462 497 339 10 388 373 510 202 Sum 3590 +3954 +4586 + 3805 =15935 40 398.375 6630167 1602754.8125

n j 10 +10 +10 +10 = 40

x j 359.0 395.4 458.6 380.5 398.375  x SS 13628460 + 1602366 +2140264 +1524677 =6630167 Most of this line was done for you. 2 x j 128881 +156341.16+210313.96+144780.25 =640316.37 2 2 From the above  x  15935 , n  40 ,  xij  6630167 ,  xi.  6630167 , and x 15935 x     398.375 . n 40 2 2 2 . SST   xij  nx  6630167  40398.375  6630167  6348105.625  282061.375

2 2 x  nx 1524677 10380.52 a) s 2     8541.6111 n 1 9 b) We are testing H o : 1   2  3   4 2 2 2 SSB   n j x j  nx  10640316.37 40398.375  6403163.7  6348105.625  55058.075 ANOVA

Source of Variation SS df MS F F  3,36 Between 55058 3 18353 2.91s F.05  2.87 Within (Error) 227003 36 6306

Total 282061 39

Since Our computed F exceeds the table F, we reject the null hypothesis.

8 252y0333 11/25/03 c) Recall the following from the outline. Scheffe’ Confidence Interval: If we desire intervals that will simultaneously be valid for a given confidence level for all possible intervals between column means, use  1 1      x  x  m  1 F m1,nm s   1 2  1 2       .  n1 n2  Tukey Confidence Interval: This also applies to all possible differences.

m,nm s 1 1 1   2  x1  x2  q   . This gives rise to Tukey’s HSD (Honestly Significant 2 n1 n2

Difference) procedure. Two sample means x.1 and x.2 are significantly different if x.1  x.2 is greater

m,nm s 1 1 than 1  2  q   2 n1 n2 From the solution to grass 3, If you used a Scheffe’ interval, there are m  4 columns. The error side is  1 1   1 1   m 1 F m1,nm  s    3 2.87 6306   104.206           n1 n2  10 10  m,nm 4,36 For the Tukey comparison, we have an error side with q  q .05  3.81 .

m,nm s 1 1 6306  1 1  q    3.81     95.676 2 n1 n2 2 10 10  We have the following differences

x.1  x.2  359.0  395.4  36.4 x.2  x.3  395.4  458.6  63.2 x.1  x.3  359.0  458.6  99.6 x.2  x.4  395.4  380.5  14.90 x.1  x.4  359.0  380.5  21.5 x.1  x.4  359.0  380.5  21.5

For the Tukey comparison only, x.1  x.3  359.0  458.6  99.6 is larger than 95.676, so we say that there is a significant difference between those shelves. The more conservative Scheffe interval gives us nothing. d) 2-way ANOVA You have already done SSC. x x x x x n x 2 1 2 3 4  i.. i i. SS xi.

1 336 440 464 354 1594 4 398.50 647108 158802.2500 2 417 277 479 423 1596 4 399.00 658988 159201.0000 3 208 374 492 321 1395 4 348.75 528245 121626.5625 4 420 421 456 424 1721 4 430.25 741353 185115.0625 5 366 481 338 518 1703 4 425.75 747885 181263.0625 6 227 349 413 451 1440 4 360.00 547300 129600.0000 7 357 328 383 311 1379 4 344.75 478443 118852.5625 8 353 449 554 462 1818 4 454.50 846570 206570.2500 9 518 462 497 339 1816 4 454.00 843898 206116.0000 10 388 373 510 202 1473 4 398.25 590577 135608.0625 Sum 3590 +3954 +4586 + 3805 =15935 40 398.375 6630167 1602754.8125

n j 10 +10 +10 +10 = 40

x j 359.0 395.4 458.6 380.5 398.375  x SS 13628460 + 1602366 +2140264 +1524677 =6630167 2 x j 128881 +156341.16+210313.96+144780.25 =640316.37

9 252y0333 11/25/03

2 2 From the above  x  15935 , n  40 ,  xij  6630167 ,  xi.  6630167 2  x 15935  x. j  640316.37 and x    398.375 . n 40 2 2 2 SST   xij  nx  6630167  40398.375  6630167  6348105.625  282061.375 2 2 2 SSC   n j x j  nx  10640316.37 40398.375  6403163.7  6348105.625  55058.075 2 2 2 SSR   ni xi.  nx  41602754.8125 40398.375  6411019  6348105  62914

Source of Variation SS df MS F F  3,27 Columns (shelves) 55058 3 18353 3.02s F.05  2.96 9,27 Rows (stores) 62914 9 6990 1.15ns F.05  2.25 Within (Error) 164090 27 6077

Total 282061 39

The null hypotheses are no significant difference between store means and no significant difference between shelf means. We reject the hypothesis that store means are equal. e) In general if the parent distribution is Normal use ANOVA, if it's not Normal, use Friedman or Kruskal-Wallis. If the samples are independent random samples use 1-way ANOVA or Kruskal Wallis. If they are cross-classified, use Friedman or 2-way ANOVA. So the other method that allows for cross-classification is Friedman and we use it if the underlying distribution is not Normal.

The null hypothesis is H 0 : Columns from same distribution or H 0 : 1  2  3  4 . We use a Friedman test because the data is cross-classified by store. This time we rank our data only within rows. There are c  4 columns and r  10 rows.

x1 x2 x3 x4 r1 r2 r3 r4

1 336 440 464 354 1 3 4 2 2 417 277 479 423 2 1 4 3 3 208 374 492 321 1 3 4 2 4 420 421 456 424 1 2 4 3 5 366 481 338 518 2 3 1 4 6 227 349 413 451 1 2 3 4 7 357 328 383 311 3 2 4 1 8 353 449 554 462 1 2 4 3 9 518 462 497 339 4 2 3 1 10 388 373 510 202 3 2 4 1 Sum 3590 +3954 +4586 + 3805 19 22 35 24

To check the ranking, note that the sum of the three rank sums is 19 + 22 + 35 +24 = 100, and that rcc 1 1045 the sum of the rank sums should be SR    100 .  i 2 2

10 252y0333 11/25/03  12  Now compute the Friedman statistic  2   SR 2   3rc 1 F rc c 1  i    i 

 12 2 2 2 2    19  22  35  24   3105  0.06361 484 1225  576 250  158.76 150  8.76 1045  . If we check the Friedman Table for c  4 and r  10 , we find that the problem is too large for the table. So we look up the 5% value of  2 with c 1  3 degrees of freedom.. It is 7.8147. Since the computed statistic exceeds that value we reject the null hypothesis. f) The Levene Test is a test for equality of variances. A high p-value indicates that the null hypothesis of the columns’ coming from populations with equal variance cannot be rejected.

Your Solution: Go to the document 252y0333TH pp 2-32 and find Results for: 2x0333-0j, where j is the second to last digit of your SS number. The first things that appear are the sums and sums of squares for all the columns, followed by a printout of column 4. The one-way ANOVA had to be run twice, the first time in the form that you would use and the second time in the form that you used in Computer Assignment 2. The second version prints out Tukey confidence intervals. Minitab then prints the results of the 2-way ANOVA and the Friedman test.

11 252y0333 11/25/03

2. A company, operating in 12 regions, gives us its advertising expenses as a percent of those of its leading competitor, and its sales as a percent of those of its leading competitor.

Row Ad Sales Sum of Ad = 1134.0 Sum of squares of Ad = 108258 1 77 85 Sum of Sales = 1168.0 2 110 103 Sum of squares of Sales = 114750 3 110 102 4 93 109 Note that the sum and sum of squares 5 90 85 of sales can’t be used directly, but 6 95 103 they should help you to get the 7 100 110 corrected numbers. 8 85 86 9 96 92 10 83 87 11 100 98 12 95 108

Change the 103 in the ‘sales’ column by adding the second-to-last digit of your Social Security number to it. For example, Seymour Butz’s SS number is 123456789 and he will change 103 to 112. This should not change the results by much. The question is whether our relative advertising expenses affect our relative sales, so ‘Sales’ should be your dependent variable and ‘Ad’ should be your independent variable. Show your work – it is legitimate to check your results by running the problem on the computer, but I expect to see hand computations that show clearly where you got your numbers for every part of this problem.  a. Compute the regression equation Y  b0  b1 x to predict the ‘Sales’ on the basis of ‘Ad’. (2) b. Compute R 2 . (2) c. Compute se . (2) s d. Compute b0 and do a significance test on b0 (2) e. Do an ANOVA table for the regression. What conclusion can you draw from this table about the relationship between advertising expenditures and sales? Why? (2) e. It is proposed to raise our expenditures to 110% of our competitors’ in every region. Use this to find a predicted value for sales and to create a confidence interval for sales. Explain the difference between this and a prediction interval and when the prediction interval would be more useful. (3)

12 252y0333 11/25/03 Solution: Working with the original data, we get the following table. The x and x 2 columns and their sums were not actually needed since they were done for you. Row Ad Sales crprice i x y x 2 xy y 2 1 77 85 7225 6545 5929 2 110 103 10609 11330 12100 3 110 102 10404 11220 12100 4 93 109 11881 10137 8649 5 90 85 7225 7650 8100 6 95 103 10609 9785 9025 7 100 110 12100 11000 10000 8 85 86 7396 7310 7225 9 96 92 8464 8832 9216 10 83 87 7569 7221 6889 11 100 98 9604 9800 10000 12 95 108 11664 10260 9025 1134 1168 108258 111090 124750

n  12 ,  x  1134 ,  y  1168 ,  x 2  108258 ,  xy  111090 and  y 2  114750 . Spare Parts Computation: SSx   x 2  nx 2  108258 1294.52  1095.00 x 1134 x     94.5 Sxy  xy  nx y  111090 1294.597.3333 n 12   714.00 y  1168 2 y    97.3333 SSy  y 2  ny  114750 1297.33332 n 12   1064.6667  SST Sxy  xy  nxy 714 a) b1     0.652055 SSx  x 2  nx 2 1095

b0  y  b1x  97.3333  0.65205594.5  35.7141  ˆ So Y  b0  b1 x becomes.Y  35.7141 0.6521x

2 SSR 465 b) SSR  b1Sxy  b1  xy  nxy 0.6521714.00  465.60 So R    .4368 or  SST 1064.6667 2 Sxy2  xy  nxy 7142 R 2      .4372 SSxSSy  x 2  nx 2  y 2  ny 2  10951064.6667 SSE 599.6667 c) d) SSE  SST  SSR  1064.6667  465  599.6667 s 2    59.96667 e n  2 10 2 2 2 1 R 2 SST 1 R  y  ny  1.43721064.6667 or s 2        59.9194 e n  2 n  2 10 2 So se  59.9194  7.7408 ( se is always positive!)

 2  2 2 2 1 X  1 94.5  d) sb  se     59.9194    59.9194.01077  8.15548  489.69 0 n X 2  nX 2  12 1095      

s  489.69  22.12 . Formulas for sb and sb appear in the outline. b0 0 1 252y0333 11/25/03

H 0 :  0   00 b0  00 The outline says that to test use t  . Remember  is most often zero – and if  s 00 H1 :  0   00 b0

b0  00 35.7141 the null hypothesis is false, we say that  is significant. So t    1.615 . 1 s 22.12 b0 Make a diagram of an almost Normal curve with zero in the middle and, if   .05 , the rejection zones n2 10 n2 10 are above t  t  2.228 and below  t  t  2.228 . Since our computed t-ratio is, at 2 .025 2 .025 1.615, nor in a rejection zone, we do not reject the null hypothesis that the coefficient is not significantly different from zero and we say that b1 is not significant. e) Note that the F test is the equivalent of a test on b1

Source SS DF MS F F.05 Regression 465.0000 1 465.00 7.75 F 1,10  4.96 s Error (Within) 599.6667 10 59.97 Total 1064.6667 11 f) Our equation says that Yˆ  35.7141 0.6521x , so, if x  110, Yˆ  35.7141 0.6521110  107.45 .   Yˆ  t s the Confidence Interval is Y0 0 Yˆ , where . 2  X X   2  2 2  1  0     1 110  94.5  s ˆ  se   59.9194   18.140 s ˆ  18.140  4.259 , so that Y  n X 2  nX 2  12 1095  Y        Yˆ  ts  107.45  2.228 4.259  107.45  9.49 the confidence interval is Y0 0 Yˆ    or 97.96 to 116.94. The confidence interval for Y gives an average value for many areas in which the ad budget price was 110, so it is not appropriate for prediction sales in one region. However, since the firm is making the ad budget uniform over all areas, it may be quite appropriate for projecting total sales. ------See next page.

14 252y0333 11/25/03

Minitab output follows.

Results for: 2x0333-10.mtw

MTB > Name c15 = 'CLIM5' c16 = 'CLIM6' c17 = 'PLIM5' c18 = 'PLIM6' MTB > Regress c2 1 c1; SUBC> Constant; SUBC> Predict c6; SUBC> CLimits 'CLIM5'-'CLIM6'; SUBC> PLimits 'PLIM5'-'PLIM6'; SUBC> Brief 2.

Regression Analysis: Sales versus Ad

The regression equation is Sales = 35.7 + 0.652 Ad

Predictor Coef SE Coef T P Constant 35.71 22.22 1.61 0.139 Ad 0.6521 0.2339 2.79 0.019

S = 7.740 R-Sq = 43.7% R-Sq(adj) = 38.1%

Source DF SS MS F P Regression 1 465.57 465.57 7.77 0.019 Residual Error 10 599.10 59.91 Total 11 1064.67

Predicted Values for New Observations

New Obs Fit SE Fit 95.0% CI 95.0% PI 1 107.44 4.26 ( 97.95, 116.93) ( 87.76, 127.12)

Values of Predictors for New Observations

New Obs Ad 1 110

Your Solution: Go to the document 252y0333TH pp 32-44 and find Results for: 2x0333-1j, where j is the second to last digit of your SS number. The first things that appear are the results of the regression with the correct prediction and confidence intervals. The next regression printout is wrong but useful. It computes the regression with sales as the independent variable (X) and Ad as the dependent variable (Y). This routine is set up to compute all the columns and column sums that you needed. However, because of the way the data was arranged, it has things reversed. 2 2 So n = n , sumy   x , sumx   y , smysq   x , smxy   xy , smxsq   y , ybar 2 2  x , xbar  y , SSy  SSx   x  nx , Sxy  Sxy   xy  nx y and SSx 2  SSy   y 2  ny  SST . Sorry about that, but the alternative was a load of new programming.