THIRD HOUR EXAM Hour of Class Registered (Circle)

252x0631 11/28/06 (Page layout view!)

ECO252 QBA2 Name THIRD HOUR EXAM Hour of Class Registered (Circle) Nov 30 and Dec 1, 2006 MWF 1 MWF 2 TR 12:30 TR 2

I. (8 points) Do all the following (2points each unless noted otherwise). Make Diagrams! Show your work!

x ~ N15, 9.3

1. Px  20

2. P0  x  14

3. P16  x  16

4. x.42 (Find z.42 first)

1 252x0631 11/28/06 (Page layout view!)

II. (22+ points) Do all the following (2points each unless noted otherwise). Do not answer question ‘yes’ or ‘no’ without giving reasons. Show your work in questions that are not multiple choice. Look them over first. The exam is normed on 50 points. Note the following: 1. This test is normed on 50 points, but there are more points possible including the take-home. You are unlikely to finish the exam and might want to skip some questions. 2. If you answer ‘None of the above’ in any question, you should provide an alternative answer and explain why. You may receive credit for this even if you are wrong. 3. Use a 5% significance level unless the question says otherwise. 4. Read problems carefully. A problem that looks like a problem on another exam may be quite different.

1. Turn in your computer problems 2 and 3 marked to show the following: (5 points, 2 point penalty for not doing.) a) In computer problem 2 – what is tested and what are the results? b) In computer problem 3 – what coefficients are significant? What is your evidence? c) In the last graph in computer problem 3, where is the regression line? [5]

2. (Abronovic) The distance that a baseball travels after being hit is a function of the velocity (in mph) of the pitched ball. A ball is pitched to a batter with a 35 inch, 32 oz bat that is swung at 70mph from the waist and at an angle of 35%. The experiment is repeated 9 times. A partial Minitab printout appears below. Use   .01 throughout this problem.

DIST = ……… + ……… VELOC Predictor Coef StDev t-ratio p Constant 31.311 0.999 …………… ………… VELOC 0.74667 0.01529 …………… ………… s = 1.185 R-sq = ……… Rsq(adj) = ………

Analysis of Variance SOURCE DF SS MS F p Regression 1 3345.1 3345.1 ……… ………… Error 7 9.8 1.4 Total 8 3354.9 a) The fastest pitchers can throw at about 100 mph. How far will such a pitch be hit? (2)

b) What is the value of R-squared? (2)

c) Fill in the F space in the ANOVA and explain specifically what is tested and what are the conclusions. (3)

d) Is the constant (31.311) significant? Why? (2) [14]

3. (Render) A firm renovates homes in upstate New York. Sales and payroll for the region for a random sample of years are given below. Sales are in $100,000 and Payroll is in $0.1 billions. Current sales are $210000.00 and because of the opening of several new plants payroll is anticipated to be about 0.6 billion dollars for the foreseeable future. Will average sales be significantly different from current sales? To answer this question. a) Complete the XY column (1) b) Find the regression equation (4) c) Find an appropriate interval for sales (3) d) State and justify your conclusion (2) [24]

Y – Sales X - Payroll X2 Y2 XY 2.0 1 1 4 3.0 3 9 9 2.5 4 16 6.25 2.0 2 4 4 2.0 1 1 4 3.5 7 49 12.25 15.0 18 80 39.50

4. The following data may look familiar. It is the service method data from the last exam. 11 service methods were tried to see which is the fastest. I ran a multiple comparison of the 11 methods. Data were stacked as ‘time’ in column 12 with the column labels ‘tel’ in column 13. This was run on Minitab in three different ways, as a one-way ANOVA, using the Kruskal-Wallis test and using the Mood median test. I have no idea how to do a Mood median test but I know this much. The null hypothesis is equal medians. The assumptions are comparable to the Kruskal-Wallis test. The Mood median test is less powerful than the Kruskal-Wallis test but is less affected by outliers. The Mood median test produces a chi-squared statistic with degrees of freedom equal to the number of columns less 1.

Row 1 2 3 4 5 6 7 8 9 10 11 1 2.9 2.8 2.6 7.7 2.4 6.6 3.5 3.4 3.5 2.3 3.4 2 3.9 2.6 2.7 2.9 13.4 3.7 8.4 8.3 3.4 6.9 4.4 3 2.6 2.6 3.2 4.3 5.8 9.7 4.3 4.2 3.8 3.3 3.1 4 3.1 2.9 2.8 2.7 1.5 1.9 3.3 3.2 3.4 5.3 3.6 5 3.9 2.9 3.6 3.4 9.8 10.1 11.9 11.0 3.6 3.0 4.4 6 2.6 2.8 2.1 4.4 2.7 4.5 3.7 3.6 3.5 3.3 3.1 7 3.3 2.3 2.3 5.5 2.7 2.9 3.0 2.9 4.8 6.1 3.8 8 3.0 2.4 2.6 3.4 4.5 9.9 2.9 2.8 3.5 3.1 3.5 9 3.5 2.0 2.6 3.4 2.3 3.0 3.6 3.5 5.3 2.6 4.0 10 3.1 2.5 2.9 3.5 5.8 31.5 5.4 5.3 3.7 4.4 3.6 11 3.2 2.4 2.4 4.0 4.8 3.5 4.4 4.3 3.4 15.0 3.7 12 2.4 2.0 2.3 3.4 4.2 5.3 3.0 2.9 3.6 6.9 2.9 13 4.0 4.1 0.5 4.1 5.8 9.8 4.3 4.2 3.8 2.1 4.5 14 4.3 4.3 4.7 3.8 6.1 5.3 5.4 5.3 3.8 10.4 4.8 The edited Minitab output follows. Note that the p-values are missing and for you to think about.

Results for: 252x06031-01.MTW

MTB > Name c72 "RESI1" #Residuals are stored for Normality test. MTB > Oneway c12 c13; SUBC> Residuals 'RESI1'; SUBC> GNormalplot; SUBC> NoDGraphs. One-way ANOVA: time versus tel Source DF SS MS F P tel 10 284.43 28.44 ……… ………… Error 143 1228.86 8.59 Total 153 1513.28 S = 2.931 R-Sq = 18.80% R-Sq(adj) = 13.12%

Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev ------+------+------+------+--- tel 1 14 3.271 0.580 (-----*-----) tel 10 14 5.336 3.630 (-----*------) tel 11 14 3.771 0.580 (-----*-----) tel 2 14 2.757 0.677 (-----*-----) tel 3 14 2.664 0.909 (------*-----) tel 4 14 4.036 1.265 (-----*-----) tel 5 14 5.129 3.214 (------*-----) tel 6 14 7.693 7.446 (-----*-----) tel 7 14 4.793 2.503 (-----*-----) tel 8 14 4.636 2.331 (------*-----) tel 9 14 3.793 0.561 (-----*-----) ------+------+------+------+--- 2.5 5.0 7.5 10.0 Pooled StDev = 2.931 Normplot of Residuals for time (This plot was identical to the one shown below)

MTB > Kruskal-Wallis c12 c13. Kruskal-Wallis Test: time versus tel Kruskal-Wallis Test on time tel N Median Ave Rank Z tel 1 14 3.150 60.1 -1.53 tel 10 14 3.850 86.8 0.82 tel 11 14 3.650 85.6 0.72 tel 2 14 2.600 33.3 -3.89 tel 3 14 2.600 33.5 -3.87 tel 4 14 3.650 85.4 0.70 tel 5 14 4.650 90.3 1.12 tel 6 14 5.300 107.2 2.61 tel 7 14 4.000 94.7 1.51 tel 8 14 3.900 89.5 1.06 tel 9 14 3.600 86.1 0.75 Overall 154 77.5 H = 41.97 #H is the Kruskal-Wallis statistic that I #taught you about.

MTB > Mood c12 c13. Mood Median Test: time versus tel Mood median test for time Chi-Square = 23.34 DF = 10 P = ……… Individual 95.0% CIs tel N<= N> Median Q3-Q1 -+------+------+------+----- tel 1 10 4 3.15 1.08 (*--) tel 10 7 7 3.85 4.00 (--*------) tel 11 5 9 3.65 1.08 (*--) tel 2 12 2 2.60 0.53 *-) tel 3 12 2 2.60 0.68 (*-) tel 4 7 7 3.65 0.93 (*-) tel 5 5 9 4.65 3.25 (------*---) tel 6 4 10 5.30 6.45 (------*------) tel 7 5 9 4.00 2.18 (--*-----) tel 8 6 8 3.90 2.18 (--*----) tel 9 6 8 3.60 0.33 *) -+------+------+------+----- 2.5 5.0 7.5 10.0 Overall median = 3.50

MTB > NormTest c72; SUBC> KSTest. Probability Plot of RESI1 #This is a plot of the differences between means of #each columns and the actual data.

MTB > Vartest c12 c13; SUBC> Confidence 95.0. Test for Equal Variances: time versus tel (This plot was not needed.) Bartlett's Test (normal distribution) Test statistic = 190.25, p-value = 0.000

Levene's Test (any continuous distribution) Test statistic = 3.29, p-value = 0.001

a) Are the means significantly different? Identify the test that answers this question, complete it and answer the question. Explain! (3)

b) Are the medians significantly different? Complete one or both of the remaining tests and answer the question. Explain! (4)

c) At the end of the printout there are a K-S (actually Lilliefors) test, a Bartlett and a Levene test. What do they tell us about the applicability of the ANOVA, Kruskal-Wallis or Mood test to the problem? What conclusion is the most reliable? (3)

d) On the basis of all this, which of the service methods are best? (1) [35]

5. Four experts rated 4 brands of Columbian coffee. By adding together ratings on a seven point scale for taste, aroma, richness and acidity, each coffee is given a rating on a 28 point scale. The following table gives these summed ratings. Assume that the underlying distribution is not Normal and test for a difference in the ratings of the four brands. Note:   .10

Row Brand A Brand B Brand C Brand D

x1 x2 x3 x4 1 24 26 25 22 2 27 27 26 24 3 19 22 20 16 4 24 27 25 23

ECO 252 QBA2 THIRD EXAM Nov 30 and Dec 1, 2006 TAKE HOME SECTION Name: ______Student Number: ______Class days and time: ______

Please Note: Computer problems 2 and 3 should be turned in with the exam (2). In problem 2, the 2 way ANOVA table should be checked. The three F tests should be done with a 5% significance level and you should note whether there was (i) a significant difference between drivers, (ii) a significant difference between cars and (iii) significant interaction. In problem 3, you should show on your third graph where the regression line is. Check what your text says about normal probability plots and analyze the plot you did. Explain the results of the t and F tests using a 5% significance level. (2)

III Do the following. Note: Look at 252thngs (252thngs) on the syllabus supplement part of the website before you start (and before you take exams). Show your work! State H 0 and H1 where appropriate. You have not done a hypothesis test unless you have stated your hypotheses, run the numbers and stated your conclusion. (Use a 95% confidence level unless another level is specified.) Answers without reasons or accompanying calculations usually are not acceptable. Neatness and clarity of explanation are expected. This must be turned in when you take the in-class exam. Note that from now on neatness means paper neatly trimmed on the left side if it has been torn, multiple pages stapled and paper written on only one side. Because so much of this exam is based on student numbers, there is a penalty for failing to state your correct student number.

1) Bassett et al. give the following numbers for the year and the number of pensioners in the United Kingdom. Pensioners are in millions. The 2000 number is a bit shaky, so subtract the last digit of your student number divided by 10 from the 12.00 that you see there. Label your answer to this problem with a version number. (Example: Good ol’ Seymour’s student number is 123456, so the 12.000 becomes 12.000 - 0.6 = 11.400 and he labels it Version 6.) 'Pensioners' is the dependent variable and 'Year' is the independent variable, so what you are going to get is a trend line. If you don’t know what dependent and independent variables are, stop work until you find out.

Year Pensioners 1 1966 6.679 2 1971 7.677 3 1975 8.321 4 1976 8.510 5 1977 8.637 6 1978 8.785 7 1979 8.937 8 2000 12.000 Bassett et. al. strongly suggest that you change the base year to something other than the year zero. They recommend that you subtract 1970 from every number in the ‘Year’ column, so that 1966 becomes -4 and 2000 becomes 30. This will make your computations easier.  a. Compute the regression equation Y  b0  b1 x to predict the number of pensioners in each year. (3).You may check your results on the computer, but let me see real or simulated hand calculations. b. Compute R 2 . (2) c. Compute se . (2) s d. Compute b1 and do a significance test on b1 (2) e. Use your equation to predict the number of pensioners in 2005 and 2006. Using the 2006 number, create a prediction interval for the number of pensioners for that year. Explain why a confidence interval for the number of pensioners is inappropriate. (3) f. Make a graph of the data. Show the trend line clearly. If you are not willing to do this neatly and accurately, don’t bother. (2)

8 252x0631 11/28/06 (Page layout view!) g. What percent rise in pensioners did the equation predict for 2006? What percent rise does it predict for 2050? The population of the United Kingdom grew at roughly 0.31% a year over the last quarter of the 20th century. Can you intelligently guess what is wrong? (1) [15]

2) The Lees in their text ask whether experience makes a difference in student earnings and present the following data for student earnings versus years of work experience. To personalize these data, take the second to last digit of your student number call it a . Clearly label the problem with a version number based on your student number. Then take your a , multiply it by 0.5 and add it to the 13 in the lower left corner. . (Example: Good ol’ Seymour’s student number is 123456, so the 13 becomes 13 + 0.5(5) = 13 + 2.5 = 15.5 and he labels it Version 5.) Each column is to be regarded as an independent random sample. Years of Work Experience 1 2 3 16 19 24 21 20 21 18 21 22 13 20 25 a) State your null hypothesis and test it by doing a 1-way ANOVA on these data and explain whether the test tells us that experience matters or not. (4) b) Using your results from a) present two different confidence intervals for the difference between earnings for those with 1 and 3 years experience. Explain (i) under what circumstances you would use each interval and (ii) whether the intervals show a significant difference. (2) b) What other method could we use on these data to see if years of experience make a difference? Under what circumstances would we use it? Try it and tell what it tests and what it shows. (3) [24] c) (Extra Credit) Do a Levene test on these data and explain what it tests and shows. (4)

3) (Abronovic) A group of 4 workers produces defective pieces at the rates shown below during different times of the day. Personalize the data by subtracting the last digit of your student number from the 14 in the lower right corner. Use the number subtracted to label this as a version number. (Example: Good ol’ Seymour’s student number is 123456, so the 14 becomes 14 - 6 =8 and he labels it Version 6.) Time Worker’s Name Apple Plum Pear Melon Early 10 11 8 12 Morning Late 9 8 7 10 Morning Early 12 13 11 11 Afternoon Late 13 14 10 14 Afternoon

Sum of Row 1 = 41, SSQ of Row 1 = 429, Sum of Column 1 = 44, SSQ of Column 1 = 494, Sum of Row 2 = 34, SSQ of Row 2 = 294, Sum of Column 2 = 46, SSQ of Column 2 = 550, Sum of Row 3 = 47, SSQ of Row 3 = 555, Sum of Column 3 = 36, SSQ of Column 3 = 334, a) Do a 2-way ANOVA on these data and explain what hypotheses you test and what the conclusions are. (6) b) Using your results from a) present two different confidence intervals for the difference between numbers of defects for the best and worst worker and for the defects from the best and second best times. Explain which of the intervals show a significant difference and why. (3) c) What other method could we use on these data to see if time of day makes a difference while allowing for cross-classification? Under what circumstances would we use it? Try it and tell what it tests and what it shows. (3) [36]

9