STAT 211 EXAM 4- FORM A FALL 2004

The heights (in inches) and weights (lb) of randomly selected supermodels are obtained. Since the weight is a function of height, regressed the weight on the height and obtained the following Minitab output.

Simple Linear Regression Model : weighti   0  1heighti  ei , i=1,…,9.

Predictor Coef SE Coef T P Constant -151.70 78.50 -1.93 0.095 height 3.883 1.118 3.47 0.010

S = 4.879 R-Sq = 63.3% R-Sq(adj) = 58.1%

Analysis of Variance Source DF SS MS F P Regression 1 287.37 287.37 12.07 0.010 Residual Error 7 166.63 23.80 Total 8 454.00

Obs height weight Fit SE Fit Residual St Resid 1 71.0 125.00 124.02 1.84 0.98 0.22 2 70.5 119.00 122.08 1.66 -3.08 -0.67 3 71.0 128.00 124.02 1.84 3.98 0.88 4 72.0 128.00 127.90 2.57 0.10 0.02 5 70.0 119.00 120.14 1.65 -1.14 -0.25 6 70.0 127.00 120.14 1.65 6.86 1.49 7 66.5 105.00 106.55 4.47 -1.55 -0.79 X 8 70.0 123.00 120.14 1.65 2.86 0.62 9 71.0 115.00 124.02 1.84 -9.02 -2.00

X denotes an observation whose X value gives it large influence.

Assume all assumptions for the simple linear regression is satisfied unless that assumption is specifically asked. Answer the questions 1 to 9 using the above information

1. Which of the following is the Pearson’s correlation between height and weight for the complete data? (a) -0.796 (b) -0.633 (c) 0.401 (d) 0.633 (e) 0.796 r= R 2  0.633

2. Assume there is a correlation between height and weight of randomly selected supermodels, which of the following test statistics that you would use to show if there is a correlation between height and weight of all adult women? (a) -1.93 (b) 3.47 (c) 12.07 r n  2 0.633(7) t   1 r 2 1 0.633 3. Should we use Spearman’s or Pearson’s correlation to show the relationship between the height and the weight? (a) Spearman’s because there is an influential data pt. (b) Pearson’s because there is an influential data pt.

4. I deleted the 7th observation and rerun the regression. The R2 for this new model was 8.5%. Which of the following should be done? (a) Use the first model and predict the future values. (b) Use the new model and predict the future values. (c) The R2 for the first model is not very high even though there is a linear relationship between those two random variables using 0.05 significance. We should search for better models and if we cannot find any other, use this first model to predict the future values. STAT 211 EXAM 4- FORM A FALL 2004

5. Which of the following is the point estimate for the constant variance in this analysis? (a) 4.879 (b) 23.80 (c) 166.63 (d) 287.37 S2 or MSE

6. I claim that the average change in the weight is 4.55 lb when the supermodel is 1 inch taller. To see if the data supports this, the 95% confidence interval for the slope is computed as (1.2389 , 6.5271). Does the data support my claim with 0.05 significance? (a) We do not have enough information to answer this question (b) Yes 4.55 falls in the confidence interval (c) No

7. If the height of the randomly selected super model is 70.5 inches, which of the following is the corresponding residual in this simple linear regression model? (a) -3.08 (b) -0.67 (c) 1.66 (d) need more information to answer this question Either from the output or height=70.5 then estimated weight=-151.70+3.883(70.5)=122.0515 where the observed weight is 119. The residual=observed weight – estimated weight=-3.0515

8. Which of the following is the estimated average weight when the supermodel is 70 inches tall? (a) 119.00 (b) 120.14 (c) 121.06 (d) 123.00 (e) 127.00 Estimated average weight=-151.70+3.883(70)=120.11

9. Which of the following is the estimated slope in the simple regression model? (a) -151.700 (b) -4.981 (c) 1.118 (d) 3.883 (e) 3.47

In a study of 1700 teens aged 15-19, half were given written surveys and half were given surveys using anonymous computer program. Among those given the written surveys, 69 say that they carried a gun within the last 30 days. Among those given the computer surveys, 108 say that they carried a gun within the last 30 days. Let X be number of teens carried a gun within the last 30 days.

Sample X n Sample p 1: written surveys 69 850 0.081176 2: computer survey 108 850 0.127059

Estimate for p(1) - p(2): -0.0458824 95% CI for p(1) - p(2): (-0.0748366, -0.0169281) Test for p(1) - p(2) = 0 (vs not = 0): Z = -3.11 P-Value = 0.002

Answer questions 10 to 13 using this information.

10. The sample percentages of two groups of teens are obviously different but why don’t we answer the question of true percentages of teens in those groups carried a gun within the last 30 days are being different comparing the sample percentages? (a) You are right. That is what we should do. (b) This is just a sample from the population and we need to test it using this sample information because comparing those two numbers lack the precision and reliability in estimation.

11. Which of the following is the test statistics comparing the true percentages of teens given written surveys and the true percentages of teens given computer surveys? (a) -3.11 (b) -0.05 (c) 0.02

12. If you are testing the true percentages of written surveys being lower than the computer surveys, which of the following is the P-value? (a) 0.002 (b) 0.001 (c) 0.01 (d) 0.002 (e) There is no way to answer without using the standard normal table STAT 211 EXAM 4- FORM A FALL 2004

Lower tailed test, P-value=P(z<-3.11)=P(z>3.11)=0.002/2=0.001

13. Which of the following is the point estimate for the difference between the true percentages of teens given written surveys and the true percentages of teens given computer surveys? (a) -0.0748 (b) -0.0459 (c) -0.0169 (d) 0.02

The head injury data in four different types of car (subcompact, compact, midsize, full-size) are given below. Subcompact 681 428 917 898 420 Compact 643 655 442 514 525 Midsize 469 727 525 454 259 Full-size 384 656 602 687 360

Normal Probability Plot of the Residuals (response is injury)

2

1 e r o c S

l 0 a m r o N -1

-2 -300 -200 -100 0 100 200 300 Residual

Analysis of Variance for injury Source DF SS MS F P Category 3 88425 29475 0.99 0.422 Error 16 475323 Total 563748

Level n Mean StDev compact 5 555.8 91.0 full-size 5 537.8 154.6 midsize 5 486.8 167.7 subcompact 5 668.8 242.0 Pooled StDev = 172.4

Tukey's pairwise comparisons Family error rate = 0.0500 Individual error rate = 0.0113 Critical value = 4.05 Intervals for (column level mean) - (row level mean) compact full-size midsize full-size -294 330 midsize -243 -261 381 363 STAT 211 EXAM 4- FORM A FALL 2004

subcompact -425 -443 -494 199 181 130

Bartlett's Test (normal distribution) Test Statistic: 3.161 P-Value : 0.367 Assume all assumptions for the single factor ANOVA model is satisfied unless that assumption is specifically asked. Answer questions 14 to 20 using this information.

14. Which of the following is the MSE in this single factor ANOVA model? (a) 29475 (b) 29708 (c) 475323 (d) 563748

15. Which of the following is the total degrees of freedom in this single factor ANOVA model? (a) 3 (b) 16 (c) 17 (d) 19 (e) 20

16. Do we have any information to support the errors being normally distributed in this single factor ANOVA model? (a) No. We need the boxplot of the residuals (b) Yes. We have the ANOVA table (c) Yes. We have the Barttlett’s test (d) Yes. We have the Tukey-Kramer simultaneous confidence intervals (e) Yes. We have the normal probability plot of residuals

17. Use a 0.05 significance level and test the null hypothesis that different categories of cars have the same true mean. Which of the following decision and conclusion that you achieve? (a) Fail to reject H0 and conclude that there are no differences between the true means of different categories of cars.

(b) Reject H0 and conclude that there are no differences between the true means of different categories of cars.

(c) Fail to reject H0 and conclude that there are differences between the true means of different categories of cars.

(d) Reject H0 and conclude that there are differences between the true means of different categories of cars.

18. Do the data suggest that larger cars are safer? (Hint: Answering this question, consider the precision and reliability instead of just comparing the numbers) (a) No (b) Yes

19. Is the constant variance assumption for this single factor ANOVA model satisfied? (a) No (b) Yes (c) do not have enough information to answer this question

20. Assume the constant variance assumption for the analysis of variance is satisfied, which of the following is the estimated constant standard deviation? (a) 172.4 (b) 29708 (c) 475323 (d) do not have enough information to answer this question STAT 211 EXAM 4- FORM A FALL 2004

Refer the following data in the table that lists SAT scores before and after the sample of 10 students took a preparatory course.

Student A B C D E F G H I J Before 700 840 830 860 840 690 830 1180 930 1070 After 720 840 820 900 870 700 800 1200 950 1080

n Mean StDev SE Mean before 10 877 151 48 after 10 888 155 49 Difference 10 -11.00 20.25 6.40

Difference = mu before - mu after

Two-sample T for before vs after Estimate for difference: -11.0 95% CI for difference: (-155.5, 133.5) T-Test of difference = 0 (vs not =): T-Value = -0.16 P-Value = 0.874 DF = 17

Two-sample T for before vs after Estimate for difference: -11.0 95% CI for difference: (-154.9, 132.9) T-Test of difference = 0 (vs not =): T-Value = -0.16 P-Value = 0.874 DF = 18 Both use Pooled StDev = 153

Paired T for before - after 95% CI for mean difference: (-25.48, 3.48) T-Test of mean difference = 0 (vs not = 0): T-Value = -1.72 P-Value = 0.120

Assume all assumptions for the above tests are satisfied unless that assumption is specifically asked. Answer questions 21 to 25 using this information.

21. Is there sufficient evidence to conclude that the preparatory course is effective in raising scores? (a) Yes (b) No Because H0 is not rejected with 0.05 significance which tells us there are no differences between the true averages before and after the preparatory course. th 22. Which of the following is the point estimate for before  after ?(Hint: i is the i true mean) (a) -11 (b) 11 (c) 877 (d) 888 (e)1765

_ _   _ th 23. Which of the following is the estimated variance of  x before  x after  ? (Hint: is the i sample mean)   xi (a) 6.40 (b) 20.25 (c) 40.96 (d) 410.06 (e) 4682.60 _ _ 2 2 2 2    before  after sbefore safter Var x before  x after    then the estimate is    m n 10 10 24. Which of the following is the test statistics for comparing the variances of the SAT scores before and after the preparatory course? (a) -1.72 (b) -0.16 (c) 0.97 (d) 0.95 (e) 20.25 2 sbefore 2 2 F  2 when testing H 0 : before   after safter 25. Which of the following will be the degrees of freedom for testing the difference between the true means of SAT scores before and after the preparatory course? (Hint: Use the fact of the samples being independent or dependent to answer the question) (a) 9 (b) 10 (c) 17 (d) 18 (e) 20 STAT 211 EXAM 4- FORM A FALL 2004

In dependent samples (because the same 10 students took the SAT before and after), degrees of freedom is n-1=10-1