Coin-Tossing Lab

Total Page:16

File Type:pdf, Size:1020Kb

Coin-Tossing Lab

SOLUTIONS TO FINAL EXAM VERSION 1

1) A) It appears that there is a tendency for violations to decrease as turnover increases, as it says in the article.

B) Since the estimated regression coefficient for turnover is negative, as predicted, the left-tailed p-value for the null hypothesis of no relationship is .088/2=.044 which is less than 5%. So the answer to the question is yes.

C) The relationship is fairly weak, with R-squared=16.1%, indicating that turnover explains only 16% of the variation in violations.

D) Careful examination of the scatterplot shows that the apparent relationship between X and Y is mostly due to the two outliers in the lower right. They perhaps drag the fitted line towards them, so in the residual plot they show up as one positive and one negative residual. But the residuals for these points are not large, perhaps because of the masking effect. Beyond this, the residual plot shows no obvious problem.

2)

A) The CI is –0.03035  tα 2 0.01680. Using df=17 and =.05, we get tα 2 = 2.110. The CI is therefore (–0.065798 , 0.005098). Such an interval would contain the true slope (the coefficient of turnover in the true regression line) in 95% of all random samples of US airports which could be collected.

B) We cannot say what would happen to the detected violation rate since it is a random variable. But we estimate that its expected value would go up by (–100)( –0.03035)=3.035, or 3.035 violations per million passengers..

C) No, since as we move away from the mean value of X, both the CI and PI become wider.

3) A) The boxplots indicate that the detected violation rate is far higher at coastal airports. The p-value for a left-tailed test (that (non-coastal) – (coastal) is negative) is .001/2=.0005, so the mean for coastal airports is significantly higher than that at non-coastal airports, at the 5% level of significance. (Note that .0005 is less than .05).

B) The answer is the p-value above, .0005, that is, 0.05%. C) The boxlots do not show a clear difference in level. The two-tailed t-test gives a p-value of .094, which is not less than .05, so we do not have evidence of a difference in turnover rates at the 5% level of significance.

D) If coastal airports have a higher rate of actual violations, then they will presumably also have a higher rate of detected violations. This would explain why the rate of detected violations is higher for coastal airports. This phenomenon may be completely unrelated to turnover rate, and indeed the mean turnover rate is not significantly different for coastal and non-coastal airports.

4)

A) Since the p-value for the coefficient of turnover is very high, there is no significant relationship between turnover and violations in the multiple regression model.

B) Coast seems to be much more important, with a (2-tailed) p-value of .001. C) The two groups correspond to the non-coastal and coastal airports. Since the coefficient for Coast is so large, the coastal airports receive a much higher predicted violation rate (fitted value). D) The residual DF is 16 in the multiple regression, and /2=.025. The

absolute value of the t-statistic would need to exceed tα 2 =2.120. E) The F-statistic is 11.06, with a corresponding p-value of .001. Thus, there is strong evidence that at least one of the true coefficients in the model is nonzero. So the model seems to be useful. This does not imply that all coefficients are useful (nonzero) so it does not say that turnover is useful. 5)

A) The p-value for turnover is large (and the R-squared is low). There is no significant relationship between turnover and violations here. B) Individually, the points were not influential, as evidenced by their small Cook’s D. Considered as a pair, they were influential, at least in the sense that when they were omitted the p-value for the coefficient of turnover went way up (although the estimated slope actually didn’t change much). So it seems this pair of points have more influence on the standard error than they do on the fit. C) Overall, non-coastal airports tend to have a lower violation rate. It is not too surprising that the two non-coastal airports with the highest turnover rate have a low rate or detected violations, even if there is no real relationship between turnover rate and detected violations. D) No, turnover just seemed important in the original simple regression due to two outliers. From part C) above, we see that we can even explain these outliers without requiring a relationship between turnover and detected violations. Location of the airport (coast) seems to be a much more important predictor of detected violations than the turnover rate. 6) From the first row of the Minitab two-sample T output for violations, sample size is n=10, so that DF=n–1=9 and the sample standard deviation is s=3.66. The s CI for the mean is x  tα 2 so the width is n s 2tα 2  2(2.262)(3.66) / 10  5.236. Answer is C. n x 10 11.83 10 7) The t-statistic is   1.581. Since this does not exceed tα s / n 3.66 / 10 =1.833 (DF=9), we do not reject the null hypothesis. Answer is B.

8) The critical value is 1.833, as seen above. Answer is D.

9) Answer is B (False), since we cannot simply divide the two-tailed p-value by 2 if the estimated coefficient does not come out in the predicted direction. For example, if we are doing a left-tailed test and the estimated regression coefficient is positive, then we cannot just divide the two-tailed p-value by 2.

pˆqˆ 10) We use the confidence interval for a binomial proportion, pˆ  z . We α 2 n (.6)(.4) have pˆ  12 / 20  .6, so qˆ  .4 . The CI becomes .6 1.96 = 20 (.3853, .8147). Answer is B.

11) Since the CI above contains .5, and we are doing a 2-tailed test here, we cannot reject the null hypothesis that the probability of the person making a correct prediction is 1/2. Answer is B.

12) Suppose that  and  are the unknown mean and standard deviation of X. Since the 95’th percentile of a standard normal is 1.645 and the 97.5’th percentile of a standard normal is 1.96, converting to z-scores yields the two equations (6.29–)/=1.645, and (6.92–)/=1.96. Subtracting the first equation from the second yields (6.92–6.29)/ =1.96–1.645, in other words, . 63/=.315. Solving for  yields =.63/.315=2. Answer is D.

x  20 21 20 13) This is a right-tailed test. The t-statistic is   .2366. We s / n 25/ 35 round this to .24. Since the sample size is greater than 30, we can assume that if the null hypothesis were true, the t-statistic would have a standard normal distribution. The p-value is then just the probability that a standard normal exceeds .24, which is .5–.0948=.4052. Answer is A. 14) The events {X>4} and {Y>5} are independent, so the probability they both happen is just the product of their probabilities, (.3)(.2)=.06. Answer is C.

15) There are 36 possibilities for the outcome of tossing two dice.

The 5 outcomes that yield event A are: (2,6),(6,2),(3,5),(5,3),(4,4).

The 11 outcomes that yield event B are: (1,6),(2,6),(3,6),(4,6),(5,6), (6,6),(6,1),(6,2),(6,3),(6,4),(6,5).

The event AB consists of the 2 outcomes (2,6),(6,2). We can now compute P(A)=5/36, P(B)=11/36, P(AB)=2/36. Since P(AB) is not equal to P(A)P(B), the events are not independent. Therefore they are dependent. Answer is C. Note also that A and B are not mutually exclusive, since AB is not empty. Finally, note that A and B are not complements of each other, since to be complementary events they would need to be mutually exclusive and to comprise the entire 36 possibilities when taken together.

Recommended publications