1) Chicago Title Company Problem

STAT500 HW#5_solutions

STAT500 HW # 5 Solutions

1) Chicago Title Company problem 0 = 0.831; because n0 = 2544*(0.831) = 2114.1 > 5, n*(1 - 0) = 2544*(1 - 0.831) = 429.9 > 5, thus the one-proportion z-test can be used. Ho: π = 0.831; Ha: π  0.831 α = 0.024; two-sided test ˆ = 2081/2544 = 0.818 0.818 - 0.831 Test statistic z = = -1.749 (0.831* 0.169)/2544 P-value= 2*P (z > |-1.749|) = 2*(1-0.9599) = 2*0.0401 = 0.0802 >  =0.024 We fail to reject Ho (since p-value >  = 0.024). Based on the observed data, there is not sufficient evidence to conclude that this year’s percentage of home buyers purchasing single-family houses is different from the 1995 figure of 83.1%.

2) a. N=18<30, use normal probability plot to see if the data is normal distribution. Normal Probability Plot for Volume (cu.f

99 ML Estimates

Mean: 1718.33 95 StDev: 133.884 90

80

t 70

n 60 e

c 50 r 40 e

P 30 20

10 5

1

1300 1400 1500 1600 1700 1800 1900 2000 2100 Data As the plot suggests, we can observe that all points fall inside the confidence bands, so there is no evidence to suggest that the data do not come from a normal distribution. We can use one-sample t-test in this problem. Ho: µ 1600; Ha: µ>1600 α = 0.05; one-sided right-tailed test y =1718.3, s=137.8 y   1718.3 -1600 t =  = 3.64 s / n 137.8 / 18

P-value= P(t > 3.64) = 1 – P(t < 3.64) = 1-0.999=0.001. STAT500 HW#5_solutions

Cumulative Distribution Function Student's t distribution with 17 DF t P(T <= t) 3.6400 0.9990

Since P-value = 0.001 <  = 0.05, we can reject H0 at level 0.05. Based on the observed data, there is sufficient evidence to conclude that the average amount of recycled paper is greater than 1600 cubic feet per 2-week period.

b. A 95% confidence interval for µ is given by

y + tα/2*(s / n ) = 1718.3 ± (2.11*137.8/ 18 ) = (1649.725, 1786.875).

3) We have independent samples. Furthermore, the two populations (Type I devices and Type II devices) appear to be normally distributed (see below):

Normal Probability Plot for Type I devic...Type II devi ML Estimates - 95% CI

Type I devic 99 Type II devi

95 Goodness of Fit 90 AD* 1.311 80 1.259

t 70

n 60 e

c 50 r 40 e

P 30 20

10 5

1

0.9 1.0 1.1 1.2 1.3 1.4 Data

Also, the standard deviations for both populations are reasonably different (s1 is twice the size of s2; s1/ s2 = 0.06360/0.03057 = 2.08). Therefore, we will use a two-sample t- test with non-pooled variances (in Minitab).

Two Sample T-Test and Confidence Interval (at 99% C.I.)

Two sample T for Type I vs Type II N Mean StDev SE Mean Type I 10 1.2360 0.0636 0.020 Type II 10 0.9970 0.0306 0.0097 STAT500 HW#5_solutions

99% CI for mu Type I - mu Type II: (0.171,0.3072) T-Test mu Type I = mu Type II (vs >): T = 10.71 P = 0.0000 DF = 12

We want to test the following hypotheses (at  = 0.01):

Ho: I - II = 0 ; Ha: I – II > 0. From Minitab, we obtain P-values = 0 (with t = 10.71; df = 12) <  = 0.01. So we can reject Ho and conclude that the mean level of emission for Type I devices (μ1) is greater than Type II devices (μ2).

4) a. The value of the pooled-variance t-statistic, t = -4.04. b. The value of the non-pooled variance t’-statistic, t’ = -3.90. 5) a. The sample is a paired sample since for each sample there is a variety A and a variety B on each farm. We have the number of pairs (n = 7) < 30, we need to check whether or not the differences are normally distributed.

Probability Plot of d Normal - 95% CI

99 Mean 4.429 StDev 2.387 95 N 7 A D 0.839 90 P-Value 0.015 80 70 t

n 60 e

c 50 r

e 40 P 30 20

10

5

1 -5 0 5 10 15 d

The above normal probability plot indicates that the differences of pairs from two varieties may NOT result from populations with normal distributions since the p- value is less than 0.05.

Paired T-Test and Confidence Interval Paired T for Variety A - Variety B N Mean StDev SE Mean Variety A 7 48.01 4.59 1.74 Variety B 7 43.59 3.53 1.34 Difference 7 4.429 2.387 0.902 STAT500 HW#5_solutions

95% CI for mean difference: (2.220, 6.637) T-Test of mean difference = 0 (vs not = 0): T-Value = 4.91 P-Value = 0.003

The hypothesis is Ho: μd = 0, Ha: μd ≠ 0 (two-tailed test) at  = 0.05. From Minitab, because the P-value = 0.0003 <  = 0.05, we reject Ho and conclude that there is sufficient evidence to suggest that the mean yields for two varieties of corn are different. However, since the normal assumption is in question we may want to consider alternate methods that do not involve a test of means.

b. The mean difference in yields between the two varieties is 4.429 bushels with a 95% C.I. (2.220, 6.637). The estimate of the size of the differences in the mean yields of the two varieties is between 2.220 and 6.637.

6) The data are paired since twins are “naturally” paired. So the paired t test is used.

a. The hypotheses are Ho: a - n = 0, Ha: a – n ≠ 0 (two-tailed test). From the

output, we can reject Ho (t = 4.95; P-value = 0) at  = 0.05, and conclude that the mean final grades are different for academic versus non-academic emphasis. b. The mean difference between final grades of the students in academic and non- academic home environments is (from the output) 3.800. The 95% confidence interval is (2.230, 5.370). The size difference in the mean final grades is estimated to be between 2.230 and 5.370. c. The conditions appear to be satisfied for the use of a paired t-procedure. For example, the normality plot and boxplot for the differences suggest the differences are normally distributed. d. It appears that using twins in this study to control for variation in the final scores was effective as compared to taking a random sample of 30 students in both types of home environments. Justification is provided by rejection of the null hypothesis in the paired-procedure (controlling for variation) and a failure to reject the same hypothesis in an independent two-sample t-procedure.

7)

For program A, y A = 51.50, SA = 10.46, nA = 10;

For program B, yB = 38.00, SB = 8.67, nB = 16. 2 2 (nA 1)S A  (nB 1)S B a) S p  ≈ 9.38; df = nA + nB –2 = 24,  = 0.01. nA  nB  2 1 1 b) The 99% CI for (is (y A  yB )  tdf , / 2  S p    = (51.50 – nA nB 1 1 38.00) ± 2.797  9.38   = 13.50 ± 10.58 = (2.92, 24.08), where tdf, /2 = 10 16

t24, 0.005 = 2.797. STAT500 HW#5_solutions

c) Using the interval (from part b) and Ho:  Ha: we see that zero is not contained in the 99% CI above. This implies one can reject Ho (equal means) at α = 0.01 significance level.

8) a. Part i): The null hypothesis is “There is no difference in the amount spent on campaigns between males and females”, which means the difference between female campaign expenditures and male expenditures is equal to 0. The alternative hypothesis is “Female candidates spend less on their campaigns than male candidates”, i.e. the difference between female campaign expenditures and male expenditures is less than 0.

Part ii): Ho: μf μm  0, Hα: μf μm < 0 (one-sided left-tailed test). b. We have independent samples because the samples were randomly selected, not related. Furthermore, According to the normality plot, there is no evidence that the samples are not drawn from a normal distribution (see chart below). Normal Probability Plot for Female...Male ML Estimates - 95% CI

Fem ale 99 M ale

95 Goodness of Fit A D* 90 0.894 80 0.711

t 70

n 60 e

c 50 r

e 40

P 30 20

10 5

1

100 200 300 400 500 Data

In addition, for variance similarity, Sf / Sm = 51.5 / 61.5 = 0.84  Sf  Sm (0.84 > 0.707)  a pooled variance can be used. Therefore, we use the pooled t-test procedure (using Minitab). STAT500 HW#5_solutions

Two sample T for Female vs Male N Mean StDev SE Mean Female 20 245.3 52.0 12 Male 20 351.0 61.9 14

95% CI for mu Female - mu Male: (-142, -69) T-Test mu Female = mu Male (vs <0): T = -5.85 P = 0.0000 DF = 38 Both use Pooled StDev = 57.2

Mean Difference = 245.3 – 351.0 = - $105,700 dollars with %95 CI of (-$142,000 and -$69,000). We are 95% confident that female candidates spend between $69,000 and $142,000 less than male candidates during their campaigns for public office.

c. Yes. At α = 0.05 level, since P-value of the pooled t statistics (t = -5.85) is less than 0.05 (P = 0.000 from output), thus we can reject the null hypothesis and the difference is significant. We conclude that there is significant evidence that the mean female campaign expenditures are less than mean male candidates expenditures.

d. Yes, this difference is of practically significant since the difference could be as much as $142,000.

9) Current Population Report problem Since the data are from couples, it is paired sample data. The sample size is not large, thus we need to check the normal assumption of the difference. Normal Probability Plot for diff ML Estimates - 95% CI

99 ML Estimates Mean 3.3 95 StDev 4.47325 90 Goodness of Fit 80 AD* 1.36

t 70

n 60 e

c 50 r 40 e

P 30 20

10 5

1

-10 0 10 Data STAT500 HW#5_solutions

This plot suggests that the differences (husband & wife) are normally distributed. We can, therefore, proceed with a one-sample paired t-test for the mean difference

(d). We want to test Ho: d = 0, Ha: d > 0 where d = husband (mean age) – wife (mean age) at α = 0.05. (Note: This is one-sided right-tailed test.)

Paired T for Husbands - Wives

N Mean StDev SE Mean Husbands 10 49.40 19.20 6.07 Wives 10 46.10 17.69 5.59 Difference 10 3.30 4.72 1.49

95% CI for mean difference: (-0.07, 6.67) T-Test of mean difference = 0 (vs > 0): T-Value = 2.21 P-Value = 0.027

From Minitab output, the data suggest we should reject Ho (t = 2.21; df = 9; P-value = 0.027 <α = 0.05), and conclude that the mean age of married men is higher than the mean age of married women.

10) a. S1 = 3.60, S2 = 5.96, where 1 = portfolio1, 2 = portfolio2. So visually it shows that portfolio2 has a higher risk than portfolio1.

To formally test, we want to conduct the following hypothesis test, Ho: σ²1 = σ²2, Ha: σ²1 < σ²2 (one-sided left-tailed test). Use α = 0.05.

Homogeneity of Variance

Response Returns Factors Portfolio ConfLvl 95.0000

Bonferroni confidence intervals for standard deviations

Lower Sigma Upper N Factor Levels 2.35242 3.59629 7.2417 10 Portfolio 1 3.89800 5.95912 11.9996 10 Portfolio 2

F-Test (normal distribution) Test Statistic: 2.746 P-Value : 0.148

Levene's Test (any continuous distribution)

Test Statistic: 3.512 P-Value : 0.077

Assuming normality, it is correct to make inference based on the F-Test. In this case we cannot reject Ho (F = 0.364; p = 0.074) at α = 0.05 level. P-value for one- STAT500 HW#5_solutions

sided test is (p-value for two-sided test / 2) = 0.148/2 = 0.074. Therefore, we conclude there is not enough evidence to suggest that portfolio2 has a higher risk than portfolio1.

Note, if the normality assumption is unreasonable (part (c)), we should use Levene’s Test.

c. Both samples are normally distributed. Therefore, the required conditions have been met for the inferences.

Normal Probability Plot for Portfolio 1...Portfolio 2

Portfolio 1 99 Portfolio 2

95 90

80 70 t

n 60 e

c 50 r 40 e

P 30 20

10 5

1

120 130 140 150 160 Data

11)

a. Expected counts are given as the following table.

0 – 4.5 4.5 – 9 9 – 13.5 13.5 0 – 2.5 17.902 20.971 23.784 26.342 2.5 – 5 24.138 28.276 32.069 35.517 5 – 7.5 16.695 19.557 22.181 24.566 7.5 11.264 13.195 14.966 16.575

b. The testing hypotheses are Ho: “length of time on first job” is unrelated to “amount of education, Ha: “length of time on first job” is related to “amount of education”. We use the “chi-square” statistic to test the hypotheses. STAT500 HW#5_solutions

Chi-Square Test

Expected counts are printed below observed counts 0-4.5 4.5-9 9-13.5 13.5 Total 0-2.5 5 21 30 33 89

17.90 20.97 23.78 26.34 2.5-5 15 35 40 30 120 24.14 28.28 32.07 35.52 5-7.5 22 16 15 30 83 16.70 19.56 22.18 24.57 7.5 28 10 8 10 56 11.26 13.20 14.97 16.57

Total 70 82 93 103 348

Chi-Sq = 9.299 + 0.000 + 1.624 + 1.683 + 3.459 + 1.599 + 1.961 + 0.857 + 1.685 + 0.647 + 2.325 + 1.202 + 24.864 + 0.774 + 3.242 + 2.608 = 57.830

DF = 9, P-Value = 0.000

From χ² = 57.83 with d.f. = 9 and p-value = 0.000. Since the p-value is relatively small, we can reject Ho and conclude that “length of time on first job” is related to the “amount of education”.

c. The level of significance for the test is the p-value defined in part (b) = 0.000.

d. Using α = 0.05, again, we can reject Ho (p = 0.000) and make the conclusion drawn in part (b).

12) Learning at Home problem

Social status A few some lots Total Middle 4( 4.36) 13(11.64 ) 15(16 ) 32 working 5( 4.64) 11( 12.36) 18( 17) 34 Total 9 24 33 66 STAT500 HW#5_solutions

Chi-Square Test

Expected counts are printed below observed counts A few Some Lots Total 1 4 13 15 32 4.36 11.64 16.00

2 5 11 18 34 4.64 12.36 17.00 Total 9 24 33 66

Chi-Sq = 0.030 + 0.160 + 0.062 + 0.029 + 0.150 + 0.059 = 0.490 DF = 2, P-Value = 0.783 2 cells with expected counts less than 5.0

Checking Condition (whether 20% of cells with expected counts <5.0): The number of cells with expected counts less than 5 is 2. 2/6=0.33=33%, so we cannot trust the result.

Conclusion: At α = 0.05, fail to reject H0 (P-value=0.783). However, we cannot trust the result because 33% of cells with expected counts <5.0