<<

BE540 Introductory Biostatistics 2005 Take Home Examination III

BE540 Introductory Biostatistics Take Home Examination III Units 6 and 7 – Estimation and Testing SOLUTIONS

1. (10 points total) Laboratory tests of bacterial counts are often used for declaring a water source “polluted”. Suppose that the distribution of bacterial counts in a taken from Lake Quinsigamond is normally distributed with a of 9,000,000.

a. (5 points) Suppose 25 samples were taken over the course of July 2004 and yielded a count of 11,500. Construct an 80% estimate of the unknown mean bacterial count in this pond at this time.

Answer: (10,738.8, 12,269.2) Solution: Estimate = x =11,500

2 SE(Estimate)= σ = 9,000,000 / 25 =600 n

Multiplier= Z .90 =1.282

CI= x ± (Z .90 ) SE( x )

=11,500 ± (1.282)(600)

= 10,730.8, 12, 269.2

Z:\bigelow\teaching\web540\docu\unit6 and 7_solutions Page 1 of 11 BE540 Introductory Biostatistics 2005 Take Home Examination III

b. (5 points) One year later, in July 2005, the Massachusetts Department of Environmental Quality Engineering (DEQE) took another sample and noted a bacterial count of 15,000. In your opinion, is this evidence of a effect?

Answer: No Solution: No. Because associated significance level is 0.12

Let X be 1997 bacterial count distribution

H φ : µ =11,500

H A : µ >11,500 (one sided)

p-value= Prob[ X≥ 15,000 | H φ is true]

15000− 11500 = Prob[ Normal (0,1) ≥ ] 9,000,000

= Prob[ Normal (0,1) ≥ 1.167]

= 0.1261 ⇒

The single reading of 15,000 has a sufficiently high likelihood occurring if the true mean level is 11,500. There is not enough evidence to suspect a pollution effect. Obviously a sample size> 1 is needed to investigate this question.

2. (10 points total). a. (2 points) True or False. Consider the construction of a 95% confidence interval. Suppose one repeats the process indefinitely. Suppose further that, for each sample drawn, a new 95% confidence interval calculation is performed. If for each sample, the investigator claims that the actual parameter value is contained in the interval, about 95% of his or her statements will be correct.

Answer: TRUE Explanation: By definition, a 95% confidence interval is generated by a process that is correct 95% of the time.

Z:\bigelow\teaching\web540\docu\unit6 and 7_solutions Page 2 of 11 BE540 Introductory Biostatistics 2005 Take Home Examination III

b. (2 points) True or False. A hypothesis test for which the type I error occurs with probability α has probability of type II error equal to (1 - α).

Answer: FALSE Explanation: Type I error= Pr[ reject Hφ when Hφ is true] whereas

Type II error= Pr[accept Hφ when H A is true]

c. (2 points) True or False. If a one sided test indicates that the null hypothesis can be rejected at the 5% level, then a two sided test performed on the same set of is necessarily significant at the 5% level.

Answer: FALSE Explanation: If a one-sided test has an achieved significance level of .05, Then the same test, carried out as two-sided, has an achieved level of significance (2)(.05)= .10

d. (2 points) True or False. For a given sample variance s2 and sample mean X , a 90% confidence interval for an unknown mean µ is narrower than a 99% confidence interval.

Answer: TRUE Explanation: This is true because the magnitude of the SE multiplier in a 90% confidence interval is larger than the magnitude of the SE multiplier in a 99% confidence interval. Hint: Recall that the multiplier in a 100% confidence interval would have to be infinity!

e. (2 points) True or False. An investigator is performing a t-test for which the assumptions are satisfied could, in the absence of a student’s t-distribution tables, use a Normal(0,1) probability provided the degrees of freedom is sufficiently large.

Answer: TRUE Explanation: As degrees of freedom > 30, the t-distribution gets more and more similar to the Normal(0,1)

Z:\bigelow\teaching\web540\docu\unit6 and 7_solutions Page 3 of 11 BE540 Introductory Biostatistics 2005 Take Home Examination III

3. (10 points total) In the planning of a large scale study, pilot data, comprised of a sample of size 24 is drawn from a normal distribution with known variance σ2 = 6.4512 and unknown mean µ. The

observed mean Xwasn Xn=24 = 2.66. In mounting the large scale study, how large a sample size would have to be drawn to estimate the unknown mean to within plus or minus .14 with 90% confidence? Note: “Plus or minus .14” is interpreted to mean that the total width of the confidence interval is 2 x .14.

Answer: 891

Solution: Solution: Width of a 90% CI, because Z .95 =1.645 is given by = ( Upper end value) – ( Lower end value) = [ x +(1.645) σ / n ]- [ x -1.645) σ / n ] =(Z)(1.645) σ / n Since total width=(Z)(0.14) we have (2)(0.14)= (2)(1.645) σ / n ⇒ n = (1.645) 22σ /(0.14) 2 ⇒ n = (138.0625) σ 2 Since σ 2 =6.4512 ⇒ n = (138.0625) (6.4512) = 890.6

Z:\bigelow\teaching\web540\docu\unit6 and 7_solutions Page 4 of 11 BE540 Introductory Biostatistics 2005 Take Home Examination III

4. (10 points total) According to the theory of plate tectonics, the California coast (including Los Angeles) is carried on one plate and the continental United States (including Sacramento) is carried on another plate. These two cities are moving relative to one another in such a way that Los Angeles gets about an inch closer to Sacramento every year. A hypothetical is underway to check this theory, by measuring the distance between Los Angeles and Sacramento once a year for 50 years. The first 25 have an average of 31,996,832 inches with a of 40 inches. The next 25 measurements average out to be 31,996,806 with a standard deviation of 40 inches. Carry out the appropriate hypothesis test to check the proposed plate tectonics theory. You may assume normality.

Answer: A one sided two sample t-test (t=2.30, df=48) provides statistically significant evidence that Los Angeles is getting closer to Sacramento over time. Achieved level of significance (p-value) = 0.013 95% CI of 25 year change in distance ( inches) = (3.26, 48.7) Solution: Let x = ave of 1st 25 measurements y = ave of 2nd 25 measurements Assumptions: Since observed SDs are equal, we can assume equality of variance. Will also assume normality so the 2 2 x ∼ Normal ( µx , σ /25) and y ∼ Normal ( µy , σ /25)

Ho and Ha : H φ : µx < µy (1 sided) 22 2 ()24SS12+ () 24 2 Preliminary:S pool = =(40) 48 ()X-Y -() 0 Test : t = with df=48 SE() X-Y ()X-Y 31,996,832-31,996,806 = = 22 ⎛⎞11 SSpp 2 ⎜ ⎟ + 40⎜ + ⎟ 25 25 ⎝⎠25 25 = 2.30

p- value=Prob[t2.30df=48 ≥ ] =0.0129

Statistical decision:0.013 is not considered with Ho. Reject.

95% CI for µ-µXy= (X-Y) ±() t.975;df=48 SE(X-Y) =(26) ± (2.01)(11.313709) =(3.259446,48.740554)

Z:\bigelow\teaching\web540\docu\unit6 and 7_solutions Page 5 of 11 BE540 Introductory Biostatistics 2005 Take Home Examination III

5. (10 points total) An investigator performed 40 tests on data from nine patients who received treatment A and eight patients who received treatment B. Each of the 40 tests was a student t-test and was independent of the other tests. Three of the 40 t-statistic values were greater than t0.025,df=15 = 2.13. How many significant results would you expect to see if the type I error for each was 0.05 and if, in truth, there are no differences between the two treatments?

Answer: 2 Solution: If the 40 tests are mutually independent ,all with α =0.05 And if Ho is true in every instance Then the expected number of significant results=(40) []0.05 =2

______

6. (10 points total) The objective of an experiment by Buckner et al was to study the effect of pancuronium- induced muscle relaxation on circulating plasma volume. Subjects were newborn infants weighing more than 1700 grams who required respiratory assistance within 24 hours of birth and met other criteria. Five infants paralyzed with pancuronium and seven nonparalyzed infants yielded the following statistics on the second of three measurements of plasma volume (ml) made during mechanical ventilation.

Group n Sample Mean Sample Standard Deviation Paralyzed 5 48.0 8.1 Non-paralyzed 7 56.7 8.1 Compute a 95% confidence interval for µ1 - µ2. State all necessary assumptions and interpret your results.

Z:\bigelow\teaching\web540\docu\unit6 and 7_solutions Page 6 of 11 BE540 Introductory Biostatistics 2005 Take Home Examination III

Answer: ( -19.27, +1.87 ) Assumptions: (1) normality and (2) common variance Interpretation: The confidence interval includes zero. With 95% confidence the data are consistent with the inference of no difference in mean plasma volume at 12-24 hours among the two groups.

Solution:

Point estimate of µ1-µ2: (48.0 – 56.7) = -8.7

22 (5−+− 1)(8.1) (7 1)(8.1) 2 Point estimate of σ2: σˆ 2 ==()8.1 (5−+ 1) (7 − 1)

⎡⎤(8.1)22 (8.1) Estimated SE of point estimate of µ1-µ2: ⎢⎥+=4.7429 ⎣⎦57

Confidence coefficient = t.975;df=10 = 2.2281

95% confidence interval for µ1-µ2:

c XX1−± 2h t.; 975df = 10 seXXc 1 − 2 h = −±8.7( 2.2281)( 4.7429) = (.,.)−19 27 187

7. (10 points total) Researchers at Wake Forest Medical Center are interested in assessing the effect of chronic lecithin use on total blood cholesterol. A pilot study is performed. Nine volunteers, all middle aged men with hypercholesterolemia, agree to have their total blood cholesterol measured prior to and following a six month regimen of lecithin intake. The results for total blood cholesterol (CH) level in mg/dl are:

Volunteer, ID 1 2 3 4 5 6 7 8 9 Pre-Regimen CH 239 245 234 253 267 239 248 242 233 Post-Regimen CH 232 254 221 263 259 238 243 249 236

Assume that total blood cholesterol values follow a normal distribution. Construct a 95% confidence interval estimate of the unknown mean change in blood cholesterol that might be attributed to prolonged lecithin use and comment on your results.

Z:\bigelow\teaching\web540\docu\unit6 and 7_solutions Page 7 of 11 BE540 Introductory Biostatistics 2005 Take Home Examination III

Answer: ( -6.91, +5.80) Comment: As the interval contains zero, with 95% confidence, the data are not consistent with an association between lecithin use and cholesterol lowering.

Solution: The setting is one population, paired data.

d −5 d ===−∑ i 0.556 n 9

2 dd− 2 ∑()i 544.222 s ===68.03 d n −18

Assume (1) normality and (2) common variance

Point estimate of µd: = -0.556

2 Lsd O L68. 03O Estimated SE of point estimate of µd: M P = = 2. 7493 N n Q NM 9 QP

Confidence coefficient = t.975;df=8 = 2.31

95% confidence interval for µd:

cdth ± .;975df = 8 sedc h = −±0.556() 2.31( 2.7493) = (− 6.91,5.80)

8. (10 points total) It has been hypothesized that the levels of creatinine phosphokinase (CPK) tend to be higher in men with large muscle mass. To address this question, the following values of CPK were obtained from 10 healthy college age football players. These values are measured in international units (IU). You may assume that this sample is a sample from a Normal distribution.

Player, ID 1 2 3 4 5 6 7 8 9 10 CPK Value 94 146 42 89 149 158 128 136 115 108

Z:\bigelow\teaching\web540\docu\unit6 and 7_solutions Page 8 of 11 BE540 Introductory Biostatistics 2005 Take Home Examination III

(a) 5 points The sample mean and sample standard deviation are 116.5 IU and 35.0 IU, respectively. Construct a 95% confidence interval of the unknown true mean CPK value, based on these data.

Answer: ( 91.46, 141.54)

Solution: The setting is one population, variance not known.

x 1165 x=∑ i = =116.5 n10

2 ()x-x 11,025 s2 =∑ i = =1225 n-1 9

Point estimate of µ: = 116.5

⎡⎤2 Estimated SE of point estimate of µ s ⎡⎤1225 : ⎢⎥==⎢⎥11.07 ⎣⎦n ⎣⎦10

Confidence coefficient = t.975;df=9 = 2.262

95% confidence interval for µd:

()x±t⎣⎦⎡⎤.975;df=9 sex() = 116.5± ()() 2.262 11.07 = (91.46,141.54)

(b) 5 points Suppose it is hypothesized that the “normal ” of CPK levels in adult men this age is 5 to 50 IU. State the null and alternative hypotheses that are required for testing the consistency of the observed sample mean of 116.5 IU with the upper value of this range. Then carry out a statistical test of this hypothesis with an appropriate statistic and interpret your findings.

Answer: Null: µ < 50 IU Alternative: µ > 50 IU, ONE SIDED A one sided one sample t-test (t=, df=9) yields observed t-value = 6.0072 Achieved level of significance (p-value) = .0001 Interpretation : The null hypothesis is rejected. These data provide evidence that CPK levels are higher in men with relatively greater muscle mass.

Z:\bigelow\teaching\web540\docu\unit6 and 7_solutions Page 9 of 11 BE540 Introductory Biostatistics 2005 Take Home Examination III

Solution: The setting is one population, variance not known.

From part (a) we already have x = 116.5

SE(x) = 11.07

Test statistic is t x-µ 116.5 - 50 t =Null = =6.0072 df=9 se(x) 11.07

Achieved Significance level is p-value

p-value = Probability [ tdf=9 ≥ 6.0072] = .0001

9. (10 points total) It is known that IQ’s are distributed normal in the general population with a mean of 100 and a standard deviation of 15. Suppose you identify a large group of children who received special education at a very young age and suspect that the IQ’s of these children are higher. You obtain a random sample of 25 of these children, measure their IQ’s, and observe a sample mean of 106. What is the p-value associated with this mean? What test did you perform and what assumptions were necessary in carrying out this test?

Answer: Achieved level of significance (p-value) = .0228 Test performed: One sample z-test, one sided Assumptions: (1) Normality (2) Known variance σ2 = 152 Null: µ < 100 in population of children receiving special education Alternative: µ > 100, ONE SIDED z- = 2 Interpretation : Reject the null hypothesis. Conclude there is evidence that the mean IQ in the population of children who receive special education is greater than 100.

Solution: x = 106

SE(x) = 15/ 25= 3

x-µ 106 - 100 z=Null = = 2 se(x) 3 p-value = Probability [ z-score ≥ 2] = .0228

Z:\bigelow\teaching\web540\docu\unit6 and 7_solutions Page 10 of 11 BE540 Introductory Biostatistics 2005 Take Home Examination III

10. (10 points total)

(a) 5 points A report from a concludes that the “difference between treatments is not statistically significant, p > ,05”. What caveat/s must accompany this conclusion? Why is it potentially wrong to interpret this as meaning that there is no clinically important difference between treatments?

Answer: Computation of the corresponding 95% confidence interval estimate might reveal a wide interval that includes clinically important differences as well as non-clinically important differences. One reason for an uninformatively wide confidence intervals is a study sample size that is too small.

(b) 5 points Interpret p-value < .05 and p-value < .01. Given identical trial size, which p-value gives stronger evidence against the null hypothesis? Explain your answer,

Answer: The p < .01 is less consistent with the assumption of the null hypothesis than the p < .05. Thus, the p < .01 gives stronger statistical evidence against the null hypothesis.

Z:\bigelow\teaching\web540\docu\unit6 and 7_solutions Page 11 of 11