MATH 376 – Final Exam Sample Solutions May 11, 2018

MATH 376 { Final Exam Sample Solutions May 11, 2018 The final exam will be close to the length of this sample. It will contain problems from all five chapters we covered. Your \cheat sheet" can be up to three sheets of 8.5 x 11 paper with notes on both sides. The test will come with the tables as we've done previously. 1. Suppose that X1, X2;:::Xm and Y1;Y2;:::;Yn are independent random samples, with 2 the variables Xi normally distributed with mean µ1 and variance σ1, and the variables 2 Yi normally distributed with mean µ2 and variance σ2. (a) What is the distribution of X − Y ? The difference in sample means is a linear combination of the Xi and Yj, normally distributed random variables, so the it is also normally distributed. (b) Find E(X − Y ). The expected value is m n 1 X 1 X E(X − Y ) = µ − µ = µ − µ m 1 n 2 1 2 i=1 j=1 (c) Find V (X − Y ). The similar calculation for variance shows that m n X 1 X 1 1 1 V (X − Y ) = σ2 + σ2 = σ2 + σ2 m2 1 n2 2 m 1 n 2 i=1 j=1 2 2 (d) Suppose that σ1 = 4, σ2 = 3:2 and m = n. Find the sample size so that X − Y will be within 0.5 units of µ1 − µ2 with probability 0:90. 1 7:2 With these values, V (X − Y ) = n (4 + 3:2) = n . We want to find n so that ! −0:5 (X − Y ) − (µ − µ ) 0:5 P ≤ 1 2 ≤ = 0:90 p7:2=n p7:2=n p7:2=n The Z-score for the two sided probability of 0.90 is Z = 1:645. Solve 0:5 = 1:645 p7:2=n p 2 1:645 7:2 for n. Since 0:5 = 77:93, use n = 78. 2. Environmental Protection Agency air quality standards for carbon-monoxide are 35 parts per million (ppm) averaged over one hour. In a large city the average values for carbon monoxide is 15 ppm with a standard deviation of 10. 1 (a) Do you think that carbon monoxide concentrations in air samples from this city are normally distributed? Why or why not? To be normally distributed, we look for almost all the data occurring with two standard deviations of the mean and to have a mound shaped symmetric distribution about the mean. In this case, the values are positive and 0 is 1.5 standard deviations from the mean, the data cannot be symmetrically distributed about the mean. We conclude the data are not normally distributed. (b) The EPA guidelines say that the air quality standard of 35 ppm should be reached no more than once a year. Find the probability that the 35 ppm threshold will be exceeded in 100 randomly selected samples. 35−p15 20 Calculate the Z-score: σ= n = 10=10 = 20. The probability of this occurring is 0. 3. Let Y1;:::;Yn denote a random sample of size n from a population with a uniform 1 distribution on the interval (0; θ). Let Y (n) = maxfY1;Y2;:::;Yn) and U = θ Y (n). (a) Show that U has distribution function 8 0; u < 0 < n FU (u) = u ; 0 ≤ u ≤ 1 : 1; u > 1 The distribution function for the order statistic Y(n), which is the maximum of the Yi is the nth power of the distribution function for Yi. For a uniform distribution on the interval (0; θ), the density function 1 is fy(y) = θ for 0 < y < θ. The distribution function FY is 8 < 0; y < 0 FY (y) = y/θ; 0 ≤ y ≤ θ : 1; y > θ The distribution function for Y(n) will be the nth power of FY , 8 0; y < 0 < y n Gn(y) = ( θ ) ; 0 ≤ y ≤ θ : 1; y > θ 1 Then the distribution function for U = θ Y (n) will be FU (u) = P (U ≤ u) = P (Y(n) ≤ θu) = Gn(θu) for 0 ≤ u ≤ 1. (b) Is U a pivotal quantity for θ? Since the distribution of U does not depend on θ, it is a pivotal quantity for θ. 2 (c) Find a 95% lower confidence bound for θ. We want a value for a so that Y P ( (n) ≤ a) = F (a) = 0:95 θ U n 1=n −1=n Thus a = 0:95, or a = 0:95 . The lower confidence bound is Y(n)0:95 . 4. Two new drugs were given to patients with hypertension. The first drug lowered the blood pressure of 16 patients an average of 11 points, with a standard deviation of 6 points. The second drug lowered the blood pressure of 20 other patients an average of 12 points, with a standard deviation of 8 points. Determine a 95% confidence interval for the difference in the mean reductions in blood pressure. Assume the measurements are normally distributed with equal variances. Use a pooled sample variance: 15 · 62 + 19 · 82 s2 = = 51:65: p 16 + 20 − 2 The Z-score for 0.95 is 1.96, so the interval is r r 2 1 1 1 1 µ1 − µ2 ± 1:96 sp( + = 11 − 12 ± 1:96 51:65 + = −1 ± 4:72 n1 n2 16 20 Or as an interval, (−5:72; 3:72). 5. A precision instrument is guaranteed to read accurately to within 2 units. A sample of four instrument readings on the same object yielded the measurements 353, 351, 351, and 355. Find a 90% confidence interval for the population variance. What assumptions are necessary? Does the guarantee seem reasonable? With n = 4 and measurements 353, 351, 351, and 355, the sample mean is 352.5 and the sample variance is s2 = 3:67. For a 90% confidence interval, we 2 2 2 need to look up χ0:95 and χ0:05 for three degrees of freedom: χ0:95 = 0:351846 2 and χ0:05 = 7:81473 The 90% confidence interval is: (n − 1)s2 (n − 1)s2 33:67 33:67 2 ; 2 = ; ≈ (1:4; 31:3) χ0:05 χ0:95 7:81473 0:351846 Of course, to do this we must assume that the measurements were independent and normally distributed. This interval is sufficiently large that the variance could be larger than 25 so the standard deviation could be larger than 5. So it is possible that the accuracy is larger than two units. 6. Suppose that Y1;:::;Yn is a random sample from a probability density function in the (one-parameter) exponential family. That is, a(θ)b(y)e−c(θ)d(y) a ≤ y ≤ b f(yjθ) = 0 otherwise 3 where a and b do not depend on θ. Show that n X d(Yi) i=1 is sufficient for θ. We use the method of factoring the likelihood function L(θ) = L(y1; : : : ; ynjθ). n Y L(θ) = f(yijθ) i=1 n Y −c(θ)d(yi) = a(θ)b(yi)e i=1 n ! Pn n Y −c(θ) d(yi) = a(θ) b(yi) e i=1 i=1 n n −c(θ)u Y = a(θ) e b(yi) i=1 = g(u; θ)h(y1; : : : ; yn) Pn where u = i=1 d(yi). By the factorization theorem, u is sufficient for θ. 7. A binomial experiment consisting of n trials resulted in observations y1; : : : ; yn, where yi = 1 if the ith trial was a success and yi = 0 otherwise. (a) What is the likelihood function L(p) of the observed sample? The probability for a single trial is pyi (1−p)1−yi , so the likelihood function for n trials is n Pn Pn Y yi 1−yi yi n− yi y n−y L(p) = L(y1; : : : ; ynjp) = p (1−p) = p i=1 (1−p) i=1 = p (1−p) i=1 Pn where y = i=1 yi. Pn (b) What are the possible values of y = i=1 yi? The values are 0; 1; : : : ; n. (c) Find the value of p that maximizes L(p). (Hint: Consider the extreme cases of y separately.) The extreme cases are y = 0 and y = n. If y = 0, then L(p) = (1 − p)n and if y = 1, then L(p) = pn. In the first case L(p) is maximized if p = 0 and in the second it is maximized if p = 1. 4 For other values of n, differentiate the logarithm of L(p) = py(1 − p)n−y with respect to p. We have d ln(L(p)) d y n − y = (y ln(p) + (n − y) ln(1 − p)) = − = 0 dp dp p 1 − p y Solve for p in terms of y to getp ^ = n . It turns out that for y = 0 y and y = n, the the maxima occur at n as well. So the the maximum Y likelihood estimator isp ^ = n . 8. Two different companies have applied to provide cable television service in Worcester. Let p denote the proportion of all potential subscribers who favor the first company over the second. Consider testing H0 : p = 0:5 versus Ha : p 6= 0:5 based on a random sample of 25 individuals. Let the random variable X denote the number in the sample who favor the first company and x represent the observed value of X. (a) Which of the following rejection regions is most appropriate and why? R1 = fx : x ≤ 7 or x ≥ 18g R2 = fx : x ≤ 8g R3 = fx : x ≥ 17g R1 is most appropriate since it tests for any preference in either direction.

MATH 376 – Final Exam Sample Solutions May 11, 2018

STATS 305 Notes1

Pivotal Quantities with Arbitrary Small Skewness Arxiv:1605.05985V1 [Stat

Stat 3701 Lecture Notes: Bootstrap Charles J

Interval Estimation Statistics (OA3102)

Elements of Statistics (MATH0487-1)

Confidence Intervals for a Two-Parameter Exponential Distribution: One- and Two-Sample Problems

1. Preface 2. Introduction 3. Sampling Distribution

Confidence Intervals and Nuisance Parameters Common Example

On Interval Estimation for Exponential Power Distribution Parameters

Comparison of Efficiencies of Symmetry Tests Around Unknown

Median Confidence Regions in a Nonparametric Model

Bootstrap (Part 3)