1. Revivew of Normal Distributions (1) True/False Practice: (A) F(X)
Total Page:16
File Type:pdf, Size:1020Kb
MATH 10B DISCUSSION SECTION PROBLEMS 4/11 { SOLUTIONS JAMES ROWAN 1. Revivew of normal distributions (1) True/False practice: 2 (a) f(x) = p 1 e−(x−4) =18 is a PDF for a random variable with mean 4 and standard deviation 3. 2π3 True . Recall that a normal random variable with mean µ and standard deviation σ has PDF 2 2 f(x) = p 1 e−(x−µ) =(2σ ). 2πσ (2) (Stewart/Day 12.5.73) The normal distribution models dispersion along a one-dimensional habitat like a coastline. Suppose the mean dispersal distance is 0 meters and the variance is 2 square meters. (a) What is the probability that an individual disperses more than 2 meters? We have a normal random variable with mean 0 and variance 2, and want to find the probability P (jXj ≥ 2). By transforming X into a standard normal variable (i.e. one withp mean 0 and variance 1), we can use a z-score table to look up probabilities.p We have thatpZ = X= 2 is a standard normal random variable, so P (jXj ≥ 2) = P (jZj ≥ 2) = 1 − 2P (0 ≤ Z ≤ 2). Looking at our z-score table, p we find that P (0 ≤ Z ≤ 2) ≈ 0:42, so that P (jXj ≥ 2) ≈ 0:16 . (b) What is the probability that an individual disperses less than 1 meter? We have a normal random variable with mean 0 and variance 2, and want to find the probability P (jXj ≤ 1). By transforming X into a standard normal variable (i.e. one withp mean 0 and variance 1), we can use a z-score table to look up probabilities. We have that Z = X= 2 is a standard normal p p random variable, so P (jXj ≤ 1) = P (jZj ≤ 2 ) = 2P (0 ≤ Z ≤ 2 ). Looking at our z-score table, we p 2 2 2 find that P (0 ≤ Z ≤ 2 ) ≈ 0:26, so that P (jXj ≤ 1) ≈ 0:52 . 2. Sampling (3) True/False practice: (a) If X1;X2;:::;Xn are a sequence of i.i.d. random variables, then g(x1; x2; : : : ; xn) = x1 − x2 − · · · ± xn is a statistic on these random variables. True . A statistic is just a random variable defined in terms of the random variables X1;X2;:::;Xn. X1+X2+···+Xn This isn't a particularly interesting statistic, though, like something like n would be. (b) An unbiased statistic for estimating the variance of the i.i.d. random variables X1;X2;:::;Xn is 1 Pn 2 X1+X2+···+Xn n k=1(Xk − X) , where X = n . False . This is a biased statistic, the unbiased statistic for estimating the variance of the i.i.d. random 1 Pn 2 X1+X2+···+Xn variables X1;X2;:::;Xn is n−1 k=1(Xk − X) , where X = n . (c) Ifµ ^ andσ ^ are estimates after sampling n times for the mean and standard error, respectively, of a random variable X with unknown mean µ, then we are 95% confident that µ lies in the interval pσ^ pσ^ µ^ − 2 n ; µ^ + 2 n . True . This is how we find a 95% confidence interval. This formula is derived from the fact that p (X − µ)=(σ= n) is a standard normal random variable by the central limit theorem. Note that we 1 2 JAMES ROWAN interpret a 95% confidence interval as \we are 95% sure that µ lies in this interval" and not \there is a 95% change that µ lies in this interval" since µ is not random but is some fixed unknown quantity. (4) (Stewart/Day 13.1.9) The age of onset of MS is well described by a normal random variable with unknown mean and standard deviation 7:6 years. The age of onset is measured for 32 individuals. Find the probability that the sample mean falls within 2 years of the true population mean. We know that X−pµ is approximately distributed like a standard normal random variable. We want 7:6= 32 to find P (jX − µj ≤ 2), or, equivalently, P X−pµ ≤ 1:49 . Looking at a table of z-scores, we see that 7:6= 32 P 0 ≤ X−pµ ≤ 1:49 ≈ 0:4319, so P (jX − µj ≤ 2) ≈ 0:8627 . 7:6= 32 (5) (Stewart/Day 13.2.13) A sample of 13 female and 17 male volunteers had their HDL cholesterol levels measured, giving the following data: Female: 46; 48; 49; 50; 51; 53; 53; 54; 54; 55; 56; 58; 60 Male: 29; 29; 30; 30; 31; 32; 34; 36; 41; 42; 43; 44; 46; 51; 52; 52; 55 Assume the HDL cholesterol levels for both groups are normally distributed. Calculate the 95% confidence intervals for both groups and plot them on the same graph. We calculate sample means of 52:85 for the female volunteers and 39:82 for the male volunteers. We calculate sample standard deviations s∗ of 3:84 for the female volunteers and 8:85 for the male volunteers. Since we have n = 13 female volunteers and n = 17 male volunteers, our 95% confdience interval for the the HDL level for the female population is 52:85 − 2 p3:84 ; 52:85 + 2 p3:84 ≈ (50:72; 54:98) and our 95% confdience 13 13 interval for the the HDL level for the male population is 39:82 − 2 p8:85 ; 39:82 + 2 p8:85 ≈ (34:91; 44:73). 17 17 A plot of these two intervals would show that these intervals do not overlap. We interpret the fact that these two 95% confidence intervals do not overlap as telling us that, probably, males and females have different average HDL cholesterol levels. (6) (historical, challenge problem) Suppose Kiwibots have sequential serial numbers 1; 2; 3;:::;N, where there are N total Kiwibots in existence. (a) Suppose that all N Kiwibots are equally likely to be the one that delivers you food. Let X be a random variable with outcome space the set of all Kiwibots with X(!) = the serial number of the Kiwibot !. What kind of random variable is X? X is a discrete uniform random variable on f1; 2;:::;Ng. (b) Suppose you get a food order from a Kiwibot and see that its serial number is 25. Estimate the mean of the random variable X using this single sample x1 = 25. We can estimate that the Kiwibot we saw was the median of all Kiwibots so far seen, and thus estimate the mean to be 25. (c) Using your answer to part b), estimate N and explain why your estimator is good. If we estimate the mean of a uniform normal distribution on f1; 2;:::;Ng is 25, this means we estimate N+1 that 2 = 25, so we estimate that N = 49. (d) Suppose now that you get a second food order from a Kiwibot and the two serial numbers you've seen are x1 = 25 and x2 = 17. Estimate the mean of X and then estimate N. N+1 25+17 Using the idea from part c), we estimate that the mean 2 = 2 = 21, so we estimate that N = 41. (e) (extra challenge) Suppose you've seen n Kiwibots with serial numbers x1 < x2 < ··· < xn. Find a good estimator for N. MATH 10B DISCUSSION SECTION PROBLEMS 4/11 { SOLUTIONS 3 x1+x2+···+xn We could use 2 · n + 1 as in parts c) and d) above, but this might give us an estimated N −1 smaller than xn. The best estimator, in some sense, would be 1 + n xn, as it guarantees we estimate an N that is bigger than xn. For more information about this problem, search for the \German Tank Problem;" this approach was used to estimate German tank production from captured German tanks during WWII. 3. Acknowledgments All problems labeled \Stewart/Day" taken from Day, Troy and Stewart, James. Biocalculus: Calculus, Probability, and Statistics for the Life Sciences. Cengage Learning, 2019. All problems labeled \Rosen" taken from Rosen, Kenneth H. Discrete Mathematics and its Applications. Eighth edition. McGraw Hill, 2019..