Midterm 2 Review: Extra Problems

1 Check Your Understanding a. (True/False) Using the same random sample from a population, you compute both a 95% con- fidence interval and a 99% confidence interval for a population proportion. The 99% confidence interval will always be wider than the 95% confidence interval. b. (True/False) A 90% CI for a proportion is (.442, .542). The sample proportion is .482. c. (True/False) A 90% CI for a proportion is (.442, .542). The margin of error is .10. d. (True/False) A 90% CI for a proportion is (.442, .542). The 90 percent confidence level that if the or study were repeated many times and we constructed many 90 percent CIs, we expect 90 percent of the generated 90% CIs would contain the population proportion. e. (True/False) Every has a distribution. f. (True/False) A t- for a test for one tells you how far the sample mean is from the hypothesized mean value in terms of standard errors of the sample mean. g. (True/False) A p-value is the probability the null hypothesis is true. h. (True/False) The significance level of a hypothesis test is the probability of incorrectly rejecting the null hypothesis. i. (True/False) The can be increased by either increasing the sample size or de- creasing the significance level or both. j. (True/False) A null hypothesis is rejected only when the p-value is less than the significance level. k. (True/False) The p-value for a one-sided test (either direction) can be found based on a two- sided p-value. l. (True/False) The test statistic for testing for a difference in two proportions has a normal dis- tribution. m. (True/False) The test statistic for testing for a difference in two proportions involves an esti- mate of the common proportion which is formed by pooling together the two sample proportions. n. (True/False) The of the sample mean estimates roughly how far the observed observations will differ from the sample mean for repeated samples of the same size. o. (True/False) The test statistic for testing about one population mean has a .

1 2 Name That Scenario

One of the primary challenges in hypothesis testing and making confidence intervals is determining the parameter of interest in a study or experiment. So far, we have covered five scenarios. The corresponding parameters are: p µ µ1 − µ2 µd p1 − p2. A researcher has decided to investigate attributes of Amherst students for a study on liberal arts college students. He has identified the following possible questions for his study, but doesn’t know the relevant parameters or how to set up his hypotheses. For each, determine the appropriate parameter and set up hypotheses (there is enough information in each case to form hypotheses).

1. Looking at spending habits, how much money do students spend on average each week on en- tertainment? Is it more than $30 a week on average?

2. Do males spend more on entertainment than females looking at average spending in a week?

3. Do more than 40 percent of Amherst students own an Ipod (or other mp3 player)?

4. In an average week, how many hours of TV/movies/online “stuff”, etc. do students watch? Is it different than 10 hours?

5. Does amount of TV/movie/online watching decrease during time at Amherst? Assume you ob- tain a random sample of seniors and they estimate the average number of hours of TV/movies/online “stuff” watched per week during their freshman year and senior year.

6. Do upperclassmen and lowerclassmen use facebook at equal rates (assume the question you ask is: do you use facebook more than 5 times a day?)?

3 Hypothesis Test or Confidence Interval

Once a parameter has been identified, not every situation calls for a hypothesis test. In fact, some argue you should always report a confidence interval, even with hypothesis tests. In the cases above, the questions were phrased so that hypothesis testing was warranted. However, CIs are useful estimation tools. Determine the relevant parameter, choose CI or hypothesis test as most appropriate, and justify your choice for each of the following situations.

1. How much more money per week do males spend on food compared to females on average?

2. Do more than 60 percent of Amherst students own a personal computer?

3. Do students get more than 8 hours of sleep a night on average in their freshman year?

4. What percentage of Amherst students are internet addicts?

2 4 Review Short Answer

1. What is the general form for a confidence interval?

2. In a hypothesis test for a population mean, you compute the standard error ofx ¯ as 4.83 for a sample of 236. Interpret the standard error.

3. A two-sided t-test for µ = 150 results in a negative test statistic and p-value of .16. What would the p-value have been if: a. The alternative had been one-sided looking for µ greater than 150?

b. The alternative had been one-sided looking for µ less than 150?

4. How are t distributions similar or different than normal distributions? (List at least 3 similarities or differences).

5. Quick answer: a. What is the name of the rule used to determine if the null hypothesis is rejected or not if comparing the p-value to a significance level? b. What confidence level is equivalent to a one-sided hypothesis test at a significance level of .03? c. When you do not reject the null hypothesis, can conclude that your result was statistically significant? d. Give an example where it would NOT be appropriate to use (test or CI) to generalize results back to the population. e. What pseudonym did Gossett publish the article discussing the t distribution under?

3 5 Air Pollution Reduction - from Moore, McCabe, and Craig

Recall that we have already considered some data from this study. Study details: residents in two areas had their wheezing symptoms compared after a bypass was constructed in one of the two areas to remove congestion. The data was collected some time after the bypass was constructed in order to assess the impact of the reduction in air pollution from having the bypass to remove congestion. The wheezing data collected was the one-year change in symptoms, i.e. residents reported the symptoms they had a year previous and how many had improved conditions in the year since was recorded. When trying to determine if the bypass resulted in a reduction of wheezing symptoms, one proportion to consider is simply the proportion of people who reported an improvement (i.e. reduction) in the number of wheezing attacks they suffered. In the bypass area, 45 out of 282 people reported an improvement and for the congested area, 21 out of 163 people reported an improve- ment. For the purposes of this exercise, assume the samples taken were random samples. Is there significant evidence to conclude that the bypass area had an improvement in number of wheezing attacks relative to the congested area? Perform a hypothesis test and report your conclusion, as well as an interpretation of your p-value.

4 6 Fumonisin - Toxin in Corn (data from FDA)

Fumonisins are toxins that come from mold and have been found in corn and associated corn prod- ucts. Two different corn meal types had their fumonisin levels compared. For partially degermed corn meal, the mean fumonisin level was .59 with a of 1.01. For not degermed corn meal, the mean fumonisin level was 1.21 with a standard deviation of 1.71. Assume the samples tested of each corn meal type were random samples of size 50. a. For partially degermed corn meal, assume the population mean fumonisin level is really .5 with a population standard deviation of 1. What is the probability of obtaining a sample mean of .59 for a random sample of size 50? What result was useful in determining this probability?

b. What is the standard error for the sample mean for partially degermed corn meal? Provide an interpretation of this standard error.

c. Using the sample data from both the partially degermed and not degermed corn meal, provide a 95% confidence interval for the difference in population mean amounts of fumonisin in each corn meal type. Be sure to comment on any assumptions/conditions. (df=79)

d. Interpret your confidence interval in the context of the problem.

e. What is the probability that your interval contains the difference in sample means?

5 f. What is the probability that your interval contains the difference in population means? g. Are you able to conclude that partially degermed corn meal has a mean fumonisin level that is .3 below the mean fumonisin level of not degermed corn meal? Explain. Set this up as a hypothesis test but use your confidence interval to perform the test. Be sure to report the significance level your conclusion is valid for.

7 Water Temperatures - For Fish

Water temperatures are known to be especially important to for fish species and their reproduction. One particular species that is very susceptible to temperature changes is the bull trout (found in Idaho). The EPA has issued special temperature regulations for bull trout breeding and juvenile grounds which require the maximum temperature to be no higher than 10 degrees C (cold compared to most other fish species). There has been some research done in an attempt to show that juvenile bull trout actually do well at higher temperatures due to the extreme constraints imposed to attain the 10 degree C restriction. A report by the Idaho Division of Environmental Quality states that a better temperature might be 12-14 degrees C. Suppose you are interested in investigating this and take a field trip to Idaho to find juvenile bull trout and determine the water temperature in their locations. You manage to find 40 juvenile bull trout (assume it is a random sample) and take 40 corresponding water temperatures. The average water temperature is 12.3 degrees C with a standard deviation of .7 degrees C. Is there evidence that the average temperature of juvenile bull trout locations is greater than 11 degrees C?

Do your findings imply juvenile bull trout do better at higher temperatures? How could you determine what temperatures bull trout did better in and what issues might you have performing that research?

6 8 Inositol or Placebo

A 1995 article in the American Journal of Psychiatry (Benjamin, J. et al.) described a double-blind experiment where 21 patients with panic disorder were treated for one week with a placebo and one week with the drug Inositol. Each patient recorded their number of panic attacks for each week. The data are listed in Inositol.txt online. a. What does it mean to say this was a double-blind experiment and why is that important in this application?

b. Is the data paired or two independent samples? How can you tell?

c. Perform a preliminary data analysis and describe your findings.

d. The researchers want to know whether or not Inositol decreased the number of panic attacks suffered by the patients on average. Determine (and define) an appropriate parameter and set hypotheses.

e. What assumptions need to hold in order for the test corresponding to your hypotheses to be valid? Check your assumptions and comment on their validity.

f. Perform the appropriate test. What is your test statistic and p-value? g. Interpret your p-value in the context of the problem.

h. What conclusion do you reach about the effectiveness of Inositol on decreasing panic attack oc- currences on average compared to placebo? Are there concerns about the validity of the conclusion based on assumption violations?

7