Biometry 333 Final Exam Spring 2005

Show all work for full credit. Ask the instructor if any question is unclear. You are allowed one page of notes.

(1)(5pts) Which best describes a 95% confidence interval? (i) An interval calculated from sampled data such that it contains 95% of the sampled data. (ii) An interval calculated from sampled data such that it contains 95% of the population values. (iii) An interval calculated from the data such that if the process were done repeatedly, about 95% of the intervals would capture the parameter’s (population’s) true value. (iv) An interval calculated from the data such that if the process were done repeatedly, about 95% of them would reject the null hypothesis.

(2)(5pts) Which best describes the key part of the central limit theorem? (i) As the sample size increases, the population becomes approximately normal. (ii) As the sample size increases, the distribution of the sample means become more normally distributed. (iii) As the sample size increases, the distribution of the sample becomes approximately normal. (iv) As the sample size increases, the population variance decreases.

(3)(5pts) P-value is short for “probability value”. Which statement best describes what probability the p-value is describing? (i) It is the probability of the null hypothesis being true. (ii) It is the probability of getting a test statistic as extreme or more extreme than the test statistic obtained from the data if the null hypothesis were true. (iii) It is the probability of getting the test statistic that we got from the data. (iv) It is the probability of a type I error.

(4) Let X be a normal random variable with mean 50 and standard deviation 5. Suppose 1 16 16 independent values of X are averaged to get X   X i . Show all work or 16 i1 describe Minitab/R commands used

(4a. 2pts) Find P(45  X  47.5)

(4b. 2pts) Find the 75th percentile of X; i.e., P(X  ?)  0.75 .

(4c. 2pts) Find P(X  47.5)

1 (5) Below is an investigation of the student data to see if gender and handedness (left or right) are independent. Chi-Square Test: left, right Expected counts are printed below observed counts Chi-Square contributions are printed below expected counts

left right Total female 4 29 33 BBBB xxxxx 0.174 0.031

male 6 AAA 34 yyyy 28.93 0.169 0.030

Total 10 57 67

Chi-Sq = 0.403, DF = 1, P-Value = ???? 1 cells with expected counts less than 5. (5a. 2pts) What was the observed number of right handed males (AAA)?

(5b. 2pts) What was the expected number of left handed females (BBBB)?

(5c. 2pts) Below “BBBB” is a 0.174 which is the chi-square contribution for that cell. Show how 0.174 was calculated.

(5d. 2pts) Use the chi-square statistic of 0.403 to find the p-value (????).

(6. 6pts)Six right handed men lifted weights (pounds) with their left and right arms. Perform a statistical test to see if the right arm is any stronger than the left arm. Assume normality. (Hint: This is Biom109 review material.) Show all work or describe Minitab/R commands. Give a test statistic, p-value, and conclusion as your answer. left right 38 41 55 63 37 41 25 25 70 68 61 71

2 (7) The following is a residual plot for the model y = b0 + b1 x. The regression equation is y = - 6.24 + 10.9 x

Residuals Versus the Fitted Values (response is y) 25

20

15

10 l a u d

i 5 s e

R 0

-5

-10

-15 0 10 20 30 40 50 Fitted Value

(7a. 4pts) What is the fitted value and squared residual for the (x,y) point (4.45, 63.11)?

(7b. 4pts) What model would you suggest be tried next? Why?

(7c. 4pts) Name 3 assumptions about the distribution of the errors (residuals?)

3 (8) The continuous variables x1, x2, and x3 were investigated for their association with the y response variable. Use the following Minitab ANOVA table for the general linear model y=b0 + b1x1 + b2x2 + b3x3 to answer the questions. General Linear Model: y versus Factor Type Levels Values

Analysis of Variance for y, using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P x1 1 112.89 229.83 229.83 XXXX YYYY x2 1 19.77 4.49 4.49 ------x3 1 705.17 705.17 705.17 ------Error 37 1853.01 1853.01 50.08 Total 40 2690.84

S = ZZZZZZ R-Sq = 31.14% R-Sq(adj) = 25.55%

(8a. 8pts) Use the above table to calculate the residual sum of squares for the following four models. Show your work or explain what part of the table gave you the result. (Hint, this is a reverse of the quiz problem and use the sequential sum of squares.) Model residual SS y = b0 ?= y = b0 + b1x1 ?= y = b0 + b1x1 + b2x2 ?= y = b0 + b1x1 + b2x2 + b3x3 ?=

(8b. 4pts) Calculate the F-statistic (XXXX) and p-value (YYYY) for the x1 variable. Use the adjusted sum of squares information. Show your work and explain commands gave you the p-value result. XXXX= YYYY=

(8c. 4pts) Which variable is responsible for the adjusted R-square being smaller than the R-square statistic? Explain why you chose that variable.

(8d. 4pts) Estimate the standard deviation for the noise term. ZZZZZZ=?

4 (9)The student data (from Mark Rizzardi’s data web page) were analyzed to explain height using sex and shoe size. Shoe size was treated as a continuous variable and an interaction was included in the model.

General Linear Model: height versus sex Factor Type Levels Values sex fixed 2 female, male

Analysis of Variance for height, using Sequential SS for Tests

Source DF Seq SS Adj SS Seq MS F P sex 1 508.55 4.04 508.55 105.77 0.000 shoe 1 281.66 277.35 281.66 58.58 0.000 sex*shoe 1 11.85 11.85 11.85 2.47 0.121 Error 65 312.54 312.54 4.81 Total 68 1114.61

S = 2.19278 R-Sq = 71.96% R-Sq(adj) = 70.67%

Term Coef SE Coef T P Constant 55.441 1.722 32.19 0.000 sex female 1.578 1.722 0.92 0.363 shoe 1.3633 0.1795 XXXX YYYY shoe*sex female -0.2818 0.1795 -1.57 0.121

(9a. 4pts) Why would not be sensible to use the adjusted sum of squares to calculate the F-statistics for the individual variables?

(9b. 4pts) Is the interaction between shoe size and sex statistically significant? Explain how you reached your conclusion.

(9c. 4pts) Write the equation for the line which describes the expected height of a female for a given shoe size. Show your work and use the equation form: y= a + bx.

(9d. 3pts) Write the equation for the line which describes the expected height of a male for a given shoe size. Show your work and use the equation form: y= a + bx.

(9e. 4pts) Calculate the t-statistic and p-value for the shoe main effect variable. XXXX= YYYY=

5 (10. 5pts) Suppose you are going to model the thickness of fur (mm) of gray foxes using the variables height (continuous), sex (2 level factor), and season (4 level factor). You will not fit any interactions and you will assume normally distributed errors. Using Greek letters (alphas, betas, gammas, epsilons, etc.) along with “fur” (response), “height”, “sex”, and “season”, write a model describing fur length. Include an error term. Use [ ] brackets for the factors. (Nothing tricky here! You should know how to do this from seeing this done frequently on the white board to describe a model.)

(11) An experiment was carried out where 20 amateur male runners were randomly selected from a list of 1000 runners who had registered to race in a 10 kilometer running event. The 20 runners were broken down into two groups of 10. One group was given VitaminX to take for the next year and the other group given VitaminZ. The runners then were to record their times during six 10k races on a track over the course of a year. The data were then analyzed to determine which Vitamin was better.

(11a. 2pts) What is the fixed factor, if any, for this experiment? Explain why it is fixed.

(11b. 3pts) What is the random factor, if any, for this experiment? Explain why it is random.

(12. 4pts) Suppose your response variables are either proportions or yes/no outcomes. You are interested in how some explanatory variables (x1, x2, etc) are associated with the outcomes. Specifically suggest a type of regression that would be very appropriate for such data.

6