Regression Analysis: Adjusted Versus Unadjusted

Regression Analysis: Adjusted Versus Unadjusted

COR1-GB.1305.03
FINAL EXAM

This is the question sheet. There are 10 questions, each worth 10 points. Please write all answers in the answer book, and justify your answers. Good Luck!

For questions 1 to 4, we consider data on state reading/math tests for the 10 most populous states in the US in 2013, based on 4th and 8th grades, combined.

The explanatory variable (x) is “Unadjusted”, the number of months ahead of the national average for the given state based on raw (unadjusted) scores on NAEP tests (National Assessment of Educational Progress). The states with the highest unadjusted scores are Ohio and Pennsylvania.

The response variable (y) is “Adjusted”, the number of months ahead of the national average for the given state based on test scores that are adjusted for student demographic factors such as poverty and share of students in specialeducation.

1)Figure 1 below provides a scatterplot of Adjusted vs. Unadjusted, followed by the Minitab output from a simple regression of Adjusted on Unadjusted.

Regression Analysis: Adjusted versus Unadjusted

Analysis of Variance

Source DF SS MS F-Value P-Value

Regression 1 55.27 55.269 5.82 0.042

Error 8 75.92 9.490

Total 9 131.19

Model Summary

S R-sq R-sq(adj) R-sq(pred)

3.08050 42.13% 34.90% 8.18%

Coefficients

Term Coef SE Coef T-Value P-Value

Constant 0.502 0.974 0.52 0.620

Unadjusted 0.746 0.309 2.41 0.042

Regression Equation

Adjusted = 0.502 +0.746Unadjusted

A)Give an interpretation of the intercept of the fitted model, in practical terms. (2 points).

B)Is there evidence of a positive linear relationship between Unadjusted and Adjusted? (3 points).

C)California had an Unadjusted score of −6.4 and an Adjusted score of −6.3. Is this data point above the fitted line? Justify your answer. (3 points).

D)Do you think that the true regression line goes through the origin, (0,0)? (2 points)

2) Consider the same simple regression as in Problem 1.

A)Test the null hypothesis that the true slope is 1, at the 5% level of significance. (5 points).

B)Construct a 95% confidence interval for the true slope. (5 points).

3) Now, we introduce a second explanatory variable: the population of the 10 states, in Millions. Figure 2 gives a fitted line plot for the simple regression of Adjusted on Population, followed by the Minitab output for the multiple regression of Adjusted on Unadjusted and Population.

Regression Analysis: Adjusted versus Unadjusted, Population

Analysis of Variance

Source DF SS MS F-Value P-Value

Regression 2 61.473 30.736 3.09 0.109

Error 7 69.712 9.959

Total 9 131.185

Model Summary

S R-sq R-sq(adj) R-sq(pred)

3.15577 46.86% 31.68% 0.00%

Coefficients

Term Coef SE Coef T-Value P-Value

Constant -1.36 2.56 -0.53 0.612

Unadjusted 0.926 0.391 2.37 0.049

Population 0.109 0.137 0.79 0.456

Regression Equation

Adjusted = -1.36 +0.926Unadjusted +0.109Population

A)Does Population seem to be a useful variable for explaining the Adjusted score?(2 points).

B)Did the introduction of Population as an explanatory variable weaken or strengthen the overall regression model, compared to the simple regression on Unadjusted? (5 points).

C) Use the AICC to select between the simple regression model on Unadjusted, and the multiple regression on both Unadjusted and Population. (3 points).

4) Figure 3 below gives the plot of residuals vs. fitted values for the multiple regression of Adjusted on Unadjusted and Population. Does this plot reveal any potential problems with the model?

5)Weekly wages for individuals working in Manhattan have a skewed (asymmetrical) distribution with a population mean of $2,749. The right tailof this distribution is much heavier (longer) than the left tail.You are going to select a random sample of size 20 from this population and compute the sample mean weekly wage,. Given this situation, is the expected value of the samplemean weekly wage greater than $2,749, less than $2,749, or equal to $2,749? Select one of these three scenarios,and defend your selection.

6) In simple linear regression, if the least-squares line has a positive slope, is it possible to have R2=0? Justify your answer.

7) Suppose you have a simple regression data set, where the least-squares line has a slope of 2.1. Consider any particulardata point in the scatterplot of y versus x, and move this data point by one unit to the right. What will be the resultingchange in the total sum of squares, SST? Give a numerical answer, if possible.

8) In simple linear regression, based on a data set with n=50, if R2=0, is it possible for the estimated slope to be statistically significantly different from zero at the 5% level of significance? Justify your answer.

9) If a discrete random variable has a standard deviation of zero, must it have a mean of zero? Justify your answer.

10) In testing versus based on a sample of size n=250, suppose you get a p-value of .006. If thesample standard deviation is 3.1, what is the value of the sample mean?