Multiple Linear Regression - Solutions

1 Relationship Between Eighth Grade IQ, Eighth Grade Abstract Reasoning and Ninth th grade Math Score For a statistics class project, students examined the relationship between x1 = 8 th th grade IQ, x2 = 8 grade Abstract Reasoning and y = 9 grade math scores for 20 students. The data are displayed below.

Student Math Score IQ Abstract Reas 1 33 95 28 2 31 100 24 3 35 100 29 4 38 102 30 5 41 103 33 6 37 105 32 7 37 106 34 8 39 106 36 9 43 106 38 10 40 109 39 11 41 110 40 12 44 110 43 13 40 111 41 14 45 112 42 15 48 112 46 16 45 114 44 17 31 114 41 18 47 115 47 19 43 117 42 20 48 118 49

Use Minitab on the dataset Finals found in the Datasets folder in ANGEL. Do Stat>Regression>Regression and enter in the Response window the variable math score and in the Predictors window enter IQ and Abstract_Reas. Click ‘Storage’ and then ‘Residuals’ and ‘Fits’. These will be stored in columns C4 and C5 and named as RESI1 and FITS1. Your output should look as follows: Regression Analysis: Math Score versus IQ, Abstract_Reas

The regression equation is Math Score = 54.1 - 0.484 IQ + 1.02 Abstract_Reas

Predictor Coef SE Coef T P Constant 54.05 22.99 2.35 0.031 IQ -0.4836 0.2955 -1.64 0.120 Abstract_Reas 1.0185 0.2656 3.84 0.001

S = 3.00271 R-Sq = 70.5% R-Sq(adj) = 67.1%

Analysis of Variance

Source DF SS MS F P Regression 2 366.92 183.46 20.35 0.000

1 Residual Error 17 153.28 9.02 Total 19 520.20 a. What is the regression equation and provide an interpretation of each slop in terms of the change in Y per unit change in X?

Math Score = 54.1 - 0.484 IQ + 1.02 Abstract_Reas

In multiple linear regression, the slope indicates “for a unit change in Xi while holding the other predictors constant (i.e. not changing), Y will change by the amount and direction of the slope for Xi”. So here, when holding abstract reasoning constant, for a 1 unit increase in IQ the predicted math score will decrease by 0.484 points; when holding IQ constant, for a 1 unit increase in Abstract Reasoning the predicted math score will increase by 1.02 points. b. Create two scatter plots of the measurements by Graph > Scatter Plot > Simple, and select IQ as the predictor (x-variable) and math score as the response (y-variable) and enter math score again as a y- variable and enter Abstract Reas x-variable. Select Multiple Graphs and click the radio button for “In separate panels of the same graph”. Describe the relationship between math score, abstract reasoning and IQ.

Scatterplot of Math Score vs IQ, Abstract_Reas

30 40 50 IQ Abstract_Reas 50

45 e r o c S

h 40 t a M

35

30 95 100 105 110 115

There is a positive relationship between both the response variables and IQ (the explanatory variable). However, the slope coefficient for IQ in the regression model is negative! This occurs from how the coefficients are now calculated. In simple linear regression the estimates are related to how the X and Y variables are correlated. However, in multiple linear regression this simple correlation loses its relevance. Instead, a partial correlation comes into play. c. Based on the output, what is the test of the slope for this regression equation? That is, provide the null and alternative hypotheses, the test statistic, p-value of the test, and state your decision and conclusion.

Ho: B1 = 0 Ha: B1 ╪ 0 The test statistic is -1.64 with a p-value of 0.120. Since this p-value is greater than 0.05, we would NOT reject Ho. This means, that when Abstract Reasoning is already in the model, IQ is not a statistically significant linear predictor of ninth grade math scores.

2 Ho: B2 = 0 Ha: B2 ╪ 0 The test statistic is 3.84 with a p-value of 0.001. Since this p-value is less than 0.05, we would REJECT Ho. This means, that when IQ is already in the model, Abstract Reasoning is a statistically significant linear predictor of ninth grade math scores. d. From the output, what is the meaning of the ANOVA F-test? Provide the two hypotheses statements, decision and conclusion.

Ho: B1 = B2 = 0 and Ha at least one of these slopes does not equal zero.

With a p-value of 0.000 and test statistic of 20.35, we reject Ho and conclude at least one of the slopes does not equal zero. NOTE: this rejection does not tell which slope(s) is/are significant. Just simply that at least one is significant. e. Check assumptions of constant variance and normality by creating a Scatterplot under Graphs of the residuals versus each of the predictor variables. For the normality plot, see Graphs > Probability Plot > Single and graph the residuals. What are your conclusions based on these graphs?

Both scatterplots provide and indication of an outlier (bottom right of each figure) and the probability plot which is testing that the null hypothesis that the data comes from a normal distribution is rejected (p-value less than 0.005) giving evidence that the data does not satisfy both assumptions of normality and constant variance. Handling possible outlier(s) in multiple linear regression is analogous to the methods used in simple linear regression.

Scatterplot of RESI1 vs IQ, Abstract_Reas

30 40 50 IQ Abstract_Reas 5.0

2.5

0.0 1 I S

E -2.5 R

-5.0

-7.5

-10.0 95 100 105 110 115

Probability Plot of RESI1 Normal - 95% CI

99 Mean -1.77636E-14 StDev 2.840 95 N 20 A D 1.170 90 P-Value <0.005 80 70 t

n 60 e

c 50 r

e 40 P 30 20

10

5

1 -10 -5 0 5 10 RESI1

3