Chapter 15: Building Multiple Regression Models 277
Total Page:16
File Type:pdf, Size:1020Kb
![Chapter 15: Building Multiple Regression Models 277](http://data.docslib.org/img/4e0979e82ba3827b3db0263d197d4fcc-1.webp)
Chapter 15 Building Multiple Regression Models
LEARNING OBJECTIVES
This chapter presents the potential of multiple regression analysis as a tool in business decision making and its applications, thereby enabling you to:
1. Analyze and interpret nonlinear variables in multiple regression analysis. 2. Understand the role of qualitative variables and how to use them in multiple regression analysis. 3. Learn how to build and evaluate multiple regression models. 4. Learn how to detect influential observations in regression analysis.
CHAPTER OUTLINE
15.1 Nonlinear Models: Mathematical Transformation Polynomial Regression Tukey’s Ladder of Transformations Regression Models with Interaction Model Transformation
15.2 Indicator (Dummy) Variables
15.3 Model-Building: Search Procedures Search Procedures All Possible Regressions Stepwise Regression Forward Selection Backward Elimination
15.4 Multicollinearity
KEY WORDS
all possible regressions qualitative variable backward elimination search procedures dummy variable stepwise regression forward selection Tukey’s four-quadrant approach indicator variable Tukey’s ladder of transformations multicollinearity variance inflation factor quadratic model
15 16 Solutions Manual and Study Guide
STUDY QUESTIONS
1. Another name for an indicator variable is a ______variable. These variables are ______as opposed to quantitative variables.
2. Indicator variables are coded using ______and ______.
3. Suppose an indicator variable has four categories. In coding this into variables for multiple regression analysis, there should be ______variables.
4. Regression models in which the highest power of any predictor variable is one and in which there are no interaction terms are referred to as ______models.
5. The interaction of two variables can be studied in multiple regression using the ______terms.
x 6. Suppose a researcher wants to analyze a set of data using the model: yˆ = b0b1 The model would be transformed by taking the ______of both sides of the equation.
7. Perhaps the most widely known and used of the multiple regression search procedures is ______regression.
8. One multiple regression search procedure is Forward Selection. Forward selection is essentially the same as stepwise regression except that ______.
9. Backward elimination is a step-by-step process that begins with the ______model.
10. A search procedure that computes all the possible linear multiple regression models from the data using all variables is called ______.
11. When two or more of the independent variables of a multiple regression model are highly correlated it is referred to as ______. This condition causes several other problems to occur including
(1) difficulty in interpreting ______.
(2) Inordinately small ______for the regression coefficients may result.
(3) The standard deviations of regression coefficients are ______.
(4) The ______of estimated regression coefficients may be the opposite of what would be expected for a particular predictor variable. Chapter 15: Building Multiple Regression Models 17
ANSWERS TO STUDY QUESTIONS
1. Dummy, Qualitative
2. 0, 1
3. 3
4. First-Order
5. x1 x2 or Cross Product
6. Logarithm
7. Stepwise
8. Once a Variable is Entered Into the Process, It is Never Removed
9. Full
10. All Possible Regressions
11. Multicollinearity, the Estimates of the Regression Coefficients, t Values, Overestimated, Algebraic Sign
SOLUTIONS TO ODD-NUMBERED PROBLEMS IN CHAPTER 15
15.1 Simple Regression Model:
yˆ = –147.27 + 27.128 x
2 2 F = 229.67 with p = .000, se = 27.27, R = .97, adjusted R = .966, and t = 15.15 with p = .000. This is a very strong simple regression model.
Quadratic Model (Using both x and x2):
yˆ = –22.01 + 3.385 X + 0.9373 x2
2 2 F = 578.76 with p = .000, se = 12.3, R = .995, adjusted R = .993, for x: t = 0.75 with p = .483, and for x2: t = 5.33 with p = .002. The quadratic model is also very strong with an even higher R2 value. However, in this model only the x2 term is a significant predictor. 18 Solutions Manual and Study Guide
15.3 Simple regression model:
ˆ Yˆ = –1456.6 + 71.017 x
R2 = .928 and adjusted R2 = .910. t = 7.17 with p = .002.
Quadratic regression model:
yˆ = 1012 – 14.06 X + 0.6115 x2
R2 = .947 but adjusted R2 = .911. The t ratio for the x term is t = –0.17 with p = .876. The t ratio for the x2 term is t = 1.03 with p = .377
Neither predictor is significant in the quadratic model. Also, the adjusted R2 for this model is virtually identical to the simple regression model. The quadratic model adds virtually no predictability that the simple regression model does not already have. The scatter plot of the data follows:
7000
6000
5000 p x
E 4000
d A 3000
2000
1000 30 40 50 60 70 80 90 100 110 Eq & Sup Exp
15.5 The regression model is:
2 2 yˆ = –28.61 – 2.68 x1 + 18.25 x2 – 0.2135 x1 – 1.533 x2 + 1.226 x1*x2
F = 63.43 with p = .000 significant at = .001 2 2 se = 4.669, R = .958, and adjusted R = .943
None of the t ratios for this model are significant. They are t(x1) = –0.25 with p = .805, t(x2) = 2 2 0.91 with p = .378, t(x1 ) = –0.33 with .745, t(x2 ) = –0.68 with .506, and t(x1*x2) = 0.52 with p = . 613. This model has a high R2 yet none of the predictors are individually significant.
The same thing occurs when the interaction term is not in the model. None of the t tests are significant. The R2 remains high at .957 indicating that the loss of the interaction term was insignificant. Chapter 15: Building Multiple Regression Models 19
15.7 The regression equation is:
yˆ = 13.619 – 0.01201 x1 + 2.998 x2
The overall F = 8.43 is significant at = .01 (p = .009).
2 2 se = 1.245, R = .652, adjusted R = .575
The t ratio for the x1 variable is only t = –0.14 with p = .893. However the t ratio for the dummy variable, x2 is t = 3.88 with p = .004. The indicator variable is the significant predictor in this regression model which has some predictability (adjusted R2 = .575).
15.9 This regression model has relatively strong predictability as indicated by R2 = .795. Of the three
predictor variables, only x1 and x2 have significant t ratios (using = .05). x3 (a non indicator variable) is not a significant predictor. x1, the indicator variable, plays a significant role in this model along with x2.
15.11 The regression equation is:
Price = 7.066 – 0.0855 Hours + 9.614 ProbSeat + 10.507 FQ
2 The overall F = 6.80 with p = .009 which is significant at = .01. se = 4.02, R = .671, and adjusted R2 = .573. The difference between R2 and adjusted R2 indicates that there are some non- significant predictors in the model. The t ratios, t = –0.56 with p = .587 and t = 1.37 with p = . 202, of Hours and Probability of Being Seated are non-significant at = .05. The only significant predictor is the dummy variable, French Quarter or not, which has a t ratio of 3.97 with p = .003 which is significant at = .01. The positive coefficient on this variable indicates that being in the French Quarter adds to the price of a meal.
15.13 Stepwise Regression:
2 Step 1: x2 enters the model, t = –7.35 and R = .794 The model is = 36.15 – 0.146 x2
Step 2: x3 enters the model and x2 remains in the model. 2 t for x2 is –4.60, t for x3 is 2.93. R = .876.
The model is yˆ = 26.40 – 0.101 x2 + 0.116 x3
The variable, x1, never enters the procedure.
15.15 The output shows that the final model had four predictor variables, x4, x2, x3, and x7. The variables, x5 and x6 did not enter the stepwise analysis. The procedure took four steps. The final model was:
y1 = –5.00 x4 + 3.22 x2 + 1.78 x3 + 1.56 x7
2 The R for this model was .5929, and se was 3.36. The t ratios were: tx4 = 3.07, tx2 = 2.05, tx5 = 2.02, and tx7 = 1.98. 20 Solutions Manual and Study Guide
15.17 The output indicates that the procedure went through two steps. At step 1, dividends entered the process yielding an R2 of .833 by itself. The t value was 6.69 and the model was yˆ = –11.062 + 2 61.1 x1. At step 2, net income entered the procedure and dividends remained in the model. The R for this two predictor model was .897 which is a modest increase from the simple regression model shown in step one. The step 2 model was:
Premiums earned = –3.726 + 45.2 dividends + 3.6 net income
For step 2, tdividends = 4.36 (p-value = .002) and tnet income = 2.24 (p-value = .056).
15.19 y x1 x2 x3
y – –.653 –.891 .821
x1 –.653 – .650 –.615
x2 –.891 .650 – –.688
x3 .821 –.615 –.688 –
There appears to be some correlation between all pairs of the predictor variables, x1, x2, and x3. All pairwise correlations between independent variables are in the .600 to .700 range.
15.21 The stepwise regression analysis of problem 15.17 resulted in two of the three predictor variables being included in the model. The simple regression model yielded an R2 of .833 jumping to .897 with the two predictors. The predictor intercorrelations are:
Net Income Dividends Gain/Loss
Net – .682 .092 Income
Dividends .682 – –.522
Gain/Loss .092 –.522 –
An examination of the predictor intercorrelations reveals that Gain/Loss and Net Income have very little correlation, but Net Income and Dividends have a correlation of .682 and Dividends and Gain/Loss have a correlation of –.522. These correlations might suggest multicollinearity.
15.23 The regression model is:
yˆ = 564 – 27.99 x1 – 6.155 x2 – 15.90 x3
2 2 F = 11.32 with p = .003, se = 42.88, R = .809, adjusted R = .738. For x1, t = –0.92 with p = .384, for x2, t = –4.34 with p = .002, for x3, t = –0.71 with p = .497. Thus, only one of the three 2 predictors, x2, is a significant predictor in this model. This model has very good predictability (R = .809). The gap between R2 and adjusted R2 underscores the fact that there are two nonsignificant
predictors in this model. x1 is a nonsignificant indicator variable. Chapter 15: Building Multiple Regression Models 21
15.25 In this model with x1 and the log of x1 as predictors, only the log x1 was a significant predictor of y. The stepwise procedure only went to step 1. The regression model was:
2 yˆ = –13.20 + 11.64 Log x1. R = .9617 and the t ratio of Log X1 was 17.36. This model has
very strong predictability using only the log of the X1 variable.
15.27 The stepwise regression procedure only used two steps. At step 1, Silver was the lone predictor. The value of R2 was .5244. At step 2, Aluminum entered the model and Silver remained in the model. However, the R2 jumped to .8204. The final model at step 2 was:
Gold = –50.19 + 18.9 Silver +3.59 Aluminum.
The t values were: tSilver = 5.43 and tAluminum = 3.85.
Copper did not enter into the process at all.
15.29 There were four predictor variables. The stepwise regression procedure went three steps. The predictor, apparel, never entered in the stepwise process. At step 1, food entered the procedure producing a model with an R2 of .84. At step 2, fuel oil entered and food remained. The R2 increased to .95. At step 3, shelter entered the procedure and both fuel oil and food remained in the model. The R2 at this step was .96. The final model was:
All = –1.0615 + 0.474 Food + 0.269 Fuel Oil + 0.249 Shelter
The t ratios were: tfood = 8.32, tfuel oil = 2.81, tshelter = 2.56.
15.31 The regression model was:
Grocery = 76.23 + 0.08592 Housing + 0.16767 Utility + 0.0284 Transportation – 0.0659 Healthcare
F = 2.29 with p = .095 which is not significant at = .05.
2 2 se = 4.416, R = .315, and adjusted R = .177.
Only one of the four predictors has a significant t ratio and that is Utility with t = 2.57 and p = . 018. The ratios and their respective probabilities are:
thousing = 1.68 with p = .109, ttransportation = 0.17 with p = .87, and thealthcare = – 0.64 with p = .53.
This model is very weak. Only the predictor, Utility, shows much promise in accounting for the grocery variability.
15.33 Of the three predictors, x2 is an indicator variable. An examination of the stepwise regression output reveals that there were three steps and that all three predictors end up in the final model. 2 Variable x3 is the strongest individual predictor of y and entered at step one resulting in an R of . 2 8124. At step 2, x2 entered the process and variable x3 remained in the model. The R at this step was .8782. At step 3, variable x1 entered the procedure. Variables x3 and x2 remained in the model. The final R2 was .9407. The final model was:
yˆ = 87.89 + 0.071 x3 – 2.71 x2 – 0.256 x1 22 Solutions Manual and Study Guide