Chapter 15 Building Multiple Regression Models

LEARNING OBJECTIVES

This chapter presents the potential of multiple regression analysis as a tool in business decision making and its applications, thereby enabling you to:

1. Analyze and interpret nonlinear variables in multiple regression analysis. 2. Understand the role of qualitative variables and how to use them in multiple regression analysis. 3. Learn how to build and evaluate multiple regression models. 4. Learn how to detect influential observations in regression analysis.

CHAPTER OUTLINE

15.1 Nonlinear Models: Mathematical Transformation Polynomial Regression Tukey’s Ladder of Transformations Regression Models with Interaction Model Transformation

15.2 Indicator (Dummy) Variables

15.3 Model-Building: Search Procedures Search Procedures All Possible Regressions Stepwise Regression Forward Selection Backward Elimination

15.4 Multicollinearity

KEY WORDS

all possible regressions qualitative variable backward elimination search procedures dummy variable stepwise regression forward selection Tukey’s four-quadrant approach indicator variable Tukey’s ladder of transformations multicollinearity variance inflation factor quadratic model

15 16 Solutions Manual and Study Guide

STUDY QUESTIONS

1. Another name for an indicator variable is a ______variable. These variables are ______as opposed to quantitative variables.

2. Indicator variables are coded using ______and ______.

3. Suppose an indicator variable has four categories. In coding this into variables for multiple regression analysis, there should be ______variables.

4. Regression models in which the highest power of any predictor variable is one and in which there are no interaction terms are referred to as ______models.

5. The interaction of two variables can be studied in multiple regression using the ______terms.

x 6. Suppose a researcher wants to analyze a set of data using the model: yˆ = b0b1 The model would be transformed by taking the ______of both sides of the equation.

7. Perhaps the most widely known and used of the multiple regression search procedures is ______regression.

8. One multiple regression search procedure is Forward Selection. Forward selection is essentially the same as stepwise regression except that ______.

9. Backward elimination is a step-by-step process that begins with the ______model.

10. A search procedure that computes all the possible linear multiple regression models from the data using all variables is called ______.

11. When two or more of the independent variables of a multiple regression model are highly correlated it is referred to as ______. This condition causes several other problems to occur including

(1) difficulty in interpreting ______.

(2) Inordinately small ______for the regression coefficients may result.

(3) The standard deviations of regression coefficients are ______.

(4) The ______of estimated regression coefficients may be the opposite of what would be expected for a particular predictor variable. Chapter 15: Building Multiple Regression Models 17

ANSWERS TO STUDY QUESTIONS

1. Dummy, Qualitative

2. 0, 1

3. 3

4. First-Order

5. x1  x2 or Cross Product

6. Logarithm

7. Stepwise

8. Once a Variable is Entered Into the Process, It is Never Removed

9. Full

10. All Possible Regressions

11. Multicollinearity, the Estimates of the Regression Coefficients, t Values, Overestimated, Algebraic Sign

SOLUTIONS TO ODD-NUMBERED PROBLEMS IN CHAPTER 15

15.1 Simple Regression Model:

yˆ = –147.27 + 27.128 x

2 2 F = 229.67 with p = .000, se = 27.27, R = .97, adjusted R = .966, and t = 15.15 with p = .000. This is a very strong simple regression model.

Quadratic Model (Using both x and x2):

yˆ = –22.01 + 3.385 X + 0.9373 x2

2 2 F = 578.76 with p = .000, se = 12.3, R = .995, adjusted R = .993, for x: t = 0.75 with p = .483, and for x2: t = 5.33 with p = .002. The quadratic model is also very strong with an even higher R2 value. However, in this model only the x2 term is a significant predictor. 18 Solutions Manual and Study Guide

15.3 Simple regression model:

ˆ Yˆ = –1456.6 + 71.017 x

R2 = .928 and adjusted R2 = .910. t = 7.17 with p = .002.

Quadratic regression model:

yˆ = 1012 – 14.06 X + 0.6115 x2

R2 = .947 but adjusted R2 = .911. The t ratio for the x term is t = –0.17 with p = .876. The t ratio for the x2 term is t = 1.03 with p = .377

Neither predictor is significant in the quadratic model. Also, the adjusted R2 for this model is virtually identical to the simple regression model. The quadratic model adds virtually no predictability that the simple regression model does not already have. The scatter plot of the data follows:

7000

6000

5000 p x

E 4000

d A 3000

2000

1000 30 40 50 60 70 80 90 100 110 Eq & Sup Exp

15.5 The regression model is:

2 2 yˆ = –28.61 – 2.68 x1 + 18.25 x2 – 0.2135 x1 – 1.533 x2 + 1.226 x1*x2

F = 63.43 with p = .000 significant at  = .001 2 2 se = 4.669, R = .958, and adjusted R = .943

None of the t ratios for this model are significant. They are t(x1) = –0.25 with p = .805, t(x2) = 2 2 0.91 with p = .378, t(x1 ) = –0.33 with .745, t(x2 ) = –0.68 with .506, and t(x1*x2) = 0.52 with p = . 613. This model has a high R2 yet none of the predictors are individually significant.

The same thing occurs when the interaction term is not in the model. None of the t tests are significant. The R2 remains high at .957 indicating that the loss of the interaction term was insignificant. Chapter 15: Building Multiple Regression Models 19

15.7 The regression equation is:

yˆ = 13.619 – 0.01201 x1 + 2.998 x2

The overall F = 8.43 is significant at  = .01 (p = .009).

2 2 se = 1.245, R = .652, adjusted R = .575

The t ratio for the x1 variable is only t = –0.14 with p = .893. However the t ratio for the dummy variable, x2 is t = 3.88 with p = .004. The indicator variable is the significant predictor in this regression model which has some predictability (adjusted R2 = .575).

15.9 This regression model has relatively strong predictability as indicated by R2 = .795. Of the three

predictor variables, only x1 and x2 have significant t ratios (using  = .05). x3 (a non indicator variable) is not a significant predictor. x1, the indicator variable, plays a significant role in this model along with x2.

15.11 The regression equation is:

Price = 7.066 – 0.0855 Hours + 9.614 ProbSeat + 10.507 FQ

2 The overall F = 6.80 with p = .009 which is significant at  = .01. se = 4.02, R = .671, and adjusted R2 = .573. The difference between R2 and adjusted R2 indicates that there are some non- significant predictors in the model. The t ratios, t = –0.56 with p = .587 and t = 1.37 with p = . 202, of Hours and Probability of Being Seated are non-significant at  = .05. The only significant predictor is the dummy variable, French Quarter or not, which has a t ratio of 3.97 with p = .003 which is significant at  = .01. The positive coefficient on this variable indicates that being in the French Quarter adds to the price of a meal.

15.13 Stepwise Regression:

2 Step 1: x2 enters the model, t = –7.35 and R = .794 The model is = 36.15 – 0.146 x2

Step 2: x3 enters the model and x2 remains in the model. 2 t for x2 is –4.60, t for x3 is 2.93. R = .876.

The model is yˆ = 26.40 – 0.101 x2 + 0.116 x3

The variable, x1, never enters the procedure.

15.15 The output shows that the final model had four predictor variables, x4, x2, x3, and x7. The variables, x5 and x6 did not enter the stepwise analysis. The procedure took four steps. The final model was:

y1 = –5.00 x4 + 3.22 x2 + 1.78 x3 + 1.56 x7

2 The R for this model was .5929, and se was 3.36. The t ratios were: tx4 = 3.07, tx2 = 2.05, tx5 = 2.02, and tx7 = 1.98. 20 Solutions Manual and Study Guide

15.17 The output indicates that the procedure went through two steps. At step 1, dividends entered the process yielding an R2 of .833 by itself. The t value was 6.69 and the model was yˆ = –11.062 + 2 61.1 x1. At step 2, net income entered the procedure and dividends remained in the model. The R for this two predictor model was .897 which is a modest increase from the simple regression model shown in step one. The step 2 model was:

Premiums earned = –3.726 + 45.2 dividends + 3.6 net income

For step 2, tdividends = 4.36 (p-value = .002) and tnet income = 2.24 (p-value = .056).

15.19 y x1 x2 x3

y – –.653 –.891 .821

x1 –.653 – .650 –.615

x2 –.891 .650 – –.688

x3 .821 –.615 –.688 –

There appears to be some correlation between all pairs of the predictor variables, x1, x2, and x3. All pairwise correlations between independent variables are in the .600 to .700 range.

15.21 The stepwise regression analysis of problem 15.17 resulted in two of the three predictor variables being included in the model. The simple regression model yielded an R2 of .833 jumping to .897 with the two predictors. The predictor intercorrelations are:

Net Income Dividends Gain/Loss

Net – .682 .092 Income

Dividends .682 – –.522

Gain/Loss .092 –.522 –

An examination of the predictor intercorrelations reveals that Gain/Loss and Net Income have very little correlation, but Net Income and Dividends have a correlation of .682 and Dividends and Gain/Loss have a correlation of –.522. These correlations might suggest multicollinearity.

15.23 The regression model is:

yˆ = 564 – 27.99 x1 – 6.155 x2 – 15.90 x3

2 2 F = 11.32 with p = .003, se = 42.88, R = .809, adjusted R = .738. For x1, t = –0.92 with p = .384, for x2, t = –4.34 with p = .002, for x3, t = –0.71 with p = .497. Thus, only one of the three 2 predictors, x2, is a significant predictor in this model. This model has very good predictability (R = .809). The gap between R2 and adjusted R2 underscores the fact that there are two nonsignificant

predictors in this model. x1 is a nonsignificant indicator variable. Chapter 15: Building Multiple Regression Models 21

15.25 In this model with x1 and the log of x1 as predictors, only the log x1 was a significant predictor of y. The stepwise procedure only went to step 1. The regression model was:

2 yˆ = –13.20 + 11.64 Log x1. R = .9617 and the t ratio of Log X1 was 17.36. This model has

very strong predictability using only the log of the X1 variable.

15.27 The stepwise regression procedure only used two steps. At step 1, Silver was the lone predictor. The value of R2 was .5244. At step 2, Aluminum entered the model and Silver remained in the model. However, the R2 jumped to .8204. The final model at step 2 was:

Gold = –50.19 + 18.9 Silver +3.59 Aluminum.

The t values were: tSilver = 5.43 and tAluminum = 3.85.

Copper did not enter into the process at all.

15.29 There were four predictor variables. The stepwise regression procedure went three steps. The predictor, apparel, never entered in the stepwise process. At step 1, food entered the procedure producing a model with an R2 of .84. At step 2, fuel oil entered and food remained. The R2 increased to .95. At step 3, shelter entered the procedure and both fuel oil and food remained in the model. The R2 at this step was .96. The final model was:

All = –1.0615 + 0.474 Food + 0.269 Fuel Oil + 0.249 Shelter

The t ratios were: tfood = 8.32, tfuel oil = 2.81, tshelter = 2.56.

15.31 The regression model was:

Grocery = 76.23 + 0.08592 Housing + 0.16767 Utility + 0.0284 Transportation – 0.0659 Healthcare

F = 2.29 with p = .095 which is not significant at  = .05.

2 2 se = 4.416, R = .315, and adjusted R = .177.

Only one of the four predictors has a significant t ratio and that is Utility with t = 2.57 and p = . 018. The ratios and their respective probabilities are:

thousing = 1.68 with p = .109, ttransportation = 0.17 with p = .87, and thealthcare = – 0.64 with p = .53.

This model is very weak. Only the predictor, Utility, shows much promise in accounting for the grocery variability.

15.33 Of the three predictors, x2 is an indicator variable. An examination of the stepwise regression output reveals that there were three steps and that all three predictors end up in the final model. 2 Variable x3 is the strongest individual predictor of y and entered at step one resulting in an R of . 2 8124. At step 2, x2 entered the process and variable x3 remained in the model. The R at this step was .8782. At step 3, variable x1 entered the procedure. Variables x3 and x2 remained in the model. The final R2 was .9407. The final model was:

yˆ = 87.89 + 0.071 x3 – 2.71 x2 – 0.256 x1 22 Solutions Manual and Study Guide