Student S Solutions Manual and Study Guide: Chapter 14 Page 1

Total Page:16

File Type:pdf, Size:1020Kb

Student S Solutions Manual and Study Guide: Chapter 14 Page 1

Student’s Solutions Manual and Study Guide: Chapter 14 Page 1

Chapter 14 Building Multiple Regression Models

LEARNING OBJECTIVES

This chapter presents several advanced topics in multiple regression analysis enabling you to:

1. Generalize linear regression models as polynomial regression models using model transformation and Tukey’s ladder of transformation, accounting for possible interaction among the independent variables. 2. Examine the role of indicator, or dummy, variables as predictors or independent variables in multiple regression analysis. 3. Use all possible regressions, stepwise regression, forward selection, and backward elimination search procedures to develop regression models that account for the most variation in the dependent variable and are parsimonious. 4. Recognize when multicollinearity is present, understanding general techniques for preventing and controlling it. 5. Explain when to use logistic regression, and interpret its results.

CHAPTER OUTLINE

14.1 Non Linear Models: Mathematical Transformation Polynomial Regression Tukey’s Ladder of Transformations Regression Models with Interaction Model Transformation

14.2 Indicator (Dummy) Variables

14.3 Model-Building: Search Procedures Search Procedures All Possible Regressions Stepwise Regression Forward Selection Backward Elimination

Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition Student’s Solutions Manual and Study Guide: Chapter 14 Page 2

14.4 Multicollinearity

14.5 Logistic Regression

KEY TERMS

All Possible Regressions Qualitative Variable Backward Elimination Search Procedures Dummy Variable Stepwise Regression Forward Selection Tukey’s Four-quadrant Approach Indicator Variable Tukey’s Ladder of Transformations Multicollinearity Variance Inflation Factor (VIF) Quadratic Regression Model

STUDY QUESTIONS

1. Another name for an indicator variable is a ______variable. These variables are ______as opposed to quantitative variables.

2. Indicator variables are coded using ______and ______.

3. Suppose an indicator variable has four categories. In coding this into variables for multiple regression analysis, there should be ______variables.

4. Regression models in which the highest power of any predictor variable is one and in which there are no interaction terms are referred to as ______models.

5. The interaction of two variables can be studied in multiple regression using the ______terms.

x 6. Suppose a researcher wants to analyze a set of data using the model: yˆ = b0b1 The model would be transformed by taking the ______of both sides of the equation.

7. Perhaps the most widely known and used of the multiple regression search procedures is ______regression.

Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition Student’s Solutions Manual and Study Guide: Chapter 14 Page 3

8. One multiple regression search procedure is Forward Selection. Forward selection is essentially the same as stepwise regression except that ______.

9. Backward elimination is a step-by-step process that begins with the ______model.

10. A search procedure that computes all the possible linear multiple regression models from the data using all variables is called ______.

11. When two or more of the independent variables of a multiple regression model are highly correlated it is referred to as ______. This condition causes several other problems to occur including

(1) difficulty in interpreting ______.

(2) Inordinately small ______for the regression coefficients may result.

(3) The standard deviations of regression coefficients are ______.

(4) The ______of estimated regression coefficients may be the opposite of what would be expected for a particular predictor variable.

Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition Student’s Solutions Manual and Study Guide: Chapter 14 Page 4

ANSWERS TO STUDY QUESTIONS

1. Dummy, Qualitative

2. 0, 1

3. 3

4. First-Order

5. x1  x2 or Cross Product

6. Logarithm

7. Stepwise

8. Once a variable is entered into the process, it is never removed

9. Full

10. All Possible Regressions

11. Multicollinearity, the Estimates of the Regression Coefficients, t Values, Overestimated, Algebraic Sign

Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition Student’s Solutions Manual and Study Guide: Chapter 14 Page 5

SOLUTIONS TO PROBLEMS IN CHAPTER 14

14.1 Simple Regression Model:

yˆ = – 147.27 + 27.128 x

2 2 F = 229.67 with p = .000, se = 27.27, R = .97, adjusted R = .966, and t = 15.15 (for x) with p = .000. This is a very strong simple regression model.

Quadratic Model (Using both x and x 2):

yˆ = – 22.01 + 3.385 x + 0.9373 x2

2 2 F = 578.76 with p = .000, se = 12.3, R = .995, adjusted R = .993, for x: t = 0.75 with p = .483, and for x2: t = 5.33 with p = .002. The quadratic model is also very strong with an even higher R2 value. However, in this model only the x2 term is a significant predictor.

14.3 Simple regression model:

yˆ = - 1,456.6 + 71.017 x

R2 = .928 and adjusted R2 = .910. t = 7.17 (for x) with p = .002.

Quadratic regression model:

yˆ = 1,012 - 14.06 x + 0.6115 x2

R2 = .947 but adjusted R2 = .911. The t statistic for the x term is t = - 0.17 with p = .876. The t statistic for the x2 term is t = 1.03 with p = .377 Neither predictor is significant in the quadratic model. Also, the adjusted R2 for this model is virtually identical to the simple regression model. The quadratic model adds virtually no predictability that the simple regression model does not already have. The scatter plot of the data follows:

Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition Student’s Solutions Manual and Study Guide: Chapter 14 Page 6

7000

6000

5000 p x

E 4000

d A 3000

2000

1000 30 40 50 60 70 80 90 100 110 Eq & Sup Exp

14.5 The regression model is:

2 2 yˆ = - 28.61 - 2.68 x1 + 18.25 x2 - 0.2135 x1 - 1.533 x2 + 1.226 x1x2

F = 63.43 with p = .000 significant at  = .001 2 2 se = 4.669, R = .958, and adjusted R = .943

None of the t statistics for this model are significant. They are t(x1) = - 2 0.25 with p = .805, t(x2) = 0.91 with p = .378, t(x1 ) = - 0.33 with .745, 2 t(x2 ) = - 0.68 with .506, and t(x1x2) = 0.52 with p = .613. This model has a high R2 yet none of the predictors are individually significant.

The same thing occurs when the interaction term is not in the model. None of the t statistics are significant. The R2 remains high at .957 indicating that the loss of the interaction term was insignificant.

14.7 The regression equation is:

yˆ = 13.619 - 0.01201 x1 + 2.998 x2

The overall F = 8.43 is significant at  = .01 (p = .009).

Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition Student’s Solutions Manual and Study Guide: Chapter 14 Page 7

2 2 se = 1.245, R = .652, adjusted R = .575

The t statistic for the x1 variable is only t = -0.14 with p = .893. However the t statistic for the dummy variable, x2 is t = 3.88 with p = .004. The indicator variable is the significant predictor in this regression model that has some predictability (adjusted R2 = .575).

14.9 This regression model has relatively strong predictability as indicated by 2 R = .795. Of the three predictor variables, only x1 and x2 have significant

t statistics (using  = .05). x3 (a non-indicator variable) is not a significant predictor. x1, the indicator variable, plays a significant role in this model along with x2.

14.11 The regression equation is:

Price = 3.4394 - 0.0195 Hours + 9.113 ProbSeat + 10.528 Downtown

The overall F = 6.58 with p = .0099 which is significant at  = .01. se = 3.94, R2 = .664, and adjusted R2 = .563. The difference between R2 and adjusted R2 indicates that there are some non-significant predictors in the model. The t statistics, t = - 0.13 with p = .901 and t = 1.34 with p = .209, of Hours and Probability of Being Seated are non-significant at  = .05. The only significant predictor is the dummy variable, Downtown location or not, which has a t statistic of 3.95 with p = .003 which is significant at  = .01. The positive coefficient on this variable indicates that being in the Downtown adds to the price of a meal.

14.13 Stepwise Regression:

Step 1: After developing a simple regression model for each independent variable, we select the model with x2 with 2 t = - 7.35 and R = .794. The model is yˆ = 36.15 - 0.146 x2.

Step 2: x3 enters the model and x2 remains in the model. 2 t for x2 is -4.60, t for x3 is 2.93. R = .876.

The model is yˆ = 26.40 - 0.101 x2 + 0.116 x3.

Step 3: The regression model is explored that contains x1 in

Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition Student’s Solutions Manual and Study Guide: Chapter 14 Page 8

addition to x2 and x3. The model does not produce any significant result. No new variable is added to the model produced in Step 2. Note that at every step of the procedure, the variable x1 appears to be non-significant.

14.15 The output shows that the final model had four predictor variables, x3, x1, x2, and x6. The variables, x4 and x5 did not enter the stepwise analysis. The procedure took four steps. The final model was:

y1 = 5.96 – 5.00 x3 + 3.22 x1 + 1.78 x2 + 1.56 x6

2 The R for this model is .5929, and se is 3.36. The t ratios are: x3 : t = 3.07; x1 :t = 2.05; x2: t = 2.02; and x6: t = 1.98.

14.17 Stepwise Regression:

Step 1: After developing a simple regression model for each independent variable, we select the model for Durability 2 with t = 3.32. For this model: R = .379 and se = 15.48. The regression equation is: Amount Spent = 17.093 + 7.135 Durability

Step 2: The regression models are explored that contain Value or Service in addition to Durability. The t value of the regression coefficient for Value (Service) is not significant. No new variable is added to the model produced in Step 1.

14.19 y x1 x2 x3

y 1 -.653 -.891 .821

x1 -.653 1 .650 -.615

x2 -.891 .650 1 -.688

x3 .821 -.615 -.688 1

Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition Student’s Solutions Manual and Study Guide: Chapter 14 Page 9

There appears to be some correlation between all pairs of the predictor variables, x1, x2, and x3. All pairwise correlations between independent variables are in the .600 to .700 range.

14.21 The predictor intercorrelations are:

Value Durability Service

Value 1 .559 .533

Durability .559 1 .364

Service .533 .364 1

An examination of the predictor intercorrelations reveals that Service and Durability have very little correlation, but Value and Durability have a correlation of .559 and Value and Service a correlation of .533. These correlations might suggest multicollinearity.

14.23 The log of the odds ratio or logit equation is:

ln (S)  0.932546  0.0000323Payroll Expenditures.

The G statistic is 11.175 which with one degree of freedom has a p-value of 0.001. Thus, there is overall significance in this model. The predictor, Payroll Expenditures, is significant at  = .01 because the associated p-value of 0.008 is less than  = .01. If the payroll expenditures are $80,000, then ln (S)  0.932546  0.0000323(80,000) 14.5 PR ln(S)  3.516546 OBLEMS S  e3.516546  0.0297. From this, the probability that the hospital with the $80,000 payroll expenditure is a psychiatric hospital can be determined by  S 0.0297 p    0.0288 or 2.88% . S 1 0.0297 1

Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition Student’s Solutions Manual and Study Guide: Chapter 14 Page 10

14.25 The log of the odds ratio or logit equation is:

ln (S)  3.07942  0.0544532 Number of Production Workers .

The G statistic is 97.492 which with one degree of freedom has a p-value of 0.000. Thus, there is overall significance in this model. The p-value associated with the predictor variable, Number of Production Workers, is 0.000. This indicates that Number of Production Workers is a significant predictor in the model at  = .001. If the number of production workers is 30, then ln (S)  3.07942  0.0544532 30 ln (S)  1.445824 S  e1.445824  0.23555.

From this, the probability that that a company with 30 production workers has a large value of industrial shipments can be determined by  S 0.23555 p    0.1906 or 19.06% . S 1 0.23555 1

14.27 The regression model is:

yˆ = 564.2 - 27.99 x1 - 6.155 x2 - 15.90 x3

2 2 F = 11.32 with p = .003, se = 42.88, R = .809, adjusted R = .738.Thus,

overall there is statistical significance at  = .01, For x1, t = -0.92 with p = .384, for x2, t = -4.34 with p = .002, for x3, t = -0.71 with p = .497. Thus, only one of the three predictors, x2, is a significant predictor in this model. This model has very good predictability (R2 = .809). The gap between R2 and adjusted R2 underscores the fact that there are two non-significant predictors in this model. x1 is a non-significant indicator variable.

14.29 Stepwise Regression:

Step 1: After developing a simple regression model for each independent variable(x1 , Log x1) , we select the model for Log x1 because it has the largest absolute value of t = 17.36 ( p-value of 0.000). For this model: R2 = .9617. The model appears in the form:

Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition Student’s Solutions Manual and Study Guide: Chapter 14 Page 11

yˆ = - 13.20 + 11.64 Log x1.

Step 2: The regression model with two predictors is explored that contains x1 in addition to Log x1. At this step, the t ratio for x1 is 0.90 with the p-value = 0.386. It indicates that the predictor x1 is non-significant. No new variable is added to the model produced in Step 1.

14.31 Stepwise Regression:

Step 1: After developing a simple regression model for each independent variable (Copper , Silver, Aluminum), we select the model with Silver because it has the largest absolute value t statistic: tSilver = 3.32 ( p-value of 0.007). The predictor Silver is significant at  = .01. For this model: R2 = 0.5244. The regression equation is Gold = 233.4 + 17.74 Silver .

Step 2: The regression models with two predictors are explored that contain Copper (or Aluminum) in addition to Silver. At this step, analysis of the t statistics shows the best model: Gold = – 50.07 + 18.86 Silver +3.587 Aluminum. The R2 at this step is .8204, the t ratio for Silver is 5.43 with p = .0004, and the t ratio for Aluminum is 3.85 with p = .004.

Step 3: A search is made to determine whether the variable Copper in conjunction with Silver and Aluminum produces the largest significant absolute t value in the model. The model does not produce significant result. No new variable is added to the model produced in Step 2.

14.33 Let Beef = x1, Chicken = x2, Eggs = x3, Bread = x4, Coffee = x5, and Price Index = y. Stepwise Regression:

Step 1: Using graphs and Tukey’s ladder of transformations we develop a simple regression model for each independent variable (x1, Log x2, x3 , x4 , x5 ). We select the model for x1 because it has the largest absolute value of t = 13.67. For this model: R2 = .8696. The model appears

in the form yˆ = 93.62 + 0.2080 x1.

Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition Student’s Solutions Manual and Study Guide: Chapter 14 Page 12

Step 2: The regression models with two predictors are explored that contain Logx2 (or x3, x4, x5) in addition to x1. At this step, analysis of the t statistics shows the best model:

yˆ = 86.96 + 0.1427 x1 + 0.08561 x4. 2 The R at this step is .9033, the t ratio for x1 is 5.67 with p = .000, and the t ratio for x4 is 3.06 with p = .005 (it is significant at  = .01).

Step 3: A search is made to determine which of the remaining independent variables in conjunction with x1 and x4 produces the largest significant absolute t value in the model. None of the models produce significant results. No new variables are added to the model produced in Step 2.

14.35 Stepwise Regression:

Step 1: After developing a simple regression model for each independent variable (Familiarity, Satisfaction , Proximity), we select the model with Familiarity because it has the largest absolute value t statistic: t Familiarity = 6.71 ( p-value of 0.000). The predictor Familiarity is significant at  = .001.For this model: R2 = 0.6167. The regression equation is Number of Visits = 0.05488 + 1.0915 Familiarity.

Step 2: A search is made to determine whether the variable Satisfaction or Proximity in conjunction with Familiarity produces the significant absolute t value in the model. None of the models produce significant results. No new variables are added to the model produced in Step 1.

14.37 The output shows that that the stepwise regression procedure stopped at 2 Step 3. At step 1, the model with x3 is selected. R = .8124 and t statistic

for x3: t = 6.90 . The regression equation is yˆ = 74.81 + 0.099 x3.

At step 2, x2 is entered into the model along with x3. The regression

equation is yˆ = 82.18 + 0.067 x3 – 2.26 x2. The t statistics are t  3.65 and t  2.32 . 2 x3 x2 The R for this model is .8782. At step 3, x1 is

entered into the model along with x3 and x2. The procedure stops

Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition Student’s Solutions Manual and Study Guide: Chapter 14 Page 13

here with a final model of: yˆ = 87.89 + 0.071 x3 – 2.71 x2 – 0.256 x1. The t  5.22 , t  3.71 and t  3.08 . 2 t statistics are : x3 x2 x1 The R for this model is .9407 indicating very strong predictability.

14.39 The log of the odds ratio or logit equation is:

ln(S)  3.94828 1.36988Number of kilometres .

The G statistic is 100.537 with p-value of 0.000. Thus, the model is significant overall. The degree of freedom is equal to 1. The p-value associated with the predictor variable, Number of kilometres, is 0.000. This indicates that Number of kilometres is a significant predictor in the model at  = .001. If a shopper drives 5 kilometres to get to the store, then ln (S)  3.94828 1.36988(5) ln (S)  2.90112 S  e2.90112 18.1945 .

From this, the probability that that a person would purchase something can be determined by  S 18.1945 p    0.948 or about 95% . S 1 18.1945 1

This indicates that there is very high probability that the person who drives 5 kilometres would purchase something. For 4 kilometres, the probability drops to .822. For 3 kilometres, the probability drops to .540 (almost a coin toss). For 2 kilometres, the probability drops to .230. For 1 kilometer, the probability drops to .071.

Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition Student’s Solutions Manual and Study Guide: Chapter 14 Page 14

Legal Notice

Copyright

Copyright © 2014 by John Wiley & Sons Canada, Ltd. or related companies. All rights reserved.

The data contained in these files are protected by copyright. This manual is furnished under licence and may be used only in accordance with the terms of such licence.

The material provided herein may not be downloaded, reproduced, stored in a retrieval system, modified, made available on a network, used to create derivative works, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise without the prior written permission of John Wiley & Sons Canada, Ltd.

(MMXIII xii FI)

Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition

Recommended publications