<<

Lecture 14 Multiple and

Fall 2013 Prof. Yao Xie, [email protected] H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

1 Outline

• Multiple regression • Logistic regression

2 Simple linear regression

Based on the scatter diagram, it is probably reasonable to assume that the of the Y is related to X by the following simple linear regression model:

Response Regressor or Predictor

Yi = β0 + β1X i + εi i =1,2,!,n

2 ε i 0, εi ∼ Ν( σ ) Intercept Random error

where the slope and intercept of the line are called regression coefficients.

• The case of simple linear regression considers a single regressor or predictor x and a dependent or response variable Y.

3 Multiple linear regression

• Simple linear regression: one predictor variable x • Multiple linear regression: multiple predictor variables x1, x2, …, xk • Example: • simple linear regression property tax = a*house price + b • multiple linear regression

property tax = a1*house price + a2*house size + b • Question: how to fit multiple linear regression model?

4 JWCL232_c12_449-512.qxd 1/15/10 10:06 PM Page 450

450 CHAPTER 12 MULTIPLE LINEAR REGRESSION

12-2 HYPOTHESIS TESTS IN MULTIPLE 12-4 PREDICTION OF NEW LINEAR REGRESSION OBSERVATIONS 12-2.1 Test for Significance of 12-5 MODEL ADEQUACY CHECKING Regression 12-5.1 Residual Analysis 12-2.2 Tests on Individual Regression 12-5.2 Influential Observations Coefficients and Subsets of Coefficients 12-6 ASPECTS OF MULTIPLE REGRESSION MODELING 12-3 CONFIDENCE INTERVALS IN MULTIPLE LINEAR 12-6.1 Models REGRESSION 12-6.2 Categorical Regressors and 12-3.1 Confidence Intervals on Indicator Variables JWCL232_c12_449-512.qxdIndividual Regression 1/15/10 10:0712-6.3 PM Selection Page 451 of Variables and Coefficients Model Building JWCL232_c12_449-512.qxd 1/15/10 10:07 PM Page 451 JWCL232_c12_449-512.qxd 1/15/10 10:07 PM Page 451 12-3.2 on 12-6.4 JWCL232_c12_449-512.qxd 1/15/10 10:07 PM Pagethe 451Mean Response

LEARNING OBJECTIVES 12-1 MULTIPLE LINEAR REGRESSION MODEL 451

12-1 MULTIPLE LINEAR REGRESSION MODEL 451 12-1 MULTIPLE LINEAR REGRESSION MODEL 451 After careful study of this chapter you should be able to do the following: x2 1. Use multiple regression techniques12-1 MULTIPLEto build LINEARempirical REGRESSION models MODELto engineering 451 and scientific10 x2 220 x2 10 10 240x2 8 2. Understand how the method of extends to fitting220 multiple220 regression models 20010 203 240 220 240 3. Assess regression88 model adequacy 240 160 203 6 200200 8 203 4. Test hypotheses and construct confidence intervals on the regression coefficients 186 200 E(Y) 120 203 160 66 160 4 160 5. Use the regression model to estimate806 the mean response and to make predictions and to construct E(Y)E(Y)120120 186 186 E(Y) 120 confidence intervals and prediction40 intervals 186 10 80 44 169 80 4 8 2 80 6. Build regression10 models with polynomial0 terms 6 JWCL232_c12_449-512.qxd4040 1/15/10 10:07 PM 10 Page 451 169 169 4 40 88 2 10 0 7. Use indicator variables2 to model categorical2 regressors 2 169x2 67 84 101 118 135 152 0 66 8 2 4 6 0 4 6 8 0 0 0 0 0 4 x1 10 2 2 8.2 Use stepwisex2 regression4 and67 other84 model101 building118 techniques135 to152 select the appropriate set of vari-0 246810x1 44 60 2 2 x2 67 84 10167 11884 101135 118152 135 152 6 8 4 0 002 x2 (a) x1 8 10 0 6 0 2468100 x (b) x1 10x ables for8 a regression10 0 0 model2468101 x (a) 1 0 2468101 x1 (a) (a) (b) Figure 12-1 (a)(b) The regression(b )plane for the model E(Y ) ϭ 50 ϩ 10x1 ϩ 7x2. (b) The contour plot. Figure 12-1 (a) The regression plane for the model E(Y ) ϭ 50 ϩ 10x1 ϩ 7x2. (b) The contour plot. Figure 12-1 (a) The regressionFigure 12-1 plane(a) for The the regression model E plane(Y ) ϭ for50 theϩ model10x1 ϩE(Y7)x2ϭ. (b)50 Theϩ 10 contourx1 ϩ 7x 2plot.. (b) The contour plot. 12-1 MULTIPLE LINEAR REGRESSION MODEL The regression model in Equation 12-1 describes a plane in the three-dimensional space The regression modelThe in regression Equation 12-1model describes in Equation a plane 12-1 in describes the three-dimensional a planeof Y in, x the, and three-dimensional space x . Figure 12-1(a) space shows this plane for the regression model 12-1.1of YThe, x , Introductionregressionand x . Figure model 12-1(a) in Equation shows this 12-1 plane describes for the regression a plane in model the three-dimensional1 space2 1 2 of Y, x1, and x2. Figure 12-1(a) shows this plane for the regression model of Y, x1, and x2. Figure 12-1(a) shows this plane for the regression model 12-1 MULTIPLE LINEAR REGRESSION MODEL 451 E Y ϭ 50 ϩ 10x ϩ 7x E Y ϭ 50 ϩ 10x ϩ 7x 1 2 Many applications of regression1E Y analysisϭ2 50 ϩ 10involvex1 ϩ 7 xsituations2 in which there are more than one E Y ϭ 50 ϩ 10x1 ϩ 7x2 regressor or predictorMultiple variable. Alinear regression modelregressionwhere that contains we have moreassumed modelthan that one the regressor expected1 2 vari- value of the error term is zero; that is E(⑀) ϭ 0. The where we have assumedwhere that we the have expected assumed1 2 value that ofthe the expected error1 2 term value is of zero; the errorthat is term xE2(⑀ )isϭ zero;0. The that is E(⑀) ϭ 0. The able is called 1a multiple2 regression model. whereparameter we have ␤ is assumed the intercept that the␤of expectedthe plane. value We sometimes of the error call term ␤ andis zero; ␤ partial thatparameter is ␤Eregression(⑀) ϭ␤␤00.is The the intercept of the plane. We sometimes call ␤1 and ␤2 partial regression 0 parameterAs an0 example,is• theMultiple linear regression model with two regressors intercept supposeof the that plane. the effective We1 sometimes life2 of call10 a cutting 1 and tool2 partial depends regression on the cutting speed parametercoefficients, ␤0 becauseis the coefficients,intercept ␤ measuresofbecause the the plane. expected ␤ measures We changesometimes the in expected Y percall unit ␤ change1 andchange ␤in2coefficients, partialYinper x whenunit regression change x becauseis in x ␤when1 measures x is the expected change in Y per unit change in x1 when x2 is and1 the tool angle.JWCL232_c12_449-512.qxd A1 multiple regression 1/15/10 model 10:07 that PM might Page1 describe451 2 this relationship1 2 is 220 coefficients,held constant,because and ␤held2 measures␤1 constant,measures the and expectedthe ␤(predictor variables) 2expectedmeasures change change the in expected Y per in unitY perchange change unit in change inY perxheld2 when unit in constant, changex x11whenis held in andx x22iswhen ␤2 measures x1 is held the expected change in Y per unit change in x2 when x1 is held 240 heldconstant. constant, Figure and 12-1(b) ␤constant.measures shows Figure athe contour expected12-1(b) plot shows changeof the a contour regression in Y per plot unit model—thatof changethe regression inconstant. is, x lineswhen model—that of Figure x con-is held 12-1(b) is, lines showsof con- a contour plot of the regression model—that is, lines of con- 2 ϭ␤ ϩ␤ ϩ␤82 ϩ⑀1 stant E(Y ) as a functionstant of E (xY )and as ax function. Notice ofthat x theand contour x . NoticeY lines that 0in the this contour 1plotx1 are lines straight2x 2in this lines. plot are straight lines. (12-1) 200 constant. Figure 12-1(b) shows1 a contour2 plot1 of the2 regression model—thatstant is, E (linesY ) as of a con-function of x1 and x2. Notice that the contour lines in this 203plot are straight lines. stant EIn( Ygeneral,) as a function the dependent ofIn xgeneral,1 and variable x 2the. Notice dependentor responsethat the variable contourY mayor belines response related in this toY plotmayk independent arebeIn relatedstraight general, to orlines. k theindependent dependentor variable or response Y may be related to k independent or where Y represents the tool life, x1 represents the cutting speed, x2 represents the tool angle, 160 regressor variables.regressorThe model variables. The model 6 12-1 MULTIPLE LINEAR REGRESSION MODEL In general, the dependentand ⑀ is a variablerandom erroror response term. ThisY may is abe multiple related tolinearregressor k independent regression variables. modelor The with model two regressors. 451 E(Y) regressor variables. The model 186 120 p ˛ ˛ The term linear is usedϭ␤ becauseϩ␤ Equation˛ ϩ␤˛ 12-1ϩ isϩ␤ a linearϩ⑀ function of the unknown parameters Y ϭ␤0 ϩ␤1x1 ϩ␤Y2x2 ϩ0 p ϩ␤1x1 k xk ϩ⑀2x2 k xk (12-2) x2 (12-2) p ˛ ˛ 10 Y ϭ␤0 ϩ␤1x1 ϩ␤2x2 ϩ ϩ␤k xk ϩ⑀ (12-2) ␤0, ␤1, and ␤2. 4 220 80 ϭ␤ ϩ␤ ϩ␤ ϩ p ϩ␤˛ ˛ ϩ⑀ is calledY a multiple0 1 xlinear1 regression2x2 modelk x withk k regressor variables.(12-2) The parameters ␤ , is called a multiple linear regression model with k regressor240 variables. The parameters ␤j, 8 j is called a multiple linear regression model with k regressor203 variables. The parameters ␤ , ϭ j ϭ 0, 1, p , k, are called the regression200 coefficients.10 This model describes a hyperplane in j 40 j 0, 1, p , k, are called the regression coefficients. This model describes a hyperplane in ␤ 169 is called a multiple thelinear k-dimensional regression spacemodel of with the regressork160regressor variables8 variables. {x }. jThe Theϭ 0,parameters parameter 1, p , k, ␤arejrepresents, called6 the the regression coefficients. This model describes a hyperplane in the k-dimensional space of the regressor variables {xj}. The parameterj ␤j represents2 the j 0 j ϭ 0, 1, p , k, are called the regression coefficients.E(Y) 120 This6 model describes a hyperplane in $ 186 ␤ expected change in responseexpected Ychangeper unit in changeresponse in Y xperwhen unit all change the remaining in xj when regressors allthe the kremaining-dimensional x (i $ regressorsj) space4 xi (i of thej) regressor variables {xj}. The parameter j represents the 0 j 804 i the k2-dimensional spaceare held of constant. the regressor variables {xj}. The parameter ␤j represents the10 $ are held constant. 40 x2 expected change67 in response84 Y per unit101 change in x118j when 169all the135 remaining152 regressors xi (i j) 4 2 8 2 expected change in response6Multiple Y per linear unit regressionchange in modelsxj when0 are all oftenthe remaining used as approximating regressors0 x ifunctions.(6i $ j) That is, the Multiple linear regression models8 are often 0used as approximating functions.are held That constant. is,4 the x 0 2 67 84 101 118 135 152 1 10 4 0 2 x2 246810x are held constant. true functional relationship between Y and x1, x2, p , 6xk is unknown, but over certain0 ranges 1 true functional relationship between Y and x , x , , x is unknown,x but over8 certainMultiple10 0 ranges linear regression models are often used as approximating functions. That is, the 1 2 p k 1 0 246810x1 Multiple linear ofregression the independent(a models) variables are often the used linear as regressionapproximating model functions. (ais) an adequate That approximation. is, the of the independent variables the linear regression model is an adequate approximation.true functional relationship between( Yb)(andb) x1, x2, p , xk is unknown, but over certain ranges true functional relationshipModels between that are Y moreand xcomplex, x , in, xstructureis unknown, than Equation but over 12-2 certain may often ranges still be analyzed Models that are more complex in structure1 than2Figure pEquation 12-1k (a) 12-2 The regressionmay often planeof still for the the be modelindependent analyzed E(Y ) ϭ 50 ϩ variables10x1 ϩ 7x2. (b) the The linear contour plot.regression model is an adequate approximation. Figure 12-1 ofby(a) the multiple Theindependent regression linear regression variablesby multiple plane techniques.the linear linear for regression theregression For modelexample, techniques. model Econsider(Y is For) anϭ example, theadequate50 cubicϩ consider polynomial approximation.10x theϩ cubic7modelx . polynomial (b) The modelcontour plot. in one regressor variable. 1Models that2 are more complex in structure than Equation 12-2 may often still be analyzed in oneModels regressor that arevariable. more complex in structure than Equation 12-2 may often still be analyzed 5 The regressionby multiple model inlinear Equation regression 12-1 describes techniques. a plane in the For three-dimensional example, considerspace the cubic polynomial model by multiple linear regression techniques. For example, consider the2 cubic 3polynomial model Y ϭ␤2 0 ϩ␤3 1x ϩ␤2ofx Yϩ␤, x1, and3xin x 2ϩ⑀one. Figure regressor 12-1(a) shows variable. this plane(12-3) for the regression model in one regressor variable. Y ϭ␤0 ϩ␤1x ϩ␤2x ϩ␤3x ϩ⑀ (12-3) If we let x ϭ x, x ϭ x2, x ϭ x3, Equation 12-3 can be written as E Y ϭ 50 ϩ 10x1 ϩ 7x2 2 3 2 1 3 2 3 2 3 Y ϭ␤0 ϩ␤1x ϩ␤2x ϩ␤3x ϩ⑀ (12-3) If we let x1 ϭ x, x2 ϭ x , x3 ϭYxϭ␤, Equationϩ␤ x12-3ϩ␤ canx beϩ␤ writtenx ϩ⑀ as (12-3) The regression0 1 model2 in3 Equationwhere we have12-1 assumed describes that the expected1 2 a value plane of the error in term the is zero; three-dimensional that is E(⑀) ϭ 0. The space Y ϭ␤ ϩ␤x ϩ␤x ϩ␤x ϩ⑀ 2 (12-4) 3 0 1 1 2parameter2 3 ␤If30 iswe the let intercept x ϭ ofx, thex plane.ϭ x We, x sometimesϭ x , Equation call ␤1 and ␤12-32 partial can regression be written as 2 Y ϭ␤3 0 ϩ␤1x1 ϩ␤2x2 ϩ␤3x3 ϩ⑀ (12-4)1 2 3 If we let x1 ϭ xof, x2 Yϭ, xx,1 x,3 andϭ x , xEquation2. Figure 12-3 12-1(a)can be written shows ascoefficients, thisbecause plane ␤1 measures for the expected regression change in Y per modelunit change in x1 when x2 is which is a multiple linear regression model with threeheld constant, regressor and variables.␤2 measures the expected change in Y per unit change in x2 when x1 is held which is a multiple linear regression model with three regressor variables.constant. Figure 12-1(b) shows a contour plot of theY regressionϭ␤0 ϩ␤ model—that1x1 ϩ␤ is,2x lines2 ϩ␤ of con-3x3 ϩ⑀ (12-4) Y ϭ␤0 ϩ␤1x1 ϩ␤2x2 ϩ␤3x3 ϩ⑀ (12-4) EstantY E(Y ) ϭas a function50 ϩ of x110and xx2. Noticeϩ that7x the contour lines in this plot are straight lines. In general,which the dependentis a multiple variable1 linearor response regression2 Y may bemodel related with to k independent three regressoror variables. which is a multiple linear regression model with three regressor variables.regressor variables. The model

p ˛ ˛ where we have assumed that the expected1 2 valueY ofϭ␤ 0theϩ␤1 xerror1 ϩ␤2x2 ϩtermϩ␤ k xisk ϩ⑀ zero; that is(12-2) E(⑀) ϭ 0. The

is called a multiple linear regression model with k regressor variables. The parameters ␤j, parameter ␤0 is the intercept of thej ϭ 0, plane. 1, p , k, are Wecalled thesometimes regression coefficients. call This ␤ model1 and describes ␤ 2a hyperplanepartial in regression ␤ ␤ the k-dimensional space of the regressor variables {xj}. The parameter j represents the coefficients, because 1 measuresexpected the changeexpected in response Ychangeper unit change in in xYj whenper all the unit remaining change regressors x i (ini $ jx) 1 when x2 is are held constant. held constant, and ␤2 measures the expectedMultiple linear regressionchange models in are Y oftenper used unitas approximating change functions. in Thatx2 is,when the x1 is held true functional relationship between Y and x1, x2, p , xk is unknown, but over certain ranges constant. Figure 12-1(b) shows a contourof the independent plot variablesof the the linear regression regression model is anmodel—that adequate approximation. is, lines of con- Models that are more complex in structure than Equation 12-2 may often still be analyzed stant E(Y ) as a function of x1 and xby2. multiple Notice linear regressionthat the techniques. contour For example, lines consider in the this cubic polynomialplot aremodel straight lines. In general, the dependent variablein one regressoror variable. response Y may be related to k independent or 2 3 regressor variables. The model Y ϭ␤0 ϩ␤1x ϩ␤2x ϩ␤3x ϩ⑀ (12-3) 2 3 If we let x1 ϭ x, x2 ϭ x , x3 ϭ x , Equation 12-3 can be written as

Y ϭ␤0 ϩ␤1x1 ϩ␤2x2 ϩ␤3x3 ϩ⑀ (12-4) Y ϭ␤0 ϩ␤1x1 ϩ␤2x2 ϩ p ϩ␤˛k x˛k ϩ⑀ (12-2) which is a multiple linear regression model with three regressor variables. is called a multiple linear regression model with k regressor variables. The parameters ␤j, j ϭ 0, 1, p , k, are called the regression coefficients. This model describes a hyperplane in the k-dimensional space of the regressor variables {xj}. The parameter ␤j represents the expected change in response Y per unit change in xj when all the remaining regressors xi (i $ j) are held constant. Multiple linear regression models are often used as approximating functions. That is, the true functional relationship between Y and x1, x2, p , xk is unknown, but over certain ranges of the independent variables the linear regression model is an adequate approximation. Models that are more complex in structure than Equation 12-2 may often still be analyzed by multiple linear regression techniques. For example, consider the cubic polynomial model in one regressor variable.

2 3 Y ϭ␤0 ϩ␤1x ϩ␤2x ϩ␤3x ϩ⑀ (12-3)

2 3 If we let x1 ϭ x, x2 ϭ x , x3 ϭ x , Equation 12-3 can be written as

Y ϭ␤0 ϩ␤1x1 ϩ␤2x2 ϩ␤3x3 ϩ⑀ (12-4) which is a multiple linear regression model with three regressor variables. JWCL232_c12_449-512.qxd 1/15/10 10:07 PM Page 451

JWCL232_c12_449-512.qxd 1/15/10 10:07 PM Page 451

12-1 MULTIPLE LINEAR REGRESSION MODEL 451

x2 12-1 MULTIPLE LINEAR REGRESSION10 MODEL 451 220

240 8 x 200 2 203 10 160 6 220 240 E(Y) 120 8 186 203 200 80 4 160 6 40 10 169 E(Y) 8 2 186 120 0 6 4 4 80 0 2 4 2 x2 67 84 101 118 135 152 6 10 0 40 x 8 10 0 169 1 8 2 0 246810x1 0 6 (a) 4 (b) 0 2 4 2 x2 67 84 101 118 135 152 6 0 Figurex 12-1 8(a) The10 regression0 plane for the model E(Y ) ϭ 50 ϩ 10x ϩ 7x . (b) The contour plot. 1 0 2468101 2 x1 (a) (b)

Figure 12-1 (a) The regression plane for the model E(Y ) ϭ 50 ϩ 10x1 ϩ 7x2. (b) The contour plot. The regression model in Equation 12-1 describes a plane in the three-dimensional space

of Y, x1, and x2. Figure 12-1(a) shows this plane for the regression model The regression model in Equation 12-1 describes a plane in the three-dimensional space of Y, x1, and x2. Figure 12-1(a) shows this plane for the regressionE Y ϭ model50 ϩ 10x1 ϩ 7x2

where we haveE Y assumedϭ 50 ϩ 10thatx1 theϩ 7 expectedx2 1 2 value of the error term is zero; that is E(⑀) ϭ 0. The parameter ␤0 is the intercept of the plane. We sometimes call ␤1 and ␤2 partial regression where we have assumed that the expected1 2 value of the error term is zero; that is E(⑀) ϭ 0. The coefficients, because ␤1 measures the expected change in Y per unit change in x1 when x2 is parameter ␤0 is the interceptheld constant,of the plane. and ␤ Wemeasures sometimes the call expected ␤1 and change␤2 partial in Y regressionper unit change in x when x is held ␤ 2 2 1 coefficients, because constant.1 measures Figure the expected 12-1(b) change shows ina contour Y per unit plot changeof the in regressionx1 when x2 model—thatis is, lines of con- held constant, and ␤ measures the expected change in Y per unit change in x when x is held 2 stant E(Y ) as a function of x and x . Notice that the2 contour1 lines in this plot are straight lines. constant. Figure 12-1(b) shows a contour plot of the1 regression2 model—that is, lines of con- In general, the dependent variable or response Y may be related to k independent or stant E(Y ) as a function of x1 and x2. Notice that the contour lines in this plot are straight lines. In general, the dependentregressor variable variables.or responseThe modelY may be related to k independent or regressor variables. The model Y ϭ␤0 ϩ␤1x1 ϩ␤2x2 ϩ p ϩ␤˛k x˛k ϩ⑀ (12-2)

Y ϭ␤0 ϩ␤1x1 ϩ␤2x2 ϩ p ϩ␤˛k x˛k ϩ⑀ (12-2) is called a multiple linear regression model with k regressor variables. The parameters ␤j, is called a multiple linearj ϭ 0,regression 1, p , k model, are called with ktheregressor regression variables. coefficients. The parameters This model ␤j, describes a hyperplane in j ϭ 0, 1, p , k, are calledthe thek-dimensional regression coefficients. space of the This regressor model describes variables a hyperplane {xj}. The in parameter ␤j represents the the k-dimensional spaceexpected of the change regressor in response variables Y {perxj}. unitThe change parameter in x␤j whenj represents all the theremaining regressors xi (i $ j) expected change in response Y per unit change in x when all the remaining regressors x (i $ j) JWCL232_c12_449-512.qxd 1/15/10are held 10:07 constant. PM Page j452 i are held constant. Multiple linear regression models are often used as approximating functions. That is, the Multiple linear regressiontrue functional models arerelationship often used between as approximating Y and x functions., x , , x Thatis unknown, is, the but over certain ranges JWCL232_c12_449-512.qxd 1/15/10 10:07 PM Page 452 1 2 p k true functional relationshipMoreof the complex between independent Y and variables xmodels1, x2, p , thexk is linear unknown,can regression still but over bemodel certain is anranges adequate approximation. of the independent variablesModels the linear that regression are more modelcomplex is an in adequatestructure approximation. than Equation 12-2 may often still be analyzed Models that are more complex in structure than Equation 12-2 may often still be analyzed analyzedby multiple using linear regression multiple techniques. linear For example, regression consider the cubic polynomial model by multiple linear regression techniques. For example, consider the cubic polynomial model in one regressor variable. in one regressor •variable.Cubic polynomial 452 CHAPTER 12 MULTIPLE LINEAR REGRESSION 2 3 2 Y ϭ␤3 0 ϩ␤1x ϩ␤2x ϩ␤3x ϩ⑀ (12-3) 452 CHAPTER 12 MULTIPLE LINEAR !REGRESSIONY ϭ␤0 ϩ␤1x ϩ␤2x ϩ␤3x ϩ⑀ (12-3) If2 weModels let x3 ϭ thatx, x includeϭ x2, x interactionϭ x3, Equationeffects 12-3 may can also be written be analyzed as by multiple linear regres- If we let x1 ϭ x, x2 ϭ! x , x3 ϭ x ,1 Equation2 12-3 can3 be written as Models that includesion methods. Aneffects interaction may also betweenbe analyzed two by variables multiple linear can beregres- represented by a cross-product ! sion methods. An interactionterm inY ϭ␤ thebetween 0model,ϩ␤ 1twox1 suchϩ␤ variables 2asx2 ϩ␤Y ϭ␤can3x3 0beϩ⑀ϩ␤ represented1x1 ϩ␤2 xby2 ϩ␤a cross-product3x3(12-4)ϩ⑀ (12-4) term in the model,• such as which is a multiple linearInteraction effectwhich regression is a multiple model linearwith three regression regressor model variables. with three regressor variables. Y ϭ␤0 ϩ␤1x1 ϩ␤2x2 ϩ␤12x1x2 ϩ⑀ (12-5) Y ϭ␤0 ϩ␤1x1 ϩ␤2x2 ϩ␤12x1x2 ϩ⑀ (12-5)

If we let x3 ϭ x1x2 and ␤3 ϭ␤12, Equation 12-5 can be written as If we let x3 ϭ x1x2 and ␤3 ϭ␤12, Equation 12-5 can be written as

Y ϭ␤0 ϩ␤1x1 ϩ␤2x2 ϩ␤3x3 ϩ⑀ Y ϭ␤0 ϩ␤1x1 ϩ␤2x2 ϩ␤3x3 ϩ⑀ which is a linear regressionwhich model.is a linear regression model. Figure 12-2(a) and (b)Figure shows 12-2(a)the three-dimensional and (b) shows plot the of three-dimensional the regression model plot of the regression model 6 ϭ ϩ ϩ ϩ Y 50 10x1 7x2 Y5ϭx1x502 ϩ 10x1 ϩ 7x2 ϩ 5x1x2 and the correspondingand two-dimensional the corresponding contour two-dimensional plot. Notice that, contour although plot. this Noticemodel is that, a although this model is a linear regression model, the shape of the surface that is generated by the model is not linear. linear regression model, the shape of the surface that is generated by the model is not linear. In general, any regression model that is linear in parameters (the ␤’s ) is a linear regression model, regardless ofIn the general, shape of any the regression surface that model it generates. that is linear in parameters (the ␤’s ) is a linear regression Figure 12-2 providesmodel, a nice regardless graphical interpretation of the shape of of an theinteraction. surface Generally, that it generates. interaction implies that the effect producedFigure by 12-2 changing provides one avariable nice graphical (x1, say) dependsinterpretation on the oflevel an ofinteraction. the Generally, interaction other variable (x2). Forimplies example, that Fig. the 12-2 effect shows produced that changing by changing x1 from 2one to 8variable produces ( xa1 much, say) depends on the level of the smaller change in E(Yother) when variable x2 ϭ 2 (thanx2). whenFor example, x2 ϭ 10. InteractionFig. 12-2 showseffects thatoccur changing frequently x 1infrom 2 to 8 produces a much the study and analysissmaller of real-world change systems, in E(Y and) when regression x2 ϭ 2methods than when are one x2 ϭof the10. techniquesInteraction effects occur frequently in that we can use to describethe study them. and analysis of real-world systems, and regression methods are one of the techniques As a final example,that consider we can theuse second-orderto describe them. model with interaction As a final example, consider the second-order model with interaction 2 2 Y ϭ␤0 ϩ␤1x1 ϩ␤2 x2 ϩ␤11x1 ϩ␤22 x2 ϩ␤12 x1x2 ϩ⑀ (12-6) 2 2 2 2 Y ϭ␤0 ϩ␤1x1 ϩ␤2 x2 ϩ␤11x1 ϩ␤22 x2 ϩ␤12 x1x2 ϩ⑀ (12-6) If we let x3 ϭ x1, x4 ϭ x 2, x5 ϭ x1x2, ␤3 ϭ␤11, ␤4 ϭ␤22, and ␤5 ϭ␤12, Equation 12-6 can be written as a multiple linear regression model as follows: 2 2 If we let x3 ϭ x1, x4 ϭ x 2, x5 ϭ x1x2, ␤3 ϭ␤11, ␤4 ϭ␤22, and ␤5 ϭ␤12, Equation 12-6 can be

Ywrittenϭ␤0 ϩ␤ as a1x multiple1 ϩ␤2 x2 linearϩ␤3 x 3regressionϩ␤4 x4 ϩ␤ model5 x5 ϩ⑀ as follows:

Figure 12-3(a) and (b) show the three-dimensionalY ϭ␤ plot0 ϩ␤ and1x the1 ϩ␤ corresponding2 x2 ϩ␤3 x contour3 ϩ␤4 xplot4 ϩ␤ for 5 x5 ϩ⑀

2 2 EFigureY ϭ 12-3(a)800 ϩ 10 andx1 ϩ(b)7 showx2 Ϫ 8.5thex 1three-dimensionalϪ 5x2 ϩ 4x˛1x2 plot and the corresponding contour plot for 1 2 These plots indicate that the expected change in Y when x1 is changed by one unit2 (say) is2 a E Y ϭ 800 ϩ 10x1 ϩ 7x2 Ϫ 8.5x1 Ϫ 5x2 ϩ 4x˛1x2 function of both x1 and x2. The quadratic and interaction terms in this model produce a mound- shaped function. Depending on the values of the regression coefficients, the second-order 1 2 model with interactionThese is capable plots indicateof assuming that athe wide expected variety changeof shapes; in Ythus,when it isx 1ais very changed by one unit (say) is a flexible regression model.function of both x1 and x2. The quadratic and interaction terms in this model produce a mound- shaped function. Depending on the values of the regression coefficients, the second-order 12-1.2 Least Squares Estimation ofmodel the Parameters with interaction is capable of assuming a wide variety of shapes; thus, it is a very flexible regression model. The method of least squares may be used to estimate the regression coefficients in the mul- tiple regression model, Equation 12-2. Suppose that n Ͼ k observations are available, and let 12-1.2 Least Squares Estimation of the Parameters

The method of least squares may be used to estimate the regression coefficients in the mul- tiple regression model, Equation 12-2. Suppose that n Ͼ k observations are available, and let JWCL232_c12_449-512.qxd 1/15/10 10:07 PM Page 453

JWCL232_c12_449-512.qxd 1/15/10 10:07 PM Page 452

12-1 MULTIPLE LINEAR REGRESSION MODEL 453

452 CHAPTER 12 MULTIPLE LINEAR REGRESSION

Models that include interaction effects may also be analyzed by multiple linear regres- 800 sion methods. An interaction between two variables can be represented by a cross-product term in the model, such as 600 Y ϭ␤ ϩ␤x ϩ␤x ϩ␤ x x ϩ⑀ (12-5) 1000 0 1 1 2 2 12 1 2 E(Y) 400 If we let x3 ϭ x1x8002 and ␤3 ϭ␤12, Equation 12-5 can be written as 200 600 10 Y ϭ␤ ϩ␤x ϩ␤x ϩ␤x ϩ⑀ JWCL232_c12_449-512.qxd 1/15/10 10:07 PM Page 4538 E(Y) 0 1 1 2 2 3 3 400 0 6 0 4 which is a linear regression model. 2 x 200 10 4 2 2 Figure 12-2(a) and (b) shows the three-dimensional plot of the regression model 6 8 0 8 x1 10 0 6 4 (a) 0 Y ϭ 50 ϩ 10x ϩ 7x ϩ 5x x x 2 4 1 2 1 2 2 2 6 8 0 x 10 x2 1 and the corresponding12-1 MULTIPLE two-dimensional LINEAR REGRESSIONcontour plot.(a) MODEL Notice that, although453 this model is a 10 linear720 regression model,x2 the shape of the surface that is generated by the model is not linear. In general, any regression10 model that is linear in parameters (the ␤’s ) is a linear regression 653 25 8 model, regardless of the shape of the surface that it generates. 586Figure 12-2 provides8 a nice graphical interpretation of an interaction. Generally, interaction 100 800 implies519 that the effect produced by changing one variable (x1, say) depends on the level of the 6 other variable (x2). For example, Fig. 12-2 shows that changing x1 from 2 to 8 produces a much 600 452 6 smaller change in E(Y ) when x2 ϭ 2 than when x2 ϭ 10. Interaction effects occur frequently in 1000 175 4 E(Y) 400 the385 study and analysis of real-world systems, and regression methods are one of the techniques that we can use to describe8004 them. 325250 400 200 318As a final example,600 consider the second-order model with interaction475 10 E(Y) 550 2 251 8 625 6 4002 750 700 0 800 2 2 0 4 Y ϭ␤0 ϩ␤1x1 ϩ␤2 x2 ϩ␤11x1 ϩ␤22 x2 ϩ␤12 x1x2 ϩ⑀ (12-6) 2 117 2 184x2 200 10 4 6 0 8 0 0 8 x1 10 0 6 0 246810x1 2 02 2468104 x (a) If we let x ϭ x , x ϭ x ,0 x ϭ x x , ␤ ϭ␤ , ␤ ϭ␤ , and ␤ ϭ␤x , Equation 12-61 can be 3 1 4 2 5 21 2 34 11 4 22 25 212 6 8 0 (b) written as a multiple linear regression xmodel as follows:(b) 10 x2 1 (a) Figure 12-2 (a) Three-dimensional10 plot of the regression model 720Figure 12-3 x2 (a) Three-dimensional plot of the regression Y ϭ␤10 ϩ␤x ϩ␤ x ϩ␤ x ϩ␤ x ϩ␤ x ϩ⑀ E(Y ) ϭ 50 ϩ 10x ϩ 7x ϩ 5x x . (b) The contour plot. 653model E(Y ) ϭ 0800 ϩ1 110x 2ϩ2 7x 3Ϫ3 8.5x42 4Ϫ 5x52 5ϩ 425x x . 1 2 8 1 2 1 2 1 2 1 2 586 8 Figure 12-3(a)(b) The and contour (b) show plot.the three-dimensional plot and the corresponding100 contour plot for 519 6 6 452 2 2 E Y ϭ 800 ϩ 10x1 ϩ 7x2 Ϫ 8.5x1 Ϫ 5x2 ϩ 4x˛1x2 385 175 4 250 4 325 400 x denote the ith observation or level of318 variable 1x .2 The observations are 475 ij These plots indicate thatj the expected change in Y when550 x1 is changed by one unit (say) is a 2 251 2 700 625 function of both x1 and x2. The quadratic800 750 and interaction terms in this model produce a mound- 117 shaped function.184 Depending on the values of the regression coefficients, the second-order 0 xi 1, ˛xi 2, p , xik, ˛yi , i ϭ 1, 02, p , ˛n and n Ͼ k 0 246810model withx1 interaction is 0capable 246810of assuming a wide variety of shapes;x1 thus, it is a very (b) flexible regression model. (b) Figure 12-2It is customary(a) Three-dimensional to present1 plot of thethe regression data for model 2multipleFigure regression 12-3 (a) Three-dimensional in a table such plot of as the Table regression 12-1. ϭ ϩ ϩ ϩ ϭ ϩ ϩ Ϫ 2 Ϫ 2 ϩ E(Y ) 50 10Eachx1 7observationx2 512-1.2x1x2. (b) The ( Leastx contour, x Squares, plot., x Estimation, y ), satisfiesmodel of E the(Y )the Parameters800 model10x 1in 7Equationx2 8.5x1 12-2,5x2 4orx1x2. i1 i2 p ik i (b) The contour plot. The method of least squares may be used to estimate the regression coefficients in the mul- y˛i ϭ␤0tipleϩ␤ regression1xi1 ϩ␤ model,2 xi 2 Equationϩ p ϩ␤12-2. kSupposexik ϩ⑀ thati n Ͼ k observations are available, and let xij denote the ith observation ork level of variable xj. The observations are

ϭ␤0 ϩ a ␤j xij ϩ⑀i i ϭ 1,˛ 2, p , ˛n (12-7) xi 1, ˛xi 2, pjϭ1, xik, ˛yi , i ϭ 1, 2, p , ˛n and n Ͼ k

It is customary to present1 the data for 2multiple regression in a table such as Table 12-1. Each observation (xi1, xi2, p , xik, yi), satisfies the model in Equation 12-2, or Table 12-1 Data for Multiple Linear Regression y˛i ϭ␤0 ϩ␤1xi1 ϩ␤2 xi 2 ϩ p ϩ␤k xik ϩ⑀i yx1 k x2 ... xk y ϭ␤x 0 ϩ a ␤jxxij ϩ⑀i ...i ϭ 1,˛ 2, p ,x ˛n (12-7) 1 11 jϭ1 12 1k y2 x21 x22 ... x2k oo oo o Table 12-1 Data for Multiple Linear Regression yn xn1 xn2 ... xnk yx1 x2 ... xk

y1 x11 x12 ... x1k

y2 x21 x22 ... x2k oo oo o yn xn1 xn2 ... xnk JWCL232_c12_449-512.qxd 1/15/10 10:07 PM Page 453

12-1 MULTIPLE LINEAR REGRESSION MODEL 453

800

600 1000 E(Y) 400 JWCL232_c12_449-512.qxd 1/15/10 10:07 PM Page 453 800 200 10 600 8 E(Y) 400 0 6 0 4 2 4 2 x2 200 10 6 8 0 8 JWCL232_c12_449-512.qxd 1/15/10x1 10:07 PM 10 Page 453 0 6 4 (a) 0 x 2 4 2 2 6 8 0 x 10 x2 12-1 MULTIPLE LINEAR1 REGRESSION MODEL 453 (a) 10 720 x2 10 653 25 8 586 8 100 519 12-1 MULTIPLE LINEAR REGRESSION MODEL 453 800 6 452 6 175 600 4 385 4 325250 400 318 1000 475 E(Y) 400 2 625 550 251 8002 750 700 800 800 117 184 200 600 0 10 E(Y) 0 0 2468108 x 0 246810x 600 1 400 1 0 6 (b) 4 1000 (b) E(Y) 0 400 2 4 2 x2 200 10 Figure 12-2 (a) Three-dimensional6 plot8 of the regression0 model Figure 12-3800(a) Three-dimensional plot of the regression 8 x1 10 0 2 2 6 E(Y ) ϭ 50200ϩ 10x ϩ 7x ϩ 5x x . (b) The contour plot. model E(Y)600ϭ 800 ϩ 10x1 ϩ 7x2 Ϫ 8.5x1 Ϫ 5x2 ϩ 4x1x42. 1 2 1 2 (a) 10 0 x E(Y) 2 4 2 2 8 (b) The contour plot. 6 0 400 8 10 x 0 6 x1 2 0 4 2 x 200 (a) 10 10 4 2 7202 x2 6 8 0 8 x1 10 10 0 6 653 4 25 x denote the( ai)th observation or level of variable x . The0 observations are x 8 ij j 2 4 2 2 6 8 0 586 8 x 10 x2 1 100 519 (a) x , ˛x , , x , ˛y , i ϭ 1, 2, , ˛n and n Ͼ k 6 10 i 1 i 2 p ik 720i x2 p 452 6 10 653 25 8 1 385 2 175 4 It is customary to present the data for multiple regression in a table such as Table 12-1. 250 586 4 8 325 Each observation (x , x , , x , y ), satisfies the model in Equation 12-2, or 400 100 i1 i2 p318ik519i 475 6 550 2 251 2 700 625 452 6 800 750 117 y˛i ϭ␤0 ϩ␤184 1xi1 ϩ␤2 xi 2 ϩ p ϩ␤k xik ϩ⑀i 175 0 4 385 k 0 4 325250 0 246810x1 0 246810400 x 318 475 1 ϭ␤0 ϩ a ␤j xij ϩ⑀i i ϭ 1,˛ 2, p , ˛n 550 (12-7) 2 (b) ϭ251 2 700(b) 625 j 1 800 750 Figure 12-2 (a) Three-dimensional plot of 117the regression model184Figure 12-3 (a) Three-dimensional plot of the regression E(Y ) ϭ 50 ϩ 100 x ϩ 7x ϩ 5x x . (b) TheData contour plot.for multiplemodel E (regressionY) ϭ0800 ϩ 10x ϩ 7x Ϫ 8.5x2 Ϫ 5x2 ϩ 4x x . 0 1 2 2468101 2 x1 0 2468101 2 1 2 1 2 x1 (b) The contour plot. (b) Table 12-1 Data for Multiple Linear Regression (b) Figure 12-2 (a) Three-dimensional plot of the regression model Figure 12-3 (a) Three-dimensional plot of the regression yx1 x2 ... xk 2 2 E(Y ) ϭ 50 ϩ 10x1 ϩ 7x2 ϩ 5x1x2. (b) The contour plot. model E(Y) ϭ 800 ϩ 10x1 ϩ 7x2 Ϫ 8.5x1 Ϫ 5x2 ϩ 4x1x2. y1 x11 x12 ... x1k xij denote the ith observation or level of variable(b) The contourxj. The plot.observations are y2 x21 x22 ... x2k oo oo o xi 1, ˛xi 2, p , xik, ˛yi , i ϭ 1, 2, p , ˛n and n Ͼ k yn xn1 xn2 ... xnk xij denote the ith observation or level of variable xj. The observations are It is customary to present1 the data for 2multiple regression in a table such as Table 12-1.

x , ˛x , , x , ˛y , i ϭ 1, 2, , ˛n and n Ͼ k Each observationData (xii11, xi2i ,2 pp, xikik, yi),i satisfies the modelp in Equation 12-2, or

It is customary to present1 the data for 2multiple regression in a table such as Table 12-1. y˛i ϭ␤0 ϩ␤1xi1 ϩ␤2 xi 2 ϩ p ϩ␤k xik ϩ⑀i Each observation (xi1, xi2, pk, xik, yi), satisfies the model in Equation 12-2, or

ϭ␤0 ϩ a ␤j xij ϩ⑀i i ϭ 1,˛ 2, p , ˛n (12-7) ϭ y˛i ϭ␤0j ϩ␤1 1xi1 ϩ␤2 xi 2 ϩ p ϩ␤k xik ϩ⑀i k

ϭ␤0 ϩ a ␤j xij ϩ⑀i i ϭ 1,˛ 2, p , ˛n (12-7) jϭ1 8 Table 12-1 Data for Multiple Linear Regression

yx1 x2 ... xk y1 Table 12-1x11 Data forx12 Multiple... Linear Regressionx1k y2 x21 x22 ... x2k yx1 x2 ... xk

oo y1 x11 oo x12 ... o x1k y x x ... x n y2 n1 x21 n2 x22 ... nkx2k oo oo o yn xn1 xn2 ... xnk JWCL232_c12_449-512.qxd 1/15/10 10:07 PM Page 454 JWCL232_c12_449-512.qxdJWCL232_c12_449-512.qxd 1/15/10 1/15/10 10:07 10:07PM PagePM Page454 454

JWCL232_c12_449-512.qxd 1/15/10 10:07 PM Page 454

454 CHAPTER 12 MULTIPLE LINEAR REGRESSION 454 CHAPTER 12 MULTIPLE LINEAR REGRESSION 454 CHAPTER 12 MULTIPLE LINEARThe REGRESSION least squares function is 454 CHAPTER 12 MULTIPLE LINEAR REGRESSION The least squares function is n n k 2 2 L ϭ ˛⑀ ϭ ˛ y Ϫ␤ Ϫ ˛ ␤ x (12-8) The least squaresLeast functionThe leastsquare squares is function is estimatea i a ofi coefficients0 a j ij n inϭ1 iϭ1 k j2ϭ1 2 ˛ n n˛ k ˛ 2 L ϭ ⑀i ϭ yi Ϫ␤0 Ϫ ␤j xij (12-8) n a n2a a ak 2 b ϭ ˛⑀ ϭ ˛ Ϫ␤ Ϫ ˛ ␤ iϭL1 a i iϭ1a yi 0 ajϭ j1xij (12-8) We want to minimize 2L withiϭ1 respectiϭ1 to ␤0, jϭ␤11, p , ␤k. The least squares estimates of ␤0, L ϭ ˛⑀ ϭ ˛ y Ϫ␤ Ϫ ˛ ␤ x (12-8) a i a aia 0 a b j bij ␤1, p , ␤k must satisfyiϭ1 iϭ1 jϭ1 We want to minimizeWe want to minimize L with Lrespectwith respect to ␤to 0␤, 0,␤ ␤11,, p , ,␤ k␤. Thek. The least least squares squares estimates estimatesof ␤0, of ␤0, ␤1, p , ␤k must satisfy ␤1, Set derivatives to 0p , ␤k must satisfy a n b k ѨL ˆ ˆ ␤ ␤n ␤ k ␤ We want to minimize L with respect to 0, 1ϭϪ, p2 , k . yThei Ϫ␤ least0 Ϫ squares␤j xij ϭ estimates0 of(12-9a) 0, ѨL ˆa ˆ a Ѩ␤ ˆ ˆ ϭϪˆ Ϫ␤ Ϫ ␤ ϭ 0 ␤0,␤1, p ,2 ␤nak yi iϭ0 1 a kj xij 0 jϭ1 (12-9a) Ѩ Ѩ␤ ˆ ˆ ˆ ␤ ␤ L 1, p , k must satisfy 0 ␤0,␤1, p ,␤k iϭ1 ˆ jϭ1 ˆ ϭϪ2 y Ϫ␤ Ϫ ␤ x ϭ 0 (12-9a) ` ` a a i 0 a a b j ij b andand Ѩ␤ ˆ ˆ ˆ 0 ␤0,␤1, p , ␤k iϭ1 jϭ1 n nn k k k ѨL ѨL ` ˆ a ˆ b ѨL ϭϪ2 y Ϫ␤ Ϫ ␤ x x ϭ 0 j ϭ 1, 2, , k (12-9b) and a i 0 a ˆj ˆij ij ˆ ˆ p Ѩ␤ ˆ ˆ ˆ ϭϪϭϪ2 y Ϫ␤Ϫ␤Ϫ Ϫ ␤␤ x ϭϭ 0 ϭ (12-9a) j ␤0,␤1, p, ␤k iϭ1 2aa yiijϭ1 0 0 aa j xijj ij xij 0 j 1, 2, p , k (12-9b) Ѩ␤ Ѩ␤ˆ ˆ ˆ ˆ ˆ ˆ Ѩ 0 ␤0,j`␤1␤, 0p,␤ 1,, ␤ pk, n␤k a iϭiϭ11 k b jϭjϭ1 1 LSimplifying Equation 12-9, we obtain theˆ least squaresˆ normal equations ϭϪ` 2 y Ϫ␤ Ϫa ␤ x x ϭ 0 b j ϭ 1, 2, , k (12-9b) ` a i 0 a a j ij ij b p ˆ ˆ ˆ Normal equationsѨ␤ ␤ ␤ ␤ ϭ ϭ and Simplifyingj 0, 1, p, Equationk i 12-9,n1 we obtainn thej 1 least squaresn normaln equations ˆ ˆ ˆ ˆ n␤ ϩ␤ ˛x ϩ␤ ˛ ˛x ϩ p ϩ␤ ˛x ϭ ˛y ` 0 n 1 a ai1 2 a i 2 k b k a ik a i ѨL iϭ1 n iϭ1 n iϭ1 iϭ1 n n Simplifying Equation 12-9,n we obtainn theˆ least squaresˆ normal equations ϭϪ2 ˆ ˆy Ϫ␤ Ϫn ˆ ␤ x xn ϭ 0 n ˆ j ϭ 1, 2, , k (12-9b) ˆ n␤aˆϩ␤i2 ˛x 0ˆ ϩ␤a ˛ j ˛xij ˆ ijϩ p ϩ␤ ˛x pϭ ˛y ˛ ␤ x ϩ␤0 ˛ 1 i1 2 p i 2 k ik i ˆ ˆ ˆ 0 a i1 1 a xi1a ϩ␤2 a xi1 xi2 ϩa ϩ␤k a xi1xik ϭ a xi1ayi a Ѩ␤j ␤ ,␤ , , ␤ iϭ1 jϭ1 0 1 p k iϭ1 n iϭ1 iϭ1 n iϭ1 iϭ1 iϭ1 n iϭ1 iϭ1 n iϭ1 ˆ ˆ n n ˆ n ˆ n n o ˛ o ˛ ˛ oo ˛ oo ˛ ` n␤0 ϩ␤1 a xi1a ϩ␤2 a xi 2 ϩ p bϩ␤k a xik ϭ a yi ˆ n nˆ 2 n ˆ n n ˆ ˛ ␤ ϩ␤ ˛ ˆ 0 aiϭx1i1 ˆ 1 a xi1iˆϭ1 ϩ␤2 xi1 ˆxi2 ϩ2iϭp1 ϩ␤ ixϭ1x ϭ x y ˛ a k i1 ik i1 i ˛ a a Simplifying Equation 12-9, ␤0 awexik obtainϩ␤1 x ikthexi1 ϩ␤ least xsquaresx ϩ p ϩ␤ normal x ϭequationsx y (12-10) iϭ1 a iϭ1 2 a ik i2 iϭ1 k a ik a ik iϭi 1 iϭ1 n iϭ1 n iϭ1 n iϭ1 iϭ1 n iϭ1 n ˆ ˆ 2 ˆ ˆ ␤ ˛ ϩ␤ ˛ p 0 Notea x thati1 theren 1 areo a p xϭi1k ϩ 1ϩ␤ normalo 2na equations,xi1 xi2 oneϩ for eachϩ␤ ofk a the oo n unknownxi1xik ϭ regressiona xi1nyi oo k+1 normal equations, k+1 coefficients to be determined — can be uniquely determined iϭ1 9 ˆ coefficients.ˆ Then solutioniϭ1 to theˆ normaln i ϭequations1 willn be the leastˆ squaresiϭ1 estimatorsn iϭ1of the n n␤ ϩ␤ ˆ ˛x ϩ␤ˆ ˛ ˛x ˆ ϩ p ϩ␤ ˛x ˆ ϭ2 ˛y 0 1 i˛ 1 2 i 2 k ik i a ˆ ˆ a˛ ˆ a a regression ␤coefficients,0 xik ϩ␤␤ , ␤ , , ␤x . Thex ϩ␤normal equationsx x can ϩ bep solved ϩ␤ by any xmethodϭ x y (12-10) o iϭ1a o 0 1 ap iϭkik1 i1 2 a oo ik i2 iϭ1 k a oo ik iϭ1 a ik i appropriate fori ϭsolving1 a systemiϭ of1 linear equations.iϭ1 iϭ1 iϭ1 n n n n n n ˆ nˆ ˆ n ˆ n 2 n ␤ ˛x ϩ␤ ˛ ϩ␤ p (12-10) ˆ 0 a ikˆ 1 a2 xikxi1 ˆ 2 a xikxi2 ϩ ϩ␤ˆk a xik ϭ a xikyi ˛ ϭ ϩ ␤ Noteϩ␤ that there˛ are p k 1 normal equations,p one for each of the unknown regression 0 a xi1iϭ1 1 a xiϭi11 ϩ␤2 aiϭx1 i1 xi2 ϩ ϩ␤ki ϭa1 xi1xik ϭ iϭ1 a xi1 yi EXAMPLE 12-1iϭ1 coefficients.Wire Bond Strengthϭ The solution to the normal equations will be the least squares estimators of the i 1 iϭ1 25 iϭ1 iϭ1 In Chapter 1, we used data on pull strength of a wire bond ˆin a ˆ ˆ Note that thereregression are p coefficients,ϭ k ϩ 1 normal ␤0, ␤ 1 equations,, p , ␤k. The n oneϭ normal25, for a ˛y eachi ϭequations725.82 of the can unknown be solved regression by any method semiconductor manufacturingo process, o wire length, and die oo iϭ1 oo height to illustrate building an empirical model. We will use coefficients.appropriate The solution for solvingto the normal a system equations of linear25 will equations. be the25 least squares estimators of the the same data, repeatedn for conveniencen in Table 12-2, and n n n ˆ ˆ ˆ a xi1 ϭ 206, a ˛xi 2 ϭ 8,294 regressionˆ coefficients,ˆ ␤ , ␤ , , ␤ˆ . The normaliϭ1 equationsiϭ1 ˆ can be 2solved by any method show the details of estimating˛ the model parameters.0 1 pA three-k ␤ ϩ␤ ˛ (12-10) 0 a xik 1 a xikxi1 ϩ␤2 a xikxi2 ϩ p ϩ␤k a xik ϭ a xikyi dimensionalappropriate scatteriϭ1 plot for of thesolving data is ϭpresented a system in Fig. of 1-15. linear equations.25 25 i 1 iϭ1 2 2 iϭ1 iϭ1 Figure 12-4 shows a matrix of two-dimensional scatter plots of ϭ ˛ ϭ a xi1 2,396, a xi2 3,531,848 EXAMPLEthe data. 12-1 These displaysWire canBond be helpfulStrength in visualizing the iϭ1 iϭ1 25 In ChapterNoterelationships that1, we thereused among data arevariables on pullp inϭ astrength multivariablek ϩ 1of normaladata wire set. bondFor equations, in a 25 one for each25 of the unknown regression ˛x x ϭ 77,177, ϭ˛x y ϭ 8,008.47,˛ ϭ example, the plot indicates that there is a strong linear a i1 i2 a n i1 25,i a yi 725.82 EXAMPLEsemiconductor 12-1coefficients.relationshipWire manufacturing Bond between The Strength strength solution process,and wire to length. wirethe normal length, and equations die iϭ1 will be theiϭ 1least squaresiϭ1 estimators of the height to illustrateSpecifically, building we will an fit empirical the multiple model. linear regression We will use 25 ˆ ˆ ˆ 25 25 25 In Chapter 1, weregression usedmodel data on coefficients,pull strength of a ␤wire, ␤ bond, in ,a ␤ . The normal equations can be solved by any method the same data, repeated for convenience0 1 in pTable 12-2,k and xi2 yi nϭϭ274,816.7125, ˛y ϭ 725.82 a axi1 ϭi 206, ˛xi 2 ϭ 8,294 semiconductor manufacturing process, wire length, and die iϭ1 a iϭ1 a showappropriate the details of forestimatingY ϭ␤solving0 ϩ␤ the1x1 ϩ␤amodel system2x2 ϩ⑀ parameters. of linear A three- equations. iϭ1 iϭ1 height to illustrate building an empirical model. We will use 25 25 dimensionalwhere scatter Y ϭ pull plot strength, of the x 1dataϭ wire is length,presented and xin2 ϭ Fig.die 1-15. 25 25 the same data, repeated for convenience in Table 12-2, and height. From the data in Table 12-2 we calculate x ϭ 206, 2 ˛x ϭ 8,294 2 Figure 12-4 shows a matrix of two-dimensional scatter plots of a i1 aϭ i 2 ˛ ϭ a xi1 2,396, a xi2 3,531,848 show the detailsthe data.of estimating These displays the model can parameters. be helpful A in three- visualizing the iϭ1 iϭ1 iϭ1 iϭ1 dimensional relationshipsscatter plot of among the data variables is presented in a multivariable in Fig. 1-15. data set. For 25 25 25 25 EXAMPLE 12-1 Wire Bond Strength 2 2 Figure 12-4 shows a matrix of two-dimensional scatter plots of ˛ x ϭ 2,396, x ϭ 3,531,848 a i1 ˛ aϭ i2 ˛ ϭ example, the plot indicates that there is a strong linear a xi1xi252 77,177, a xi1 yi 8,008.47, In Chapter 1,the we data. used These data displayson pull strength can be helpful of a wire in visualizing bond in a the iϭ1 iϭ1 iϭ1 iϭ1 relationship between strength and wire length. n ϭ 25, a ˛yi ϭ 725.82 semiconductorrelationships manufacturing among variables process, in a wire multivariable length, anddata set. die For 25 iϭ1 25 Specifically, we will fit the multiple linear regression 25 ˛x x ϭ 77,177, ˛x y ϭ 8,008.47, height to illustrateexample, building themodel plot an indicates empirical that model. there is We a strongwill use linear a i1 i2 a i1 i 25 x y 25ϭ 274,816.71 relationship between strength and wire length. iϭ1 a i2 i iϭ1 the same data, repeated for convenience in Table 12-2, and iϭ1 ϭ ˛ ϭ Specifically, we will fitY theϭ␤ multiple0 ϩ␤1x1 linearϩ␤2x 2 regressionϩ⑀ a xi1 206, a xi 2 8,294 show the details of estimating the model parameters. A three- 25iϭ1 iϭ1 model where Y ϭ pull strength, x ϭ wire length, and x ϭ die x y ϭ 274,816.71 dimensional of the data is presented1 in Fig. 1-15. 2 a i2 i height. From the data in Table 12-2 we calculate iϭ251 25 Y ϭ␤0 ϩ␤1x1 ϩ␤2x2 ϩ⑀ 2 2 Figure 12-4 shows a matrix of two-dimensional scatter plots of ϭ ˛ ϭ a xi1 2,396, a xi2 3,531,848 iϭ1 iϭ1 the data. Thesewhere displaysY ϭ pull can strength, be helpfulx1 ϭ wire in length, visualizing and x 2 theϭ die relationshipsheight. among From variables the data inin Table a multivariable 12-2 we calculate data set. For 25 25

˛ ϭ ˛ ϭ example, the plot indicates that there is a strong linear a xi1xi2 77,177, a xi1 yi 8,008.47, relationship between strength and wire length. iϭ1 iϭ1

Specifically, we will fit the multiple linear regression 25 model a xi2 yi ϭ 274,816.71 iϭ1 Y ϭ␤0 ϩ␤1x1 ϩ␤2x2 ϩ⑀ where Y ϭ pull strength, x1 ϭ wire length, and x2 ϭ die height. From the data in Table 12-2 we calculate JWCL232_c12_449-512.qxd 1/15/10 10:07 PM Page 456

456 CHAPTER 12 MULTIPLE LINEAR REGRESSION

The solution to this set of equations is Practical Interpretation: This equation can be used to ˆ ˆ ˆ predict pull strength for pairs of values of the regressor vari- ␤0 ϭ 2.26379, ␤1 ϭ 2.74427, ␤2 ϭ 0.01253 ables wire length (x1) and die height (x2). This is essentially JWCL232_c12_449-512.qxd 1/15/10 10:07 PM Page 456 Therefore, the fitted regression equation is the same regression model given in Section 1-3. Figure 1-16 shows a three-dimensional plot of the plane of predicted val- yˆ ϭ 2.26379 ϩ 2.74427x1 ϩ 0.01253x2 uesyˆ generated from this equation.

456 CHAPTER 12 MULTIPLE LINEAR REGRESSION

12-1.3 Matrix ApproachThe solution to toMultiple this set of equations Linear is RegressionPractical Interpretation: This equation can be used to ˆ ˆ ˆ predict pull strength for pairs of values of the regressor vari- ␤0 ϭ 2.26379, ␤1 ϭ 2.74427, ␤2 ϭ 0.01253 ables wire length (x1) and die height (x2). This is essentially Therefore, the fitted regression equation is the same regression model given in Section 1-3. Figure 1-16 In fitting a multiple regression model, it isshows much a three-dimensional more convenient plot of the plane of topredicted express val- the mathemati- ˆ uesˆ generated from this equation. cal operationsy ϭ 2.26379 usingϩ 2.74427 matrixx1 ϩ 0.01253 notation.x2 Supposey that there are k regressor variables and n ob- servations, (xi1, xi2, , xik, yi), i ϭ 1, 2, , n and that the model relating the regressors to the 12-1.3 Matrix Approachp to Multiple Linear Regressionp response is In fitting a multiple regression model, it is much more convenient to express the mathemati- cal operations using matrix notation. Suppose that there are k regressor variables and n ob- Matrix formϭ for multiple linear y ϭ␤servations,ϩ␤ (xi1,x xi2, pϩ␤, xik, yi),x i 1,ϩ 2, pp, nϩ␤and that xthe modelϩ⑀ relating thei regressorsϭ 1, 2, to the , n i response0 regression is 1 i1 2 i 2 k ik i p

yi ϭ␤0 ϩ␤1xi1 ϩ␤2 xi 2 ϩ p ϩ␤k xik ϩ⑀i i ϭ 1, 2, p , n This model is a system •ofWrite multiple regression as n equations that can be expressed in matrix notation as This model is a system of n equations that can be expressed in matrix notation as (XX␤؉⑀␤؉⑀ (12-11) (12-11؍ y؍ y where where y1 1 x11 x12 p x1k ␤0 ⑀1 y2 1 x21 x22 p x2k ␤1 ⑀2 ؍ ⑀ and ؍␤ ؍ X ؍ y y1 o 1 x11 oox12 op ox1k o ␤0 o ⑀1 ≥ yn ¥ ≥ 1 xn1 xn2 p xnk ¥ ≥ ␤k ¥ ≥ ⑀n ¥ y2 1 x21 x22 p x2k ␤1 ⑀2 ؍ ⑀ ϫ p) matrix of the levelsand؍␤general, y is an (n ϫ 1) vector of the observations, X is an (n ؍X In ؍ y o of the independentoo variables (assuming o that the intercept o is always multiplied byo a constant o value—unity), ␤ is a ( p ϫ 1) vector of the regression coefficients, and ⑀ is a (n ϫ 1) vector y ␤ 10 ⑀ ≥ n ¥ of random ≥errors.1 Thex nX1matrixx isn2 often calledp the modelxnk ¥ matrix. ≥ k ¥ ≥ n ¥ We wish to find the vector of least squares estimators, ␤ˆ ,that minimizes

n 2 In general, y is an (n ϫ 1) vectorL ϭ aof˛⑀ itheϭ⑀ ¿observations,⑀ϭ y Ϫ X␤ ¿ y Ϫ X X␤ is an (n ϫ p) matrix of the levels iϭ1 of the independent variables (assuming that the1 intercept2 1 is2 always multiplied by a constant value—unity), ␤ isThe a least( p squaresϫ 1) estimator vector ␤ˆ isof the the solution regression for ␤ in the equations coefficients, and ⑀ is a (n ϫ 1) vector ѨL of random errors. The X matrix is often called theϭ 0 model matrix. We wish to find the vector of least squaresѨ␤ estimators, ␤ˆ ,that minimizes We will not give the details of taking the derivatives above; however, the resulting equations that must be solved aren 2 Normal L ϭ a ˛⑀i ϭ⑀¿⑀ϭ y Ϫ X␤ ¿ y Ϫ X␤ (X؅y (12-12 ؍ ˆ␤Equations iϭ1 X؅X 1 2 1 2 The least squares estimator ␤ˆ is the solution for ␤ in the equations

ѨL ϭ 0 Ѩ␤

We will not give the details of taking the derivatives above; however, the resulting equations that must be solved are

Normal (X؅y (12-12 ؍ ˆ␤Equations X؅X JWCL232_c12_449-512.qxd 1/15/10 10:07 PM Page 456

456 CHAPTER 12 MULTIPLE LINEAR REGRESSION

The solution to this set of equations is Practical Interpretation: This equation can be used to ˆ ˆ ˆ predict pull strength for pairs of values of the regressor vari- JWCL232_c12_449-512.qxd ␤0 ϭ JWCL232_c12_449-512.qxd2.26379, ␤ 11/15/10ϭ 2.74427, 10:07 ␤ 2 PM1/15/10ϭ 0.01253 Page 45610:07 PM Page 456 ables wire length (x1) and die height (x2). This is essentially Therefore, the fitted regression equation is the same regression model given in Section 1-3. Figure 1-16 shows a three-dimensional plot of the plane of predicted val- yˆ ϭ 2.26379 ϩ 2.74427x1 ϩ 0.01253x2 uesyˆ generated from this equation.

456 CHAPTER 12 MULTIPLE LINEAR REGRESSION 12-1.3 Matrix Approach456 CHAPTERto Multiple 12 MULTIPLE Linear LINEAR Regression REGRESSION

The solution to this set of equations is Practical Interpretation: This equation can be used to InThe fitting solution a tomultiple this set ofregression equations ismodel,predict it is much pull strength more forconvenient pairsPractical of values to express Interpretation:of the regressor the mathemati- vari- This equation can be used to ␤ˆ ϭ ␤ˆ ϭ ␤ˆ ϭ 0 2.26379,cal operations1 2.74427, using 2 matrix0.01253 notation.ablesSuppose wire length that (therex ) andpredict are die kheight pullregressor strength (x ). This variables for is pairsessentially ofand values n ob- of the regressor vari- ␤ˆ ϭ 2.26379, ␤ˆ ϭ 2.74427, ␤ˆ ϭ 0.01253 1 2 0 1 2 the same regression modelables given wire in length Section ( x1-3.) and Figure die 1-16height (x ). This is essentially Therefore, the fittedservations, regression equation (xi1, xi 2is, p , xik, yi), i ϭ 1, 2, p , n and that the model relating the1 regressors to the2 responseTherefore, isthe fitted regression equation is shows a three-dimensionalthe plotsame of regression the plane of model predicted given val- in Section 1-3. Figure 1-16 yˆ ϭ 2.26379 ϩ 2.74427x1 ϩ 0.01253x2 uesyˆ generated from thisshows equation. a three-dimensional plot of the plane of predicted val- yˆ ϭ 2.26379 ϩ 2.74427x1 ϩ 0.01253x2 uesyˆ generated from this equation. yi ϭ␤0 ϩ␤1xi1 ϩ␤2 xi 2 ϩ p ϩ␤k xik ϩ⑀i i ϭ 1, 2, p , n 12-1.3 Matrix Approach to Multiple Linear Regression 12-1.3 Matrix Approach to Multiple Linear Regression This modelIn fitting is aa multiple system regressionof n equations model, that it is can much be more expressed convenient in matrix to express notation the mathemati- as cal operations using matrix notation. Suppose that there are k regressor variables and n ob- In fitting a multiple regression model, it is much more convenient to express the mathemati- (nXand␤؉⑀ that the model relating the regressors to the(12-11 ,؍servations, (xi1, xi2, p , xik, yi), i ϭ 1, 2, yp response is cal operations using matrix notation. Suppose that there are k regressor variables and n ob- where servations, (xi1, xi2, p , xik, yi), i ϭ 1, 2, p , n and that the model relating the regressors to the response is yi ϭ␤0 ϩ␤1xi1 ϩ␤2 xi 2 ϩ p ϩ␤k xik ϩ⑀i i ϭ 1, 2, p , n y1 1 x11 x12 p x1k ␤0 ⑀1 y ϭ␤ ϩ␤x ϩ␤ x ϩ p ϩ␤x ϩ⑀ i ϭ 1, 2, , n This modely2 is a system of1 n equationsx21 i x22 that0 canp be1 expressedi1x2k 2 i 2in matrix ␤notation1 k ik as i ⑀2 p ؍ ⑀ and ؍␤ ؍ X ؍ y o o o ofX ␤؉⑀n equations o that can be expressed in(12-11) matrix notation as ؍ This modeloo is a system oy ≥ yn ¥ ≥ 1 xn1 xn2 p xnk ¥ ≥ ␤k ¥ ≥ ⑀n ¥ where (X␤؉⑀ (12-11 ؍ y In general, y is an (n ϫ 1) vector of the observations, X is an (n ϫ p) matrix of the levels y1 where 1 x11 x12 p x1k ␤0 ⑀1 of the independent variables (assuming that the intercept is always multiplied by a constant y2 1 x21 x22 p x2k ␤1 ⑀2 n ϫ 1) vector) ؍andand ⑀ is⑀ a ؍␤,ϫ 1) vector of the regression coefficients؍ is a (Xp ␤ ؍ value—unity),y JWCL232_c12_449-512.qxd 1/15/10o 10:07 PM Pagey 1457 1 x11 x12 po x1k o ␤0 ⑀1 of random errors. The X matrixoo is often o called the omodel matrix. ≥ yn ¥ ≥ 1 xn1 xn2 p xnk ¥ ≥ ␤k ¥ ≥ ⑀n ¥ y2 1 x21 x22 p x2k ␤1 ⑀2 ؍ ⑀ and ؍␤estimators, ␤ˆ ,that minimizes ؍ of least squaresX ؍ We wish to find the vectory o oo o o o o JWCL232_c12_449-512.qxdIn general, 1/15/10 y is an ( n 10:07ϫ 1) vector PM ofPage the observations,457 X is an (n ϫ p) matrix of the levels ≥ ynn ¥ ≥ 1 xn1 xn2 p xnk ¥ ≥ ␤k ¥ ≥ ⑀n ¥ of the independent variables (assuming2 that the intercept is always multiplied by a constant L ϭ ˛⑀ ϭ⑀⑀ϭ y Ϫ X␤ y Ϫ X␤ ϫ a i ¿ 12-1 MULTIPLE¿ LINEAR REGRESSION MODELϫ 457 value—unity), ␤ is a ( p 1)i ϭvector1 of the regression coefficients, and ⑀ is a (n 1) vector of random errors. The X matrix is often called ϫthe model matrix. ϫ In general, y is an (n 1 1) vector2 of1 the observations,2 X is an (n p) matrix of the levels We wish Equationsto findof thethe 12-12 vector independent are theof least squares squaresvariables normal estimators, (assuming equations ␤ˆ ,thatin that matrix minimizesthe form. intercept They are is identical always to multiplied by a constant The least squares estimator ␤ˆ is the solution for ␤ in the equations the scalarvalue—unity), form of the normal ␤ is equations a ( p ϫ given1) vector earlier inof Equations the regression 12-10. To coefficients,solve the normal and ⑀ is a (n ϫ 1) vector equations, multiply bothn sides of Equations 12-12 by the inverse of X¿X. Therefore, the least squaresof estimate random of ␤errors.is 2 The X matrix is often called the model matrix. L ϭ a ˛⑀i ϭ⑀¿⑀ϭѨL y Ϫ X␤ ¿ y Ϫ X12-1␤ MULTIPLE LINEARˆ REGRESSION MODEL 457 Matrix Wenormal wishiϭ 1to find equation the vectorϭ 0 of least squares estimators, ␤,that minimizes Least Square Ѩ␤ 1 2 1 2 Ϫ1 least␤(Xin؅X) thensquares equationsX؅y normal equations in(12-13) matrix form. They are identical to ؍ ˆ␤TheEstimate least of squares ␤ estimatorEquations ␤ˆ is the12-12 solution are the for 2 • Least square function the scalar form of the normalL ϭ a equations˛⑀i ϭ⑀¿ ⑀ϭgiven yearlierϪ X␤ in¿ Equationsy Ϫ X␤ 12-10. To solve the normal We will not give the details of taking the derivativesϭ above; however, the resulting equations equations, multiply bothѨL sidesi 1of Equations 12-12 by the inverse of X¿X. Therefore, the least that must be •solvedCoefficient satisfies Note are thatsquares there are estimate p ϭ k ϩ 1of normal ␤ isϭ equations0 in p ϭ k ϩ 11 unknowns 2 (the1 values of2 ˆ ˆ ˆ Ѩ␤ ,1The, p , ␤leastk . Furthermore, squares estimator the matrix X␤ˆ؅Xisis the always solution nonsingular, for ␤ asin was the assumed equations above␤ ,0␤ ! so the methods described in textbooks on determinants and matrices for inverting these ma- Ϫ Normal We willLeast not giveSquaretrices the can details be used2 of to taking find X the¿X derivatives1 . In practice, above; multiple however, regression the calculations resulting areequations almost always performed using a computer. ˆ ˆ ѨL Ϫ1 (Xϭ؅X)0 X؅y (12-12) (12-13) ؍ ␤ X؅y ؍ ␤equation solved of ␤ are X؅X Equations that mustEstimate• beNormal It is easy to see that the 1matrix2 form of the normal equationsѨ␤ is identical to the scalar form. Writing out Equation 12-12 in detail, we obtain Normal ofX؅ ytaking nthe derivatives above;(12-12) however, the resulting equations ؍ Equations We will not given theX details؅X␤ˆn Noten that therex are p ϭx k ϩp1 normalx equations inn p ϭ k ϩ 1 unknowns (the values of that must be solveda i1 are a i2 a ik ˆ y ,i␤ϭˆ1 . Furthermore,iϭ1 the matrixiϭ1 X؅␤X0 is alwaysa i nonsingular, as was assumed above , , ˆ␤ , ˆ␤ 0 n 1 p n k n n iϭ1 2 n so thex methodsxi1 describedx x in textbooksp x x on determinants and matrices for inverting these ma- a i1 a a i1 i2 a i1 ik ␤ˆ Normal iϭ1 iϭ1 2 iϭ1 Ϫiϭ11 1 a xi1 yi trices can be used to find X¿X . In practice,ϭ multipleiϭ1 regression calculations are almost (X؅y (12-12 ؍Equations alwaysoo performed using o a computer. oX؅X␤ˆ o n n n n o It is easy to see that the 1matrix2 form2 of the normaln equations is identical to the scalar form. a xik a xik xi1 a xik xi2 p a xik ˆ iϭ1 iϭ1 iϭ1 iϭ1 ␤k a xik˛ yi Writing out Equation 12-12 in detail, we obtain iϭ1 H X H X H X 11 n n n If the indicated matrix multiplication is performed, the scalar form of the normal equations n -that is, Equation 12-10) willn result. In thisa formxi1 it is easya toxi 2see thatp X؅X is aa ( pxϫik p) sym) ␤ˆ y metric matrix and X؅y is a ( p ϫ 1) columniϭ1 vector. Noteiϭ the1 special structure iofϭ1 the X؅X ma-0 a i n n n n iϭ1 trix. The diagonal elements of X؅X are the sums2 of squares of the elements in the columns of n x xi1 x x p x x X, and the off-diagonal elementsa i1 are a the sums of cross-productsa i1 i2 of thea elementsi1 ik in the␤ˆ x y columns of X. Furthermore,iϭ 1note that theiϭ 1elements ofi ϭX1؅y are the sums of icross-productsϭ1 of1 a i1 i ϭ iϭ1 the columns of X and the observationsoo yi . o o The fitted regression model is o o n 5n 6 n n 2 n a xik ak xik xi1 a xik xi2 p a xik ˆ ˆ ˆ ␤ ˛ ˛ ˛ ˛ ˛ x y iyϭˆi 1ϭ␤0 ϩ iaϭ1 ␤j xij i ϭiϭ11, 2, p , n iϭ1 (12-14)k a ik i jϭ1 iϭ1 H X H X H X In matrix notation, the fitted model is If the indicated matrix multiplication is performed, the scalar form of the normal equations ˆ ϭ ˆ -that is, Equation 12-10) willy Xresult.␤ In this form it is easy to see that X؅X is a ( p ϫ p) sym) -metric matrix and X؅y is a ( p ϫ 1) column vector. Note the special structure of the X؅X ma The difference between the observation y and the fitted value yˆ is a residual, say, trix. The diagonal elements ofi X؅X are the sums ofi squares of the elements in the columns of ei ϭ yi Ϫ yˆi. The (n ϫ 1) vector of residuals is denoted by X, and the off-diagonal elements are the sums of cross-products of the elements in the columns of X. Furthermore,e ϭ ynoteϪ yˆ that the elements of X؅y are(12-15) the sums of cross-products of the columns of X and the observations yi . The fitted regression model is 5 6 k ˆ ˆ yˆi ϭ␤0 ϩ a ˛␤j ˛xij i ϭ 1, ˛2, p ,˛ n (12-14) jϭ1

In matrix notation, the fitted model is

yˆ ϭ X␤ˆ

The difference between the observation yi and the fitted value yˆi is a residual, say, ei ϭ yi Ϫ yˆi. The (n ϫ 1) vector of residuals is denoted by

e ϭ y Ϫ yˆ (12-15) JWCL232_c12_449-512.qxd 1/15/10 10:07 PM Page 457

JWCL232_c12_449-512.qxd 1/15/10 10:07 PM Page 457

12-1 MULTIPLE LINEAR REGRESSION MODEL 457

Equations 12-12 are the least squares normal equations in matrix form. They are identical to the scalar form of the normal equations given earlier in Equations 12-10. To solve the normal equations, multiply both sides of Equations12-1 MULTIPLE 12-12 LINEAR by the REGRESSIONinverse of X MODEL¿X. Therefore, 457 the least squares estimate of ␤ is Equations 12-12 are the least squares normal equations in matrix form. They are identical to Least Squarethe scalar form of the normal equations given earlier in Equations 12-10. To solve the normal equations, multiply both sides of Equations 12-12 by the inverse of X X. Therefore, the least (X؅X)Ϫ1 X؅y ¿ (12-13) ؍ ˆ␤ ␤ Estimate of squares estimate of ␤ is

Least Square X؅ X) equationsϪ1 X؅y in p ϭ k ϩ 1 unknowns(12-13) (the values of) ؍Estimate of ␤ Note that there are p ϭ k ϩ 1 ␤ˆ normal ˆ ˆ ˆ ,p , ␤k . Furthermore, the matrix X؅X is always nonsingular, as was assumed above ,1␤ ,0␤ so the methods described in textbooks on determinants and matrices for inverting these ma- Ϫ trices can be used2 to find X¿X 1 . In practice, multiple regression calculations are almost Notealways that there performed are p ϭusingk ϩ a 1computer. normal equations in p ϭ k ϩ 1 unknowns (the values of ˆ ˆ ˆ .pIt, is␤ keasy. Furthermore, to see that the the 1matrix matrix2 form X؅X ofis the always normal nonsingular, equations isas identicalwas assumed to the above, scalar form ,1␤ ,0␤ so theWriting methods out described Equation in12-12 textbooks in detail, on determinantswe obtain and matrices for inverting these ma- Ϫ trices can be used2 to find X¿X 1 . In practice, multiple regression calculations are almost always performed using a computer. 1 n 2 n n JWCL232_c12_449-512.qxdIt is easy 1/15/10 to see that10:07 the PMmatrix Page form 459 of the normalp equations is identical to the scalarn form. n a xi1 a xi2 a xik ˆ Writing out Equation 12-12 iniϭ 1detail, we obtainiϭ1 iϭ1 ␤0 a yi n n n n iϭ1 2 p n a xi1 n a xi1 n a xi1xi2 n a xi1xik ˆ iϭ1 iϭ1 iϭ1 iϭ1 ␤1 n a xi1 yi n a xi1 a xi2 p a xik iϭ1 ␤ˆ ϭ y ooiϭ1 iϭ1 oiϭ1 o0 a i n n n 12-1n MULTIPLE LINEARo REGRESSIONiϭ1 o MODEL 459 n 2n n p n n a xi1 x a xi1 x xa xi1xi2x x pa xi1xik x2ˆ n iϭ1 a ikiϭ1 a ik ii1ϭ1 a ik i2 iϭ1 a ik␤1 ˆ a xi1 yi ␤ ˛ Table 12-3 Observations,iϭ1 Fitted Values,iϭ1 and Residualsiϭ1 for Example 12-2 iϭ1 k iϭ1 a xik yi ϭ iϭ1 Observation oo o o H Observation o X H X o H X Number n yi n yˆi ei ϭn yi Ϫ yˆi n Number yi yˆi ei ϭ yi Ϫ yˆi 2 n 1 a9.95xik a 8.38xik xi1 a 1.57xik xi2 p a x14ik ˆ 11.66 12.26 Ϫ0.60 ␤ x ˛ y 2If the indicatediϭ 24.451 matrixiϭ 25.601 multiplicationiϭϪ11.15 is performed,iϭ1 the scalark forma of theik i normal equations 15 21.65iϭ1 15.81 5.84 -that is,H Equation 31.75 12-10) 33.95 will result.Ϫ2.20 In this form it is16 easyX H toX see 17.89 thatH X؅X 18.25X is a ( p ϫϪp0.36) sym)3 -4metric matrix 35.00 and X 36.60؅y is a ( p ϫϪ1.601) column vector. Note17 the special 69.00 structure 64.67 of the X 4.33؅X ma 5 25.02 27.91 Ϫ2.89 If thetrix. indicated The diagonal matrix elementsmultiplication of X؅ Xis areperformed, the sums the of scalarsquares18 form of theof 10.30 theelements normal 12.34 in equations the columnsϪ2.04 of 6 16.86 15.75 1.11 19 34.93 36.47 Ϫ1.54 thatX, is, andEquation the off-diagonal 12-10) will result. elements In this are form the it sums is easy of to cross-products see that X؅X is of a the( p ϫ elementsp) sym- in the) 7columns of 14.38 X. Furthermore, 12.45 note 1.93 that the elements of20 X؅y are the 46.59 sums 46.56of cross-products 0.03 of metric8 matrix and9.60 X؅y is 8.40a ( p ϫ 1) column 1.20 vector. Note the21 special structure 44.88 of 47.06 the X؅X ma-Ϫ2.18 trix. 9theThe columns diagonal 24.35 of elements X and 28.21 the of Xobservations؅X Ϫ are3.86 the sums yi .of squares22 of the elements 54.12 in the 52.56 columns of 1.56 X, 10and theThe off-diagonal fitted 27.50 regression 27.98 elements model Ϫ are0.48 is the sums of cross-products23 56.63 of the elements 56.31 in the 0.32 5 6 columns11 of X. 17.08Furthermore, 18.40 note thatϪ1.32 the elements of X؅y24 are the sums 22.13 of cross-products 19.98 of 2.15 12 37.00 37.46 Ϫ0.46 k 25 21.15 21.00 0.15 the columns of X and the observationsˆ yi . ˆ 13 41.95 41.46yˆi ϭ␤ 0.490 ϩ a ˛␤j ˛xij i ϭ 1, ˛2, p ,˛ n (12-14) The fitted regression model is jϭ1 5 6 Computers are almostk always used in fitting multiple regression models. Table 12-4 pre- In matrix notation, the fittedˆ model isˆ sents someyˆ annotatedi ϭ␤0 ϩ outputa ˛␤ fromj ˛xij Minitab i ϭ for1, the ˛2, leastp ,˛ squaresn regression model(12-14) for wire bond pull strength data. The upperjϭ1 part of the table contains the numerical estimates of the regres- Fitted model ˆ sion coefficients. The computer alsoyˆ ϭcalculatesX␤ several other quantities that reflect important In matrix notation,information the fitted aboutmodel• theFitted model is regression model. In subsequent sections, we will define and explain the quantities in this output. ˆ The difference between the! observation yi and the fitted value yi is a residual, say, Estimating ␴2 yˆ ϭ X␤ˆ ei ϭ yi Ϫ yˆi. The (n ϫ 1) vector of residuals is denoted by Just as in simple linear! regression, it is important to estimate ␴2, the of the error term ⑀, in a multiple regression model. Recall that in simple linear regression the estimate of ␴2 was ϭ Ϫ ˆ ˆ The difference betweenobtained by the dividing observation• Residual the sum ofyi etheand squared y the y fitted residuals value by n Ϫyi 2. is Now a residual, there are twosay, (12-15)parame- ˆ ei ϭ yi Ϫ yi. The (tersn ϫ in 1)the vector simple oflinear! residuals regression is denoted model, so by in multiple linear regression with p parameters a logical estimator for ␴2 is • Estimator of variancee ϭ y Ϫ yˆ (12-15) Estimator of Variance n 2 ˛e a i SS ␴ˆ 2 ϭ iϭ1 ϭ E n Ϫ p n Ϫ p (12-16)

12 This is an unbiased estimator of ␴2. Just as in simple linear regression, the estimate of ␴2 is usu- ally obtained from the for the regression model. The numerator of Equation 12-16 is called the error or residual sum of squares, and the denominator n Ϫ p is called the error or residual degrees of freedom.

We can find a computing formula for SSE as follows: n n 2 2 SSE ϭ a yi Ϫ yˆi ϭ a ei ϭ e¿e iϭ1 iϭ1 1 2 Substituting e ϭ y Ϫ yˆ ϭ y Ϫ X␤ˆ into the above, we obtain ˆ SSE ϭ y¿y Ϫ ␤¿X¿y ϭ 27,178.5316 Ϫ 27,063.3581 ϭ 115.174 (12-17) Multiple Linear Regression

A regression with twoMultiple or more explanatory Linear Regression variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in A regressionsimple regression, with two orit is more now explanatorymodeled as variablesa function isof called several a multipleexplanatory regression.variables. Rather The function than modeling lm can bethe used mean to responseperform multiple as a straight linear line,regression as in in R simpleand regression, much of the it syntax is now is modeled the same as as a functionthat used of for several fitting simpleexplanatory linear variregressionables. The modefunctionls. Tolm performcan be usedmultiple to perform linear regression multiple linear with p regression explanatory in R andvariables Usemuch of usetheR syntaxthefor command: ismultiple the same as that linear used for fitting regression simple linear regression models. To perform multiple linear regression with p explanatory variables• Fit model using uselm(response the command: ~ explanatory_1 + explanatory_2 + … + explanatory_p)

Here!lm(response the terms response ~ explanatory_1 and explanatory_i + explanatory_2 in the function + … + explanatory_p)should be replaced by the names of the response and explanatory variables, respectively, used in the Here•analysis. theExample terms response and explanatory_i in the function should be replaced by the names of the response and explanatory variables, respectively, used in the analysis.Ex. Data was collected on 100 houses recently sold in a city. It consisted of the sales price (in $), house size (in square feet), the number of bedrooms, the Ex. number Data was of bathrooms,collected on the 100 lot houses size (in recentlysquare feet) sold andin a thecity. annual It consisted real estate of the tax sales(in price $). (in $), house size (in square feet), the number of bedrooms, the number of bathrooms, the lot size (in square feet) and the annual real estate tax (in $)The. following program reads in the data.

The> following Housing program= read.table("C:/Users/Martin/Documents/W2024/housing.txt", reads in the data. header=TRUE) > Housing> Housing = read.table("C:/Users/Martin/Documents/W2024/housing.txt", header=TRUE) Taxes Bedrooms Baths Price Size Lot 1 1360 3 2.0 145000 1240 18000 > Housing 2 1050 1 1.0 68000 370 25000 Taxes Bedrooms Baths Price Size Lot ….. 1 1360 3 2.0 145000 1240 18000 99 1770 3 2.0 88400 1560 12000 13 2 1050 1 1.0 68000 370 25000 100 1430 3 2.0 127200 1340 18000 …..

99 Suppose 1770 we 3 are 2.0 only 88400 interested 1560 12000in working with a subset of the variables (e.g., 100“Price”, 1430 “Size” 3 and2.0 127200“Lot”). It 1340is possible 18000 (but not necessary) to construct a new data frame consisting solely of these values using the commands: Suppose we are only interested in working with a subset of the variables (e.g., “Price”,> myvars “Size” = and c("Price", “Lot”) ."Size", It is possible "Lot") (but not necessary) to construct a new data> frameHousing2 consisting = Housing[myvars] solely of the se values using the commands: > Housing2 > myvars Price = c("Price",Size Lot "Size", "Lot") > Housing21 145000 = Housing[myvars]1240 18000 > Housing22 68000 370 25000 Price…….. Size Lot 1 14500099 88400 1240 1560 18000 12000 2 10068000 127200 370 250001340 18000 …….. 99 88400 1560 12000 100 127200 1340 18000 Multiple Linear Regression

A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is now modeled as a function of several explanatory variables. The function lm can be used to perform multiple linear regression in R and much of the syntax is the same as that used for fitting simple linear regression models. To perform multiple linear regression with p explanatory variables use the command:

lm(response ~ explanatory_1 + explanatory_2 + … + explanatory_p)

Here the terms response and explanatory_i in the function should be replaced by the names of the response and explanatory variables, respectively, used in the analysis.

Ex. Data was collected on 100 houses recently sold in a city. It consisted of the sales price (in $), house size (in square feet), the number of bedrooms, the number of bathrooms, the lot size (in square feet) and the annual real estate tax (in $). Read data The following program reads in the data.

> Housing = read.table("C:/Users/Martin/Documents/W2024/housing.txt", header=TRUE) > Housing Taxes Bedrooms Baths Price Size Lot 1 1360 3 2.0 145000 1240 18000 2 1050 1 1.0 68000 370 25000 ….. 99 1770 3 2.0 88400 1560 12000 100 1430 3 2.0 127200 1340 18000

Suppose we are only interested in working with a subset of the variables (e.g., “Price”, “Size” and “Lot”). It is possible (but not necessary) to construct a new dataSuppose we are only interested in a subset of variables frame consisting solely of these values using the commands: ! > myvars = c("Price", "Size", "Lot") >We want to fit a linear regression model Housing2 = Housing[myvars] > Housing2response variable : price Price Size Lot 1 145000predictor variables 1240 18000 : size, lot 2 68000 370 25000 14 …….. 99 88400 1560 12000 100 127200 1340 18000 Multiple Linear Regression

A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is now modeled as a function of several explanatory variables. The function lm can be used to perform multiple linear regression in R and much of the syntax is the same as that used for fitting simple linear regression models. To perform multiple linear regression with p explanatory variables use the command:

lm(response ~ explanatory_1 + explanatory_2 + … + explanatory_p)

Here the terms response and explanatory_i in the function should be replaced by the names of the response and explanatory variables, respectively, used in the analysis.

Ex. Data was collected on 100 houses recently sold in a city. It consisted of the sales price (in $), house size (in square feet), the number of bedrooms, the number of bathrooms, the lot size (in square feet) and the annual real estate tax (in $).

The following program reads in the data.

> Housing = read.table("C:/Users/Martin/Documents/W2024/housing.txt", header=TRUE) > Housing Taxes Bedrooms Baths Price Size Lot 1 1360 3 2.0 145000 1240 18000 2 1050 1 1.0 68000 370 25000 …..Create multiple scatter plot 99 1770 3 2.0 88400 1560 12000 100 1430 3 2.0 127200 1340 18000Before fitting our regression model we want to investigate how the variables are related to one another. We can do this graphically by constructing scatter plots of • scatter plots of all pair-wise combinations of variables all pair-wise combinations of variables in the data frame. This can be done by Suppose we are only interested in workingtyping: with a subset of the variables (e.g.,

“Price”,Beforewe are interested in fitting“Size” our and regression “Lot”). It is model possible we> plot(Housing2)(butwant not to investigatenecessary ) howto construct the variables a new are datarelated frame to one consisting another. solely We can of the dose this values graphically using theby constructing commands: scatter plots of all pair-wise combinations of variables in the data frame. This can be done by >typing: myvars = c("Price", "Size", "Lot") > Housing2 = Housing[myvars] > Housing2plot(Housing2) Price Size Lot 1 145000 1240 18000 2 68000 370 25000 …….. 99 88400 1560 12000 100 127200 1340 18000

15

To fit a multiple linear regression model with price as the response variable and size and lot as the explanatory variables, use the command:

> results = lm(Price ~ Size + Lot, data=Housing) > results

Call: lm(formula = Price ~ Size + Lot, data = Housing) Coefficients: (Intercept) Size Lot -10535.951 53.779 2.840

This output indicates that the fitted value is given by yˆ 10536 53.8x1 2.8x2

To fit a multiple linear regression model with price as the response variable and size and lot as the explanatory variables, use the command:

> results = lm(Price ~ Size + Lot, data=Housing) > results

Call: lm(formula = Price ~ Size + Lot, data = Housing) Coefficients: (Intercept) Size Lot -10535.951 53.779 2.840

This output indicates that the fitted value is given by yˆ 10536 53.8x1 2.8x2 Before fitting our regression model we want to investigate how the variables are related to one another. We can do this graphically by constructing scatter plots of all pair-wise combinations of variables in the data frame. This can be done by typing:

> plot(Housing2)

Before fitting our regression model we want to investigate how the variables are related to one another. We can do this graphically by constructing scatter plots of all pair-wise combinations of variables in the data frame. This can be done by typing:

> plot(Housing2)

To fit a multiple linear regression model with price as the response variable and size and lot as the explanatory variables, use the command:

> results = lm(Price ~ Size + Lot, data=Housing) ToInference fit a multiple in the linear multiple regression regression model setting with price is typicallyas the response performed variable in a andnumber of > results sizesteps and. We lot beginas the byexplanatory testing whether variables, the use explanatory the command: variables collectively have an Fit Inference model in the multiple regression setting is typically performed in a number of effectsteps. Weon beginthe response by testing variable,whether the i.e. explanatory variables collectively have an > results = lm(Price ~ Size + Lot, data=Housing) Call: effect on the response variable, i.e. > results lm(formula = Price ~ Size + Lot, data = Housing) H0 : 1 2 p 0 H : 0 Coefficients: 0 1 2 p Call: (Intercept) Size Lot lm(formulaIfIf wewe can can reject reject= Price this this hypothesis,~ Size hypothesis, + Lot, we data continue we = continueHousing) by testing by whether testing the whether individual the individual -10535.951 53.779 2.840 Coefficients:regressionregression coefficients coefficients are significantare significant while controlling while controlling for the other for variablesthe other in variables in (Intercept)thethe model.model. Size Lot

-10535.951 53.779 2.840 WeWe cancan access access the the results results of each of eachtest by test typing: by typing: yˆ 10536 53.8x 2.8x This output indicates that the fitted value is given by 1 2 This> summary(results) output indicates that the fitted value is given by yˆ 10536 53.8x 2.8x > summary(results) 1 2 Call: Call:lm(formula = Price ~ Size + Lot, data = Housing) lm(formulaResiduals: = Price ~ Size + Lot, data = Housing) Residuals: Min 1Q 3Q Max -81681 -19926 2530 17972 84978 Coefficients: Min 1Q Median 3Q Max - 81681 - 19926 Estimate 2530 Std. 17972 Error 84978 t value Pr(>|t|) Coefficients:(Intercept) -1.054e+04 9.436e+03 -1.117 0.267 Size 5.378e+01 Estimate 6.529e+00 Std. Error 8.237 t value 8.39e - 13 Pr(>|t|) *** (Intercept)Lot 2.840e+00 -1.054e+04 4.267e 9.436e+03-01 6.656 - 1.117 1.68e - 09 0.267 *** Size--- 5.378e+01 6.529e+00 8.237 8.39e-13 *** Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 LotResidual standard 2.840e+00 error: 30590 4.267e on 97- 01degrees 6.656 of freedom 1.68e -09 *** ---Multiple R-squared: 0.7114, Adjusted R-squared: 0.7054 Signif.F-: codes: 119.5 on0 ‘***’ 2 and 0.001 97 DF, ‘**’ p -0.01value: ‘*’ < 0.052.2e- 16‘.’ 0.1 ‘ ’ 1 16 Residual : 30590 on 97 degrees of freedom MultipleThe output R shows-squared: that F 0.7114, = 119.5 (p

Sometimes we are interested in simultaneously testing whether a certain subset of the coefficients are equal to 0 (e.g. 3 = 4 = 0). We can do this using a partial F-test. This test involves comparing the SSE from a reduced model (excluding the parameters we hypothesis are equal to zero) with the SSE from the full model (including all of the parameters).

JWCL232_c11_401-448.qxd 1/14/10 8:04 PM Page 440

JWCL232_c11_401-448.qxd 1/14/10 8:04 PM Page 440 JWCL232_c11_401-448.qxd440 CHAPTER 11 SIMPLE 1/14/10 LINEAR 8:04 REGRESSION PM Page AND440 CORRELATION

(a) Draw a scatter diagram of y versus x. Customer xyCustomer xy (b) Fit the simple linear regression model. 11 1818 5.84 36 463 0.51 (c) Test for significance of regression using ␣ϭ0.05. 12 1700440 5.21CHAPTER 37 11 SIMPLE 770LINEAR 1.74REGRESSION(d) AND Plot CORRELATION the residuals versus yˆi and comment on the underly- 13440 747CHAPTER 3.25 11 SIMPLE 38 LINEAR 724 REGRESSION 4.10 AND CORRELATIONing regression assumptions. Specifically, does it seem that 14 2030 4.43 39 808 3.94 the equality(a) Draw of variance a scatter assumption diagram of isy versussatisfied? x. Customer xyCustomer xy(e) Find(a) Draw a simple a scatter linear diagram regression of y versus model x. using 1y as the 15Customer 1643 3.16xy 40Customer 790xy 0.96 (b) Fit the simple linear regression model. 16 41411 0.50 1818 41 5.84 783 36 3.29 463 0.51response.(b) Fit the(c) Doessimple Test forthis linear significance transformation regression of model. regression on y stabilize using ␣ϭthe in-0.05. 11 1818 5.84 36 463 0.51 (c) Test for significance of regression using ␣ϭ0.05. 12 1700 5.21 37 770 1.74equality(d) of variance Plot the residualsproblem versusnoted inyˆi part and comment(d) above? on the underly- 1712 354 0.171700 5.21 42 37 406 770 0.44 1.74 ˆ 13 747 3.25 38 724 4.10(d) Plot the residualsing regression versus assumptions. yi and comment Specifically, on the underly- does it seem that 1813 1276 1.88 747 3.25 43 38 1242 724 3.24 4.10 ing regression assumptions. Specifically, does it seem that 14 2030 4.43 39 808 3.94 the equality of variance assumption is satisfied? the equality of variance assumption is satisfied? 1914 745 0.772030 4.43 44 39 658 808 2.14 3.94 (e) Find a simple linear regression model using 1y as the 15 1643 3.16 40 790 0.96(e) Find a simple linear regression model using 1y as the 2015 795 3.701643 3.16 45 40 1746 790 5.71 0.96 response. Does this transformation on y stabilize the in- 16 414 0.50 41 783 3.29 response. Does this transformation on y stabilize the in- 2116 540 0.56 414 0.50 46 41 895 783 4.12 3.29 17 354 0.17 42 406 0.44 equality of variance problem noted in part (d) above? 17 354 0.17 42 406 0.44 equality of variance problem noted in part (d) above? 22 87418 1.56 1276 47 1.88 1114 43 1.90 1242 3.24 2318 1543 5.281276 1.88 48 43 413 1242 0.51 3.24 1919 745 745 0.77 0.77 44 44 658 658 2.14 2.14 24 1029 0.64 49 1787 8.33 2020 795 795 3.70 3.70 45 45 1746 1746 5.71 5.71 2521 71021 4.00 540 540 0.56 50 0.56 46 3560 46 895 14.94 895 4.12 4.12 2222 874 874 1.56 1.56 47 47 1114 1114 1.90 1.90 2323 1543 1543 5.28 5.28 48 48 413 413 0.51 0.51 2424 1029 1029 0.64 0.64 49 49 1787 1787 8.33 8.33 11-10 LOGISTIC2525 710 REGRESSION 710 4.00 4.00 50 50 3560 3560 14.94 14.94

Linear regression often works very well when the response variable is quantitative. We now consider the situation where the response variable takes on only two possible values, 0 and 11-10 LOGISTIC REGRESSION 11-101. LOGISTIC These could REGRESSION be arbitrary assignments resulting from observing a qualitative response. For example, the response could be the outcome of a functional electrical test on a semi- LinearLinear regressionIntroduction regression often works often veryworks well tovery when logisticwell the when response the response variableregression isvariable quantitative. is quantitative.We now We now conductorconsider device the for situation which wherethe results the response are either variable a “success,” takes on whichonly two possible the values,device 0works and properly, or considera “failure,” the whichsituation could where be the due response to a short, variable an open, takes oron someonly two other possible functional values, 0 and 1. These1. These could couldbe arbitrary be arbitrary assignments assignments resulting resulting from observing from observing a qualitative a qualitativeresponse. response. problem.For example,• linear regression: the response could be the outcome of a functional electrical test on a semi- For example, the response could be the outcome of a functional electrical test on a semi- Supposeconductor that device the model for which has thethe formresults are either a “success,” which means the device works conductor• deviceresponse variable y is for which the results arequantitative (real value) either a “success,” which means the device works properly, or a “failure,” which could be due to a short, an open, or some other functional properly, or a “failure,” which could be due to a short, an open, or some other functional problem. ! Yi ϭ␤0 ϩ␤1 xi ϩ⑀i (11-51) Supposeproblem. that the model has the form Suppose! that the model has the form and the response variable Y takes on the values either 0 or 1. We will assume that the response i Y ϭ␤ ϩ␤ x ϩ⑀ (11-51) variable Y is a Bernoulli• random variablei with0 probability1 i i distribution as follows: i logistic regression: Yi ϭ␤0 ϩ␤1 xi ϩ⑀i (11-51) and the response• variableresponse variable Y only takes two values, {0, 1} Yi takes on the values either 0 or 1. We will assume that the response variableand Y itheis a response Bernoulli variable random Y ivariabletakes on withthe values probability either distribution 0 or 1. We aswill follows: assume that the response variable Yi is a BernoulliYi randomProbability variable with as follows:

1 P Yi ϭ 1 ϭ␲i 0 Yi P Y ϭ 0Probabilityϭ 1 Ϫ␲ 1 i 2 i 1 PYiYi ϭ 1 ϭ␲Probabilityi 0 1 P1 Y ϭ2 0P ϭY 1ϭϪ␲1 ϭ␲ 1 i 2 i i i 0 P Y ϭ 0 ϭ 1 Ϫ␲ 1 2 1 i 2 i 17 Now since E ⑀i ϭ 0, the expected value of the response variable is 1 2 Now since1 2 E ⑀i ϭ 0, the expected value of the response variable is E Yi ϭ 1 ␲i ϩ 0 1 Ϫ␲i ⑀ ϭ Now since1 2 E i 0, the expected value of the response variable is E ϭ␲Y ϭ 1 ␲ ϩ 0 1 Ϫ␲ 1 2 i i1 2 i 1 2 i 1 2 ϭ␲i This implies that 1 2 E Y1 i 2ϭ 1 ␲1 i ϩ 02 1 Ϫ␲i This implies that ϭ␲i E Yi ϭ␤1 02 ϩ␤1xi 2ϭ␲i 1 2 This implies that E Yi ϭ␤0 ϩ␤1xi ϭ␲i 1 2 1 2 E Yi ϭ␤0 ϩ␤1xi ϭ␲i 1 2 JWCL232_c11_401-448.qxd 1/14/10 8:05 PM Page 441

JWCL232_c11_401-448.qxd 1/14/10 8:04 PM Page 440

11-10 LOGISTIC REGRESSION 441 440 CHAPTER 11 SIMPLE LINEAR REGRESSION AND CORRELATION

This means that the expected response given by the response function E(Yi) ϭ␤0 ϩ␤1xi is (a) Draw a scatter diagram of y versus x. Customer xyCustomer xyjust the probability that the response variable takes on the value 1. (b) Fit the simple linear regression model. 11 1818 5.84 36 463 0.51 (c)There Test for are significance some substantive of regression problems using ␣ϭ 0.05.with the regression model in Equation 11-51. First, 12 1700 5.21 37 770 1.74 note(d) that Plot if the the residuals response versus is yˆbinary,i and comment the error on the terms underly- ⑀i can only take on two values, namely, 13 747 3.25 38 724 4.10 ing regression assumptions. Specifically, does it seem that 14 2030 4.43 39 808 3.94 the equality of variance assumption is satisfied? ⑀i ϭ 1 Ϫ ␤0 ϩ␤1 xi when Yi ϭ 1 JWCL232_c11_401-448.qxd 1/14/10 8:0515 PM Page 1643 441 3.16 40 790 0.96 (e) Find a simple linear regression model using 1y as the response. Does this transformation⑀ ϭϪ on␤ yϩ␤stabilize the in- ϭ 16 414 0.50 41 783 3.29 i 01 1 xi 2 when Yi 0 17 354 0.17 42 406 0.44 equality of variance problem noted in part (d) above? 18 1276 1.88 43 1242 3.24 Consequently, the errors in this model1 cannot2 possibly be normal. Second, the error variance JWCL232_c11_401-448.qxd 1/14/10 8:05 PM Page 441 19 745 0.77 44 658 2.14 is not constant, since 20 795 3.70 45 1746 5.71 21 540 0.56 46 895 4.12 ␴2 ϭ E Y Ϫ E Y 2 22 874 1.56 47 1114 1.90 11-10 LOGISTIC REGRESSION Yi 441 i i 23 1543 5.28 48 413 0.51 2 2 ϭ 15 Ϫ␲i ␲1 i 26ϩ 0 Ϫ␲i 1 Ϫ␲i This means24 that 1029 the expected 0.64 response 49 given 1787 by 8.33the response function E(Y ) ϭ␤ ϩ␤x is i 0 ϭ␲1 i i 1 Ϫ␲i 11-10 LOGISTIC REGRESSION 441 just the25 probability 710 that 4.00the response 50 variable 3560 takes 14.94 on the value 1. 1 2 1 2 1 2 There are some substantive problems with the regression model in Equation 11-51. First, NoticeThis that means this that last the expression expected response is just1 given by2 the response function E(Yi) ϭ␤0 ϩ␤1xi is note that if the response is binary, the error terms ⑀i can onlyjust take the on probability two values, that namely,the response variable takes on the value 1. There are some substantive problems␴2 withϭ the regressionϪ model in Equation 11-51. First, yi E Yi 1 E Yi 11-10 LOGISTIC REGRESSION ⑀ ⑀i ϭ 1 Ϫ ␤0 ϩ␤1 xi when Ynotei ϭ that1 if the response is binary, the error terms i can only take on two values, namely, since E Y ϭ␤ ϩ␤x ϭ␲. This indicates1 23that the 1variance24 of the observations (which is ⑀ ϭϪ␤ ϩ␤ x when Y ϭ 0i 0 1 i i Lineari regression01 often1 i works2 very welli when the response variable ⑀i ϭ is1 Ϫquantitative.␤0 ϩ␤1 xi We now when Yi ϭ 1 the same as the variance of the errors because ⑀i ϭ Yi Ϫ␲i, and ␲i is a constant) is a function consider the situation where the response variable takes on only⑀ ϭϪtwo possible␤ ϩ␤ values, 0 and ϭ of the mean.1 2 Finally, there isi a constraint01 1 xi on2 the whenresponse Yi 0function, because Consequently, the errors1. in These this model could1 becannot arbitrary2 possibly assignments be normal. resulting Second, from the observing error variance a qualitative response. is not constant, since For example, the response could be theConsequently, outcome of the a functionalerrors in this electrical model1 cannot test2 onpossibly a semi- be normal. Second, the error variance conductor device for which the results isare not either constant, a “success,” since which means 0theՅ deviceE Yi worksϭ␲i Յ 1 ␴2 ϭ Ϫ 2 properly, Yi Eor Ya i“failure,”E Yi which could be due to a short, an open, or some other functional This restriction can cause ␴serious2 ϭ E YproblemsϪ E Y 21 with2 the choice of a linear response function, as problem. Yi i i ϭ Ϫ␲ 2␲ ϩ Ϫ␲ 2 Ϫ␲ 15 i 1 i 26 0 iwe 1have initiallyi assumed in Equation 11-51.2 It would2 be possible to fit a model to the data for Suppose that the model has the form ϭ 15 Ϫ␲i ␲1 i 26ϩ 0 Ϫ␲i 1 Ϫ␲i ϭ␲i 1 Ϫ␲i which the predicted values ofϭ␲ the responseϪ␲ lie outside the 0, 1 interval. 1 2 1 2 1 2 1 i 1 2 i 1 2 1 2 Yi ϭ␤Generally,0 ϩ␤1 x iwhenϩ⑀i the response variable is (11-51)binary, there is considerable empirical evidence Notice that this last expression is just1 2 indicatingNotice that that this the last shapeexpression of is the just1 response2 function should be nonlinear. A monotonically and the response variable Y takes on the values either 0 or 1. We will assume that the response 2 i 2 ␴ ϭ Ϫ increasing (or decreasing) S-shaped␴ y(orϭ reverseE Yi 1 ϪS-shaped)E Yi function, such as shown in Figure 11-19, variable Y is ay i BernoulliE Yi 1randomE Y ivariable with probability distribution asi follows: i is usually employed. This function is called the logit response function, and has the form since E Y ϭ␤ ϩ␤x ϭ␲. This indicates1 23that the 1variance24 since of E theYi observationsϭ␤0 ϩ␤1xi ϭ␲ (whichi. This is indicates1 23that the 1variance24 of the observations (which is i 0 1 i i the same as the variance of the errors because ⑀ ϭ Y Ϫ␲, and ␲ is a constant) is a function i expi ␤ iϩ␤ xi the same as theJWCL232_c11_401-448.qxd variance of the errors because 1/15/10 ⑀i ϭ Y4:54i Ϫ␲Y PMi, and Page ␲Probabilityi is1442 a2 constant) is a function 0 1 1 2 i of the mean. Finally, there is a constraintE Y onϭ the response function, because (11-52) of the mean. Finally, there is a constraint on the response function, because 1 ϩ exp ␤0 ϩ␤1 x 1 P Yi ϭ 1 ϭ␲i 1 2 0 Յ E Yi ϭ␲i Յ 1 0 P Yi ϭ 0 ϭ 1 Ϫ␲i 1 2 0 Յ E Yi ϭ␲i Յ 1 1 2 1 2 This restriction can cause serious problems1 with2 the choice of a linear response function, as 1.2 1 2 1.2 This restriction can cause serious problems1 withLogistic2 the choice ofwe aresponse have linear initially response assumed function, functionin Equationas 11-51. It would be possible to fit a model to the data for 1 which the predicted values of the response lie outside1 the 0, 1 interval. we have initially assumedNow in Equationsince E ⑀ 11-51.ϭ 0, theIt would expected be possiblevalue of theto fitresponse a model variable to the isdata for 442 CHAPTERi 11 SIMPLE LINEAR REGRESSIONGenerally, AND when CORRELATION the response variable is binary, there is considerable empirical evidence which the predicted values of the response lie outside the 0, 1 interval. 1 2 0.8 indicating that the shape of the response function0.8 should be nonlinear. A monotonically Generally, when the response variable is binary, thereE Yi isϭ increasingconsiderable1 ␲i ϩ (or0 decreasing)1 empiricalϪ␲i S-shaped evidence (or reverse S-shaped) function, such as shown in Figure 11-19, or equivalently, indicating that the shape of the response0.6 function should ϭ␲ is be usually nonlinear. employed. A This monotonically function is called the logit0.6 response function, and has the form E(Y) 1 2 i1 2 1 2 E(Y) increasing (or decreasing) S-shaped (or reverse S-shaped) function, such as shown in Figure 11-19, This implies that 1 exp ␤0 ϩ␤1 x is usually employed. This function is called0.4 the logit response function, andE hasY ϭ the formE Y ϭ 0.4 (11-53) (11-52) 1 ϩ exp Ϫ ␤ 1ϩ␤ϩ expx ␤0 ϩ␤1 x E Yi ϭ␤0 ϩ␤1xi ϭ␲i 0 11 2 0.2 0.2 exp ␤0 ϩ␤1 x 1 2 1 2 E Y ϭ 1 2 (11-52)3 1 241 2 In logistic1.2 regression we assume that E(Y) is related to1.2 x by the logit function. It is easy to 10ϩ exp ␤0 ϩ␤1 x 0 0show1 2that 42 6 8 10 12 14 02468101214 1 2 1 1 1 2 x x 1.2 1.2 0.8 (a) E Y 0.8 (b) ϭ ␤ ϩ␤ exp 0 1x Ϫ6.0Ϫ1.0x (11-54) Ϫ6.0ϩ1.0x 0.6 1 Ϫ E Y 0.6 1 Figure 11-19E1(Y) Examples of the logistic response1 2 function. E(a)(Y )E Y ϭ 1ր 1 ϩ e , (b) E Y ϭ 1ր 1 ϩ e . 0.4 1 0.4 2 0.8 0.8 1 2 1 2 1 2 1 2 1 2 The quantity exp( ␤0 ϩ␤1x ) on the right-hand side of Equation 11-54 is called the . It has0.2 a straightforward interpretation: If the odds ratio is0.2 2 for a particular value of x, it means E(Y) 0.6 E(Y) 0.6 that 0a success is twice as likely as a failure at that value0 of the regressor x. Notice that the natural0 logarithm 2 4of the 6odds ratio 8 is 10 a linear 12 function 14 of the02468101214 regressor variable. Therefore the 0.4 0.4 x x 18 slope ␤1 is the change in the(a) log odds that results from a one-unit increase in x. This(b) means that 0.2 0.2 ␤ e 1 Ϫ6.0Ϫ1.0x Ϫ6.0ϩ1.0x Figurethe odds 11-19 ratioExamples changes of bythe logistic when response x increases function. by (a) one E Y unit.ϭ 1ր 1 ϩ e , (b) E Y ϭ 1ր 1 ϩ e . 0 0 The parameters in this logistic regression model are usually estimated by the method of 0 2 4 6 8 10 12 14 maximum02468101214 likelihood. For details of the procedure,1 2 see Montgomery,1 2 Peck,1 2 and Vining1 2 x (2006). Minitab will fit xlogistic regression models and provide useful information on the (a) quality of the fit. (b) Figure 11-19 Examples of the logistic response function. (a) E Y ϭ 1ր 1Weϩ willeϪ6.0 illustrateϪ1.0x , (b) logistic E Y ϭ regression1ր 1 ϩ eϪ using6.0ϩ1.0 thex . data on launch temperature and O-ring fail- ure for the 24 space shuttle launches prior to the Challenger disaster of January 1986. There 1 2 are 1six O-rings used2 to seal1 2field joints1 on the rocket2 motor assembly. The table below presents the launch temperatures. A 1 in the “O-Ring Failure” column indicates that at least one O-ring failure had occurred on that launch.

O-Ring O-Ring O-Ring Temperature Failure Temperature Failure Temperature Failure 53 1 68 0 75 0 56 1 69 0 75 1 57 1 70 0 76 0 63 0 70 1 76 0 66 0 70 1 78 0 67 0 70 1 79 0 67 0 72 0 80 0 67 0 73 0 81 0

Figure 11-20 is a scatter plot of the data. Note that failures tend to occur at lower temperatures. The logistic regression model fit to this data from Minitab is shown in the following boxed display. The fitted logistic regression model is

1 yˆ ϭ 1 ϩ exp Ϫ 10.875 Ϫ 0.17132x

3 1 24 JWCL232_c11_401-448.qxd 1/15/10 4:54 PM Page 442

442 CHAPTER 11 SIMPLE LINEAR REGRESSION AND CORRELATION

or equivalently,

1 E Y ϭ (11-53) 1 ϩ exp Ϫ ␤0 ϩ␤1x JWCL232_c11_401-448.qxdJWCL232_c11_401-448.qxd 1/14/10 1/14/10 8:05 8:05 PM PM Page Page 443 443 1 2 In logistic regression we assume that E(Y) is related3 1 to x by24 the logit function. It is easy to

JWCL232_c11_401-448.qxd 1/15/10 4:54 PM Page 442 show that

E Y ϭ exp ␤ ϩ␤x (11-54) 1 Ϫ E Y 0 1 11-1011-10 LOGISTIC LOGISTIC1 2 REGRESSION REGRESSION 443443 1 2 1 2 442 CHAPTER 11 SIMPLE LINEAR REGRESSION AND CORRELATION The quantity exp( ␤0 ϩ␤1x ) on the right-hand side of Equation 11-54 is called the odds ratio. BinaryBinary Logistic Logistic Regression: Regression:It has O-Ring a O-Ring straightforward Failure Failure versus versus interpretation: Temperature Temperature If the odds ratio is 2 for a particular value of x, it means or equivalently, that a success is twice as likely as a failure at that value of the regressor x. Notice that the LinkLink Function: Function: Logit Logitnatural logarithm of the odds ratio is a linear function of the regressor variable. Therefore the 1 ResponseResponse Information Informationϭ E Y slope ␤1 is the change in the(11-53) log odds that results from a one-unit increase in x. This means that 1 ϩ exp Ϫ ␤0 ϩ␤1x ␤1 VariableVariable1 2 Value Valuethe Count Count odds ratio changes by e when x increases by one unit. In logistic regressionO-RingweO-Ring assume F F that 1 E( 1Y) is related3 1 to xThe7(Event)by7(Event)24 the parameters logit function. in Itthis is easy logistic to regression model are usually estimated by the method of show that 0101maximum7 7 likelihood. For details of the procedure, see Montgomery, Peck, and Vining TotalTotal(2006). 24 24 Minitab will fit logistic regression models and provide useful information on the E Y ϭ exp ␤quality0 ϩ␤1x of the fit. (11-54) LogisticLogistic Regression1 RegressionϪ E Y Table Table 1 2 1 We2 will illustrate logistic regression using the data on launch temperature and O-ring fail- 1 2 OddsOdds 95% 95% CI CI The quantity exp( ␤0 ϩ␤1x ) on the right-hand side ofure Equation for the 11-54 24 isspace called shuttlethe odds launchesratio. prior to the Challenger disaster of January 1986. There It has a straightforwardPredictor interpretation:Predictor If the Coef odds Coef ratioare SE is SE six2Coef for Coef O-rings a particular used Z value Z to of seal x, P it fieldmeans P Ratio joints Ratio on Lower the Lower rocket Upper motor Upper assembly. The table below presents that a success is twiceConstant asConstant likely as a failure 10.875 10.875 at that thevalue 5.703launch of 5.703 the regressortemperatures. 1.91 1.91 x. Notice 0.057 0.057 A that1 in the the “O-Ring Failure” column indicates that at least one O-ring natural logarithm of the odds ratio is a linear function of the regressor variable. Therefore the TemperatTemperat Ϫ0.17132Ϫ0.17132failure 0.08344 0.08344 had occurredϪ2.05Ϫ2.05 on 0.040 0.040that launch. 0.84 0.84 0.72 0.72 0.99 0.99 slope ␤1 is the change in the log odds that results from a one-unit increase in x. This means that the odds ratio changesLog-Likelihood byLog-Likelihood e␤1 when x increasesϭϪϭϪ 11.515by 11.515one unit. The parameters Testin thisTest that logistic that all all slopesregression are aremodel zero: zero: are G usuallyGϭ ϭ5.944,5.944, estimated DF DFϭ byϭ1, the 1,P-Value method P-Value ofϭϭ0.0150.015 maximum likelihood. For details of the procedure, see Montgomery, Peck, and Vining (2006). Minitab will fit logistic regression models and provide useful information on the quality of the fit. O-Ring O-Ring O-Ring We will illustrate logistic regression using the dataTemperature on launch temperature Failure and O-ring fail- Temperature Failure Temperature Failure ure for the 24 space shuttle launches prior to the Challenger disaster of January 1986. There 53ˆ ␤ˆ ˆ ␤ˆ 1ϭ 68 0ˆ ␤ˆ 75 0 are six O-rings usedThe toThe seal standard field standard joints error on error the ofrocket of the themotor slope slope assembly. ␤ 1 is1 Theisse (se ␤table1() 1 ϭbelow) 0.08344. presents0.08344. For For large large samples, samples, ␤ 1 has1 has an an Example ˆ ˆ ˆ ˆ the launch temperatures.approximateapproximate A 1 in the “O-Ringnormal normal Failure”distribution, distribution, column 56 andindicates and so so␤ that1 ͞␤se 1at͞( seleast␤1(1)␤ 1canone) can O-ringbe becompared compared69 to tothe the standard standard0 normal normal 75 1 failure had occurreddistribution ondistribution that launch. to totest test H0 H: ␤0:1 ␤ϭ1 ϭ0. 0.Minitab Minitab57 performs performs this1 this test. test. The The P-value P70-value is 0.04,is 0.04, indicating indicating0 that that 76 0 • Failure of machine vs temperaturetemperaturetemperature has has a significant a significant effect effect63 on on the the probability probability0 of ofO-ring O-ring failure. 70failure. The The odds odds ratio 1ratio is 0.84,is 0.84, 76 0 so soevery every one one degree degree increase increase in intemperature66 temperature reduces reduces0 the the odds odds of offailure70 failure by by 0.84. 0.84. Figure Figure1 11-21 11-21 78 0 showsO-Ringshows the the fitted fitted logistic logistic regression O-Ring regression 67 model. model. The The sharp0 sharp O-Ring increase increase in70 inthe the probability probability 1of ofO-ring O-ring 79 0 Temperaturefailure Failurefailure is veryis Temperaturevery evident evident in inthis Failurethis graph. graph.67 The Temperature The actual actual temperature temperature0 Failure at theat the Challenger72 Challengerlaunchlaunch0 was was 31 31ЊF .ЊF . 80 0 53 ThisThis1 is wellis well outside outside68 the the range0 of ofother67 other launch launch75 temperatures, temperatures,0 0 so soour our 73logistic logistic regression regression0 model model is is 81 0 56 notnot1 likely likely to toprovide provide69 highly highly accurate0 accurate predictions predictions75 at thatat that temperature,1 temperature, but but it isit clearis clear that that a launch a launch 57 at 31at1 31ЊFЊ isF almost is almost70 certainly certainly going0 going to toresult result76 in inO-ring O-ring failure. failure.0 63 0 It isIt isinteresting interesting70 to tonote note 1that that all all of ofthese76 these data data were were0 available available prior priorto tolaunch. launch. However, However, 66 0 70 1 78 0 67 engineersengineers0 were were70 unable unable to toeffectivelyFigure 1effectively 11-20 analyze analyze is79 a scatter the the data dataplot 0and ofand usethe use themdata. them Noteto toprovide providethat failures a convincinga convincing tend to occur at lower temperatures. 67 argumentargument0 against against72 launching launchingThe Challenger0 Challenger logistic regressionto80 toNASA NASA managers. model managers.0 fit Yet to Yet thisa simplea simpledata regressionfrom regression Minitab analysis analysis is shown in the following boxed 67 0 73 display.0 81 0 The fitted logistic regression model is

1.01.0 1.01.0 Figure 11-20 is a scatter plot of the data. Note that failures tend to occur at lower temperatures. 1 The logistic regression model fit to this data from Minitab is shown in the followingyˆ boxedϭ display. 1 ϩ exp Ϫ 10.875 Ϫ 0.17132x The fitted logistic regression model is 0.50.5 0.50.5 3 1 24 1 yˆ ϭ O-ring failure O-ring failure (O-ring failure) 1 ϩ exp Ϫ 10.875 Ϫ 0.17132x (O-ring failure) P P 3 1 24

0.00.0 0.00.0 50 50 60 60 70 70 80 80 50 50 60 60 70 70 80 80 TemperatureTemperature TemperatureTemperature 19 FigureFigure 11-20 11-20ScatterScatter plot plot of O-ringof O-ring failures failures FigureFigure 11-21 11-21ProbabilityProbability of ofO-ring O-ring failure failure versus versus versusversus launch launch temperature temperature for for 24 24space space shuttle shuttle launchlaunch temperature temperature (based (based on ona logistic a logistic regression regression flights.flights. model).model).