Lecture 14 Multiple Linear Regression and Logistic Regression

Lecture 14 Multiple Linear Regression and Logistic Regression Fall 2013 Prof. Yao Xie, [email protected] H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech H1 Outline • Multiple regression • Logistic regression H2 Simple linear regression Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is related to X by the following simple linear regression model: Response Regressor or Predictor Yi = β0 + β1X i + εi i =1,2,!,n 2 ε i 0, εi ∼ Ν( σ ) Intercept Slope Random error where the slope and intercept of the line are called regression coefficients. • The case of simple linear regression considers a single regressor or predictor x and a dependent or response variable Y. H3 Multiple linear regression • Simple linear regression: one predictor variable x • Multiple linear regression: multiple predictor variables x1, x2, …, xk • Example: • simple linear regression property tax = a*house price + b • multiple linear regression property tax = a1*house price + a2*house size + b • Question: how to fit multiple linear regression model? H4 JWCL232_c12_449-512.qxd 1/15/10 10:06 PM Page 450 450 CHAPTER 12 MULTIPLE LINEAR REGRESSION 12-2 HYPOTHESIS TESTS IN MULTIPLE 12-4 PREDICTION OF NEW LINEAR REGRESSION OBSERVATIONS 12-2.1 Test for Significance of 12-5 MODEL ADEQUACY CHECKING Regression 12-5.1 Residual Analysis 12-2.2 Tests on Individual Regression 12-5.2 Influential Observations Coefficients and Subsets of Coefficients 12-6 ASPECTS OF MULTIPLE REGRESSION MODELING 12-3 CONFIDENCE INTERVALS IN MULTIPLE LINEAR 12-6.1 Polynomial Regression Models REGRESSION 12-6.2 Categorical Regressors and 12-3.1 Confidence Intervals on Indicator Variables JWCL232_c12_449-512.qxdIndividual Regression 1/15/10 10:0712-6.3 PM Selection Page 451 of Variables and Coefficients Model Building JWCL232_c12_449-512.qxd 1/15/10 10:07 PM Page 451 JWCL232_c12_449-512.qxd 1/15/10 10:07 PM Page 451 12-3.2 Confidence Interval on 12-6.4 Multicollinearity JWCL232_c12_449-512.qxd 1/15/10 10:07 PM Pagethe 451Mean Response LEARNING OBJECTIVES 12-1 MULTIPLE LINEAR REGRESSION MODEL 451 12-1 MULTIPLE LINEAR REGRESSION MODEL 451 12-1 MULTIPLE LINEAR REGRESSION MODEL 451 After careful study of this chapter you should be able to do the following: x2 1. Use multiple regression techniques12-1 MULTIPLEto build LINEARempirical REGRESSION models MODELto engineering 451 and scientific10 x2 220 data x2 10 10 240x2 8 2. Understand how the method of least squares extends to fitting220 multiple220 regression models 20010 203 240 220 240 3. Assess regression88 model adequacy 240 160 203 6 200200 8 203 4. Test hypotheses and construct confidence intervals on the regression coefficients 186 200 E(Y) 120 203 160 66 160 4 160 5. Use the regression model to estimate806 the mean response and to make predictions and to construct E(Y)E(Y)120120 186 186 E(Y) 120 confidence intervals and prediction40 intervals 186 10 80 44 169 80 4 8 2 80 6. Build regression10 models with polynomial0 terms 6 JWCL232_c12_449-512.qxd4040 1/15/10 10:07 PM 10 Page 451 169 169 4 40 88 2 10 0 7. Use indicator variables2 to model categorical2 regressors 2 169x2 67 84 101 118 135 152 0 66 8 2 4 6 0 4 6 8 0 0 0 0 0 4 x1 10 2 2 8.2 Use stepwisex2 regression4 and67 other84 model101 building118 techniques135 to152 select the appropriate set of vari-0 246810x1 44 60 2 2 x2 67 84 10167 11884 101135 118152 135 152 6 8 4 0 002 x2 (a) x1 8 10 0 6 0 2468100 x (b) x1 10x ables for8 a regression10 0 0 model2468101 x (a) 1 0 2468101 x1 (a) (a) (b) Figure 12-1 (a)(b) The regression(b )plane for the model E(Y ) ϭ 50 ϩ 10x1 ϩ 7x2. (b) The contour plot. Figure 12-1 (a) The regression plane for the model E(Y ) ϭ 50 ϩ 10x1 ϩ 7x2. (b) The contour plot. Figure 12-1 (a) The regressionFigure 12-1 plane(a) for The the regression model E plane(Y ) ϭ for50 theϩ model10x1 ϩE(Y7)x2ϭ. (b)50 Theϩ 10 contourx1 ϩ 7x 2plot.. (b) The contour plot. 12-1 MULTIPLE LINEAR REGRESSION MODEL The regression model in Equation 12-1 describes a plane in the three-dimensional space The regression modelThe in regression Equation 12-1model describes in Equation a plane 12-1 in describes the three-dimensional a planeof Y in, x the, and three-dimensional space x . Figure 12-1(a) space shows this plane for the regression model 12-1.1of YThe, x , Introductionregressionand x . Figure model 12-1(a) in Equation shows this 12-1 plane describes for the regression a plane in model the three-dimensional1 space2 1 2 of Y, x1, and x2. Figure 12-1(a) shows this plane for the regression model of Y, x1, and x2. Figure 12-1(a) shows this plane for the regression model 12-1 MULTIPLE LINEAR REGRESSION MODEL 451 E Y ϭ 50 ϩ 10x ϩ 7x E Y ϭ 50 ϩ 10x ϩ 7x 1 2 Many applications of regression1E Y analysisϭ2 50 ϩ 10involvex1 ϩ 7 xsituations2 in which there are more than one E Y ϭ 50 ϩ 10x1 ϩ 7x2 regressor or predictorMultiple variable. Alinear regression modelregressionwhere that contains we have moreassumed modelthan that one the regressor expected1 2 vari- value of the error term is zero; that is E(⑀) ϭ 0. The where we have assumedwhere that we the have expected assumed1 2 value that ofthe the expected error1 2 term value is of zero; the errorthat is term xE2(⑀ )isϭ zero;0. The that is E(⑀) ϭ 0. The able is called 1a multiple2 regression model. whereparameter we have ␤ is assumed the intercept that the␤of expectedthe plane. value We sometimes of the error call term ␤ andis zero; ␤ partial thatparameter is ␤Eregression(⑀) ϭ␤␤00.is The the intercept of the plane. We sometimes call ␤1 and ␤2 partial regression 0 parameterAs an0 example,is• theMultiple linear regression model with two regressors intercept supposeof the that plane. the effective We1 sometimes life2 of call10 a cutting 1 and tool2 partial depends regression on the cutting speed parametercoefficients, ␤0 becauseis the coefficients,intercept ␤ measuresofbecause the the plane. expected ␤ measures We changesometimes the in expected Y percall unit ␤ change1 andchange ␤in2coefficients, partialYinper x whenunit regression change x becauseis in x ␤when1 measures x is the expected change in Y per unit change in x1 when x2 is and1 the tool angle.JWCL232_c12_449-512.qxd A1 multiple regression 1/15/10 model 10:07 that PM might Page1 describe451 2 this relationship1 2 is 220 coefficients,held constant,because and ␤held2 measures␤1 constant,measures the and expectedthe ␤(predictor variables) 2expectedmeasures change change the in expected Y per in unitY perchange change unit in change inY perxheld2 when unit in constant, changex x11whenis held in andx x22iswhen ␤2 measures x1 is held the expected change in Y per unit change in x2 when x1 is held 240 heldconstant. constant, Figure and 12-1(b) ␤constant.measures shows Figure athe contour expected12-1(b) plot shows changeof the a contour regression in Y per plot unit model—thatof changethe regression inconstant. is, x lineswhen model—that of Figure x con-is held 12-1(b) is, lines showsof con- a contour plot of the regression model—that is, lines of con- 2 ϭ␤ ϩ␤ ϩ␤82 ϩ⑀1 stant E(Y ) as a functionstant of E (xY )and as ax function. Notice ofthat x theand contour x . NoticeY lines that 0in the this contour 1plotx1 are lines straight2x 2in this lines. plot are straight lines. (12-1) 200 constant. Figure 12-1(b) shows1 a contour2 plot1 of the2 regression model—thatstant is, E (linesY ) as of a con-function of x1 and x2. Notice that the contour lines in this 203plot are straight lines. stant EIn( Ygeneral,) as a function the dependent ofIn xgeneral,1 and variable x 2the. Notice dependentor responsethat the variable contourY mayor belines response related in this toY plotmayk independent arebeIn relatedstraight general, to orlines. k theindependent dependentor variable or response Y may be related to k independent or where Y represents the tool life, x1 represents the cutting speed, x2 represents the tool angle, 160 regressor variables.regressorThe model variables. The model 6 12-1 MULTIPLE LINEAR REGRESSION MODEL In general, the dependentand ⑀ is a variablerandom erroror response term. ThisY may is abe multiple related tolinearregressor k independent regression variables. modelor The with model two regressors. 451 E(Y) regressor variables. The model 186 120 p ˛ ˛ The term linear is usedϭ␤ becauseϩ␤ Equation˛ ϩ␤˛ 12-1ϩ isϩ␤ a linearϩ⑀ function of the unknown parameters Y ϭ␤0 ϩ␤1x1 ϩ␤Y2x2 ϩ0 p ϩ␤1x1 k xk ϩ⑀2x2 k xk (12-2) x2 (12-2) p ˛ ˛ 10 Y ϭ␤0 ϩ␤1x1 ϩ␤2x2 ϩ ϩ␤k xk ϩ⑀ (12-2) ␤0, ␤1, and ␤2. 4 220 80 ϭ␤ ϩ␤ ϩ␤ ϩ p ϩ␤˛ ˛ ϩ⑀ is calledY a multiple0 1 xlinear1 regression2x2 modelk x withk k regressor variables.(12-2) The parameters ␤ , is called a multiple linear regression model with k regressor240 variables. The parameters ␤j, 8 j is called a multiple linear regression model with k regressor203 variables. The parameters ␤ , ϭ j ϭ 0, 1, p , k, are called the regression200 coefficients.10 This model describes a hyperplane in j 40 j 0, 1, p , k, are called the regression coefficients. This model describes a hyperplane in ␤ 169 is called a multiple thelinear k-dimensional regression spacemodel of with the regressork160regressor variables8 variables.

Lecture 14 Multiple Linear Regression and Logistic Regression

Understanding Linear and Logistic Regression Analyses

Application of General Linear Models (GLM) to Assess Nodule Abundance Based on a Photographic Survey (Case Study from IOM Area, Paciﬁc Ocean)

The Simple Linear Regression Model

Chapter 2 Simple Linear Regression Analysis the Simple

The Conspiracy of Random Predictors and Model Violations Against Classical Inference in Regression 1 Introduction

Design and Analysis of Ecological Data Landscape of Statistical Methods: Part 1

Chapter 11 Autocorrelation

Simple Statistics, Linear and Generalized Linear Models

Binary Response and Logistic Regression Analysis

Skedastic: Heteroskedasticity Diagnostics for Linear Regression

Statistical Analysis of Corpus Data with R a Short Introduction to Regression and Linear Models

Chapter 10 Heteroskedasticity