CS 147: Computer Systems Performance Analysis Multiple and Categorical Regression 2015-06-15
Total Page:16
File Type:pdf, Size:1020Kb
CS147 CS 147: Computer Systems Performance Analysis Multiple and Categorical Regression 2015-06-15 CS 147: Computer Systems Performance Analysis Multiple and Categorical Regression 1 / 36 Overview CS147 Overview Multiple Linear Regression Basic Formulas Example Quality of the Example Overview Categorical Models 2015-06-15 Multiple Linear Regression Basic Formulas Example Quality of the Example Categorical Models 2 / 36 Multiple Linear Regression Multiple Linear Regression CS147 Multiple Linear Regression Multiple Linear Regression I Develops models with more than one predictor variable I But each predictor variable has linear relationship to response variable I Conceptually, plotting a regression line in n-dimensional Multiple Linear Regression space, instead of 2-dimensional 2015-06-15 I Develops models with more than one predictor variable I But each predictor variable has linear relationship to response variable I Conceptually, plotting a regression line in n-dimensional space, instead of 2-dimensional 3 / 36 Multiple Linear Regression Basic Formulas Basic Multiple Linear Regression Formula CS147 Basic Multiple Linear Regression Formula Multiple Linear Regression Response y is a function of k predictor variables x1; x2;:::; xk Basic Formulas y = b0 + b1x1 + b2x2 + ··· + bk xk + e Basic Multiple Linear Regression Formula 2015-06-15 Response y is a function of k predictor variables x1; x2;:::; xk y = b0 + b1x1 + b2x2 + ··· + bk xk + e 4 / 36 Multiple Linear Regression Basic Formulas CS147 A Multiple Linear Regression Model A Multiple Linear Regression Model Given sample of n observations Multiple Linear Regression f(x11; x21;:::; xk1; y1);:::; (x1n; x2n;:::; xkn; yn)g model consists of n equations (note possible + vs. − typo in Basic Formulas book): y1 = b0 + b1x11 + b2x21 + ··· + bk xk1 + e1 y2 = b0 + b1x12 + b2x22 + ··· + bk xk2 + e2 . A Multiple Linear Regression Model . yn = b0 + b1x1n + b2x2n + ··· + bk xkn + en Given sample of n observations 2015-06-15 f(x11; x21;:::; xk1; y1);:::; (x1n; x2n;:::; xkn; yn)g model consists of n equations (note possible + vs. − typo in book): y1 = b0 + b1x11 + b2x21 + ··· + bk xk1 + e1 y2 = b0 + b1x12 + b2x22 + ··· + bk xk2 + e2 . yn = b0 + b1x1n + b2x2n + ··· + bk xkn + en 5 / 36 Multiple Linear Regression Basic Formulas Looks Like It’s Matrix Arithmetic Time CS147 Looks Like It’s Matrix Arithmetic Time Multiple Linear Regression y = Xb + e 2 3 2 3 2 3 2 3 y1 1 x11 x21 ::: xk1 b0 e0 6y27 61 x12 x22 ::: xk27 6b17 6e17 6 7 = 6 7 6 7 + 6 7 6 . 7 6 . 7 6 . 7 6 . 7 Basic Formulas 4 . 5 4 . 5 4 . 5 4 . 5 yn 1 x1n x2n ::: xkn bk en Note that: Looks Like It’s Matrix Arithmetic Time I y and e have n elements I b has k + 1 2015-06-15 I x is k by n y = Xb + e 2 3 2 3 2 3 2 3 y1 1 x11 x21 ::: xk1 b0 e0 6y27 61 x12 x22 ::: xk27 6b17 6e17 6 7 = 6 7 6 7 + 6 7 6 . 7 6 . 7 6 . 7 6 . 7 4 . 5 4 . 5 4 . 5 4 . 5 yn 1 x1n x2n ::: xkn bk en Note that: I y and e have n elements I b has k + 1 I x is k by n 6 / 36 Multiple Linear Regression Basic Formulas Analysis of Multiple Linear Regression CS147 Analysis of Multiple Linear Regression Multiple Linear Regression I Listed in box 15.1 of Jain I Not terribly important (for our purposes) how they were derived Basic Formulas I This isn’t a class on statistics I But you need to know how to use them Analysis of Multiple Linear Regression I Mostly matrix analogs to simple linear regression results 2015-06-15 I Listed in box 15.1 of Jain I Not terribly important (for our purposes) how they were derived I This isn’t a class on statistics I But you need to know how to use them I Mostly matrix analogs to simple linear regression results 7 / 36 Multiple Linear Regression Example Example of Multiple Linear Regression CS147 Example of Multiple Linear Regression Multiple Linear Regression I IMDB keeps numerical popularity ratings of movies I Postulate popularity of Academy Award-winning films is based on two factors: I Year made Example I Running time I Produce a regression Example of Multiple Linear Regression rating = b0 + b1(year) + b2(length) 2015-06-15 I IMDB keeps numerical popularity ratings of movies I Postulate popularity of Academy Award-winning films is based on two factors: I Year made I Running time I Produce a regression rating = b0 + b1(year) + b2(length) 8 / 36 Multiple Linear Regression Example Some Sample Data CS147 Some Sample Data Multiple Linear Regression Title Year Length Rating Silence of the Lambs 1991 118 8.1 Terms of Endearment 1983 132 6.8 Rocky 1976 119 7.0 Example Oliver! 1968 153 7.4 Marty 1955 91 7.7 Gentleman’s Agreement 1947 118 7.5 Mutiny on the Bounty 1935 132 7.6 Some Sample Data It Happened One Night 1934 105 8.0 2015-06-15 Title Year Length Rating Silence of the Lambs 1991 118 8.1 Terms of Endearment 1983 132 6.8 Rocky 1976 119 7.0 Oliver! 1968 153 7.4 Marty 1955 91 7.7 Gentleman’s Agreement 1947 118 7.5 Mutiny on the Bounty 1935 132 7.6 It Happened One Night 1934 105 8.0 9 / 36 Multiple Linear Regression Example Now for Some Tedious Matrix Arithmetic CS147 Now for Some Tedious Matrix Arithmetic Multiple Linear Regression T T T −1 T I We need to calculate X, X , X X, (X X) , and X y T −1 T I Because b = (X X) (X y) I We will see that b = (18:5430; −0:0051; −0:0086) Example I Meaning the regression predicts: Now for Some Tedious Matrix Arithmetic rating = 18:5430 − 0:0051(year) − 0:0086(length) 2015-06-15 T T T −1 T I We need to calculate X, X , X X, (X X) , and X y T −1 T I Because b = (X X) (X y) I We will see that b = (18:5430; −0:0051; −0:0086) I Meaning the regression predicts: rating = 18:5430 − 0:0051(year) − 0:0086(length) 10 / 36 Multiple Linear Regression Example X Matrix for Example CS147 X Matrix for Example Multiple Linear Regression 2 1 1991 118 3 6 1 1983 132 7 6 7 6 1 1976 119 7 6 7 6 1 1968 153 7 X = 6 7 Example 6 1 1955 91 7 6 7 6 1 1947 118 7 6 7 4 1 1935 132 5 X Matrix for Example 1 1934 105 2015-06-15 2 1 1991 118 3 6 1 1983 132 7 6 7 6 1 1976 119 7 6 7 6 1 1968 153 7 X = 6 7 6 1 1955 91 7 6 7 6 1 1947 118 7 6 7 4 1 1935 132 5 1 1934 105 11 / 36 Multiple Linear Regression Example Transpose to Get XT CS147 Transpose to Get XT Multiple Linear Regression 2 1 1 1 1 1 1 1 1 3 T Example X = 41991 1983 1976 1968 1955 1947 1935 1934 5 118 132 119 153 91 118 132 105 Transpose to Get XT 2015-06-15 2 1 1 1 1 1 1 1 1 3 T X = 41991 1983 1976 1968 1955 1947 1935 1934 5 118 132 119 153 91 118 132 105 12 / 36 Multiple Linear Regression Example Multiply To Get XTX CS147 Multiply To Get XTX Multiple Linear Regression 2 8 15689 968 3 T X X = 4 15689 30771385 1899083 5 Example 968 1899083 119572 Multiply To Get XTX 2015-06-15 2 8 15689 968 3 T X X = 4 15689 30771385 1899083 5 968 1899083 119572 13 / 36 Multiple Linear Regression Example Invert to Get C = (XTX)−1 CS147 Invert to Get C = (XTX)−1 Multiple Linear Regression 2 1207.7585 -0.6240 0.1328 3 T −1 C = (X X) = 4 -0.6240 0.0003 -0.0001 5 Example 0.1328 -0.0001 0.0004 Invert to Get C = (XTX)−1 2015-06-15 2 1207.7585 -0.6240 0.1328 3 T −1 C = (X X) = 4 -0.6240 0.0003 -0.0001 5 0.1328 -0.0001 0.0004 14 / 36 Multiple Linear Regression Example Multiply to Get XTy CS147 Multiply to Get XTy Multiple Linear Regression 2 60.1 3 T X y = 4 117840.7 5 Example 7247.5 Multiply to Get XTy 2015-06-15 2 60.1 3 T X y = 4 117840.7 5 7247.5 15 / 36 Multiple Linear Regression Example Multiply (XTX)−1(XTy) to Get b CS147 Multiply (XTX)−1(XTy) to Get b Multiple Linear Regression 2 18.5430 3 b = 4 -0.0051 5 Example -0.0086 Multiply (XTX)−1(XTy) to Get b 2015-06-15 2 18.5430 3 b = 4 -0.0051 5 -0.0086 16 / 36 Multiple Linear Regression Quality of the Example How Good Is This Regression Model? CS147 How Good Is This Regression Model? Multiple Linear Regression I How accurately does model predict film rating based on age and running time? I Best way to determine this analytically is to calculate errors: Quality of the Example SSE = yTy − bTXTy or X 2 How Good Is This Regression Model? SSE = ei 2015-06-15 I How accurately does model predict film rating based on age and running time? I Best way to determine this analytically is to calculate errors: SSE = yTy − bTXTy or X 2 SSE = ei 17 / 36 Multiple Linear Regression Quality of the Example Calculating the Errors CS147 Calculating the Errors Estimated 2 Multiple Linear Regression Year Length Rating Rating ei e i 1991 118 8.1 7.4 −0:71 0.51 1983 132 6.8 7.30 :51 0.26 1976 119 7.0 7.50 :45 0.21 Quality of the Example 1968 153 7.4 7.2 −0:20 0.04 1955 91 7.7 7.80 :10 0.01 1947 118 7.5 7.60 :11 0.01 1935 132 7.6 7.6 −0:05 0.00 Calculating the Errors 1934 105 8.0 7.8 −0:21 0.04 2015-06-15 Estimated 2 Year Length Rating Rating ei ei 1991 118 8.1 7.4 −0:71 0.51 1983 132 6.8 7.30 :51 0.26 1976 119 7.0 7.50 :45 0.21 1968 153 7.4 7.2 −0:20 0.04 1955 91 7.7 7.80 :10 0.01 1947 118 7.5 7.60 :11 0.01 1935 132 7.6 7.6 −0:05 0.00 1934 105 8.0 7.8 −0:21 0.04 18 / 36 Multiple Linear Regression Quality of the Example Calculating the Errors, Continued CS147 Calculating the Errors, Continued Multiple Linear Regression I SSE = 1:08 P 2 I SSY = yi = 452:91 2 I SS0 = ny = 451:5 Quality of the Example I SST = SSY − SS0 = 452:9 − 451:5 = 1:4 I SSR = SST − SSE = 0:33 2 SSR 0:33 I R = = = 0:23 SST 1:41 Calculating the Errors, Continued I In other words, this regression stinks 2015-06-15 I SSE = 1:08 P 2 I SSY = yi = 452:91 2 I SS0 = ny = 451:5 I SST = SSY − SS0 = 452:9 − 451:5 = 1:4 I SSR = SST − SSE = 0:33 2 SSR 0:33 I R = = = 0:23 SST 1:41 I In other words, this regression stinks 19 / 36 Multiple Linear Regression Quality of the Example CS147 Why Does It Stink? Why Does It Stink? I Let’s