
Which marketing action does it? Data inspection, a little something about R, linear regression and problems with multicollinearities Inholland University of Applied Sciences International Week 2014 Stefan Etschberger Augsburg University of Applied Sciences Who is talking to you? Stefan Etschberger University degree in mathematics and physics Worked as an engineer in semiconductor industry Back to university as a researcher: doctoral degree in economic science Research focus: marketing research using data analysis Professor of Mathematics and Statistics since 2006 at University of Applied Sciences Augsburg since 2012 Where am I from? City of Augsburg Almost (OK, 2nd place) oldest city in Germany (15 b.C.) Famous for its renaissance architecture and the oldest social housing project in the world (1521) A lot of university students (25.000) And a business school at the Augsburg University of Applied Science Data analysis, Regression and Beyond: Table of Contents 1 Introduction 2 R and RStudio 3 Revision: Simple linear regression 4 Multicollinearity in Regression 1 Introduction Mr. Maier and his cheese Mr. Maier and his data Data analysis, Introduction Regression and Beyond Stefan Etschberger Introduction Mr. Maier and his cheese Mr. Maier and his data R and RStudio Simple linear regression Multicollinearity Supplementary slides After his bachelor’s degree in marketing Mr. Maier took over a respectable cheese dairy in Bavaria Regularly he does marketing focused on distinct towns He uses the phone, e-mail, mail and small gifts for his key customers And he collected data about his spendings per marketing action and his revenues for 30 days after the action took place 5 Data analysis, Mr. Maier’s data Regression and Beyond Stefan Etschberger action revenue telephone e-mail mail gift 1 10193.70 186.20 158.60 26.90 11.10 Introduction Mr. Maier and his cheese 2 4828.20 470.30 55.00 14.40 20.30 Mr. Maier and his data 3 11139.30 41.80 154.70 20.90 12.40 R and RStudio 4 5030.10 530.10 79.80 21.70 17.00 Simple linear regression . Multicollinearity Supplementary slides Goal: Getting to know interesting structure hidden inside data Maybe: Forecast of his revenue as a model dependent of the spendings for his marketing actions Data has been sent the data from his external advertising service provider inside an Excel-file. Mr. Maier runs his data analysis software.... 6 Data analysis, Regression and Beyond: Table of Contents 1 Introduction 2 R and RStudio 3 Revision: Simple linear regression 4 Multicollinearity in Regression 2 R and RStudio What is R? What is RStudio? First steps Data analysis, What is R and why R? Regression and Beyond Stefan Etschberger R is a free Data Analysis Software R is very powerful and widely used Introduction in science and industry R and RStudio What is R? (in fact far more widely than SPSS) What is RStudio? Created in 1993 at the University of First steps Simple linear Auckland regression by Ross Ihaka and Robert Multicollinearity Gentleman Supplementary slides Since then: A lot of people improved the software and wrote thousands of packages for lots of applications source: http://goo.gl/axhGhh Drawback (at first glance): No point and click tool Major advantage (at second thought): No point and click tool 8 graphics source: http://goo.gl/W70kms Data analysis, What is RStudio? Regression and Beyond Stefan Etschberger RStudio is a Integrated Development Environment (IDE) Introduction R and RStudio for using R. What is R? What is RStudio? Works on OSX, Linux First steps and Windows Simple linear regression It’s free as well Multicollinearity Still: You have to write Supplementary slides commands But: RStudio supports you a lot 9 Data analysis, First steps Regression and Beyond Stefan Etschberger Getting to know Introduction RStudio R and RStudio What is R? Code What is RStudio? First steps Console Simple linear Workspace regression Multicollinearity History Supplementary slides Files Plots Packages Help Auto- Completion Data Import 10 Data analysis, data inspection Regression and Beyond Stefan Etschberger # read in data from comma-seperated list MyCheeseData = read.csv(file="Cheese.csv", header=TRUE) # show first few lines of data matrix head(MyCheeseData) Introduction ## phone gift email mail revenue R and RStudio ## 1 29.36 146.1 10.32 13.36 3138 What is R? ## 2 8.75 125.8 11.27 14.72 3728 What is RStudio? ## 3 36.15 124.5 8.45 17.72 3085 First steps ## 4 51.20 129.4 10.27 39.59 4668 Simple linear ## 5 51.36 163.4 8.19 7.57 2286 regression ## 6 34.65 110.0 7.89 21.68 4148 Multicollinearity Supplementary slides # make MyCheeseData the default dataset attach(MyCheeseData) # how many customer data objects do we have? length(revenue) ## [1] 80 # mean, median and standard deviation of revenue data.frame(mean=mean(revenue), median=median(revenue), sd=sd(revenue)) ## mean median sd ## 1 3075 3086 903.4 11 Data analysis, data inspection Regression and Beyond Stefan Etschberger Overview over all variables Introduction R and RStudio summary(MyCheeseData) What is R? What is RStudio? ## phone gift email First steps ## Min. : 0.09 Min. : 32.9 Min. : 0.11 Simple linear ## 1st Qu.:19.41 1st Qu.: 92.1 1st Qu.: 6.62 regression ## Median :32.16 Median :112.4 Median : 8.48 Multicollinearity ## Mean :32.72 Mean :114.7 Mean : 8.40 ## 3rd Qu.:48.23 3rd Qu.:134.2 3rd Qu.:10.43 Supplementary slides ## Max. :73.59 Max. :183.4 Max. :16.93 ## mail revenue ## Min. : 1.82 Min. : 831 ## 1st Qu.:12.68 1st Qu.:2326 ## Median :19.89 Median :3086 ## Mean :19.60 Mean :3075 ## 3rd Qu.:25.55 3rd Qu.:3671 ## Max. :47.47 Max. :4740 12 Data analysis, data inspection Regression and Beyond Stefan Etschberger Boxplots Introduction names=names(MyCheeseData) R and RStudio for(i in 1:5) { What is R? boxplot(MyCheeseData[,i], col="lightblue", lwd=3, main=names[i], cex=1 ) What is RStudio? } First steps Simple linear regression phone gift email mail revenue Multicollinearity Supplementary slides 15 40 60 4000 150 30 10 40 3000 100 20 5 2000 20 10 50 1000 0 0 0 13 Data analysis, Data inspection Regression and Beyond Stefan Etschberger Visualize pairs plot(MyCheeseData, pch=19, col="#8090ADa0") Introduction 50 100 150 0 10 20 30 40 R and RStudio 60 What is R? phone 40 What is RStudio? 20 First steps 0 Simple linear regression 150 gift Multicollinearity 100 Supplementary slides 50 15 email 10 5 0 30 mail 10 0 revenue 3000 1000 0 20 40 60 0 5 10 15 1000 2000 3000 4000 14 Data analysis, data inspection Regression and Beyond Stefan Etschberger Introduction R and RStudio List all Correlations What is R? What is RStudio? First steps cor.MyCheeseData = cor(MyCheeseData) Simple linear cor.MyCheeseData regression Multicollinearity ## phone gift email mail revenue ## phone 1.00000 0.1863 -0.5230 0.09869 -0.2273 Supplementary slides ## gift 0.18630 1.0000 0.5682 -0.11034 0.3220 ## email -0.52299 0.5682 1.0000 0.36645 0.7408 ## mail 0.09869 -0.1103 0.3665 1.00000 0.6508 ## revenue -0.22732 0.3220 0.7408 0.65076 1.0000 15 Data analysis, data inspection Regression and Beyond Stefan Etschberger Visualize correlation require(corrplot) corrplot(cor.MyCheeseData) Introduction corrplot(cor.MyCheeseData, method="number", order ="AOE", tl.pos="d", type="upper") R and RStudio What is R? What is RStudio? First steps Simple linear 1 regression phone gift email mail revenue 1 phone1 0.19 −0.23 −0.52 0.1 0.8 Multicollinearity phone 0.8 0.6 Supplementary slides 0.6 gift1 0.32 0.57 −0.11 0.4 gift 0.4 0.2 0.2 revenue1 0.74 0.65 0 email 0 −0.2 −0.2 email1 0.37 −0.4 mail −0.4 −0.6 −0.6 mail1 −0.8 revenue −0.8 −1 −1 16 Data analysis, Regression and Beyond: Table of Contents 1 Introduction 2 R and RStudio 3 Revision: Simple linear regression 4 Multicollinearity in Regression 3 Revision: Simple linear regression Example set of data Trend as a linear model Least squares Best solution Variance and information Coefficient of determination R2 is not perfect! Residual analysis Data analysis, Data Regression and Beyond Stefan Etschberger Premier German Soccer League 2008/2009 Etat Punkte FC Bayern 80 67 Given: data for all 18 VfL Wolfsburg 60 69 Introduction clubs in the German SV Werder Bremen 48 45 R and RStudio FC Schalke 04 48 50 Premier Soccer Simple linear VfB Stuttgart 38 64 regression League in the season Hamburger SV 35 61 Example set of data Trend as a linear model 2008/09 Bayer 04 Leverkusen 35 49 Least squares Bor. Dortmund 32 59 Best solution variables: Budget for Hertha BSC Berlin 31 63 Variance and information Coefficient of determination season (only direct 1. FC Köln 28 39 R2 is not perfect! salaries for players) Bor. Mönchengladbach 27 31 Residual analysis TSG Hoffenheim 26 55 Multicollinearity and: resulting table Eintracht Frankfurt 25 33 Supplementary slides points at the end of Hannover 96 24 40 the season Energie Cottbus 23 30 VfL Bochum 17 32 Karlsruher SC 17 29 Arminia Bielefeld 15 28 (Source: Welt) 18 Data analysis, data in scatter plot Regression and Beyond Stefan Etschberger Bundesliga 2008/09 70 VfL Wolfsburg FC Bayern Introduction VfB Stuttgart R and RStudio Hertha BSC Berlin Hamburger SV Simple linear 60 regression Bor. Dortmund Example set of data Trend as a linear model TSG Hoffenheim Least squares Best solution Variance and information FC Schalke 04 50 Bayer 04 Leverkusen Coefficient of determination R2 is not perfect! Punkte Residual analysis SV Werder Bremen Multicollinearity Supplementary slides Hannover 96 40 1.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages48 Page
-
File Size-