Regression Analysis: an Evaluation of the Influences Behind the Pricing of Beer Regressionsanalys: En Utv¨Arderingav Influenserna Bakom Priss¨Attningen Av ¨Ol
Total Page:16
File Type:pdf, Size:1020Kb
Regression analysis: An evaluation of the influences behind the pricing of beer Regressionsanalys: En utv¨arderingav influenserna bakom priss¨attningen av ¨ol Sara Eriksson and Jonas H¨aggmark Spring semester 2017 i 1 Preface This project is a bachelor thesis in multiple linear regression created by Sara Eriksson and Jonas H¨aggmarkduring the spring semester in 2017. We wish to thank our mentor Pierre Nyquist, for all of your advises and guiding throughout the project. ii 2 Abstract This bachelor thesis in applied mathematics is an analysis of which factors affect the pricing of beer at the Swedish market. A multiple linear regression model is created with the statistical programming language R through a study of the influences for several explanatory variables. For example these variables include country of origin, beer style, volume sold and a Bayesian weighted mean rating from RateBeer, a popular website for beer enthusiasts. The main goal of the project is to find sig- nificant factors and, as follows directly, a significant model without any influence of multicollinearity. The regression analysis is based on a data set with 1413 observations which represent beers that sold over 1000 liters, among further restrictions, and is created from Systembolaget's sale statistics for 2016 and Ratebeer. This number of observations represents 43% of Systembolaget's total assortment of beer. The model is developed through a thorough residual analysis, transformations of variables, de- termination of multicollinearity and a validation of the absence of outliers and high leverage points. All of these in favor for significance at a level of 95%. In addition to the regression model, two submodels with associated box plots for the variable groups Country of Origin and Beer Style are created for analyzing the importance of these variables amongst each other. A k-fold cross validation study and three different variable selections are carried out for further adequacy checking, these are also given as recommendations for continued analysis. The result shows that there are several different factors that affect the pricing of beer. For ex- ample, higher alcohol by volume, sour beers and beers from New Zealand yields a higher price while beers with high sales, lagers and Austrian beers show a negative tendency for the price. The result can be used as an example of the influences behind the pricing of beer in Sweden. The first model in the analysis has 41 explanatory variables and in the final model the number of explanatory variables is reduced to 20 where all are significant. iii 3 Sammanfattning Detta kandidatexamensarbete i till¨ampadmatematik ¨aren analys av vilka faktorer som p˚averkar priss¨attningenp˚a¨olp˚aden svenska marknaden. En multipel regressionsmodell har skapats med det statiska programmeringsspr˚aket R genom en studie av influenserna f¨orett antal regressorvariabler. Dessa variabler inkluderar bland andra ursprungsland, ¨olstil,s˚aldvolym och ett Bayesiskt viktat medelv¨ardefr˚anRateBeer, vilket ¨aren popul¨arhemsida f¨or¨olentusiaster. Huvudm˚aletmed projek- tet ¨aratt finna signifikanta faktorer och, som d˚amedf¨oljer,en signifikant modell utan n˚agoninfluens av multikolinj¨aritet. Regressionsanalysen ¨arbaserad p˚aen upps¨attning data f¨or1413 observationer representerande de ¨olsom s˚altmer ¨an1000 liter, bland ytterligare restriktioner, och ¨arskapad fr˚anSystembolagets f¨ors¨aljningsstatistikfr˚an2016 och RateBeer. Detta antal observationer representerar 43% av Sys- tembolagets totala ¨olutbud. Modellen ¨arutvecklad genom en grundlig residualanalys, transformationer av variabler, best¨amning av multikolinj¨aritetsamt en validering av fr˚anvaron av avvikande v¨ardenoch punkter med h¨ogt inflytande. Allt detta f¨oratt n˚aen signifikansniv˚ap˚a95%. Ut¨over regressionsmodellen s˚askapas tv˚asubmodeller med tillh¨orandel˚addiagramf¨orvariabelgrupperna Ursprungsland och Olstil¨ f¨oratt analysera betydelsen av dessa variabler sinsemellan. En k-fold korsvalideringsstudie samt tre olika variabelselektioner utf¨orsf¨orvidare l¨amplighetskontroll och dessa ges ¨aven som rekommendationer f¨orfortsatt analys. Resultatet visar att det finns flera olika faktorer som p˚averkar priss¨attningenp˚a¨ol. Exempelvis ger ¨okad alkoholhalt, syrliga ¨oloch ¨olfr˚anNya Zeeland ett h¨ogrepris samtidigt som h¨ogf¨ors¨aljning, lager¨oloch ¨osterrikisk¨olvisar en negativ tendens f¨orpriset. Resultatet kan anv¨andassom ett ex- empel p˚ainfluenserna bakom priss¨attningenav ¨oli Sverige. Den f¨orstamodellen i analysen har 41 regressorer och i den slutliga modellen har antalet regressorer reducerats till 20 d¨aralla ¨arsignifikanta. iv List of Figures 1 Monks have a great historical influence on the art of brewing. .2 2 Survey carried out by SurveyMonkey. .3 3 Summary of the final cost of a craft beer due to each expense. .4 4 Normal probability and histogram plot for the residuals. .8 5 Ordinary and scaled residuals against the predicted values. .9 6 The model residuals versus the fitted values for the original model. 15 7 Normal Q-Q plot and histogram for the original model. 15 8 Scale-location plot for the original model. 16 9 Leverage plot for the original model. 16 10 Logarithmic transformation of the response variable and the variable representing item price. 17 11 Logarithmic transformation of the response variable and the variable representing volume in ml. 18 12 Second model, developed with logarithmic transformations. 18 13 Model developed through multicollinearity analysis. 19 14 Model B, developed through analysis of dummy variables. 21 15 Outliers and high leverage points. 22 16 Predicted price against actual price, original model. 23 17 Predicted price against actual price, final model. 23 18 Submodel country of origin. 25 19 Normal probability plots. 26 20 Submodel beer styles. 26 21 Countries of origin. 27 22 Beer styles. 28 23 Package. 28 24 Organic. 28 25 In stock. 29 26 Rated. 29 27 Final model. 31 v Contents 1 Preface ii 2 Abstract iii 3 Sammanfattning iv 4 Introduction 1 4.1 Background . .1 4.1.1 The History of Beer . .1 4.2 Purpose . .2 4.3 Problem Definition . .2 4.4 Data Set . .3 4.5 Problem Restrictions . .3 4.6 Literature Analysis . .3 5 Mathematical Theory 5 5.1 Multiple Linear Regression . .5 5.1.1 Heteroscedasticity . .5 5.2 Residual Analysis . .6 5.3 Model Adequacy . .7 5.3.1 Residual Plots . .8 5.4 Outliers and High Leverage Points . .9 5.5 Transformations of Variables . 10 5.6 Multicollinearity . 10 5.7 Variable Selection . 10 5.8 Cross Validation . 11 6 Data Set 12 6.1 Restrictions in Data . 12 6.2 List of Variables . 12 6.3 About RateBeer . 14 7 Analysis and Model Development 15 7.1 Residual Analysis . 15 7.2 Transformations of Variables . 17 7.3 Multicollinearity . 19 7.4 Analysis of Significance . 20 7.5 Outliers and High Leverage Points . 21 7.6 Cross Validation . 22 7.7 Variable Selection . 24 8 Submodels 25 8.1 Country of Origin . 25 8.2 Beer Styles . 26 8.3 Box Plots . 27 vi 9 Results 30 9.1 Final Model . 30 9.2 Submodels . 32 9.3 Calculation Example . 33 10 Discussion 35 10.1 Analysis of Variables . 35 10.1.1 Quantitative Variables . 35 10.1.2 Dummy Variables Except Beer Styles and Countries of Origin . 35 10.1.3 Beer Styles . 36 10.1.4 Countries of Origin . 37 10.2 Looking Back at the Literature Analysis . 37 10.3 Recommendations . 38 11 Appendix 40 11.1 Variable Selection Tables . 40 11.1.1 Best Subset Selection . 40 11.1.2 Forward Subset Selection . 41 11.1.3 Backward Subset Selection . 42 vii 4 Introduction In this project a study about how different factors affect the pricing of beers at Systembolaget during 2016 will be carried out. The study will analyze a multiple linear regression model with several influential parameters. 4.1 Background The government in Sweden has a monopoly for alcohol sales through Systembolaget. This means that Systembolaget is the only store in the country allowed to sell alcoholic beverages above 3.5% alcohol by volume. Since the beer industry is immensely trendy as of when this report is written, a closer look at this branch is interesting. The trend is hugely influenced by the great increase in craft beers and the current number of breweries in Sweden is the greatest through history. These breweries vary greatly in size, with the smallest having just a few workers while the biggest companies work on an international scale. The beer market has been expanding largely for the last ten years and is still expanding due to demand and a growing interest. The trend suggests that this expansion will keep on for at least another decade and therefore an analysis of which parameters actually matters for the pricing of beer can be helpful when examining the market. To see which factors determine the pricing of beer in Sweden today, as well as their level of influence, a multiple regression model will be created and analyzed. 4.1.1 The History of Beer About 10.