Regression Analysis: an Evaluation of the Influences Behind the Pricing of Beer Regressionsanalys: En UtvÄrderingav Influenserna Bakom PrissÄttningen Av Öl

Regression analysis: An evaluation of the influences behind the pricing of beer Regressionsanalys: En utvärderingav influenserna bakom prissättningen av öl Sara Eriksson and Jonas Häggmark Spring semester 2017 i 1 Preface This project is a bachelor thesis in multiple linear regression created by Sara Eriksson and Jonas Häggmarkduring the spring semester in 2017. We wish to thank our mentor Pierre Nyquist, for all of your advises and guiding throughout the project. ii 2 Abstract This bachelor thesis in applied mathematics is an analysis of which factors affect the pricing of beer at the Swedish market. A multiple linear regression model is created with the statistical programming language R through a study of the influences for several explanatory variables. For example these variables include country of origin, beer style, volume sold and a Bayesian weighted mean rating from RateBeer, a popular website for beer enthusiasts. The main goal of the project is to find significant factors and, as follows directly, a significant model without any influence of multicollinearity. The regression analysis is based on a data set with 1413 observations which represent beers that sold over 1000 liters, among further restrictions, and is created from Systembolaget's sale statistics for 2016 and Ratebeer. This number of observations represents 43% of Systembolaget's total assortment of beer. The model is developed through a thorough residual analysis, transformations of variables, de- termination of multicollinearity and a validation of the absence of outliers and high leverage points. All of these in favor for significance at a level of 95%. In addition to the regression model, two submodels with associated box plots for the variable groups Country of Origin and Beer Style are created for analyzing the importance of these variables amongst each other. A k-fold cross validation study and three different variable selections are carried out for further adequacy checking, these are also given as recommendations for continued analysis. The result shows that there are several different factors that affect the pricing of beer. For example, higher alcohol by volume, sour beers and beers from New Zealand yields a higher price while beers with high sales, lagers and Austrian beers show a negative tendency for the price. The result can be used as an example of the influences behind the pricing of beer in Sweden. The first model in the analysis has 41 explanatory variables and in the final model the number of explanatory variables is reduced to 20 where all are significant. iii 3 Sammanfattning Detta kandidatexamensarbete i tillämpadmatematik ären analys av vilka faktorer som p˚averkar prissättningenp˚aölp˚aden svenska marknaden. En multipel regressionsmodell har skapats med det statiska programmeringsspr˚aket R genom en studie av influenserna förett antal regressorvariabler. Dessa variabler inkluderar bland andra ursprungsland, ölstil,s˚aldvolym och ett Bayesiskt viktat medelvärdefr˚anRateBeer, vilket ären populärhemsida förölentusiaster. Huvudm˚aletmed projek- tet äratt finna signifikanta faktorer och, som d˚amedföljer,en signifikant modell utan n˚agoninfluens av multikolinjäritet. Regressionsanalysen ärbaserad p˚aen uppsättning data för1413 observationer representerande de ölsom s˚altmer än1000 liter, bland ytterligare restriktioner, och ärskapad fr˚anSystembolagets försäljningsstatistikfr˚an2016 och RateBeer. Detta antal observationer representerar 43% av Sys- tembolagets totala ölutbud. Modellen ärutvecklad genom en grundlig residualanalys, transformationer av variabler, bestämning av multikolinjäritetsamt en validering av fr˚anvaron av avvikande värdenoch punkter med högt inflytande. Allt detta föratt n˚aen signifikansniv˚ap˚a95%. Utöver regressionsmodellen s˚askapas tv˚asubmodeller med tillhörandel˚addiagramförvariabelgrupperna Ursprungsland och Olstil¨ föratt analysera betydelsen av dessa variabler sinsemellan. En k-fold korsvalideringsstudie samt tre olika variabelselektioner utförsförvidare lämplighetskontroll och dessa ges även som rekommendationer förfortsatt analys. Resultatet visar att det finns flera olika faktorer som p˚averkar prissättningenp˚aöl. Exempelvis ger ökad alkoholhalt, syrliga öloch ölfr˚anNya Zeeland ett högrepris samtidigt som högförsäljning, lageröloch österrikiskölvisar en negativ tendens förpriset. Resultatet kan användassom ett ex- empel p˚ainfluenserna bakom prissättningenav öli Sverige. Den förstamodellen i analysen har 41 regressorer och i den slutliga modellen har antalet regressorer reducerats till 20 däralla ärsignifikanta. iv List of Figures 1 Monks have a great historical influence on the art of brewing. .2 2 Survey carried out by SurveyMonkey. .3 3 Summary of the final cost of a craft beer due to each expense. .4 4 Normal probability and histogram plot for the residuals. .8 5 Ordinary and scaled residuals against the predicted values. .9 6 The model residuals versus the fitted values for the original model. 15 7 Normal Q-Q plot and histogram for the original model. 15 8 Scale-location plot for the original model. 16 9 Leverage plot for the original model. 16 10 Logarithmic transformation of the response variable and the variable representing item price. 17 11 Logarithmic transformation of the response variable and the variable representing volume in ml. 18 12 Second model, developed with logarithmic transformations. 18 13 Model developed through multicollinearity analysis. 19 14 Model B, developed through analysis of dummy variables. 21 15 Outliers and high leverage points. 22 16 Predicted price against actual price, original model. 23 17 Predicted price against actual price, final model. 23 18 Submodel country of origin. 25 19 Normal probability plots. 26 20 Submodel beer styles. 26 21 Countries of origin. 27 22 Beer styles. 28 23 Package. 28 24 Organic. 28 25 In stock. 29 26 Rated. 29 27 Final model. 31 v Contents 1 Preface ii 2 Abstract iii 3 Sammanfattning iv 4 Introduction 1 4.1 Background . .1 4.1.1 The History of Beer . .1 4.2 Purpose . .2 4.3 Problem Definition . .2 4.4 Data Set . .3 4.5 Problem Restrictions . .3 4.6 Literature Analysis . .3 5 Mathematical Theory 5 5.1 Multiple Linear Regression . .5 5.1.1 Heteroscedasticity . .5 5.2 Residual Analysis . .6 5.3 Model Adequacy . .7 5.3.1 Residual Plots . .8 5.4 Outliers and High Leverage Points . .9 5.5 Transformations of Variables . 10 5.6 Multicollinearity . 10 5.7 Variable Selection . 10 5.8 Cross Validation . 11 6 Data Set 12 6.1 Restrictions in Data . 12 6.2 List of Variables . 12 6.3 About RateBeer . 14 7 Analysis and Model Development 15 7.1 Residual Analysis . 15 7.2 Transformations of Variables . 17 7.3 Multicollinearity . 19 7.4 Analysis of Significance . 20 7.5 Outliers and High Leverage Points . 21 7.6 Cross Validation . 22 7.7 Variable Selection . 24 8 Submodels 25 8.1 Country of Origin . 25 8.2 Beer Styles . 26 8.3 Box Plots . 27 vi 9 Results 30 9.1 Final Model . 30 9.2 Submodels . 32 9.3 Calculation Example . 33 10 Discussion 35 10.1 Analysis of Variables . 35 10.1.1 Quantitative Variables . 35 10.1.2 Dummy Variables Except Beer Styles and Countries of Origin . 35 10.1.3 Beer Styles . 36 10.1.4 Countries of Origin . 37 10.2 Looking Back at the Literature Analysis . 37 10.3 Recommendations . 38 11 Appendix 40 11.1 Variable Selection Tables . 40 11.1.1 Best Subset Selection . 40 11.1.2 Forward Subset Selection . 41 11.1.3 Backward Subset Selection . 42 vii 4 Introduction In this project a study about how different factors affect the pricing of beers at Systembolaget during 2016 will be carried out. The study will analyze a multiple linear regression model with several influential parameters. 4.1 Background The government in Sweden has a monopoly for alcohol sales through Systembolaget. This means that Systembolaget is the only store in the country allowed to sell alcoholic beverages above 3.5% alcohol by volume. Since the beer industry is immensely trendy as of when this report is written, a closer look at this branch is interesting. The trend is hugely influenced by the great increase in craft beers and the current number of breweries in Sweden is the greatest through history. These breweries vary greatly in size, with the smallest having just a few workers while the biggest companies work on an international scale. The beer market has been expanding largely for the last ten years and is still expanding due to demand and a growing interest. The trend suggests that this expansion will keep on for at least another decade and therefore an analysis of which parameters actually matters for the pricing of beer can be helpful when examining the market. To see which factors determine the pricing of beer in Sweden today, as well as their level of influence, a multiple regression model will be created and analyzed. 4.1.1 The History of Beer About 10.

Load more