DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, 2016

Analyzing the Factors that Drive Sales in Sweden A Regression Analysis

IDA PALMGREN

EMMA SOC DESCHAECK

KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ENGINEERING SCIENCES

Analyzing the Factors that Drive Wine Sales in Sweden

A Regression Analysis

IDA PALMGREN EMMA SOC DESCHAECK

Degree Project in Applied Mathematics and Industrial Economics (15 credits) Degree Progr. in Industrial Engineering and Management (300 credits) Royal Institute of Technology year 2016 Supervisors at KTH: Thomas Önskog, Jonatan Freilich Examiner: Henrik Hult

TRITA-MAT-K 2016:27 ISRN-KTH/MAT/K--16/27--SE

Royal Institute of Technology SCI School of Engineering Sciences

KTH SCI SE-100 44 Stockholm, Sweden

URL: www.kth.se/sci

Sammanfattning Det h¨arkandidatexamensarbetet i till¨ampadmatematisk statistik och industriell ekonomi har analyserat vilka faktorer som ¨okar l¨onsamhetenf¨orvinproducenter p˚aden svenska vinmark- naden. Denna rapport har till¨ampats˚av¨alett kvantitativt som kvalitativt tillv¨agag˚angss¨attf¨or att besvara de uppsatta fr˚agest¨allningarna.F¨oratt finna de signifikanta faktorer som p˚averkar f¨ors¨aljningenav vin har en en linj¨arregressionsmodell till¨ampats.Regressionen har baserats p˚a data h¨amtad fr˚anSystembolagets f¨ors¨aljningsstatistikfr˚an2015. Vidare har en analys utf¨ortsur ett ekonomiskt perspektiv, baserat p˚ateorier av fr¨amstHumphrey och Kotler, f¨oratt identifiera den optimala marknadsf¨oringsstrateginsom kan ¨oka l¨onsamhetenf¨orvinproducenter. Resultaten av den kvantitativa analysen har visat att det finns m˚angaolika faktorer som p˚averkar f¨ors¨aljningenav r¨ottoch vitt vin. F¨orde r¨odavinerna ¨arde mest framst˚aendefaktorerna smaktyper och vissa druvor som ¨okar f¨ors¨aljningen.F¨orde vita vinerna ¨arfaktorerna smaktyper och ursprungsland. Lika f¨orde b˚adavintyperna ¨aratt antal butiker ¨okar f¨ors¨aljningensamtidigt som priset p˚averkar f¨ors¨aljningennegativt. Resultatet av den kvalitativa analysen har belyst vikten av att f¨orst˚avilken marknad man handlar i och vilka verktyg man har till sitt f¨orfogandevid marknadsf¨oring. Anv¨andningen av teorier s˚asomSWOT-analys och 4P:n ¨aren viktig del i att kunna ¨oka sin f¨ors¨aljning och l¨onsamhet.Sammanfattningsvis b¨orn¨amnasatt en kombination av b˚adeden kvantitativa och kvalitativa analysen kan erbjuda vinproducenter en stor insyn i den svenska vinmarknaden, s˚av¨alsom erbjuda de r¨attaverktygen f¨oratt ¨oka sin f¨ors¨aljning.

Abstract This thesis is a combined study of applied mathematical statistics and industrial engineering and management implemented to analyze and determine the success factors in the Swedish wine market to increase the profitability of wine producers. Hence, this thesis applied both a quantitative and qualitative approach to address this aim. From a mathematical perspective, a linear regression model was applied on red and white wine to identify how wine characteristics affect the number of bottles sold. The regression is based on data obtained from Systembolaget, consisting of the sales statistics from 2015. From an industrial engineering and management perspective, the theories of Humphery and Kotler and Armstrong are applied on the wine market to obtain further sources of profitability for the wine producers. The results of the quantitative analysis showed that there are different factors affecting the sales of red and white wine. For red wine the most important positive factors were taste category and certain . For white wine the success factors were taste category and country of origin. Common for both red and white wine were that the number of stores increased the sales, while the price impacted the sales negatively. The results of the qualitative analysis revealed the importance of understanding target market and customer as well as having a distinct value proposition. Furthermore, the usage of theories, such as SWOT analysis and the 4Ps, demonstrated the significance of using different types of tools to increase sales and profitability. To conclude, a combination of the quantitative and the qualitative analyses this thesis presents, may offer the wine producers a greater insight to the Swedish market, and as well the proper tools to increase their sales.

Contents

1 Introduction 1 1.1 Background ...... 1 1.2 Aim ...... 1 1.3 Research Questions ...... 2

2 Mathematical Theory 3 2.1 The Multiple Regression Analysis Model ...... 3 2.1.1 Definition and Terminology ...... 3 2.1.2 Dummy Variables ...... 4 2.1.3 Important Assumptions ...... 4 2.1.4 Ordinary Least Squares Estimation ...... 5 2.2 Reducing the Model ...... 5 2.2.1 Akaike Information Criterion ...... 5 2.3 Model Validation ...... 6 2.3.1 Linearity ...... 6 2.3.2 Normality ...... 6 2.3.3 Heteroscedasticity ...... 6 2.3.4 R2 and Adjusted R2 ...... 7 2.3.5 Hypothesis Testing ...... 8 2.3.5.1 F -Statistic and p-value ...... 8 2.4 Errors ...... 9 2.4.1 Multicollinearity ...... 9 2.4.2 Endogeneity ...... 10

3 Method 11 3.1 Data Pre-Processing ...... 11 3.2 Variable Selection ...... 11 3.3 The Initial Model: Red Wine ...... 13 3.3.1 Model Validation ...... 15 3.3.2 Evaluating the R2, F -statistic and p-value ...... 17 3.3.3 Reducing the Model ...... 17 3.4 The Initial Model: White Wine ...... 18 3.4.1 Model Validation ...... 20 3.4.2 Evaluating the R2, F -statistic and p-value ...... 22 3.4.3 Reducing the Model ...... 22

4 Results 24 4.1 The Final Model: Red Wine ...... 24 4.1.1 Model Validation ...... 25 4.1.2 Evaluating the R2, F -statistic and p-value ...... 28 4.2 The Final Model: White Wine ...... 28 4.2.1 Model Validation ...... 29 4.2.2 Evaluating the R2, F -statistic and p-value ...... 32

5 Discussion 33 5.1 Analysis of Results ...... 33 5.1.1 Red Wine ...... 33 5.1.2 White Wine ...... 34 5.2 Limitations ...... 36

6 Case Study 37 6.1 Introduction ...... 37 6.1.1 Aim and Purpose ...... 37 6.1.2 Research Question ...... 37 6.2 SWOT Analysis ...... 38 6.3 The Marketing Mix - The 4Ps ...... 39 6.4 Recommendations ...... 42 6.5 Criticism ...... 43 6.5.1 The Social Consequences of Marketing of Alcohol ...... 44

7 Conclusion 46 1 Introduction

1.1 Background

Wine has quickly become an important part of the Swedish society, where many see wine as an essential part of cuisine, culture and social gatherings. This wine culture has resulted in a continuous increase in the amount of wine sold in Sweden over the last couple of decades. This growth has been most significant for above the retail price of 99 SEK (R¨ott¨orp,2014). Due to the high demand, the market has seen an increase in the number of wines on the market, creating a more competitive environment for wine producers. It is therefore imperative for wine producers to enhance their knowledge on what makes a particular wine popular. Dr. Steve Goodman from the University of Adelaide stated that it is not ultimately about a specific preference of wine, but what wine is readily available when the consumer demands it (Hebbeln, 2014). However, other studies stress that it is the combination of the wine’s characteristics, such as aroma and taste, together with the bottle’s design and label that are important for a consumer when choosing a particular wine (Bisson, 2002). It is therefore interesting to determine what factors are significant for the consumer.

Even with the best knowledge it is hard for wine producers to affect the sales in Sweden since the market is regulated with only one retail outlet, known as Systembolaget. Systembolaget is a government monopoly for off-premise retail of all alcoholic beverages containing more than 3.5 % alcohol by volume. There are approximately 400 Systembolaget stores and the legal age limit for buying alcoholic beverages is 20 years. The purpose of this monopoly is not to increase sales, but to limit the social consequences associated with alcohol, such as health issues and violence (Systembolaget, 2016b).

Along with this regulated market, Sweden has had a strict marketing policy on alcohol. Before 2003, there was a complete ban on it for many years. This marketing policy was then changed, in 2003, to a more liberal one, but it still includes many restrictions (IOGT-NTO, 2016). For example, the advertising of alcohol may not encourage people to drink and it cannot be directed at or depict people below 25 years of age (Riksdagen, 2010). Even with a limited allowance of advertisement, studies show that advertising can increase sales of alcohol between 5 to 8 % (Holder, 2008). Therefore it is interesting to analyze how wine producers can further increase the sales of their wines when applying an analysis of the market and marketing tools.

1.2 Aim

The aim of this thesis is to analyze what factors drive the sales of red and white wine in Sweden. This will be done by obtaining a regression model that can predict the sales of a particular wine using its characteristics. These sales are restricted to the ones from Systembolaget and consists of

1 red and white wine. Furthermore, only wines above the retail price of 99 SEK are included in this analysis. This model will be relevant for potential new competitors as well as already established wine producers on the market. Recommendations and suggestions will be provided to wine producers who sell their wine in Sweden and question which factors are important and affect the amount of wine sold. Consequently, the prediction model obtained will serve as a guideline for wine producers on the market on how to create wines to optimize their future sales.

Furthermore, a market analysis will be done and marketing theories will be presented to offer an an insight into potential strategies for wine producers to further increase their profitability. As such, the aim of this thesis is to give wine producers a complete oversight of the market in which they operate to give them a competitive edge.

1.3 Research Questions

To answer the aims of this thesis, the research questions that will be addressed are:

• What factors drive the sales of red and white wine in Sweden?

• Do these factors differ between red and white wine?

• How can the wine producers increase their sales and profitability by better understanding their market and using the right marketing tools?

2 2 Mathematical Theory

This thesis will use regression analysis to identify the impact of different factors on the wine sales in Sweden. These factors will be referred to as covariates. The analysis will be used to create a predictive linear regression model for how much a certain wine can be expected to sell according to its specifications (Lang, 2015, p. 4). For this model to be valid, there are five assumptions that need to be met. If these are met, the Ordinary Least Squares estimator will give the best estimate for the regression (Kennedy, 2008, p. 40).

2.1 The Multiple Regression Analysis Model

2.1.1 Definition and Terminology

The linear regression model has the following specification:

k X yi = xijβj + ei i = 1, ..., n (1) j=0

In this equation, yi is an observation of the dependent random variable y, x.js are the value of the covariates, and the last term ei is the residual (Lang, 2015, p. 3). The unknown coefficients βj are estimated from the data in the regression.

When handling a multivariate regression, it is often more convenient to employ the matrix notation of equation 1:

Y is an n × 1-vector of random variables:

  y1  .  Y =  .    yn

X is an n × (k + 1)-matrix consisting of the covariates:

  1 x11 ··· x1k . . . .  X = . . .. .    1 xn1 ··· xnk

3 β is a (k + 1) × 1-vector of random variables:

  β0  .  β =  .    βk

Finally, e is an n × 1-matrix of random variables:

  e1  .  e =  .    en

2.1.2 Dummy Variables

When a covariate is qualitative, e.g., the type of , one must use a proxy to represent them in the regression. This is done by using a dummy variable, an artificial variable that is constructed such that it takes on the value 1 if true and 0 otherwise (Kennedy, 2008, p. 232). It is important to note that one of the dummies must be omitted from the regression to leave one category to which the other categories are compared to. This is done to avoid perfect multicollinearity. This omitted dummy variable is known as the benchmark of the regression (Kennedy, 2008, p. 234).

2.1.3 Important Assumptions

In ”A Guide to Econometrics”, Kennedy (2008, p. 41-42) states that the linear regression model consists of five basic assumptions that need to be true for how the data is generated.

1. The first assumption of the linear regression model is that the dependent variable, Y , can be written as a linear combination of the independent variables, i.e. the covariates, plus a residual term e.

2. The second assumption is that E(e) = 0, i.e. the mean of the distribution from which the error term is drawn is zero. If this assumption is not met, the estimator of β is biased.

3. The third assumption is that E(e2) = σ2, i.e. all the error terms have the same variance. This is known as a homoscedastic regression. If this is not true, the regression is heteroscedastic and the error terms do not all have the same variance. Another violation is autocorrelated errors and this occurs when the disturbances are correlated with one another.

4 4. The fourth assumption is that the covariates are regarded as fixed in repeated samples, i.e. deterministic. A violation to this assumption is errors in variables, i.e. errors in measuring the covariates.

5. The fifth assumption is that there cannot be fewer observations than the number of covariates and that there is no exact linear relationship between the covariates. If this is not true, multicollinearity is present in the regression.

2.1.4 Ordinary Least Squares Estimation

If the five assumptions mentioned above are met, the Ordinary Least Squares (OLS) estimator is guaranteed to be the optimal estimator (Kennedy, 2008, p. 40). The OLS estimator of β is denoted by βˆ and it is the value that minimizes the sum of squares eˆT eˆ = |eˆ|2 of the residuals, eˆ = Y − Xβˆ (Lang, 2015, p. 5). Here, eˆ is the estimate of error term e in the regression.

It is a well known fact that the OLS estimate of β is given by solving for βˆ in the normal equations XT eˆ = 0. This generates the estimator as (Lang, 2015, p. 5):

βˆ = (XT X)−1XT Y (2)

This estimate of β is unbiased under the assumptions of the linear regression model, i.e. E(βˆ) = β (Lang, 2015, p. 5).

Finally, an expression for the covariance matrix of the estimator of β is given as:

Cov(βˆ) = (XT X)−1σ2 (3)

2.2 Reducing the Model

2.2.1 Akaike Information Criterion

When choosing what covariates should be part of the model, one can use the Akaike Information Criterion test (AIC) (Lang, 2015, p. 21). Unlike hypothesis testing the AIC does not test a model according to a null hypothesis. Instead, it reveals what approximate model will best minimize the information loss relative to the ”true model”, and therefore give the most accurate model (Lang, 2015, p. 22). There are different ways to use AIC, but in this thesis, an approach using partial eta-squared, η2, will be used. AIC informs that a covariate should be deleted if the corresponding 2 −2r η is less than 1 − e n (Lang, 2016, p. 2). Here, n is the number of observations, and r is the number of restrictions. The process can be broken down into the following steps:

5 −2r 1. Calculate 1 − e n for the full model with r = 1.

2 −2r 2. Delete the covariate if η < 1 − e n . If this is true for more than one covariate, delete the one with the smallest η2.

3. Redo the regression without this covariate and calculate each covariate’s η2.

4. Repeat until there is no η2 for which the criteron is true.

2.3 Model Validation

An important part of creating an accurate prediction model is model validation. Firstly, this type of analysis verifies that the assumptions of the multiple regression model are met. Furthermore, it also includes different statistical tools to examine how well the model actually represents the data. This is done to evaluate to what extent the model can be used to predict the wine sales, and hence be considered an accurate model.

2.3.1 Linearity

Violations to the first assumption are omitting relevant variables or including irrelevant variables, or when the relationship between the dependent variable and the covariates is not linear. To verify this assumption, one can plot the residuals of the regression model against the fitted values. This plot is called Residuals vs Fitted. This plot should resemble a straight line across the observations so that there is no systematic relationship between the residuals and the fitted values. This means that the model efficiently captures the systematic variance present in the data and hence linearity between the dependent variable and the covariates is confirmed (Kabacoff, 2011, p. 190).

2.3.2 Normality

In order to verify the second assumption, i.e. the error terms are normally distributed with a mean of zero, one can use a probability plot of the standardized residuals against the theoretical quantiles. This plot is called a Normal Q-Q plot. If the assumption is true, the points should be along a straight 45-degree line (Kabacoff, 2011, p. 190).

2.3.3 Heteroscedasticity

The error heteroscedasticity is a violation of the third assumption. This occurs if the error terms do not all have the same variance, in contrast to a homoscedastic regression. The OLS estimator will still be unbiased but there are other consequences that needs to be considered with heteroscedasticity.

6 The most important consequence is that the covariance matrix of the βˆ is inconsistent and, as a result, hypothesis testing using the F-test becomes invalid (Lang, 2015, p. 17). A remedy to this problem is using a robust covariance matrix, known as White’s consistent variance estimator (Lang, 2015, p. 18). This eliminates the asymptotic bias and hence, hypothesis testing can once again be considered valid. Furthermore, another consequence of heteroscedasticity is that the OLS estimator no longer is the most efficient estimator.

One can detect heteroscedasticity by looking at the covariance matrix of βˆ, as the diagonal terms each correspond to variance of that error term (Kennedy, 2008, p. 112). Another way to detect heteroscedasticity is to look at the scatter plot of the residuals against each covariate, this is known as the eyeball test (Kennedy, 2008, p. 116). If the absolute magnitude of the residuals appears to be on average the same along the axis, heteroscedasticity is most likely not present. Below are two graphs showing the difference between a homoscedastic and a heteroscedastic regression.

(a) Homoscedasticity (b) Heteroscedasticity

Figure 1: Eyeball test

In this thesis, a Scale-Location plot will be used to detect potential heteroscedasticity. There should be no discernible pattern in the plot, i.e. the variability of the residuals should not change along the range of the dependent variable. This is similar to the eyeball test mentioned above, but the difference is that one does not have to plot every covariate separately.

2.3.4 R2 and Adjusted R2

When using OLS as the estimator for the regression, the sum of the squared errors of the dependent 2 variable about its mean, i.e. the total variation |eˆ∗| , can be broken into two parts, the ”explained” variation and the ”unexplained” variation. The ”explained” variation is the sum of squared errors of the estimated values of the dependent variable around their mean. The ”unexplained variation” is the sum of squared residuals, |eˆ|2. This is meaningful when one wants to use the coefficient of determination R2, as it is a representation of the variation in the dependent variable ”explained” by the variation in the independent variable (Kennedy, 2008, p. 13-14). The coefficient of determination

7 has the following equation (Lang, 2015, p. 8):

2 2 2 |eˆ∗| − |eˆ| R = 2 (4) |eˆ∗|

However, using R2 can be problematic when determining the number of covariates that should be included in the regression, as adding a covariate can never cause the R2 to fall. This could lead to adding to many covariates to the regression that do not actually contribute to the accuracy of the regression. Therefore, one needs to adjust the R2 for the degrees of freedom to obtain the adjusted R2, denoted R¯2, to avoid this problem (Kennedy, 2008, p. 79).

(1 − R2)(n − 1) R¯2 = 1 − (5) n − p − 1 n is the number of observations and p is the number of covariates in the regression. When using R¯2, one should only consider to include a covariate if it causes R¯2 to increase (Kennedy, 2008, p. 80).

2.3.5 Hypothesis Testing

Hypothesis testing is employed to determine whether there is enough evidence in a data sample to infer that a certain hypothesis is true for the entire population (Minitab, 2016). A hypothesis test consists of two opposing hypotheses, the null hypothesis, H0, and the alternative hypothesis,

H1. After formulating the test, one must choose a test statistic to employ. A test statistic is a standardized value calculated from the data and is used to determine whether to reject the null hypothesis (Minitab, 2016). There are several different test statistics that can be used, this thesis will use the F -statistic.

2.3.5.1 F -Statistic and p-value

The F -statistic is an aid for calculating the signifcance for every specific covariate. The null hy- pothesis tested is that an estimator is equal to zero, i.e. B1 = 0. The p-value generated under this null hypothesis reveals the significance of each covariate in the regression. The equation for the F -statistic reads as follows (Lang, 2015, p. 8):

!2 βˆ − β F = j j (6) ˆ Std.Error(βj )

8 Consequently, from this the p-value for the hypothesis is:

P r(F (1, n − k − 1) > F ) = p − value (7)

In this thesis, a significance level is chosen at 90 percent. This means that any covariate with a p-value higher than 0.1 will be removed, as the null hypothesis fails to be rejected.

The F -statistic can also be calculated for the total model to see its overall significance. It is executed in a similar manner, except the null hypothesis now includes r estimators equal to zero. If the null hypothesis is rejected, the corresponding covariates are implied to be significant. This result is found when the F -statistic is large and hence, the overall p-value is low (Lang, 2015, p. 10).

The formula for the F -statistic for the total model is:

1 T −1 F = βˆ Vˆ βˆ (8) r

Where βˆ is a column matrix consisting of the estimators tested for zero and Vˆ is the covariance matrix for these estimators, see section 2.1.4 (Lang, 2015, p. 10).

2.4 Errors

2.4.1 Multicollinearity

If there is a relationship between the covariates, the phenomenon multicollinearity occurs and the fifth assumption for the linear regression model is violated. The major consequence of the presence of multicollinearity is that the variance of the OLS estimates become large (Kennedy, 2008, p. 192- 193). These large variances result in the covariate estimates not being precise and as a result any hypothesis testing becomes invalid (Kennedy, 2008, p. 194).

There are multiple ways to detect multicollinearity. One way is to notice when a covariate’s sign is not the same as what was hypothesized. Another way is to use the correlation matrix found in many computer softwares. Furthermore, it can also be detected by using the statistic variance inflation factor, which is the method that will be used in this thesis (Kabacoff, 2011, p. 200). The VIF can be calculated using the following formula:

1 VIF = (9) (1 − R2)

As a general rule of thumb, V IF > 10, indicates a multicollinearity problem (Lang, 2015, p. 55).

9 However, if the regression is intended for prediction purposes one needs to remember that multi- collinearity may not be of concern (Kennedy, 2008, p. 196). This will be further discussed throughout the thesis while checking for multicollinearity.

2.4.2 Endogeneity

The error endogeneity occurs when one of the covariates is correlated with the residual error term in the regression model, expressed mathematically this means that the assumption E(ei) = 0 is violated (Lang, 2015, p. 25). However, this problem is only present when employing a structural interpretation, and as this thesis will use prediction, it is not relevant for this thesis.

10 3 Method

The method in this thesis is based on three important steps. First, the data is pre-processed and narrowed down to only include relevant data for the aim. The data has been collected from Systembolaget and consists of the sales statistics of wine in Sweden from 2015 (Systembolaget, 2015). This data is regarded as deterministic, hence the fourth assumption has been met. Second, two regression analyses are performed using the computer software R to analyze relevant relationships and correlations between the dependent variable and the covariates. These two regression are divided into red and white wine since many factors for them were different. Lastly, the regression is verified to see whether either model violates the remaining four basic linear regression assumptions.

3.1 Data Pre-Processing

The data was collected from the published sale data denoted ”F¨ors¨aljningsstatistik2015” from Systembolaget (2015) for both red and white wine. Common to both types of wine is that the increase in wine sales have been most significant for wines above the retail price of 99 SEK (R¨ott¨orp, 2014). Therefore all wines below 100 SEK were removed in the collected dataset. Furthermore, all wines that did not sell at least one bottle 2015 were removed. Some of the wines were also missing important information such as taste notes and year produced, and when possible these were added from the online database of Systembolaget (2016b). If data was still missing after this step, the wines were removed. For red wine, the data was therefore narrowed down to 287 observations from 563 observations. For white wine, the data was narrowed down to 139 wines from 219 observations. The main reason for this decrease in observations was that many wines lacked a proper taste definition and that they were not available in any store.

3.2 Variable Selection

After processing the raw data, the relevant covariates for the regression were chosen. Below the dependent variable and the covariates are described in more detail.

Dependent Variable The dependent variable, y, in the regression was the number of bottles sold in year 2015. As this thesis attempts to answer what the important factors that drive wine sales are, this was deemed a relevant response variable. Furthermore, since the number of bottles sold is a positive number for all the observations, as well as the fact that it spans over a large range of values, the dependent variable was re-expressed as the logarithmic function of the number of bottles sold.

Covariates

11 • Alcohol Content: The first covariate chosen was the alcohol content. This was deemed relevant as a customer might choose a wine due to its percentage of alcohol.

• Country of Origin: The next covariate chosen was the country of origin of the wine. This was included as it can be deemed to influence how much of a certain wine is sold. This covariate was treated as a dummy.

• Ethical: The next covariate chosen was whether or not the wine was ethical. This means that the production of the wine has been analyzed to make sure that it is done in a socially sustain- able manner, i.e. working conditions and salaries meet specified standards (Systembolaget, 2016b). This might influence a customer’s choice and was therefore included. This covariate was also treated as a dummy.

• Grapes: The type of grape or types of grapes used in a certain wine was also included as dummy variables. This is done to analyse whether or not certain types of grapes are likely to sell more.

• Organic: The next covariate included was a dummy variable for whether or not the wine was organic. To be marked organic the wine must meet strict standards concerning the way its produced and its impact on the environment (Systembolaget, 2016b). This was deemed to have potential relevance for the predictive model as many customers are more aware and might choose one wine over another based on this factor alone.

• Number of Stores: The number of stores was included as a covariate as the number of bottles sold of a certain wine is reasonably affected by its availability.

• Price: Price was included as a covariate as it influences a customer’s choice when buying a wine, and is therefore important to evaluate.

• Taste Category: The taste category attributed to a specific wine describes its dominating attributes. These covariates were treated as dummies. For red wine, there are four taste categories (Systembolaget, 2016b):

– Soft and Berry

– Rough and Nuanced

– Well-Seasoned and Spicy

– Fruity and Flavorsome

For white wine, there are three categories (Systembolaget, 2016b):

– Fresh and Fruity

– Rich and Flavorsome

12 – Flowery

• Taste Notes: The taste notes rate the wine on a scale from 1-12 in three categories. For red wine, these are fullness, roughness and fruitiness. For white wine, these are sweetness, fullness and fruitiness (Systembolaget, 2016b).

• Year: Lastly, year was included as a covariate as some wines develop over time and are therefore more attractive as they become older.

3.3 The Initial Model: Red Wine

First, a regression on red wine was computed. This was done to analyze how different covariates influence the wine sales and with that obtain a regression model for prediciting future sales on red wine. The initial model for the regression of red wine includes all the covariates that were deemed relevant in finding a suitable prediction model. Below follows a table of the estimates obtained for the covariates.

Table 1: Table of the Covariates for the Initial Model of Red Wine.

Covariate Estimate Std.Error p-value Unit Benchmark Alcohol Content -0.1169 0.1030 0.2576 Percentage - Country of Origin Argentina 0.3175 0.4209 0.4514 Dummy - Australia -0.4012 0.5358 0.4548 Dummy - Austria -1.7890 0.5163 0.0006 Dummy - Chile -1.774 0.4950 0.0004 Dummy - France -0.1253 0.4889 0.7980 Dummy - Germany -2.065 0.4888 0.0000 Dummy - Greece 0.3810 0.5492 0.4900 Dummy - Italy -0.3827 0.4576 0.4039 Dummy - Lebanon -0.3479 0.6915 0.6154 Dummy - New Zealand -0.5035 0.5528 0.3633 Dummy - Others - - - - Benchmark Portugal -0.1510 0.4950 0.7606 Dummy - South Africa -0.2867 0.4709 0.5433 Dummy - Spain -1.3213 0.6333 0.0380 Dummy - USA -0.3422 0.5048 0.4986 Dummy - Grapes Barbera 0.4350 0.3902 0.2661 Dummy - 0.1630 0.2249 0.1469 Dummy -

13 Cabernet Sauvignon 0.2150 0.1898 0.2585 Dummy - Carinena 1.1553 0.3275 0.0005 Dummy - Carmenere 0.3291 0.3488 0.3268 Dummy - Cinsault -0.3133 0.4112 0.4470 Dummy - Counoise 0.2442 0.4090 0.5511 Dummy - Corvina 0.8314 0.3836 0.0312 Dummy - Graciano -0.4512 0.6187 0.4665 Dummy - Grenache 0.3682 0.2125 0.0845 Dummy - Malbec -0.9319 0.2919 0.0016 Dummy - Mazuelo 0.9366 0.6578 0.1558 Dummy - Mencia 0.1743 0.6143 0.7768 Dummy - -0.0116 0.2073 0.9553 Dummy - Molinara 0.4672 0.1837 0.0116 Dummy - Montepulciano 0.01911 0.5174 0.9706 Dummy - Mourvedre -0.3835 0.2223 0.0862 Dummy - Nebbiolo 0.6966 0.3000 0.0211 Dummy - Oseleta 0.2771 0.1856 0.1367 Dummy - Petit Verdot -0.3911 0.2307 0.0914 Dummy - 0.1188 0.2499 0.6349 Dummy - Pinotage 0.0358 0.2619 0.8914 Dummy - Rondinella -0.5197 0.3410 0.1289 Dummy - Sangiovese 0.3671 0.1987 0.0660 Dummy - Shiraz 0.1397 0.1503 0.3535 Dummy - Tannat 0.6873 0.2876 0.0177 Dummy - Tempranillo 0.9652 0.4564 0.0355 Dummy - Touriga Nacional 0.2358 0.2496 0.0346 Dummy - Viognier -1.184 0.5293 0.0262 Dummy - Zinfandel 0.01715 0.3414 0.9600 Dummy - Organic Yes -0.5118 0.1855 0.0063 Dummy - No - - - Dummy Benchmark Number of Stores 0.007440 0.0004044 0.000 Quantity - Price -0.005401 0.008932 0.0000 SEK - Taste Category Soft and Berry 0.6429 0.3731 0.0862 Dummy - Rough and Nuanced 0.5847 0.2417 0.0163 Dummy - Well-Seasoned and Spicy 0.8184 0.1738 0.0000 Dummy - Fruity and Flavorsome 0.9936 0.2096 0.0000 Dummy -

14 Missing category - - - Dummy Benchmark Taste Notes Fullness 0.02003 0.08777 0.8197 1-12 - Dryness -0.0417 0.08396 0.6201 1-12 - Fruitiness -0.3120 0.01156 0.0075 1-12 - Year 0.04509 0.2639 0.0889 Number -

3.3.1 Model Validation

Before reducing the model, some tools were used to validate the model. Plots of the initial model are used to analyze whether or not the important assumptions are met.

Figure 2: Residual vs. Fitted Plot for the Initial Model of Red Wine.

As mentioned under the section 2.3.1, the plot above is used to verify assumption 1 of the linear regression model. Since there is no clear evidence of a curved line along the observations, there seems to be a linear relationship between the dependent variable and the covariates. Hence, one can conclude that assumption 1 has been met.

15 Figure 3: Normal Q-Q Plot for the Initial Model of Red Wine.

To verify the second assumption of normality, the graph above is used. Most of the observations appear to fall on the 45-degree straight line. Therefore, the initial model is deemed to meet the second assumption.

Figure 4: Scale-Location Plot for the Initial Model of Red Wine.

This last plot is used to confirm homoscedasticity in the regression, which is the third assumption of the linear regression model. As can be seen above, the variation in the residuals is constant along the range of the fitted values. Hence, homoscedasticity is confirmed and the third assumption is met for the initial model of red wine.

16 3.3.2 Evaluating the R2, F -statistic and p-value

Table 2: Table of R2, F -statistic and p-value for the Initial Model for Red Wine.

R2 0.8123 Adjusted R2 0.7666 F -statistic 17.78 on 56 and 230 DF p-value 0.0000

The R2 is 0.8123 meaning that the model explains 81.23 % of the variation in the data. The adjusted R2, which takes into account the degrees of freedom, shows that the model explains 76.66 % of the variation in the data. The F -statistic for the overall initial model of red wine is fairly high. This is supported by a low p-value suggesting that the overall significance of the model is significant.

3.3.3 Reducing the Model

The next step in obtaining a suitable predictive model is to use the Aikake Information Criterion. This is done to evaluate what covariates could be removed from the regression model to obtain a more accurate model. Following the steps described in section 2.2.1, the procedure was repeated 33 times to get the best model. This means that 33 covariates were removed from the initial model. The number to which all the η2’s were compared to was:

−2r 1 − e n = 0.006944 (10)

After using AIC to reduce the model, covariates with a p-value over the predefined accepted level of 0.1 were removed. Below follows a table in the order of which the covariates were removed according 2 −2r to the criteron that η < 1 − e n as well as a p-value > 0.1.

Table 3: Table of the Removed Covariates for Red Wine.

Covariate η2 p-value 1 Montepulciano 0.00001 0.9706 2 Zinfandel 0.00001 0.9615 3 Pinotage 0.00001 0.9040 4 Merlot 0.00002 0.9544 5 Portugal 0.00005 0.7633 6 France 0.00003 0.8453 7 Fullness 0.00025 0.8127 8 Lebanon 0.00024 0.5979

17 9 Mencia 0.00030 0.7902 10 Touriga Nacional 0.00076 0.1614 11 Pinot Noir 0.00067 0.6756 12 Counoise 0.00088 0.4913 13 Carmenere 0.00147 0.2815 14 South Africa 0.00158 0.4992 15 Greece 0.00236 0.0250 16 Dryness 0.00265 0.4471 17 Shiraz 0.00219 0.4792 18 Australia 0.00219 0.5312 19 USA 0.00138 0.5764 20 New Zealand 0.00291 0.4346 21 Oseleta 0.00385 0.0957 22 Italy 0.00362 0.3617 23 Barbera 0.00284 0.4320 24 Graciano 0.00381 0.5215 25 Cinsault 0.00436 0.3655 26 Rondinella 0.00594 0.0445 27 Molinara 0.00498 0.1515 28 Cabernet Sauvignon 0.00546 0.2569 29 Petit Verdot 0.00481 0.2600 30 Sangiovese 0.00528 0.2304 31 Cabernet Franc 0.00520 0.2663 32 Argentina 0.00642 0.0096 33 Nebbiolo 0.00672 0.2245 34 Year 0.01211 0.1135 35 Mazuelo 0.00815 0.1103

3.4 The Initial Model: White Wine

A second regression was computed using the data for white wine. This was done to analyze how different covariates influence the wine sales on white wine and with that obtain a regression model for predicting future sales on white wine. The initial model that was attained for white wine had similar covariates to the red wine regression, except for the different grapes and countries. The table below describes the different covariates in more detail, and the output of the initial regression.

18 Table 4: Table of the Covariates for the Initial Model of White Wine.

Covariate Estimate Std.Error p-value Unit Benchmark Alcohol Content 0.07485 0.1310 0.5690 Percentage - Country of Origin Argentina 0.7210 0.2594 0.0066 Dummy - Australia -0.07209 -0.5932 0.9035 Dummy - Austria 1.462 0.4390 0.0012 Dummy - France 1.426 0.3333 0.0000 Dummy - Germany 0.9582 0.5415 0.0800 Dummy - Italy 0.6756 0.4854 0.1672 Dummy - New Zealand 1.194 0.3469 0.0009 Dummy - Others - - - - Benchmark Portugal 3.295 0.6769 0.0000 Dummy - Slovenia 0.1604 0.4038 0.6921 Dummy - South Africa 1.005 0.3602 0.0064 Dummy - Spain 1.546 0.7058 0.0310 Dummy - USA 1.020 0.2981 0.0009 Dummy - Ethical Yes -3.093 0.2614 0.0000 Dummy - No - - - Dummy Benchmark Grapes Albarino -1.296 0.2669 0.0000 Dummy - Aligote 1.094 0.6782 0.1100 Dummy - Bourboulenc -0.5664 0.5238 0.2823 Dummy - Carricante -0.2721 0.8101 0.7377 Dummy - 0.8861 0.6779 0.1944 Dummy - Chenin Blanc 0.5956 0.6854 0.3870 Dummy - Gewurztraminer -0.3323 0.4078 0.4172 Dummy - Gros Manseng 0.03229 0.5935 0.9567 Dummy - Gruner Veltliner 0.1996 0.7421 0.7886 Dummy - Muskat -2.0142 0.6288 0.0019 Dummy - Nerello Mascalese 0.2451 0.7864 0.7560 Dummy - Pinot Blanc 1.985 1.302 0.1307 Dummy - Pinot Gris -0.4560 0.3661 0.2159 Dummy - 0.7833 0.6758 0.2494 Dummy - Rousanne 1.265 0.6959 0.0723 Dummy - 0.5874 0.6667 0.3806 Dummy - Semillon -1.108 0.4585 0.0176 Dummy -

19 Verdicchio 2.193 0.7568 0.0047 Dummy - Vermentino 0.9506 0.7278 0.1947 Dummy - Viognier 0.01923 0.4424 0.9654 Dummy - Organic Yes -0.1285 0.1705 0.4529 Dummy - No - - - Dummy Benchmark Number of Stores 0.006459 0.0005797 0.000 Quantity - Price -0.008354 0.002392 0.0007 SEK - Taste Category Fresh and Fruity 0.1684 0.4059 0.6792 Dummy - Rich and Flavorsome 0.5823 0.4726 0.2209 Dummy - Flowery 1.177 0.4210 0.0063 Dummy - Missing category - - - Dummy Benchmark Taste Notes Sweetness 0.09224 0.1057 0.3852 1-12 - Fullness -0.1708 0.1331 0.2024 1-12 - Fruitiness -0.0356 0.1653 0.8299 1-12 - Year 0.2792 0.1206 0.0228 Number -

3.4.1 Model Validation

In a similar approach as done with red wine, plots of the initial model were analyzed to ascertain significant characteristics of the regression model. As mentioned before, this is done to confirm that certain tools can be used to evaluate and understand the model.

Figure 5: Residual vs. Fitted Plot for the Initial Model of White Wine.

20 The plot above is used to verify assumption 1 of the linear regression model. There seems to be a linear relationship between the dependent variable and the covariates, since the graph shows a straight line. Hence, one can conclude that assumption 1 has been met, much alike the initial model of red wine.

Figure 6: Normal Q-Q Plot for the Initial Model of White Wine.

The graph above is used to check the condition of normality. Since almost all of the observations fall on the 45-degree straight line, one can conclude that the initial model meets the second assump- tion.

Figure 7: Scale-Location Plot for the Initial Model of White Wine.

Lastly, this plot confirms homoscedasticity in the regression. The variation in the residuals is constant along the range of the fitted values. Hence, homoscedasticity is confirmed and the third

21 assumption is met for the initial model of white wine.

3.4.2 Evaluating the R2, F -statistic and p-value

Table 5: Table of R2, F -statistic and p-value for the Initial Model for White Wine.

R2 0.8226 Adjusted R2 0.7395 F -statistic 9.903 on 44 and 94 DF p-value 0.0000

The R2 is 0.8226 showing that the model explains 82.26 % of the variation in the data. The adjusted R2 shows that the model explains 73.95 % of the variation in the data. The F -statistic for the overall initial model of white wine is adequate as well. This is supported by a low p-value suggesting that the overall significance of the model is fine.

3.4.3 Reducing the Model

As done in the regression for red wine, AIC was used to reduce the prediction model to get a more accurate model. In this case, the benchmark to which the η2 were compared to was evaluated to:

−2r 1 − e n = 0.01429 (11)

The steps 2-4 described in section 2.2.1 were repeated 19 times, i.e. 19 covariates were removed from the inital model. After using AIC to reduce the model, covariates with a p-value over the predefined accepted level of 0.1 were removed. Below follows a table in the order of which the covariates were 2 −2r removed according to the criterion that η < 1 − e n and as well as a p-value > 0.1.

Table 6: Table of the Removed Covariates for White Wine.

Covariate η2 p-value 1 Viognier 0.00001 0.9654 2 Gros Manseng 0.00001 0.9595 3 Australia 0.00009 0.9031 4 Gruner Veltliner 0.00041 0.7431 5 Fruitiness 0.00057 0.8260 6 Nerello Mascalese 0.00055 0.7573 7 Fresh and Fruity 0.00070 0.6226 8 Slovenia 0.00078 0.6188

22 9 Carricante 0.00229 0.4042 10 Bourboulenc 0.00378 0.2143 11 Gewurztraminer 0.00413 0.3336 12 Alcohol Content 0.00242 0.5805 13 Organic 0.00284 0.5024 14 Pinot Gris 0.00586 0.2223 15 Sweetness 0.00907 0.1937 16 Argentina 0.01296 0.0060 17 Chenin Blanc 0.01400 0.0798 18 Aligote 0.01343 0.0001 19 Italy 0.01158 0.2867 20 Sauvignon Blanc 0.00776 0.2835 21 Riesling 0.00821 0.3096

23 4 Results

4.1 The Final Model: Red Wine

After reducing the model using AIC and p-value, the final model was attained. The final model contains 21 covariates listed in the table below with the relevant regression outputs.

Table 7: Table of the Covariates for the Final Model of Red Wine.

Covariate Estimate Std.Error p-value Unit Benchmark Alcohol Content -0.1560 0.07410 0.0362 Percentage - Country of Origin Austria -1.458 0.09684 0.0000 Dummy - Chile -1.525 0.2802 0.0000 Dummy - Germany -1.810 0.1149 0.0000 Dummy - Others - - - - Benchmark Spain -1.052 0.2790 0.0002 Dummy - Grapes Carinena 0.8458 0.2268 0.0002 Dummy - Corvina 0.3630 0.1388 0.0094 Dummy - Grenache 0.4983 0.2094 0.0180 Dummy - Malbec -0.5312 0.2020 0.0090 Dummy - Mourvedre -0.4421 0.2038 0.0309 Dummy - Tannat 0.8527 0.1063 0.0000 Dummy - Tempranillo 0.7780 0.3275 0.0182 Dummy - Viognier -1.2156 0.4079 0.0032 Dummy - Organic Yes -0.5192 0.1671 0.0021 Dummy - No - - - Dummy Benchmark Number of Stores 0.007599 0.0003643 0.0000 Quantity - Price -0.005665 0.0007614 0.0000 SEK - Taste Category Soft and Berry 0.6279 0.3033 0.0394 Dummy - Rough and Nuanced 0.5771 0.1968 0.0037 Dummy - Well-Seasoned and Spicy 0.7463 0.1664 0.0000 Dummy - Fruity and Flavorsome 0.8886 0.1799 0.0000 Dummy - Missing category - - - Dummy Benchmark Taste Notes Fruitiness -0.2255 0.09012 0.0130 1-12 -

24 This results in a final predictive model for red wine with the following regression equation:

log(Number of Bottles Sold) = 13.10 − 0.1560 × Alcohol Content − 1.458 × Austria − 1.525 × Chile −1.810 × Germany − 1.052 × Spain + 0.8458 × Carinena +0.3630 × Corvina + 0.4983 × Grenache − 0.5312 × Malbec −0.4421 × Mourvedre + 0.8527 × T annat +0.7780 × T empranillo − 1.2156 × V iognier − 0.5192 × Organic +0.007599 × Number of Stores − 0.005665 × P rice +0.6279 × Soft and Berry + 0.5771 × Rough and Nuanced +0.7463 × W ell Seasoned and Spicy + 0.8886 × F ruity and F lavorsome −0.2255 × F ruitiness

4.1.1 Model Validation

To evaluate to what extent the final model can be used to predict the wine sales and be considered an accurate model, a final verification of the important assumptions of the linear regression model is done below.

Figure 8: Residual vs. Fitted Plot for the Final Model of Red Wine.

This plot is very similar to the one for the initial model. Therefore, with the same reasoning, one can deduce from the plot that the relationship between the dependent variable and the covariates is linear. Hence, one can conclude that assumption 1 has been met.

25 Figure 9: Normal Q-Q Plot for the Final Model of Red Wine.

The normal Q-Q plot has clearly improved after arriving at the final model since more of the observations fall on the 45-degree straight line. Therefore, one can with certainty draw the conclusion that assumption 2 of normality is met.

Figure 10: Scale-Location Plot for the Final Model of Red Wine.

Similar to the initial model of red wine, the final model has a constant variation in the residuals along the range of the fitted values. Hence, homoscedasticity is confirmed and the third assumption is met. This means that the F -statistic is valid for the model.

Finally, to validate whether the final assumption regarding multicollinearity is avoided, the VIF - statistic is used. This is only done for the final model to address any concerns regarding multi- collinearity, even though it is mostly not a concern for a prediction model. Below follows a table

26 with the results from this test, where F ALSE indicates a covariate that has met the VIF - critera and hence does not exhibit a potential harmful multicollinearity.

Table 8: Table of the VIF -test for Red Wine.

Covariate VIF V IF > 10 Alcohol Content 1.658 FALSE Country of Origin Austria 1.017 FALSE Chile 1.297 FALSE Germany 1.021 FALSE Spain 3.918 FALSE Grapes Carinena 1.402 FALSE Corvina 1.817 FALSE Grenache 2.331 FALSE Malbec 1.129 FALSE Mourvedre 2.028 FALSE Tannat 1.056 FALSE Tempranillo 3.832 FALSE Viognier 1.124 FALSE Organic Yes 1.162 FALSE Number of Stores 1.397 FALSE Price 1.516 FALSE Taste Category Soft and Berry 2.357 FALSE Rough and Nuanced 11.50 TRUE Well-Seasoned and Spicy 15.75 TRUE Fruity and Flavorsome 13.63 TRUE Taste Notes Fruitiness 1.211 FALSE

As seen above, there are three covariates with possible harmful multicollinearity. However, this can be explained by the choice of benchmark for the taste category, which is the ”missing category”. This benchmark has a small number of cases in proportion to the the other categories. This causes the covariates to have high V IF s, even though they are not correlated with any other variable in the regression model (Allison, 2012). Therefore, this does not need to be of concern in terms of harmful multicollinearity.

27 4.1.2 Evaluating the R2, F -statistic and p-value

Table 9: Table of R2, F -statistic and p-value for the Final Model for Red Wine.

R2 0.7918 Adjusted R2 0.7753 F -statistic 47.99 on 21 and 265 DF p-value 0.0000

The R2 for the final model is 0.7918 meaning that the model explains 79.18 % of the variation in the data. The adjusted R2 shows that the model explains 77.53 % of the variation in the data. This is slightly larger than what it was for the initial model. Furthermore, the gap between the R2 and the adjusted R2 has decreased. The F -statistic for the final model of red wine is 47.99 on 21 and 265 degrees of freedom, which is much higher than the initial model. The overall p-value is still low, and in combination with the higher F -statistic, the significance of the model has clearly improved after the reduction.

4.2 The Final Model: White Wine

As for the red wine model, a final model was obtained for white wine after using AIC. This final model includes 23 covariates described in the table below with the relevant data from the regression.

Table 10: Table of the Covariates for the Final Model of White Wine.

Covariate Estimate Std.Error p-value Unit Benchmark Country of Origin Austria 0.9875 0.3036 0.0015 Dummy - France 1.183 0.2567 0.0000 Dummy - Germany 0.9506 0.3349 0.0054 Dummy - New Zealand 0.9309 0.2901 0.0017 Dummy - Others - - - - Benchmark Portugal 2.589 0.2676 0.0000 Dummy - South Africa 0.9099 0.3520 0.0110 Dummy - Spain 0.7671 0.2683 0.0050 Dummy - USA 0.9384 0.3515 0.0087 Dummy - Ethical Yes -2.454 2.966 0.0000 Dummy - No - - - Dummy Benchmark

28 Grapes Albarino -1.365 0.2057 0.0000 Dummy - Chardonnay 0.2785 0.1527 0.0707 Dummy - Muskat -1.871 0.2813 0.0000 Dummy - Pinot Blanc 1.822 0.2378 0.0000 Dummy - Rousanne 0.7665 0.3567 0.0338 Dummy - Semillon -1.242 0.3682 0.0010 Dummy - Verdicchio 2.099 0.2923 0.0000 Dummy - Vermentino 0.8849 0.2687 0.0013 Dummy - Number of Stores 0.006458 0.0005900 0.000 Quantity - Price -0.008522 0.002024 0.0001 SEK - Taste Category Rich and Flavorsome 0.4375 0.2091 0.0386 Dummy - Flowery 0.4487 0.1953 0.0234 Dummy - Missing category - - - Dummy Benchmark Taste Notes Fullness -0.1714 0.09845 0.0844 1-12 - Year 0.2762 0.1016 0.0076 Number -

This results in a final predictive model with the following regression equation:

log(Number of Bottles Sold) = −0.005468 + 0.9875 × Austria + 1.183 × F rance + 0.9506 × Germany +0.9309 × New Zealand + 2.589 × P ortugal + 0.9099 × South Africa +0.7671 × Spain + 0.9384 × USA − 2.454 × Ethical −1.365 × Albarino + 0.2785 × Chardonnay − 1.871 × Muskat +1.822 × P inot Blanc + 0.7665 × Rousanne −1.242 × Semillon + 2.099 × V erdicchio +0.8849 × V ermentino + 0.006458 × Number of Stores −0.008522 × P rice − 0.4375 × Rich and F lavorsome + 0.4487 × F lowery −0.1714 × F ullness + 0.2762 × Y ear

4.2.1 Model Validation

Once again, a final verification of the important assumptions of the linear regression model is done below.

29 Figure 11: Residual vs. Fitted plot for the Initial Model of White Wine.

Similar to the model of red wine, this plot has not altered much after arriving at the final model. Therefore, one can still conclude that the dependent variable and the covariates have a linear rela- tionship. Hence, assumption 1 is met for the final model of white wine.

Figure 12: Normal Q-Q Plot for the Final Model of White Wine.

Although not all observations fall on the straight 45-degree line, they don’t deviate enough to cause any concern. Therefore, one can conclude that assumption two has been met for the final model of white wine, i.e. the residuals are normally distributed with a mean of zero.

30 Figure 13: Scale-Location Plot for the Final Model of White Wine.

As before, the variation in the residuals is constant along the fitted values. Hence, one can conclude that the regression is homoscedastic and the third assumption has been met. Finally, this means that the F -statistic is valid for the model.

As for the final model of red wine, the VIF -statistic is used to validate whether the final assump- tion regarding multicollinearity is met. Below follows a table with the results from this test, where F ALSE indicates a covariate that has met the VIF -critera and hence does not have harmful mul- ticollinearity.

Table 11: Table of the VIF-test for White Wine.

Covariate VIF V IF > 10 Country of Origin Austria 1.629 FALSE France 3.527 FALSE Germany 1.952 FALSE New Zealand 1.938 FALSE Portugal 3.103 FALSE South Africa 1.874 FALSE Spain 2.172 FALSE USA 1.833 FALSE Ethical Yes 1.133 FALSE Grapes Albarino 4.094 FALSE Chardonnay 2.306 FALSE

31 Muskat 2.673 FALSE Pinot Blanc 2.230 FALSE Rousanne 1.325 FALSE Semillon 1.248 FALSE Verdicchio 1.125 FALSE Vermentino 1.094 FALSE Number of Stores 1.470 FALSE Price 1.350 FALSE Taste Category Rich and Flavorsome 3.369 FALSE Flowery 1.472 FALSE Taste Notes Fullness 3.571 FALSE Year 1.368 FALSE

For the white wine regression model, there is no covariate that exhibits any harmful multicollinear- ity.

4.2.2 Evaluating the R2, F -statistic and p-value

Table 12: Table of R2, F -statistic and p-value for the Final Model for White Wine.

R2 0.8035 Adjusted R2 0.7642 F -statistic 20.45 on 23 and 115 DF p-value 0.0000

The R2 for the final model reveals a 76.42 % explanation in the variation of the data. The adjusted R2 has increased slightly with a degree of explanation of 76.42 %. Furthermore, the F -statistic for the final model of white wine is significantly higher with a value of 20.45 on 23 and 115 degrees of freedom. The overall p-value is still low, and as the for the red wine model, this in combination with a high F -statistic means that the overall significance of the model has improved.

32 5 Discussion

This section will discuss the final results of this study, and analyze the covariates that were included in the final models. Furthermore, the relevance of the results will be investigated by taking into consideration the limitations of the data.

5.1 Analysis of Results

5.1.1 Red Wine

The final regression model for red wine included 21 covariates. Below follows an analysis of the different covariates and their estimates.

Alcohol Content According to the obtained model, the covariate alcohol content is negative suggesting that a higher alcohol content has a negative effect on the number of bottles sold. This may be explained by con- sumers preferring a lower alcohol content below and including 14 %, as that has been the maximum level possible historically. Therefore, new wines on the market with higher alcohol content, which has only recently become possible due to more resilient yeast in the production, do not sell as much due to consumers not being accustomed to it yet (Puckette, 2015).

Country of Origin The estimates for the countries of origin were all negative, which means that the model suggests that the countries of origin has a negative affect on the sales. This can be explained by the fact that the countries that were left after reducing the model are not necessarily the most prestigious wine producing countries. Therefore they may influence the sales negatively.

Grapes The estimates of the different grapes vary between being positive and negative. It is difficult to say why some grapes perform better than others, except for the obvious reason that consumers consider one to taste better than the other. Furthermore, different types of grape combinations are more famous than others and might therefore increase the sales of a wine containing one of those grapes. For example, the grape Corvina is a part of the famous blend Amarone, a very popular type of in Sweden that makes up approximately 45 % of the wine sales of bottles above 150 SEK (Montelius, 2015). This might be why that particular grape has a positive estimate.

Organic The negative estimate indicates that customers tend to choose wines that are not classified as organic. This might seem counter-intuitive due to the fact that people are becoming more aware of

33 the consequences of not purchasing organic goods. This awareness in all different kinds of organic products should theoretically as well reflect on the sales of wine. However, organic goods tend to be pricier, which can explain why the correlation is negative. Furthermore, this negative value can also be explained by not having enough organic wines in the selection of data to get a relevant estimate.

Number of Stores The estimate for the number of stores is positive with a very low standard error and p-value. This suggests that the covariate has a high significance to the model. This is to be expected as a consumer will ultimately base their purchase on what is readily available (Hebbeln, 2014).

Price The estimate for price is negative, with a very low standard error and p-value. The negative estimate suggests that it is disadvantageous for the wines to be more expensive since they sell less, according to the model. This is not surprising since the rational consumer with scarce resources always chooses the good with the lowest price, ceteris paribus (Jain and Ohri, 2014, p. 4).

Taste Category The different taste categories all have positive estimates. This can be due to the fact that compared to the benchmark, the consumer has more information when purchasing the wine. Therefore, the consumer can make a more informed choice and, hence, might favor a wine with a defined taste category.

Taste Notes After reducing the model, only fruitiness was left as a dummy under taste notes. This covariate has a negative estimate. It is hard to draw any conclusions to why, but it may be because many consumers confuse fruitiness with sweetness, which might not be what one associates red wine with. Therefore, the consumer may be averse to choosing a wine with high fruitiness.

5.1.2 White Wine

The final regression model for white wine included 23 covariates. Below follows an analysis of the different covariates and their estimates.

Country of Origin The estimates for the countries of origin were all positive, which means that the model, unlike red wine, suggests that the countries of origin has a positive affect on the sales. This contradicts the previous argument that was stated for red wine. Why this difference occurs is difficult to account for. However, the difference can be explained by the fact that fewer covariates were removed from the

34 initial regression, suggesting that countries are more important when choosing a white wine.

Ethical The dummy ethical has a negative estimate, but also a relatively high standard error. Therefore, one could argue, that the estimate is quite uncertain. This uncertainty can be because there were very few wines that qualified as ethical. It is counter-intuitive that an ethical wine will sell less as consumers are becoming more and more aware of their choices, as with organic wines. However, in this case, it may be because Systembolaget does not offer many ethical wines, in general, which causes consumers to choose something else due to its unavailability.

Grapes As for red wine, the grapes in the final white wine model vary between having positive and negative estimates. Some of the grape varieties have higher significance than others, for example Pinot Blanc. It has a close resemblance to Chardonnay, a very popular grape variety, and it is quite fruity making it popular in a white wine (Robinson, 2016). This may explain its large estimate and high significance to the model.

Number of Stores The estimate for the number of stores is, like red wine, positive with a low standard error and p-value. This is, as mentioned before, not surprising since availability is a key factor in the choosing of wine.

Price The estimate for the price is negative, as it was in the red wine model. Again, with the same logic, this is due to a rational consumer always choosing the cheaper wine, ceterius paribus.

Taste Category Two dummies remain under the covariate taste category after reducing the model. Both of these have positive estimates. Once again, this is probably due to the fact that they are compared to the benchmark, which is the wines that lack a category. So consumers that want to make a well-informed decision, as a rational consumer will do, are going to favor a wine with more information causing them to have positive estimates.

Taste Notes After reducing the model, the only dummy left under taste notes was fullness. This covariate has a negative estimate. This indicates that the consumer is more inclined to choose a lighter wine.

Year The covariate year has a positive estimate suggesting that it increases the sales of a particular wine. Many wines benefit from becoming older, which may be why this has a positive estimate.

35 5.2 Limitations

The two models obtained have an overall high significance and high adjusted R2 suggesting that they are able to fairly well predict the sales of wine. It is however important to remember that some very important factors were not taken into consideration. Some of the most important factors are the various nuances of customer behavior. Most of these behaviors are unquantifiable and hard to obtain any reliable data on, which is why they have not been included in this thesis. But, it is still important to remember that some of them would probably have a significant impact on the sales of a specific wine. Some of these factors include returning customers choosing the same wine, recommendations from a reliable source, shelf position, and marketing. These types of data are based on personal preferences, as well as word of mouth, which make them hard for a wine producer to affect. This is not true, however, for marketing, which is why its impact will be analyzed further in the case study to help wine producers sell more wine.

Furthermore, this study chose to exclude brand names and their own specific popularity. Many consumers will probably choose a wine based on the fact that they recognize the wine name. However, this was excluded, as this thesis attempts to help all wine producers create wines that sell more, not only those that are already well established and popular due to brand recognition. This may be a quite large limitation to the study, but in order to properly answer the question at hand, it was deemed a necessary restriction.

Along with excluding brand names, this thesis chose to exclude the different regions that the wines were produced in. This was deemed a relevant restriction to the analysis as the average consumer has no knowledge of different regions. More consumers will know the location of the country than the location of a specific region. Therefore, country of origin was seen as a satisfactory measure of the geographical location of the wine.

36 6 Case Study

6.1 Introduction

The first part of this thesis was aimed at creating a prediction model for wine producers to increase the sales of their wines by finding important factors that influence a consumer’s choice. However, to only focus on selling is not enough in a competitive market as buyers are becoming increasingly informed. Therefore, wine producers have to become a part of a value-delivery process that places marketing at the beginning of planning to meet individual wants, perceptions, preferences and buying criteria (Kotler and Armstrong, 2014, p. 33-34). This means that wine producers need to reach their consumers early in their decision process. Hence, it is imperative that wine producers use a broader perspective consisting of a market analysis and a marketing mix to differentiate themselves on the market. Therefore, the second part of this thesis will focus on how existing wine producers can increase the sales of their wine by using the theories of SWOT analysis developed by Albert Humphrey and the 4Ps presented by Kotler and Armstrong in “Principles of Marketing”. These theories will be specifically applied on a hypothetical company to deliver to them recommendations on how to survive in a changing, volatile market. A combination of these two theories do not only give the producer an insight in the market in which they operate, but also an opportunity to identify which factors of their marketing mix that generate the highest profitability. Lastly, some criticism of the theories used will be presented along with a brief discussion on the social effects of marketing of alcohol in Sweden.

6.1.1 Aim and Purpose

The purpose of this analysis is to obtain a recommendation for well-established wine producers on how to survive and keep their profitability on the Swedish market. This analysis is based on a medium-sized wine producer operating from one of the bigger wine producing countries, for example France or Italy (Castaldi, 2006). The company is well-established and produces both red and white wine but has not yet become the market leader. The company has also had consistent sales for the past years. This limitation was chosen due to the fact that these wine producers are the most vulnerable since new wine producers from untraditional regions are penetrating the market (Castaldi, 2006). The aim is therefore to provide this hypothetical company with concrete recommendations and tools to excel on the Swedish market.

6.1.2 Research Question

To answer the aim and purpose of this second part of this thesis, the following research question has been formulated:

37 • How can the wine producers increase their sales and profitability by better understanding their market and using the right marketing tools?

6.2 SWOT Analysis

First, to be able to provide this hypothetical wine producer with concrete recommendations on how to increase their profitability using the 4Ps, it is important to understand the market in which they operate. This can be done by applying a SWOT analysis, which identifies the company’s internal strengths and weaknesses in order to seize possible opportunities and successfully avoid external threats (Panagiotou, 2003). In this thesis, the SWOT analysis is therefore implemented as an industry analysis to “contextualize market opportunities” for this medium-sized wine producer on the Swedish market (Leigh, 2010, p. 122).

Strengths The internal strengths of a medium-sized wine producer is that it has a relatively large wine pro- duction giving it benefits such as economies of scale so that it can keep a competitive price. This also means that they produce enough to keep a high availability of their wines for the consumer. Furthermore, since it is already well-established, in the sense that it has been around for many years, it has a large amount of experience with all aspects of the wine production and distribution. This also translates to having valuable human capital. Moreover, being well-established means that it probably has a stable brand that is recognized among consumers since it has been able to survive for many years. This also means the company has a loyal customer base. Lastly, being a wine producer from a large wine producing country may limit the exposure to insecure environmental conditions.

Weaknesses The internal weakness for this hypothetical wine producer is that having been around for many years it may stagnate and not endure in this new, competitive market due to its traditional ways. It is common that old-fashioned companies fall behind due to not adapting fast enough to new trends and demands. This is a weakness for any company acting in a market where new competitors are feverishly entering and trends constantly changing.

Opportunities The opportunities for the wine producer are limited to the ones identified on the Swedish market. Firstly, one can see a higher awareness among consumers regarding the social responsibility of the company. This means that consumers are more sensitive to the impact that the company has on both the environment and the society in terms of for example working conditions. Therefore, there is an opportunity for wine producers operating on the Swedish market to get their wines branded as organic and ethical, which is a clear way to differentiate themselves at Systembolaget. Even though the regression analysis does not support that this would increase sales, recent studies suggest that it

38 will play a much larger role in the future (Bisson, 2002). Furthermore, another opportunity to gain a larger market share on the Swedish market, and other markets, is to merge with other wineries to further their economies of scale as well as increasing brand recognition. This could help ensure their survival as having a good support network facilitates the company’s ability to make big adjustments knowing that they have enough capital to fall back on during the start-up phase when revenues within the company may be limited (Castaldi, 2006, p. 8).

Threaths As mentioned before, the threats on the Swedish market are these new wine producers entering on the market. They have a clear competitive edge on some fronts since they are less rigid and traditional. These new entrants are more likely to have a differentiated approach to both design and name making them stand out at Systembolaget. Furthermore, newer companies may be more adapted to technological advances and better realize the importance of using social media to gain recognition.

6.3 The Marketing Mix - The 4Ps

To further create a competitive advantage, the wine producers need to establish how they will deliver value to their customers. This is done by using the company’s marketing mix. The term marketing mix is used to describe the tactical tools that a marketer can use to increase their competitive edge on the market and as such, increase market response (Kotler and Armstrong, 2014, p. 34). The most common type of marketing mix tool is known as the 4Ps, which classifies the mix into four subgroups. These groups are product, price, place and promotion. Below follows a small description for each of these subgroups along with how they can be used specifically on the wine market in Sweden (Kotler and Armstrong, 2014, p. 34).

Product Product is the physical object, service, person or idea that will be delivered to the market for use and consumption. This product is created to satisfy a demand on the market. It is probably quite clear that the product in this case is a perishable good in the form of red and white wine. As the regression analysis in the previous part of this thesis determined, there are several factors regarding the product that will aid in selling the wine. Most importantly, one can see that it is important for the consumer to have as much information as possible when choosing the wine. This becomes apparent by the positive estimates for the taste category. It is therefore important that the wine producer makes sure their wine is tried and tested by Systembolaget as this ensures it gets both a taste category as well as a classification of its taste notes.

However, there are also other parts regarding the product that are important to take into considera- tion, which the regression analysis did not include. This is relevant to explicate since it is important to differentiate oneself from other wines and to in some ways stand out. This is primarily done by

39 creating a strong brand, in terms of name, design and the actual physical appearance of the product (Bisson, 2002). This becomes especially important in the physical stores of Systembolaget as it may be quite overwhelming as the assortments are usually very large. Even more so, since the wine producer cannot influence shelf position and additional promotion in stores. This, since the goal of Systembolaget is not to create more sales and will therefore not accept such strategies. Creating a wine bottle which stands out in form of appearance is therefore the most important complement to its intrinsic qualities to create a desirable product as this may be what a consumer ultimately bases their decision on when choosing between two wines that are otherwise identical (Bisson, 2002). Studies conducted regarding the most popular designs are inconclusive as it ultimately comes down to personal taste when the consumer bases their decision on the design (Bonafede, 2010). Ultimately, however, this study still demonstrates the importance of putting in effort to create a design that clearly shows what the brand stands for as it can persuade a consumer to choose that wine over another.

Another important part of the design of the bottle is the choice between using a cork or a screw cap to seal the bottle. Recently, the screw cap has become more popular as more and more studies show that the difference between the two choices have little to no impact on the wine’s quality (Puckette, 2014). Depending on who the wine is targeted to, one may also argue for the use of a screw cap as it is easier to open and as such, more convenient for the consumer. Therefore, the hypothetical wine producer should carefully consider who their wine is targeted to and from that make a choice on the type of cork or screw cap that will be used. This can also vary between the range of wines that the wine producer offers, as more expensive wines are normally expected to have a cork.

Price After choosing the product that will be delivered, the price for that particular product must be decided upon, as well as potential discounts and terms of payment that will be offered. Deciding upon a price has very much to do with the target market that wants to be reached. For example, more expensive wines attract wealthier customers and perhaps, those with a higher knowledge of wine. Specifically, in Sweden, the price is of outmost importance as if the product does not sell, they won’t be able to use discounts to sell more. Therefore, it is imperative to decide on the right price from the beginning as discounts can’t be relied on. As was seen from the regression done in the first part of this thesis, price has a negative impact on the sales. This is not surprising and once again pleads for the importance of assessing the right price. For the hypothetical, well-established wine producer this analysis focuses on, one can see this pricing as one of its strengths. This, since it has operated on the market for a long while and therefore probably knows the right price for its wine. More so since it may benefit from economies of scale mentioned in the SWOT analysis.

Place Next, the company must decide where its product will be available to the target customer. In Sweden, this hypothetical wine producer has a very limited ability to affect this. As Systembolaget

40 has monopoly on the retail sales of wine, this is the only physical store where it can be offered. As the regression analysis showed, availability is one of the most important factors to increase sales and it is therefore imperative that the wine producer satisfies this criterion. Furthermore, it is important to note that wines are becoming more and more available at online stores and subscription groups. It is therefore important to make sure the wine producer’s products are available on these type of retail outlets as well.

Promotion Under promotion falls all the activities that communicate the value of the product to the target customer and convinces the buyer to choose that particular product. These types of activities include advertising, campaigns, publicity and social media communication. In Sweden, there are some restrictions on the types of advertisement used to promote alcohol. However, there are some ways to get around these restrictions, such as advertising on the TV through foreign channels, advertising on the Internet, through social media such as Twitter, Instagram, and through celebrities. As such, there are still many ways to use promotion to increase customer-perceived value and hence increase the sales of a particular wine. For example, using advertising to create a feeling or a sense of fulfillment so that the customer believes that drinking that particular wine will bring them the same satisfaction as promoted. This is not allowed on the Swedish market but, as mentioned, through foreign TV channels one can avoid this restriction (Creutzer, 2012).

Furthermore, the wine producer can use the Internet to successfully market their wine. As mentioned before, some of the restrictions that apply to advertising do not apply on the Internet. Most notably the ad no longer needs to contain a warning text (Konsumentverket, 2015). Below follows two hypothetical advertisements that show the difference between a published ad and an Internet ad.

41 (a) Published advertisement. (b) Internet advertisement.

Figure 14: Two types of allowed advertisements in Sweden.

For this reason, the Internet is a very powerful media through which to advertise. Research suggests that so called banner ads are likely to increase the purchase probability of a consumer (Manchanda et al., 2004). Furthermore, the study shows that having consistent ads that are targeted to that specific consumer also increases likelihood of a purchase (Manchanda et al., 2004). This means that using data to specifically target consumers, who for example recently visited Systembolagets website or looked up a recipe, are reached by the banner ad. This could then potentially increase the sales of that specific wine.

Another way that the hypothetical wine producer can use promotion is through unofficial channels where indirect marketing is done. For example, wine reviews in magazines and wine tastings on TV, which is not labeled advertising and is therefore not subject to the restrictions that other advertise- ments are (Creutzer, 2012). Finally, one can conclude that the immense increase in investments in advertising of alcohol in Sweden, which rose 127 % between 2008 and 2011, shows that many be- lieve it will have a positive effect on the sales (Creutzer, 2012). This supports Holder’s finding who suggested advertising may increase sales between 5 to 8 % (Holder, 2008). It is therefore imperative that this hypothetical medium-sized wine producer uses promotion carefully to stay a current force in the market.

6.4 Recommendations

Finally, to summarize the most important success factors for this hypothetical medium-sized wine producer are mentioned below as a recommendation to them:

42 • Making the brand both ethical and organic to meet the increasing trend of consumer’s being more aware of the social consequences of their choices.

• Merging with a larger wine producer to gain financial security and hence, more room to take risks.

• Having it tasted by Systembolaget so they can provide the wine with a taste category and taste notes so the consumer is given more information regarding the product and hence able to make a more informative decision.

• Differentiating its wine bottles by carefully choosing its design and label so that it appeals to the consumer and stands out.

• Choosing a cork or a screw cap depending on who the wine is targeted at. Generally, more expensive wines use corks as this is often associated with exclusivity, while screw caps are more convenient.

• Selling their wine to online subscription groups and online retail stores to increase availability of their wine.

• Using targeted banner ads to increase purchase probability of a consumer online.

• Using indirect marketing through including the wine in TV tastings and wine reviews.

It is important to realize that some of these recommendations do not work for all types of wine producers, but as was stated before, this analysis aimed at a well-established medium-sized wine producer.

6.5 Criticism

The literature used to examine the Swedish wine market and the wine producer’s opportunity also has its limitations. Hill and Westbrook (1997) criticized the SWOT analysis for being too shallow and not providing enough support to actually act upon. Panagiotou (2003) furthered this criticism by stating that although the SWOT analysis has many strengths, its unstructured nature may leave its users without a clear understanding on how to incorporate what is arrived at from it. Furthermore, the subjectivity of this type of analysis is questioned (Hindle, 2009). Therefore, the SWOT analysis in this thesis is merely used as a tool to analyze the wine producer and its market. When it comes to actual actions that should be implemented, the marketing mix is used as a complement to give further structure to the analysis.

However this definition of a marketing mix is also heavily debated and many regard the 4Ps as being old-fashioned. Popovic (2006) criticized it for being too product-oriented in its approach to marketing. This is particularly negative in today’s market environment where consumer’s essentially

43 have the advantage due to the abundance of information and choice available. It may therefore be useful to use the 4Cs, another marketing mix, as a complement to the analysis in this thesis. The 4Cs are customer-oriented where one looks at the customer solution, cost to the customer, convenience and communication (Lauterborn, 1990). The reason for not including this theory in the analysis is due to the fact that the aim was to specifically demonstrate to a wine producer how to increase sales from the product-oriented perspective, not the customer. It is also more in line with the regression analysis, where the wine’s qualities are focused on. However, to further widen the analysis, the analysis using 4Cs can be recommended for further studies conducted.

6.5.1 The Social Consequences of Marketing of Alcohol

As mentioned before, the goal with marketing is to increase the sales of wine. This is in contradiction with the goal of Systembolaget. As a state-appointed monopoly, they want to limit the sales of alcohol to hinder the social consequences associated with its consumption. There is therefore a discrepancy between these two aspects of alcohol policy on the Swedish market. Although the exact relationship between alcohol and its social consequences are yet to be quantified, there is a clear relationship between alcohol consumption and alcohol related crime, traffic accidents, family dysfunction, and problems in the workplace (Babor et al., 2010, p. 60). Studies suggest that an increase in alcohol consumption negatively affects public health, as can be seen by example of the changes of mortality rate in Russia during the later part of the 20th century (Babor et al., 2010, p. 63). When the government enforced an anti-alcohol campaing, the mortality rate significantly decreased. However, when they lost the control of the alcohol market, the mortality rate increased once again. This example clearly shows the benefits to public health when limiting the alcohol supply. Furthermore, studies show that a government monopoly of retail sales is an effective way to limit the alcohol consumption and the harm associated with it (Babor et al., 2010, p. 244). In Sweden, the government monopoly of retail sales of alcohol has reduced alcohol related consequences significantly. According to the Swedish Institue of Social Research, Systembolaget contributes to a reduction in violence, sick leave, deaths and traffic injuries (Systembolaget, 2016a). Furthermore, Sweden even tried once to introduce some alcohols above 3.5 % in the grocery stores. This resulted in a 15 % increase in the alcohol consumption, most notable among young people. Along with an equal increase in the number of traffic incidents at the time (Noval and Nilsson, 1984). This clearly shows the positive effects of the government monopoly on limiting the Swedish alcohol consumption.

In contradiction, removing the ban on the marketing of alcohol in Sweden is believed to increase the sales of alcohol. Holder (2008) showed with his studies that the removal of the ban increases alcohol consumption with 5 to 8 %, even if it is only partially removed. His studies also suggest that a partial restriction on marketing has no effect on limiting the alcohol consumption. Not only does marketing increase alcohol consumption, the constant reminder of alcohol makes it difficult for those who try to quit or try to decrease their consumption (Thomson et al., 1997). This clear

44 indication that the advertising of alcohol encourages more drinking is in contradiction with the aim of Systembolaget. It is therefore important to remember that even though the wine producers, as the drivers of advertising, want to sell more, they also have a social responsibility. One should keep in mind that they are a part of a bigger picture where alcohol is often related to negative consequences. Therefore, the use of marketing to increase sales should be done in a conscientious way.

45 7 Conclusion

The results of this thesis show that the factors which drive the sales of wine vary between red and white. The regression analysis for red wine concluded that some grapes and all taste categories are positive in the regression model. In contrast, the alcohol content, countries of origin, organic and taste notes have negative estimates. For the white wine model, the positive estimates are the countries of origin, some grapes, taste categories and year. The covariates with negative estimates are ethical and taste notes.

Furthermore, one can conclude that for both regressions, availability is an important factor, which is measured by the covariate number of stores. This result supports Dr. Goodman’s statement regarding the importance of availability of the wine for the consumer (Hebbeln, 2014). The covariate price was also the same for both regression models, but negative. This is also to be expected as a rational consumer with scarce resources will choose the cheaper wine, ceterius paribus (Jain and Ohri, 2014, p. 4). Finally, one needs to remember that these are only models used to assimilate reality and they cannot be the only tool used to achieve higher sales. Even though the regressions have high adjusted R2, the models are at best a guideline to proceed from and not an absolute truth. However, these factors are still a vital part for the survival in this competitive market.

Therefore, the wine producers need to look at the profitability from a broader perspective to achieve success. Hence, the second part of this thesis, regarding the market analysis and marketing tools, should be seen as a complement to how the wine producers can further increase their sales. First of all, it is imperative that the wine producer knows and understands the market in which they operate as well as potential opportunities available on the market. After understanding this properly, they need to use their marketing mix efficiently. These tools, such as the 4Ps, can be used to increase the wine producers competitive edge on the market and identify possible strategies to increase sales. Using these steps, the wine producer can maximize the captured value from their customers to ensure higher sales, and as such higher profitability for their business. Despite that this last part of the thesis attempts to find opportunities for more sales, one needs to bear in the mind the social consequences associated with alcohol consumption. The wine producers have a responsibility towards its consumers and society since alcohol is no ordinary commodity.

46 List of Tables

1 Table of the Covariates for the Initial Model of Red Wine...... 13 2 Table of R2, F -statistic and p-value for the Initial Model for Red Wine...... 17 3 Table of the Removed Covariates for Red Wine...... 17 4 Table of the Covariates for the Initial Model of White Wine...... 19 5 Table of R2, F -statistic and p-value for the Initial Model for White Wine...... 22 6 Table of the Removed Covariates for White Wine...... 22 7 Table of the Covariates for the Final Model of Red Wine...... 24 8 Table of the VIF -test for Red Wine...... 27 9 Table of R2, F -statistic and p-value for the Final Model for Red Wine...... 28 10 Table of the Covariates for the Final Model of White Wine...... 28 11 Table of the VIF-test for White Wine...... 31 12 Table of R2, F -statistic and p-value for the Final Model for White Wine...... 32

List of Figures

1 Eyeball test ...... 7 2 Residual vs. Fitted Plot for the Initial Model of Red Wine...... 15 3 Normal Q-Q Plot for the Initial Model of Red Wine...... 16 4 Scale-Location Plot for the Initial Model of Red Wine...... 16 5 Residual vs. Fitted Plot for the Initial Model of White Wine...... 20 6 Normal Q-Q Plot for the Initial Model of White Wine...... 21 7 Scale-Location Plot for the Initial Model of White Wine...... 21 8 Residual vs. Fitted Plot for the Final Model of Red Wine...... 25 9 Normal Q-Q Plot for the Final Model of Red Wine...... 26 10 Scale-Location Plot for the Final Model of Red Wine...... 26 11 Residual vs. Fitted plot for the Initial Model of White Wine...... 30 12 Normal Q-Q Plot for the Final Model of White Wine...... 30 13 Scale-Location Plot for the Final Model of White Wine...... 31 14 Two types of allowed advertisements in Sweden...... 42

47 References

Allison, P. (2012). When can you safely ignore multicollinearity? http://statisticalhorizons. com/multicollinearity" Visited 2016-04-20.

Babor, T., Caetano, R., Casswell, S., and Edwards, G. (2010). Alcohol: No Ordinary Commodity, research and public policy, volume 2. Oxford University Press.

Bisson, L. F. (2002). The present and future of the international wine industry. http://www. nature.com/nature/journal/v418/n6898/full/nature01018.html" Visited 2016-05-02.

Bonafede, V. (2010). Analysis of design aesthetics and the correlation to price. http:// digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=1031&context=agbsp" Visited 2016-05-19.

Castaldi, R. (2006). A country-level analysis of competitive advantage in the wine industry. http:// academyofwinebusiness.com/wp-content/uploads/2010/05/Castaldi.pdf" Visited 2016-05- 18.

Creutzer, M. (2012). Alkoholreklamen har ¨okat med 127 pro- cent p˚a tre ˚ar. http://www.can.se/Tidskriften-AoN/Alkoholreklam/ Alkoholreklamen-har-okat-med-127-procent-pa-tre-ar/" Visited 2016-05-18.

Hebbeln, S. (2014). Study: What drives wine purchases? http://www.winedirect.com/ ?method=blog.blogDrilldown&blogEntryID=9F22E755-DF83-B266-D53A-0F0201522AB0& originalMarketingURL=blog/Study--What-drives-wine-purchases-" Visited 2016-04-25.

Hill, T. and Westbrook, R. (1997). SWOT Analysis: It’s Time for a Product Recall. Long Range Planning.

Hindle, T. (2009). Swot analysis. http://www.economist.com/node/14301503/" Visited 2016-05- 19.

Holder, H. (2008). Alcohol monopoly and public health: Potential effects of privatization of the swedish alcohol retail monopoly. https://www.folkhalsomyndigheten.se/pagefiles/21546/ R200827_Alkoholmonopol_eng_0809.pdf" Visited 2016-05-02.

IOGT-NTO (2016). Alkoholreklam i sverige. http://iogt.se/kampanjer/slack-ner/ alkoholreklam-i-sv/" Visited 2016-05-02.

Jain, T. R. and Ohri, V. K. (2014). Introductory Microeconomics. VK Global Publications.

Kabacoff, R. (2011). R in Action: Data Analysis and Graphics with R. Manning Publications Co.

Kennedy, P. (2008). A Guide to Econometrics, volume 6. Blackwell Pub.

48 Konsumentverket (2015). Marknadsf¨oring av alkohol p˚a internet. http://www. konsumentverket.se/Foretag/Regler-per-omrade-och-bransch/Alkohol-och-tobak/ Marknadsforing-av-alkohol-pa-internet/" Visited 2016-05-19.

Kotler, P. and Armstrong, G. (2014). Principles of Marketing, volume 15. Pearson Education Limited.

Lang, H. (2015). Elements of Regression Analysis. Stockholm: KTH.

Lang, H. (2016). More Exercises in Regression Analysis. Stockholm: KTH.

Lauterborn, B. (1990). New Marketing Litany: Four Ps Pass´e: C-Words Take Over, volume 61. Advertising Age.

Leigh, D. (2010). SWOT Analysis. International Society for Performance Improvement.

Manchanda, P., Dub´e,J.-P., Goh, K. Y., and Chintagunta, P. K. (2004). The effect of banner adver- tising on internet purchasing. https://www.researchgate.net/profile/Jean-Pierre_Dube/ publication/228643899_The_Effect_of_Banner_Advertising_on_Internet_Purchasing/ links/02e7e521232f1a51f6000000.pdf" Visited 2016-05-19.

Minitab (2016). What is a hypothesis test? http://support.minitab.com/en-us/ minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/basics/ what-is-a-hypothesis-test/. Visited 2016-03-21.

Montelius, F. (2015). Hur popul¨art¨aramarone? http://www.clubamarone.se/2015/01/30/ hur-populart-ar-amarone/" Visited 2016-04-25.

Noval, S. and Nilsson, T. (1984). Mellan¨oletseffekt p˚akonsumtionsniv˚an och tillv¨axtenhos den totala alkoholkonsumtionen. Link¨opingUniversity.

Panagiotou, G. (2003). Bringing SWOT into focus, volume 14. Business Strategy Review.

Popovic, D. (2006). Modelling the Marketing of High-Tech Start-Ups, volume 14. Journal of Target- ing, Measurement and Analysis for Marketing.

Puckette, M. (2014). Corks vs screw caps. http://winefolly.com/tutorial/ corks-vs-screw-caps/" Visited 2016-05-19.

Puckette, M. (2015). Wine: From the lightest to the strongest. http://winefolly.com/tutorial/ the-lightest-to-the-strongest-wine/" Visited 2016-04-25.

Riksdagen (2010). Alkohollag. http://www.riksdagen.se/sv/dokument-lagar/dokument/ svensk-forfattningssamling/alkohollag-20101622_sfs-2010-1622" Visited 2016-05-02.

49 Robinson, J. (2016). Pinot blanc. http://www.jancisrobinson.com/learn/grape-varieties/ white/pinot-blanc" Visited 2016-05-02.

R¨ott¨orp, A. (2014). Italien och ekoviner ¨okar mest. http://vinbanken.se/2014/02/01/ 2013-var-italiens-och-frankrikes-vinar-i-sverige/. Visited 2016-04-19.

Systembolaget (2015). F¨ors¨aljningsstatistik per artikel och f¨ors¨aljning f¨or de st¨orsta m¨arkena per varugrupp. http://www.systembolaget.se/om-systembolaget/om-foretaget/ forsaljningsstatistik. Visited 2016-03-20.

Systembolaget (2016a). Varf¨orsystembolaget. http://www.varforsystembolaget.se/" Visited 2016-05-19.

Systembolaget (2016b). Website of systembolaget. http://www.systembolaget.se"Visited 2016- 03-21.

Thomson, A., Bradley, E., Casswell, S., and Wyllie, A. (1997). A qualitative investigation of the responses of in treatment and recovering heavy drinkers to alcohol advertising on New Zealand television. Contemporary Drug Problems.

50

TRITA -MAT-K 2016:27 ISRN -KTH/MAT/K--16/27--SE

www.kth.se