Eindhoven University of Technology
MASTER
Improving the promotion forecasting accuracy at Unilever Netherlands
van der Poel, M.J.
Award date: 2010
Link to publication
Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration.
General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.
• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain
Eindhoven, August 2010
Improving the promotion forecasting accuracy at Unilever
Netherlands
by M.J. (Thijs) van der Poel
BSc Industrial Engineering and Management Science Student identity number 0550934
in partial fulfilment of the requirements for the degree of
Master of Science in Operations Management and Logistics
Supervisors TU/e: dr. K.H. van Donselaar dr. J.J.L. Schepers
Supervisor Unilever: dr. P.D.J. van Balkom
Improving the promotion forecasting accuracy at Unilever Netherlands
TUE. School of Industrial Engineering. Series Master Theses Operations Management and Logistics
Subject headings: sales forecasting, promotions, retail trade, consumer goods
Page II Improving the promotion forecasting accuracy at Unilever Netherlands
Abstract
This master thesis describes how the forecasting accuracy of promotions can be improved at Unilever Netherlands. Currently, a very judgemental way of forecasting is applied by employees within the organization. This research will develop the forecasting process by using a more mathematical forecasting model. With multiple linear regression the consumer demand and retailer orders are forecasted and an analysis is made between the difference of forecasting consumer demand and forecasting retailer orders. The effect size of the 21 dependent variables on the promotional demand are discussed and the most important are used to formulate a reduced model. It is concluded that the consumer demand can be forecasted quite accurate; however, the forecasting accuracy drops substantial for retailer orders. Multiple disturbing factors on consumer demand apparently increase the variability of the retailer orders. Therefore, this research advices Unilever to cooperate more extensively with their retailers to investigate the disturbing factors and develop a integrated forecasting approach.
Page III Improving the promotion forecasting accuracy at Unilever Netherlands
Management summary
Problem introduction This research is performed at Unilever Netherlands in Rotterdam and is directed at the forecasting process for promotions. In the last 2 decades the promotional pressure has increased in the Fast Moving Consumer Market where Unilever operates in. This holds especially in the Netherlands, where the competition is fierce and multiple price wars have decreased the price level. Therefore, with a current promotional pressure of around 40%, Unilever has indicated the forecasting process of these promotions as a developmental area. An earlier internal project indicated that the forecast accuracy on promotion or range level is quite good; however, on product level the promotion accuracy drops dramatically. And since the Unilever plants have to produce product specific items and the stock levels are product specific, the goal is to increase the forecasting accuracy on SKU level.
Problem definition The main research question is: What are the causes for the low forecasting accuracy of the promotion forecasting process and how can the forecasting accuracy be improved?
A first analysis of the problem resulted in five problem areas. This research mainly focussed on the problem area Poor database usage: Within Unilever different data sources have to be consulted manually for each promotion. This is a time consuming user unfriendly process, which does not enhance the usage of data and thus the forecast accuracy. Furthermore, no model is provided to calculate the sales of a new promotion. Therefore, an employee has to search and analyze all the information him or herself.
The research is performed at four retailers in the Netherlands (Albert Heijn, C1000, Kruidvat and Plus) and 86 different products. The promotions of these products are analyzed for the period January 2009 upto march 2010. Also, some practical requirements to make a forecasting model work in practice are defined: The forecasting model should be easy to use for Unilever employees, it should work with data which is available within the organization and it should forecast the consumer demand and use this as a basis to come to a retailer order forecast to enhance the usability of the model.
Research design The research design depicts which method should be used and which variables are included in the model. Multiple linear regression is chosen as the most suitable method for a forecasting model. In this method one dependent variable is predicted with multiple independent variables. As dependent variable the Lift Factor of a promotion is forecasted. This is the promotional sales divided by the
Page IV Improving the promotion forecasting accuracy at Unilever Netherlands weekly base line sales of a product. The independent variables are divided among the groups promotion, retailer and brand as depicted in the underlying figure. The research will test the effect of the different independent variables on the dependent variable, will reduce the number of variables and correct for data availability.
Display Folder Advertising Promotion TV variables Holiday
Length promo Retailer Absolute # of selling Retailer Promotional Price decrease variables sales Percentual points Promo mechanism Repeat buyers # of products in promotions Promotion Brand pressure variables Lift factor former promotions Market Preservability penetration Size of product Susceptibility to stockpiling Frequency of purchase Product category Summer products Weather Winter products
Results The first step in developing a forecasting model for Unilever is to test the model performance of the full model, where all variables in above figure are included, on the consumer demand. The consumer demand is the actual number of products which are scanned at the registers of the retailer stores during a promotion. The effect size and direction of the independent variables are depicted in the table below, where two plus or minus signs indicate a strong effect of the variable on the promotional sales. The Adjusted R square of the model is quite high with a value of 0.700. This indicates a good model fit where 70% of the variance of the promotional demand is explained by the model. Furthermore, the model results are robust when used for other promotions than the promotions with which the model is calibrate.
Besides the fact that the variables with a large effect size are more important to inherit in a forecasting model, the effect size of a variable can also be used to drive marketing decisions. The first marketing implication is that a display (second placement) of a promotion in a retailer store is far more important than folder advertisement and TV advertisement. Hence, when the marketing budget should be allocated, investments in display should have priority above investments in folder advertisement and both should have priority on investments in TV advertisement. The second implication is that the promotion mechanism where a consumer has to buy four or more products to get the promotional discount results in the highest promotional demand. Surprisingly, a Single Price
Page V Improving the promotion forecasting accuracy at Unilever Netherlands
Off (SPO), where a consumer only has to buy one product, leads to a highger promotional demand than a promotion where a consumer has to buy two or three products. A promotion where the consumer gets a free product or premiaat has the lowest promotional demand, although the success of such a promotion really depends on the type of free product or premiaat. The last important implication is that marketing can increase the promotional sales by making sure that the promotion is sold in all stores of a retailer. This variable is especially important if the product is not sold in (almost) all stores in base line sales. For these products there is a lot of extra promotional sales to gain. One way of boosting the number of stores is by advertising the promotion in the folder, since all stores are expected to have the folder promotions available. So when a product is not sold in all stores it is more interesting for Unilever to invest in folder advertisement.
Variable Effect size Variable Effect size Display ++ log_growth_number_selling_points ++ Folder + Percentage_repeat_buyers n.e. TV_support n.e. / + Promotion_pressure n.e. Holiday_products n.e. ln_LF_former_promotions_EAN ++ Promo_length ++ Market_penetration n.e. Percentual_discount ++ Preservability + SPO a - log_size_of_product n.e. Two_for a - Frequency_of_purchase - Three_for a - Personalcare c n.e. Free_product a n.e. Ice_and_beverages c - Premiaat a n.e. SCC_and_vitality_shots c + Number_of_products_in_promotion - Savoury_and_dressings c n.e. C1000 b + winter_products_temp n.e. Plus b - summer_products_temp n.e. Kruidvat b - - n.e. = no effect on the promotional sales a The baseline group for the different product categories is the product group “Four_or_five_for” b The baseline group for the different retailers is the retailer “Albert Heijn” c The baseline group for the different product groups is the product group “Homecare”
To increase the usability of the model in practice, a model is constructed with the above most important variables. The model fit of this model with a limited number of variables is still surprisingly high and almost equal to the model fit of the full model. However, not all variables have data availability at Unilever, since Unilever as a manufacturer is dependent on a retailer for information of upcoming promotions. For two variables in the adapted model Unilever has no data availability. These are the percentage of shops with a second placement and the extra number of shops where the product is sold in promotion. To analyze what the effect is of the lack of data on the forecast accuracy of the model a new model without these variables is tested. The model fit decreases to an adjusted R square of around 0.500, indicating that the exclusion of the two variables substantially
Page VI Improving the promotion forecasting accuracy at Unilever Netherlands worsens the performance of the forecasting model. Thus it is important for Unilever to gain data availability on these variables.
Unilever not only wants to know how much a promotion sells on the shopping floor, but also wants to know how much a retailer orders of a product. Therefore, the model results for the consumer demand are adapted to retailer orders. The retailers included in the research order on average between 39% and 85% more than is sold during the promotion. The forecasts for the consumer demand are raised with this difference. The model performance decreases substantially because of the extra variance in the retailer orders. The adjusted R square for the Non Food data set has decreased to 0.103, meaning that the predictive power of the model is almost absent. For the Food data set the R square is 0.392. So, the variability in the retailer orders is a lot higher for the Non food products than Food products. Forecasting retailer orders for Non Food products seems to have little to no benefit, forecasting retailer orders for Food products has more practical value.
Implementation & conclusions The different model adaptations in this research show that if the right information is available Unilever is very well capable of accurately predicting the consumer demand. Unilever has an advantage over the retailer because of their larger data pool of promotions over all retailers which can be used to forecast upcoming promotions. Hence, with this skill Unilever is able to take the lead in establishing a collaboration with retailers and increasing the forecast accuracy.
However, two aspects decrease the forecast accuracy of a manufacturer. First, a manufacturer has less data availability than a retailer and thus important variables cannot be used to forecast the promotional demand. Second, forecasting retailer orders has turned out to be far more difficult than consumer demand, especially for Non Food products. The bullwhip effect leads to a substantial deviation between retailer orders and consumer demand. As a result, Unilever should first increase their data availability on promotions by closer collaboration with retailers and better database management. Thereafter, in order to be able to accurately forecast retailer orders, the disturbing factors behind the bullwhip effect should be analyzed. Close collaboration with a retailer is needed to successfully analyze these disturbances. When the disturbing factors are successfully analyzed, a promotion forecasting model which forecasts the consumer demand and corrects for the disturbing factors should be formulated and employed together with the retailer.
Concluding, close collaboration and information sharing is needed, where in the end Unilever and the retailer together use one forecasting approach. Concepts like Vendor Managed Inventory (VMI), Continuous Replenishment Program (CPR) and Collaborative Planning, Forecasting and Replenishment (CPFR) can be used to increase the collaboration between Unilever and a retailer, where VMI is the most basic concept and CPFR is the most advanced concept.
Page VII Improving the promotion forecasting accuracy at Unilever Netherlands
Preface
This master thesis is the result of the final part of my study Industrial Engineering and Management at Eindhoven University of Technology. The master thesis project was executed Unilever Netherlands in Rotterdam from the beginning of 2010 up to the end of the summer.
When I started my master thesis I just came back from an international semester in Hong Kong. Life over there had been eye opening, and really interesting, but also relaxing and having a lot of fun in one way or another. Therefore, starting my master thesis in Rotterdam really pushed me back into normal hard working life. And I have to say that I still feel lucky that an opportunity for my master thesis had presented itself at Unilever, since the working atmosphere is really good in the headquarters in Unilever Rotterdam. Luckily the burden of the master thesis did not feel like that at all, so I can look confidently in to the future where a real job is waiting for me.
I would like to grab the opportunity to express my gratitude towards a few people. First of all, I would like to thank Patrick van Balkom, my supervisor at Unilever. His guidance and comments provided very useful insights and shed light on my path the moments I needed it. I really enjoyed working with him.
Second, I would like to thank my first supervisor at the TU/e, Karel van Donselaar. His thorough knowledge on the subject led to some very good discussions. And without his efforts of finding an internship I would not have had the opportunity at Unilever. Third, I would like to thank my second supervisor at the TU/e, Jeroen Schepers. The feedback he gave on my work provided new insights and improved the quality of my work.
Lastly, I would like to thank my girlfriend for supporting me during the project.
Thijs van der Poel Rotterdam, August 2010
Page VIII Improving the promotion forecasting accuracy at Unilever Netherlands
Index
Abstract ...... III Management summary...... IV Preface ...... VIII Index ...... IX
Part 1: Project definition ...... 1 1 Introduction of research ...... 1 1.1 Structure of report ...... 1 1.2 Company description ...... 1 1.3 Problem introduction ...... 3 1.4 Overview literature...... 5 1.5 Gaps in literature ...... 6 2 Problem definition...... 6 2.1 Problem formulation ...... 6 2.2 Problem decomposition ...... 7 2.3 Research questions ...... 8 2.4 Practical requirements research ...... 9 3 Scope of research ...... 10 3.1 Region ...... 10 3.2 Retailers ...... 10 3.3 Time horizon ...... 11 3.4 Products (SKU’s) ...... 11
Part 2: Research design ...... 15 4 Method of research ...... 15 5 Dependent and independent variables ...... 16 5.1 Dependent variable ...... 16 5.1.1 Lift factor as dependent variable ...... 16 5.2 Independent variables ...... 17 5.3 Transformations of (in)dependent variables ...... 20 5.4 Assign baseline dummy variables ...... 20 6 Different data sets & hypotheses ...... 21 6.1 Sample size ...... 21 6.2 Data set split and reduction ...... 22 6.3 Hypotheses effect size variables and data sets ...... 23 6.4 Measurement indicators hypotheses ...... 25
Page IX Improving the promotion forecasting accuracy at Unilever Netherlands
Part 3: Results full model...... 27 7 Regression analyses full model ...... 27 7.1 Overview most important dependent and independent variables ...... 27 7.2 Checking the assumptions underlying multiple linear regression ...... 28 7.3 Results full model ...... 28 7.4 Validation full model ...... 33 8 Generalizability of model results ...... 34 8.1 Generalizability of sample size ...... 34 8.2 Comparison with other research in the field ...... 37
Part 4: Model adaptation ...... 41 9 Adaptations to increase the usability and check for data availability ...... 42 9.1 Adaptation 1: Increase the usability by reducing the number of variables ...... 42 9.2 Adaptation 2: Increase the usability by checking for data availability ...... 44 9.3 Comparison of the different adaptations with the full model ...... 45 10 Model adaptation 3: From consumer demand to retailer orders ...... 46 10.1 Calculation retailer orders ...... 46 10.2 Model fit on retailer orders ...... 46 10.3 Difference between retailer orders and consumer demand ...... 47
Part 5: Implementation and conclusions ...... 51 11 Implementation ...... 51 11.1 Final model for implementation ...... 51 11.1.2 Results retailer orders (model adaption 1 as basis) ...... 52 11.1.3 Results retailer orders (model adaption 2 as basis) ...... 53 11.1.4 Conclusion results retailer orders based on model adaptation 1 & 2 ...... 53 11.1.5 Actions needed to overcome current problems ...... 54 11.2 Implementation plan ...... 54 12 Conclusions ...... 58 12.1 Ideal model ...... 58 12.2 Adaptations needed on ideal model ...... 60 12.3 Future steps to increase the forecast accuracy ...... 61 12.4 Contribution to literature ...... 62
References ...... 65 Appendices ...... 67
Page X Improving the promotion forecasting accuracy at Unilever Netherlands
Part 1: Project definition
1 Introduction of research
1.1 Structure of report
The report is structured in five parts. The parts are based on the regulative cycle of Van Strien (1979). In the first part the motivation for this research is discussed resulting in the Needs of Unilever (Figure 1 1). This part discusses the exact problem Unilever is experiencing and the possibilities of dealing with the problem, which results in the starting point for the rest of the research. The second part will translate the company needs into a research design . The research design will depict how the needs can be investigated and translated into methods to research the problem. The third part of this research will discuss the model results . The model results will enhance the understandability of the problem. The results need to be adapted to be applicable in practice. This is done in the model adaptation in part four. The last part, the implementation & conclusions, discusses how this research can be implemented in order to fulfil the needs which are distinguished in part one. Throughout the different parts, the research will have a contribution to the existing literature as described in paragraph 1.5. After reading the first part it should be clear what the problem is that Unilever is encountering and what the scope of this research is.
Project definition (Needs)
Implementation Research Design & Conclusions (Methods)
Model Model adaptation results
Figure 1 1: The different project parts of the research
1.2 Company description
Unilever is a global manufacturing company operating in the Fast Moving Consumer Goods (FMCG) industry. The company is specialized in Food, Home and Personal care products. The company employs around 163.000 people worldwide. The Unilever portfolio of 400 brands, of which Dove, Knorr, Lipton and Omo are some of the largest brands, is sold in over 100 countries. These brands contributed to a turnover in 2009 of 39.8 billion euro’s worldwide ( www.unilever.com ). Most of Unilever’s products are manufactured in the 264 self owned plants. This research will focus on Unilever Benelux, of which the main office is located in Rotterdam. Moreover, the emphasis is placed on the Dutch market and thus on the Dutch part of the Unilever Benelux organization. This decision
Page 1 Improving the promotion forecasting accuracy at Unilever Netherlands will be clarified in paragraph 3.1. In the Netherlands Unilever is split up in 5 product categories and 4 customer teams. The different product categories are: Home care (HC) Personal Care (PC) Savoury & Dressings (S&D) Vitality shots & Spreads/Cooking Category (SCC) Icecream & Beverages (I&B)
The different customer teams are: Albert Heijn (including Etos) Bijeen (C1000 & Jumbo), Super de Boer en Makro Drugteam (different drugstore, e.g. Kruidvat, DA) Superunie (16 smaller retailers, e.g. Sligro, Plus)
Unilever is organized in a matrix organization around the above product categories and customer teams (see Figure 1 2). Alongside the customer teams interdisciplinary Customer Development Teams meet ones a month to discuss the more tactical issues. The Customer Development Teams are responsible for the planning horizon between 0 6 months. The product categories are overviewed by Category Brand Teams which have a longer planning horizon of 3 24 months. Hence, the customer teams have a more operational focus than the product category teams.
Concluding, in this paragraph the organization structure has been explained in a simplistic way to enhance the understandability. The product categories and some of the named retailers will be used in the further research. Hence, the reader is able to position the research within the Unilever organization.
Page 2 Improving the promotion forecasting accuracy at Unilever Netherlands
Customer teams
Home Care SuperBijeen, de Boer, Makro
Customer Development Personal Care Teams (CDT) AlbertHeijn Drugstores Superunie (planning horizon of 0-6 months) Product Icecream & categories Beverages
SSC & Vitality CBT : Marketing, Sales, Planning, Finance Savoury & Dressings CDT : Sales, Customer Development, Customer Service Category Brand Teams (CBT) (planning horizon of 3-24 months) Figure 1 2: Organization matrix Unilever Netherlands
1.3 Problem introduction
Within the Dutch FMCG (Fast Moving Consumer Goods) market as well as foreign markets the promotional share of the total volume has increased in the last decades. Accordingly, in the Netherlands the promotion pressure has increased in the last couple of years due to multiple price wars, to around 40% promotional volume of the total volume. Because of that, Unilever noticed that their promotion forecasting process became more and more important over the years and needed improvement. Promotion forecasting has received increased attention at Unilever Benelux since halfway 2008. The forecasting accuracy at that moment was open for improvements with a case fill, which is a service level measure, of at best 95% (the current case fill target is 98.5%). Besides the low case fill, Unilever overforecasted their promotions on average with 30%, which resulted in high numbers of obsoletes. Furthermore, the employees who produced the promotion forecast were not able to put a lot of time in a promotion forecast, while they addressed the importance of accurate forecasting in interviews back then.
Hence, the objective formulated was to reduce the overforecast while increasing the case fill. A program within the company was directed at report and evaluation possibilities of promotion forecasting and the training of the involved employees. The given trainings further increased the awareness of an accurate promotion forecast and improved the creation process of a promotion forecast. Currently the forecasting accuracy is analyzed on SKU level (stands for Stock Keeping Unit and is defined as a unique product), range level (i.e. different variants of one product together, for example different DOVE spray deodorants) and promotion level (all products within one promotion, for example all products in the DOVE line). Before 2008 the promotion accuracy was not yet analyzed on SKU level. The forecast accuracy on range and promotion level seemed to be quite
Page 3 Improving the promotion forecasting accuracy at Unilever Netherlands good; however, on SKU level the forecast accuracy looked more dramatic. This occurred because the variance of the different SKU’s levelled each other out. So when one SKU was overforecasted and another SKU in the promotion was underforecasted, the forecast inaccuracy of the two SKU’s cancelled out against each other.
In Figure 1 3 a simplified overview of the supply chain is depicted of the market where Unilever operates in. In this figure Unilever is the manufacturer, Albert Heijn for example the retailer and shoppers in the retailer stores are the consumer. In this supply chain there are two different demand origins, the demand from the consumers at the retailers and the demand from the retailers at the manufacturer. Both demand origins can be forecasted with a model and the remainder of this paragraph will show which of the two will be forecasted. The consumer demand is a more direct and accurate representation of the promotional sales and the retailer orders more indirect and contain more variation. This increase variation is caused mainly by the forward buying of a retailer, available stock at a retailer before a promotion takes place, the inaccuracy in the promotional forecast of a retailer and the pipeline fill. These cause a variation which is difficult to explain, especially since stock levels of a retailer are pretty much unknown at Unilever. Besides that, modelling the promotional sales requires modelling the promotion mechanism underlying the sales and occurs on the shopping floor. Hence, this enables a model to accurately forecast the factors behind promotional sales. Furthermore, the On Shelf Availability in the retailer shops is regarded as a more important measure than the case fill at the retailer, since the products are sold in the shops and not in the warehouse of a retailer. Lastly, when discussing the height of the expected promotional sales with the retailer, the consumer demand is the fundament for this discussion. Hence, it is preferred to base the forecast on the consumer demand. This forecast will be corrected for the disturbing factors between consumer demand and retailer orders. So, first a model will be developed which forecasts the consumer demand. The consumer demand forecast generated by the model has to be adapted to retailer orders afterwards. Because consumer demand is used as basis, the research will measure the promotional demand in consumer units. This measure can be adapted to the more widely used case pack size measure within Unilever (a case pack contains a certain number of consumer units).
Manufacturer GoodsRetailer Goods Consumer
Demand Demand (retailer orders) (consumer demand)
Adaptation Figure 1 3: Overview of the demand in the FMCG supply chain
Page 4 Improving the promotion forecasting accuracy at Unilever Netherlands
This paragraph aimed to give an introduction of the promotion forecasting problem within Unilever. It depicted the background of the problem and specified the different demand origins in the supply chain where Unilever operates in. In the next chapter the problem definition will be discussed in more detail. First an overview of the available relevant literature about promotion forecasting is given.
1.4 Overview literature
In this chapter the relevant literature for the research field of this master thesis will be summarized and linked with the situation of the company. An extensive literature study about promotion forecasting can be found in Van der Poel (2010a) on which this summary is based.
The importance of promotions within the FMCG sector has grown substantially over the last 20 years (Blattberg, 1995). The market share of promotions has increased likewise in the Dutch FMCG market. With the increase of promotion pressure, simultaneously the instability of the demand has increased. Promotions are responsible for large volumes, typically between 4 to 8 times the base line sales (Buckers, 2010, Van den Heuvel, 2009). Hence, logically the importance of accurate promotion forecasting has increased as well. On the one hand, a low case fill, because of underforecasting, results in Out Of Stocks in retailer stores and is harmful for the sales and retailer relationship. On the other hand, overforecasting results in extra stock (costs) and potential obsoletes.
There are different types of promotions. At the moment, almost all promotions in the Dutch FMCG sector are price promotions, where the consumer gets a reduced price in one or another form. Likewise, the price promotion is the largest single category in the marketing budget in American FMCG companies (Silva Riso, 1999). But besides price promotions occasionally a coupon promotion or a promotion with a premiaat or free product (e.g. gadget, discount on theme park ticket or a free (new) product) is offered. The success of the different type of promotions is influenced by numerous variables. In Van der Poel (2010a) 53 variables with a possible influence are listed. Which variables are perceived as important will be discussed in chapter 5.2. These variables have to be fitted in a model. The most widely used method found in literature is a multiple linear regression analysis (Van Loo, 2006, Van den Heuvel, 2009, De Schrijver, 2009, Cooper et al, 1999, Wittink et al, 1988). In such an analysis multiple independent variables predict one dependent variable. Interaction effects between independent variables can be incorporated when the form of the interacting variables is continuous. Furthermore, the (in)dependent variables can be included in their linear and logarithmic form as long as their form is metric.
The literature described will be useful in the development of a promotion forecasting model on manufacturer level. The same variables have an impact on the promotional volume for retailers and manufacturers. However, the data availability will differ and manufacturers are dependent on
Page 5 Improving the promotion forecasting accuracy at Unilever Netherlands retailers for the data of an upcoming promotion. No research is available on the effect of unknown stock levels. As mentioned this is only important if a manufacturer wants to predict retailer orders.
1.5 Gaps in literature
In the literature study of Van der Poel (2010a) multiple gaps in the literature on promotion forecasting were discussed. This paragraph indicates the gaps this research will address: 1. The dependent variable. This can be the lift factor (hereafter shortened with LF) over the base line sales or the absolute promotional sales. Furthermore the LF as dependent variable can be transformed in multiple ways. There is no conclusive research on the performance of the different forms of the dependent variable. 2. Manufacturer based model. All relevant promotion forecasting models found in literature are retailer based. It is unclear on what aspects a retailer model and manufacturer model differ and how this could have an impact on the performance. This gap also relates to the following gap, which discusses whether it is an advantage or disadvantage to be a retailer. 3. Advantage or disadvantage of being a retailer. It is interesting to investigate if a manufacturer based model has a different performance than a retailer based model. A factor which might cause this difference is the dependency on the retailer for information. A second factor is that the products of a manufacturer might be more homogenate than the products of a retailer. Moreover, a manufacturer has more promotions of the same product than a retailer, because the product is sold at multiple retailers. Literature on promotion forecasting does not specify if there is a difference in performance between a manufacturer and retailer based model and which factors cause this difference.
2 Problem definition
This chapter specifies the aim of the research. First, the overall problem formulation is depicted. Second, an initial analysis of the problem context is depicted. Third, the corresponding research questions are formulated. Lastly, the practical requirements of the research for Unilever are stated.
2.1 Problem formulation
Although some steps have been made the last 2 years (as mentioned in paragraph 1.3), the promotion forecasting process is still open for quite some improvement. In this paragraph the problems related to promotion forecasting are used to come to the problem formulation and accompanying research question with sub questions. The purpose of this research is to analyze the inaccuracy of the promotion forecasts. Hence the following problem formulation is depicted.
Problem formulation: The forecasting accuracy of the current promotion forecasting process is too low
Page 6 Improving the promotion forecasting accuracy at Unilever Netherlands
2.2 Problem decomposition
A first analysis of the overall problem context is shown in the Fishbone diagram in Figure 2 1. The goal of this analysis is to investigate which general problem areas have an impact on the forecast accuracy and too choose the scope this research will focus on. The forecast inaccuracy of a promotion forecast is regarded as the main problem. A high forecast inaccuracy results in more obsoletes, higher stock costs and a lower case fill. The causes of Forecast inaccuracy can be divided into five general problem areas, which are: Phasing of promotions: For the measurement of the forecast accuracy it is important that the deliveries of promotional volumes are planned in the correct weeks. This is mainly a measurement issue, but can cause problems if volumes have to be delivered earlier than planned. Promotions are typically delivered one or two weeks before the promotion takes place. Sales oriented organization: Unilever is a sales oriented organization where logistic issues typically have less priority. The sales department wants to be able to deliver the products to the retailer at all costs, i.e. they have less of an eye for logistic costs and operations. This mentality can result in a tendency to overforecasting. Customer team deviation: There are four customer teams which work quite independently from each other. Information sharing and learning from other customer teams is not common practice. Furthermore, ways of working differ substantially between the customers within one customer team. Retailer dependency: Retailers have the power to change promotions and can decide not to share all relevant information with Unilever. It is common practice that retailers are not willing to share information, mainly because of data sensitivity. Furthermore, similar promotions at other retailers can result in last minute changes when the discount of a promotion at another retailer is higher. Poor database usage: The different data sources have to be consulted manually for each promotion. This is a time consuming user unfriendly process, which does not enhance the usage of data and thus the forecast accuracy. Furthermore, no model is provided to calculate the sales of a new promotion. Therefore, an employee has to search and analyze all the information him or herself.
Page 7 Improving the promotion forecasting accuracy at Unilever Netherlands
Phasing of Sales oriented Customer team promotions organization deviation Multiple SS Divided Different information increases responsibilities available Measurement Promotion forecasts mechanism Low power Logistiek Assistent on history single retailer Obsoletes Measurement Risk averse Different method and error timeline for process per retailer Forecast Low Service inaccuracy Level (case fill) Forward buying Usability databases Last minute changes Lack of Stock costs tool # of databases to extract No stock data data from Limited information Lack of Time consuming sharing knowledge LA operation
Retailer Poor database dependency usage
Figure 2 1: Problem areas and scope of the research
The blue zone indicates the main scope of the research, which is mainly to enhance the database usage within Unilever. The grey zones are areas which will benefit from this research as well, because of more standardization and more clarity on the information needed from a retailer. Regarding the main scope, employees currently need to research the different databases by themselves in order to create an accurate forecast. Furthermore, no tool is provided to calculate a promotion forecast. Hence, an employee has to make his own assumptions and calculations with limited information available. Concluding, this process is complex, time consuming and not standardized and a forecasting tool which is used throughout the Unilever organization will improve these aspects.
2.3 Research questions
Figure 2 1 shows that a low forecasting accuracy results in more obsoletes, a lower service level and higher stock costs. The forecasting accuracy should be increased to reduce this effect. Contiguously, the main research question is: What are the causes for the low forecasting accuracy of the promotion forecasting process and how can the forecasting accuracy be improved?
Most of the issues can be improved with a suitable demand forecasting model for promotions. Such a forecasting model can diminish the poor database usage, standardize the processes among
Page 8 Improving the promotion forecasting accuracy at Unilever Netherlands different customer teams, can serve as an argument towards retailers to legitimize the request for data and can serve as a tool to strengthen the logistic voice in the sales oriented organization of Unilever. Only the phasing of the orders of a promotion is not expected to directly improve. Hence, the research will be focussed on the development and implementation of a promotion forecasting model. The following sub questions can be used to answer the main research question: 1. What are the functional requirements to make a forecasting model work within Unilever? 2. Which products, retailers, time horizon and region should be included in the analysis? 3. Which prediction method is most suitable for promotion forecasting? 4. Which independent and dependent variables should be included in a forecasting model? 5. How should a model generate a forecast for retailer orders? 6. What is the impact of being a manufacturer on the performance of the forecasting model?
2.4 Practical requirements research
Besides the scientific nature of this research, the practical goals should be defined as well. The overall goal is to improve the promotion forecast accuracy of Unilever. A couple of sub goals should be formulated to reach this overall goal. The sub goals formulated are practical requirements that a potential solution should meet in order to work in practice. These are: 1. Ease of use 2. Data availability 3. Consumer demand as basis
(1) Ease of use: This aspect reflects on the fact that a forecasting model should be simple to use. Hence, it should not cost a Unilever employee too much effort to use the model, the interface should be very simple and the output should be understandable. Regarding the number of variables in the model, interviews within Unilever indicated that a practical useful model should contain maximum 10 variables and preferably less. Furthermore, the result the model generates should be understandable and the model itself should not be seen as a black box. This would decrease the acceptance of the forecast of the model. (2) Data availability: The model should work with data which is readily available for the Unilever employees who have to work with the model. (3) Consumer demand as basis: As reasoned in paragraph 1.3 the consumer demand will be the starting point for developing a forecasting model. Later on this forecast will be adapted to retailer orders.
Summarizing, to meet the practical requirements the model needs to be understandable, have a high usability, work with available data and focus on consumer demand. These requirements will be taken into account in the model building process.
Page 9 Improving the promotion forecasting accuracy at Unilever Netherlands
3 Scope of research
In this chapter the scope of the research will be determined. The region, retailers, time horizon and products which will form the sample size are discussed successively.
3.1 Region
Unilever Benelux exists out of The Netherlands and Belgium. An analysis has been done to judge the comparability of the Dutch and Belgium market. If the markets and promotion processes are comparable, the research would have focussed on both markets. However, the Belgium market differs too much from the Dutch on a couple of aspects. A vast part of the promotions on the Belgium market are coupon promotions, while these promotions are rare in the Dutch market. Moreover, most promotions are promoted on special cardboard displays and multiple items of a SKU’s are bundled together in a repack. Often different SKU’s are even bundled together in one repack. Lastly, the retailer Colruyt in Belgium has a lowest price guarantee for all his products. Therefore, they match promotions of all other retailers on the Belgium market on products they offer in store. As a result, other retailers try to come up with promotions that Colruyt does not need to match, which brings a high variety of promotions to the Belgium market.
Concluding, above factors are very likely to cause a different promotion mechanism on the Belgium and Dutch market. Because the markets are very different, the model will not be capable to benefit from the larger pool of data. Therefore, one of both markets needs to be chosen for the research. Since the research is performed from the office in Rotterdam, data collection will be easier for the Dutch market; therefore, the Dutch market is chosen as scope for this research.
3.2 Retailers
Next, the research needs to be focussed on certain retailers in the market, since inclusion of all retailers will lead to extensive data gathering and will decrease the quality of the analysis. The following criteria are used to select four retailers in the market. Size of the retailer: How large is the retailer compared to other retailers. This indicates the importance of a retailer. A large retailer is more important than a small retailer, because promotions of a large retailer have a higher impact on the safety stock and quicker lead to out of stocks. Therefore, large retailers are preferred. Promotion pressure: How much of the total volume of a retailer originates from promotional volume. Possibility of collaboration with a retailer: If the retailer is likely to or does already cooperate with Unilever to enhance the accuracy of the promotion forecasting process.
Page 10 Improving the promotion forecasting accuracy at Unilever Netherlands
Data availability: In paragraph 5.2 the variables that will be included in the forecasting model are discussed. However, to include a variable in the model, data is needed. The data availability differs for each retailer. Duration promotion: The duration of a promotion in the retailer sector is normally one week; however, some retailers have a different promotion period (e.g. Kruidvat, Jumbo, Makro, Sligro). It is preferred to perform the analysis on retailers with a promotion period of 1 week, to enhance the comparability between promotions.
Based on these criteria the retailers AH, C1000, Kruidvat and Plus are included in the research. AH and C1000 are the largest retailers and Kruidvat is the largest drugstore in the Netherlands. Therefore, including them is very logical although Kruidvat has promotions with a duration of 1 and 2 weeks; hence, the duration has to be included as independent variable in the research. Plus is a smaller retailer; however, they are included because they are open for collaboration and the data availability on pr omotions of the Plus is good.
3.3 Time horizon
A time horizon of at least 1 year is desirable, to overcome potential seasonal effects. Therefore, a time horizon from the beginning of 2009 up to week 13 of 201 0 is chosen for this research. To be able to cross validate the model the sample is split. The promotions in 2009 are used to calibrate the model and the promotions in the first quarter of 2010 are use d to validate the model results (see Figure 3 1). By this approach the results o f the model can be tested on their robustness (Miles et al., 2001).
Figure 3 1: Time horizon and data split research
3.4 Products (SKU’s)
Unilever Net herlands has around 2500 SKU’s which are sold at a rando m point in time. Quite some of these products are offered only once and/or have a very low volume. The SKU’s are divided among five product categories and these five product categories are divided in subcategories. The 5 product categories are named in cha pter 1; the subcategories are depicted below, in Figure 3 2, in the bottom layer of the figure.
Page 11 Improving the promotion forecasting accuracy at Unilever Netherlands
All products
Food Non-Food
Ice & Vitality shots, Savoury Home- Personal- Beverage Spreads & Cooking & Dressings care care
Vitality Spreads & House- Laundry shots cooking Hold care
Ice Tea & fruit Savoury Dres- Other Other Skin Hair Deo & cream beverages sings Foods bakery care grooming
Figure 3 2: The product categories of Unilever
Since data gathering is a time consuming process, a sample of the total population will be taken. The sample size needs to be representative for the whole set of products. Hence, the selection of SKU’s for the research sample is based on the following criteria: 1. At least 10% and at most 90% of the total volume of a SKU originates from promotional volume. Otherwise, a product does almost have no promotions or almost no base line sales. 2. The focus will be on the more important high volume SKU’s (A and B SKU’s), although some low volume SKU’s will be included in the sample as well (C SKU’s). 3. The SKU is sold in promotion in at least 2 of the 4 retailers included in the analysis. 4. The number of the total promotions of a SKU at the 4 retailers in the analysis during the time horizon of the research should be at least 4. 5. Each SKU is part of a broader range of Unilever products (e.g. the product “Kip Siam wereldgerecht” is part of the “Knorr Wereldgerechten” range). Always at maximum three variants of a range will be included to assure the diversity of the sample size. 6. The product should have sales since January 2009 up to march 2010, since this is the time horizon for the research.
Because there are 13 categories in total and the total sample size should be around 100 products, the aim is to select between 5 and 15 SKU’s for each subcategory, depending on the category size. The resulting sample size of 86 products is depicted in Appendix 1. For the category Savoury 19 SKU’s are selected because of the large number of SKU’s in this category (Table 3 1). For the categories Dressings, Other Foods and Tea, Soy & Fruit Beverages respectively 3, 4 and 3 products are selected. This is less than the goal of 5, because not enough products fulfilled the criteria. In the Dressings category and in the Tea, Soy & Fruit Beverages category a lot of products have been
Page 12 Improving the promotion forecasting accuracy at Unilever Netherlands innovated or relaunched in the time horizon of the research. In the Other Foods category, the number of products which is sold in promotion by at least two retailers is very limited. The category Vitality shots does almost not have any SKU’s anymore with respectable sales. The sample size taken is responsible for 1238 promotions in the time horizon of the research. The total number of promotions in this time horizon is 15283, meaning that the sample size contains 8.1% of the total promotions. This percentage of the promotions combined with the selection criteria should provide a representative sample size. This will be checked in chapter 8.
Category Number Category Number of SKU’s of SKU’s Deo & Grooming 9 Other bakery 0 Dressings 3 Savoury 19 Hair care 8 Skin 10 Household care 6 Spreads and cooking products 8 Ice cream 9 Tea and soy & fruit beverages 3 Laundry 6 Vitality shots 0 Other foods 4 Total 86
Table 3 1: Number of SKU's per category
Conclusion Part 1: This part resulted in a clear problem formulation, scope of the research and requirements which the research should fulfill in practice (sub research question 1). The research will be focused on improving the forecast accuracy of Unilever by taking the consumer demand as a starting point and later on adjust this consumer demand to retailer orders. Chapter 3 depicted the retailers, SKU’s, time horizon and region which form the sample size of the research (sub research question 2).
Page 13 Improving the promotion forecasting accuracy at Unilever Netherlands
Page 14 Improving the promotion forecasting accuracy at Unilever Netherlands
Part 2: Research design
The first part defined the needs within the Unilever organization, stated the problem formulation and limited the scope of the research. In order to be able to produce an accurate forecasting model, this paragraph will discuss which methods are most suitable, which variables should be included and what effects are expected in the results of the model (hypotheses). Basically, this part forms t he framework for the research.
4 Method of research
In this paragraph the method to be used for the research will be discussed. According to Makridakis et al (1988), three families of forecasting models can be distinguished, namely judgmental methods, Time series analysis and Explanatory methods. Judgmental forecasting is currently used within Unilever. The aim is to come to a more sophisticated, quantitative model. Time series analysis requires lengthy time series for the prediction of the upcoming period( s). However, promotions are events which occur on an infrequent basis. Therefore, time series cannot be used to analyze the promotional volume. Lastly, Explanatory methods aim to forecast the promotional volume as a dependent variable by independent variab le(s). Each independent variable needs to have an explanatory relationship with the dependent variable. Concluding, of the two quantitative approaches, only explanatory models are suitable for forecasting promotional volumes. Lastly, judgmental analysis wi ll always co exist, since the forecast a model provides needs to be verified with common sense.
Next, the different forecasting methods within the explanatory family will be analyzed and a choice is made for one method. Van Loo (2006) analyzed the four m ost important forecasting techniques on criteria applicable to the situation at Unilever (see Table 4 1), where a 4 indicates that the method performs the best relative to the other methods and a 1 indicates that a method performs the worst relative to the other methods . Van Loo concluded it is not even sure if more simple models are outperformed by more complex models. Therefore, the scoring on the accuracy criteria is questionable. Still single equation models are perceived as most suitable, especially since Unilever demands a model which is relative flexible and easy to use and interpret. The single equation models can be further divided in single and multiple linear regression models. Since simple linear regressi on models can only include one independent variable and the promotional volume is dependent on more than one independent variable, multiple regression is chosen as the most appropriate method . This is consistent with the analysis of Van der Poel (2010 b) wh ich concludes that multiple linear regression is the most widely used method in literature.
Page 15 Improving the promotion forecasting accuracy at Unilever Netherlands
Criteria Single equation (single and Multiple Econometric Artificial Neural multiple linear regression) equation models Networks (ANN) Accuracy 1 2 3 4 Costs 4 3 2 1 Complexity 4 3 2 1 Data need 4 3 2 1 Ease of interpretation 4 3 2 1 Ease of Use 3 2 1 4 Total 20 16 12 12
Table 4 1: Performance forecasting techniques (Van Loo, 2006)
This paragraph concluded that multiple linear regression is the most suitable method for a promotion forecasting model. Consequently, this research will make use of multiple linear regression to analyze the promotions of Unilever.
5 Dependent and independent variables
In this chapter the dependent and independent variables that will be included in the model are discussed. The choice of the variables and the form of the variables have an important effect on the model results later on. A poor choice of variables results in a low model fit and thus in an inaccurate forecasting model for Unilever. Therefore, an adequate analysis will be made in this chapter to select the variables.
5.1 Dependent variable
The dependent variable of the model is the sales height of a promotion. As concluded in paragraph 2.4, the consumer demand will be forecasted as dependent variable. However, this variable can be predicted in numerous forms. Hereunder, the LF as a form of the promotional consumer demand is discussed.
5.1.1 Lift factor as dependent variable Paragraph 2.4 consumer demand was chosen as dependent variable. The promotional consumer demand as dependent variable can still be forecasted in multiple ways. A distinction is made between the absolute sales of a promotion and the LF of a promotion (Cooper et al, 1999, Wittink et al, 1988). The LF is defined as follows:
Promotional sales Lift Factor = Formula 5-1 Base line sales
Page 16 Improving the promotion forecasting accuracy at Unilever Netherlands
The advantage of working with a LF as dependent variable is that the promotional sales volume is standardized against the base volume. As a result, the influence of the absolute sales height of a promotion has been removed from the model equation. The promotional sales is a given fact, but the way the base line sales is calculated is less straight forward. In this research the base line sales is calculated by averaging the base line sales of the 5 weeks before a promotion (consistent with Van den Heuvel, 2009). A time period of 5 weeks has been chosen to reduce the effect of irregularities in the base line sales. Furthermore, when a promotion occurs in these 5 weeks, which happens only occasionally, these promotional sales are not included in the base line sales. But, a substitute base line is calculated when a promotion takes place. No other corrections are made on the base line sales. Because the above approach works with a base line, the seasonality and trend effects are included, since the base line sales is already subjected to these effects. Therefore, seasonality and trend effects will not have to be included as independent variables in the model. 1 Summarizing, the LF of a promotion is preferred above the absolute sales of a promotion, since the absolute promotional sales does not provide a forecasting model with a clear reference (i.e. the absolute sales constantly differs because different products and retailers have different height of sales).
5.2 Independent variables
Van der Poel (2010a) published a list of 53 variables with a possible influence on promotional demand. Of these 53 variables a selection of 21 variables is made which are taken into account in this research. The other variables are excluded because of lack of data, complexity issues, irrelevance because of the supply chain perspective taken or a limited expected influence. The most important variables omitted are (1) the percentage of products which is on promotion within the category, (2) the percentage of products which was on promotion within the category last week, (3) promotions of competitors and (4) price discount of last promotion. The first three variables are excluded because of a lack of data and the last variable is excluded because of complexity issues. Figure 5 1 categorizes the 21 variables taken into account among the clusters Promotion, Retailer and Brand. The Promotion cluster is perceived as the most important before the Brand cluster and Retailer cluster. Figure 5 1 forms the backbone of this research. The split between the clusters is made to create a better overview and to gain insight in the effect sizes for promotion related variables, retailer related variables and brand related variables.
1 The model will use the last 5 weeks before a promotion to calculate the average base line sales. In practice a promotion has to be forecasted between 13 and 4 weeks in advance, when the base line sales is not available for the 5 weeks before a promotion. Than the base line sales will have to be calculated for the upcoming weeks with a simple trend and seasonal model.
Page 17 Improving the promotion forecasting accuracy at Unilever Netherlands
n Display n Folder Advertising Promotion p TV variables g Holiday n p Length promo Retailer n Absolute # of selling m Retailer Promotional n Price decrease variables sales Percentual points p Promo mechanism m Repeat buyers # of products p p in promotions Promotion Brand pressure variables Lift factor former promotions s Market m Preservability penetration s Size of product Susceptibility n = available data in Nielsen to stockpiling m p = data available in Frequency of promoplanner purchase s Product category m = data available at marketing n Summer products s = SAP data n n Weather g = general available data Winter products Figure 5 1: Variables with a likely influence on the promotional sales
The variables in Figure 5 1 are discussed in more detail in Table 5 1. Under type of promo, multiple promotion mechanisms are discussed in further detail. The minimum and maximum measurement values are shown in the third column and the scale of a variable is depicted in the fourth column of the table. The variables as described in the underlying table will be tested on their effect on promotional sales. Which effect is expected to occur for each variable is depicted in paragraph 6.3.
Variable Description Measurement Scale
• Promotion variables
Display The percentage of the selling stores in which the promotion is placed on a (0, 100) % Scale display (kopstelling). The variable is not available for Kruidvat and will be replaced with the average over the other observations (Cooper et al, 2003). Folder Depicts if the promotion is shown in the folder of the retailer. (0, 1) Nominal
TV support This variable states if the promotion is shown on television. Because of low (0, 1) Nominal data availability this variable is only available for the retailer Albert Heijn.
Holiday products The interaction effect between the holiday weeks (New year, Easter, (0, 1) Nominal Whitsunday, Christmas) and products which have higher sales during holiday weeks (luxury ice cream). Promo length Length promotion in weeks (1 or 2 weeks). (1, 2) Scale
Absolute discount The absolute price decrease of a promotion measured per product. (0, 3.53) € Scale
Percentual discount The percentual price decrease of a promotion. (0, 100) % Scale
Page 18 Improving the promotion forecasting accuracy at Unilever Netherlands
Promo mechanism: The different promo mechanisms will be programmed with dummy variables:
SPO Single price off, the consumer receives discount when he buys at least (0, 1) Nominal one promotion product. two for X The consumer receives discount when he buys at least two promotion (0, 1) Nominal products. three for X The consumer receives discount when he buys at least three promotion (0, 1) Nominal products. four or five for X The consumer receives discount when he buys at least four or five (0, 1) Nominal promotion products. Premiaat The consumer receives a free non Unilever item with the promotion (0, 1) Nominal product(s). Free product The consumer receives a free Unilever product with the promotion. The (0, 1) Nominal free product is mostly a new Unilever product and cannot be compared with a for example 2+1 promotion, since the consumer is not able to choose the product he gets for free. Number of products in The number of SKU’s which are sold in the same promotion. (1, 366) Scale promotion
• Retailer variables
Retailer The retailer (Albert Heijn, Kruidvat, C1000, Plus) where the promotion is sold. (0, 1) Nominal
Growth # of selling points The number of selling points where the promotion is sold divided by the ( 47, 162) % Scale average number of selling points in the 5 weeks before the promotion period.
• Brand variables
Repeat buyers The percentage of repeat buyers of the product in a quarter of a year. (0, 100) % Scale
Promo pressure The percentage of products of the total sales which is sold in promotions. (0, 100) % Scale
LF former promotions SKU The natural logarithm of the average LF of historical promotions of the (0.32, 2.97) Scale product.
Market penetration The percentage of consumers who buy the product. (0, 100) % Scale
Preservability The preservability of a product in days with a maximum of 730 days. (84, 730) Scale
Size of product The size of a product in cubical centimetres. (194, 3444) Scale
Frequency of purchase The number of times a product is bought by consumers on average in a (1.5, 7.6) Scale quarter of a year. Product category The category to which the product belongs (Ice & beverages, Savoury & (0, 1) Nominal dressings, Spreads & Cooking, Home Care, Personal Care). Winter products The interaction effect between the average weekly temperature and products ( 2.91, 18.33) Scale temperature which report higher sales during cold weather. Summer products The interaction effect between the average weekly temperature and products (0, 20.80) Scale temperature which report higher sales during warm weather.
Table 5 1: Overview of the independent variables with a likely influence on promotional sales
Page 19 Improving the promotion forecasting accuracy at Unilever Netherlands
5.3 Transformations of (in)dependent variables
In this paragraph the variables which should be considered for transformation are discussed. Variables included in a linear regression analysis should meet the assumptions of parametric data (Field, 2005). For the dependent variable it is most important that these assumptions are met. The assumptions for parametric data are: 1. Normally distributed data 2. Interval data 3. Independent of other variables in or outside the model 4. Homogeneity of variance
In order to judge normality in large sample sizes, with more than 200 cases, one should look at the histogram of a variable and the value of the skewness and kurtosis instead of calculating their significance (Field, 2005). In appendix 2 the histogram of the variable “Lift Factor” shows obvious signs of non normality. Therefore, one of the advised transformations by Field (2005) is applied on this variable. The natural logarithm of the LF seems to meet the normality requirements. Furthermore, the variable fulfils the interval data assumption. The assumption of homogeneity of variance and independency will be tested for the regression model as a whole in paragraph 7.2. The research of Van Loo (2006) uses a different transformation of the LF, which will be discussed in paragraph 8.2.
Next, the independent variables will be tested on the above assumptions. Most of the independent variables are coded with dummy variables and thus do not qualify for transformation. On the variable LF former promotions SKU the same transformation is applied as above (see appendix 2), with similar results. Of the other independent variables which have interval data characteristics, “growth number of shops”, “absolute discount” and “size of product” are positively skewed and “percentage of repeat buyers” is negatively skewed. Therefore, a log(10) transformation is applied to these variables (Field, 2005). 2 According to the statistics in appendix 2, the normality of three variables (growth number of shops, absolute discount and size of product) has improved. Therefore, these variables are included in the analysis in their logarithmic form.
5.4 Assign baseline dummy variables
According to Field (2005) a baseline group should be chosen, when a characteristic is coded with dummy variables. The effect of the other dummy groups will be measured against the baseline group. There are three characteristics which are coded with dummy variables and need a baseline group. These are Retailer, Product category and promo mechanism. Albert Heijn is chosen as the
2 For the independent variable the natural logarithm performed slightly better than the Log 10 values. Therefore, the ln values are used. For the dependent variables the Log 10 transformation is used.
Page 20 Improving the promotion forecasting accuracy at Unilever Netherlands baseline variable for Retailer, Homecare as the baseline variable for “product category” and “4 or 5 for X” as the baseline variable for promo mechanism. Regarding the variable promo mechanism, 4.6% of the promotions have a double promo mechanism (e.g. three for 5 euro plus a free gadget). Moreover, because the percentage of double promo mechanisms is fairly small the impact on the choice of a baseline variable will be absent or minor.
Concluding, the dependent and independent variables have been chosen and the form in which they should be included in the model has been discussed as well. In chapter 4, multiple linear regression was already decided to be the most suitable method. Together, these chapters provided the fundamentals for the construction of a forecasting model. The next chapter will discuss the hypotheses and the different datasets that will be tested with the model.
6 Different data sets & hypotheses
In this chapter the different datasets that will be tested with the research model are discussed and the hypotheses are formulated for the direction of the variables and the performance of the data sets. The purpose of the chapter is to construct theoretical expectations (hypotheses) which will be tested in the model result part. First the data sets will be determined, which deviate from each other depending on the products categories that are included in the data set. The data set of promotions cannot be broken down unlimitedly into subsets, because of a minimum sample size that is required. First this minimum sample size is discussed, second the different data sets are discussed, third the hypotheses are discussed and fourth the measurement indicators for the model performance are discussed.
6.1 Sample size
According to Green (1991), the minimum acceptable sample size of a data set if one wants to test the overall fit of a model can be determined with the formula 50 + 8k, where k is the number of predictors. Table 5 1 discussed 33 3 predictor variables which will be included in the model. This results in a minimum sample size of 314 cases. This rule of thumb is very useful but oversimplifies the issue. As a final check the difference between the R2 and the adjusted R 2 should be analyzed. When the difference between those two measures is minor, the variance explained by the regression model is more likely to be generalizable to other datasets. The R2 and adjusted R 2 are compared in the model results.
3 This is more than the 21 variables named earlier in this research, because some variables of those 21 need to be coded with multiple dummy variables (e.g. Retailer)
Page 21 Improving the promotion forecasting accuracy at Unilever Netherlands
6.2 Data set split and reduction
Hence, the dataset should not be split in such a way that the minimum acceptable sample size is violated. Besides the data set split, an analysis of the outliers is discussed as well in this paragraph. The most obvious split in the dataset is between Food and Non food (Homecare and Personal care products, hereafter named HPC) SKU’s. Food SKU’s follow a different sales pattern than HPC SKU’s, where HPC SKU’s are more slow moving products and thus have a much lower sales. Other potential data set splits are on a retailer, product category and promo mechanism level. But, these splits do result in sample sizes which are not large enough for the different data sets and/or the demand patterns of the possible splits do not clearly differ. While the products on Food and HPC level do clearly differ in demand height and pattern. Hence the sample is split on this level which results in three models: All categories, Food categories and HPC categories.
Next, when exploring the outliers of a regression where all cases are included, it seems that most of the outliers originate from the Magnum products (see Appendix 3). More specific, 15 out of the 24 outliers originate from the Magnum products. After closer inspection, the Magnum products have very high LF’s. The first two staves in Figure 6 1 are the LF’s of two out of three Magnum products in the sample size and are considerably higher than the LF’s of the other products in the sample size. Since the Magnum SKU’s are responsible for over half of the outliers and have a very high LF, the different data sets will be tested with and without the three Magnum SKU’s. The high Lift Factor of the Magnum products are probably the result of the combination of the facts that Magnum is a very strong brand, that Magnum is an expensive brand with high absolute discounts in promotion and that Magnum is not often in promotion. However, the high average LF of Magnum is still remarkable against the average LF of the other Unilever brands. Besides the Magnum products, three other promotions have been deleted because their standard residual exceeded 3.5 (Appendix 3).
Page 22 Improving the promotion forecasting accuracy at Unilever Netherlands
Average Lift factor sample SKU's 60.00
50.00
40.00
30.00
20.00 Average Lift factor Lift Average
10.00
0.00 1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930 313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081 82 83 84 85 SKU
Figure 6 1: Lift Factors of the SKU’s in the sample size
As a result, the 5 data sets in Table 6 1 will be tested, where a first split is made between Food and HPC categories and a second split is made by the inclusion or exclusion of Magnum products.
Categories Data set Number of Cases calibration Cases validation number total cases period period All Data set 1 1235 989 246 Food Data set 2 482 388 94 HPC Data set 3 753 601 152 All w/o Magnum Data set 4 1211 968 243 Food w/o Magnum Data set 5 458 367 91
Table 6 1: Data set to be checked to analyze best performing data set
6.3 Hypotheses effect size variables and data sets
Hereunder, in Table 6 2, the hypotheses are formulated for the different independent variables and the performance of the data sets relative to each other. These hypotheses will be checked using the results of the model on the different data sets. The number of plus or minus signs indicate the expected size of the effect on the promotional sales. All variables originate from literature and in the research of Van der Poel (2010a) the source of each variable is depicted.
Page 23 Improving the promotion forecasting accuracy at Unilever Netherlands
Variable Effect Explanation • Hypotheses Promotion variables H1 Display ++ A promotion placed on a display will have higher promotional sales. H2 Folder ++ A promotion depicted in the folder will have higher promotional sales. H3 TV support ++ A promotion showed on TV will have higher promotional sales. H4 Holiday products + Products which are expected to sell better in holiday weeks are expected to have higher promotional sales in a holiday week. H5 Promo length ++ The longer a promotion the higher the promotional sales. H6 Absolute discount ++ A higher absolute discount will result in higher promotional sales. H7 Percentual discount ++ More percentual discount will result in higher promotional sales. H8 Promo mechanism Ranked from the expected most positive to the most negative effect on the promotional sales: Four or five for X, three for X, two for X, SPO, free product, premiaat. H9 Number of products in _ A promotion with more products in the same promotion will result in lower promotional promotion sales per SKU. • Hypotheses Retailer variables H10 Retailer Unknown which retailer will have a positive or negative effect. H11 Growth # of selling points ++ More extra selling points will result in higher promotional sales. • Hypotheses Brand variables H12 Repeat buyers _ A higher percentage of repeat buyers indicates a larger group of loyal consumers and likely a lower LF. H13 Promo pressure + A high promo pressure means relative low base sales. Hence, promotional pressure increases the LF, since this measure is dependent on the base. H14 LF former promotions SKU ++ Higher historical LF’s of a SKU indicate higher promotional sales. H15 Market penetration _ When a higher amount of consumers already buys the product there will be fewer consumers who switch to this product in promotion. H16 Preservability + Products with a longer preservability will have higher promotional sales. H17 Size of product _ The more space a product consumes the lower the susceptibility to stockpiling is, which is likely to result in a lower LF. H18 Frequency of purchase _ The higher the frequency of purchase the lower the susceptibility to stockpiling is, which is likely to result in a lower LF. H19 Product category Unknown which product category will have a positive or negative effect. H20 Winter products temp. _ Promotions in weeks with a low temperature will have a higher LF for “winter” products.
H21 Summer products temp. + Promotions in weeks with a high temperature will have a higher LF for “summer” products. • Hypotheses different datasets H22 Dataset 1 & 2 vs. dataset 4 The exclusion of the Magnum products will increase the model fit for dataset 4 & 5. & 5 H23 Dataset 3 & 5 vs. dataset 4 Breaking down the data set in Food and HPC categories makes the data sets more specific and will result in a higher model fit for dataset 3 & 5.
Table 6 2: Hypotheses of the effects sizes of the variables and the performance of the different data sets
Page 24 Improving the promotion forecasting accuracy at Unilever Netherlands
6.4 Measurement indicators hypotheses
Before the results will be discussed in the next chapter, general accepted measurement indicators to test the hypotheses will be specified in this paragraph. The measurement indicators can be distinguished on model performance and variable performance. The model performance is tested with two measurement indicators, the (adjusted) R square and the MAPE. The R square and adjusted R square are calculated according to formulas 6 1 and 6 2. The (adjusted) R square is a widely used measurement for the goodness of fit of a linear regression model and is used in other research on promotion forecasting as well (Van Loo (2006), Van den Heuvel (2009), Wittink et all (1988)). For the validation period of the models in this research only the R square is depicted, because the adjusted R square has no meaning when a model from a calibration period is fitted on a validation period. SS R 2 = mo del Formula 6-1 SS regression
n − 1 AdjustedR2=1 − (1 − R 2 ) Formula 6-2 n− p − 1