Eindhoven University of Technology

MASTER

Improving the promotion forecasting accuracy at Netherlands

van der Poel, M.J.

Award date: 2010

Link to publication

Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration.

General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

Eindhoven, August 2010

Improving the promotion forecasting accuracy at Unilever

Netherlands

by M.J. (Thijs) van der Poel

BSc Industrial Engineering and Management Science Student identity number 0550934

in partial fulfilment of the requirements for the degree of

Master of Science in Operations Management and Logistics

Supervisors TU/e: dr. K.H. van Donselaar dr. J.J.L. Schepers

Supervisor Unilever: dr. P.D.J. van Balkom

Improving the promotion forecasting accuracy at Unilever Netherlands

TUE. School of Industrial Engineering. Series Master Theses Operations Management and Logistics

Subject headings: sales forecasting, promotions, retail trade, consumer goods

Page II Improving the promotion forecasting accuracy at Unilever Netherlands

Abstract

This master thesis describes how the forecasting accuracy of promotions can be improved at Unilever Netherlands. Currently, a very judgemental way of forecasting is applied by employees within the organization. This research will develop the forecasting process by using a more mathematical forecasting model. With multiple linear regression the consumer demand and retailer orders are forecasted and an analysis is made between the difference of forecasting consumer demand and forecasting retailer orders. The effect size of the 21 dependent variables on the promotional demand are discussed and the most important are used to formulate a reduced model. It is concluded that the consumer demand can be forecasted quite accurate; however, the forecasting accuracy drops substantial for retailer orders. Multiple disturbing factors on consumer demand apparently increase the variability of the retailer orders. Therefore, this research advices Unilever to cooperate more extensively with their retailers to investigate the disturbing factors and develop a integrated forecasting approach.

Page III Improving the promotion forecasting accuracy at Unilever Netherlands

Management summary

Problem introduction This research is performed at Unilever Netherlands in Rotterdam and is directed at the forecasting process for promotions. In the last 2 decades the promotional pressure has increased in the Fast Moving Consumer Market where Unilever operates in. This holds especially in the Netherlands, where the competition is fierce and multiple price wars have decreased the price level. Therefore, with a current promotional pressure of around 40%, Unilever has indicated the forecasting process of these promotions as a developmental area. An earlier internal project indicated that the forecast accuracy on promotion or range level is quite good; however, on product level the promotion accuracy drops dramatically. And since the Unilever plants have to produce product specific items and the stock levels are product specific, the goal is to increase the forecasting accuracy on SKU level.

Problem definition The main research question is: What are the causes for the low forecasting accuracy of the promotion forecasting process and how can the forecasting accuracy be improved?

A first analysis of the problem resulted in five problem areas. This research mainly focussed on the problem area Poor database usage: Within Unilever different data sources have to be consulted manually for each promotion. This is a time consuming user unfriendly process, which does not enhance the usage of data and thus the forecast accuracy. Furthermore, no model is provided to calculate the sales of a new promotion. Therefore, an employee has to search and analyze all the information him or herself.

The research is performed at four retailers in the Netherlands (Albert Heijn, C1000, Kruidvat and Plus) and 86 different products. The promotions of these products are analyzed for the period January 2009 upto march 2010. Also, some practical requirements to make a forecasting model work in practice are defined: The forecasting model should be easy to use for Unilever employees, it should work with data which is available within the organization and it should forecast the consumer demand and use this as a basis to come to a retailer order forecast to enhance the usability of the model.

Research design The research design depicts which method should be used and which variables are included in the model. Multiple linear regression is chosen as the most suitable method for a forecasting model. In this method one dependent variable is predicted with multiple independent variables. As dependent variable the Lift Factor of a promotion is forecasted. This is the promotional sales divided by the

Page IV Improving the promotion forecasting accuracy at Unilever Netherlands weekly base line sales of a product. The independent variables are divided among the groups promotion, retailer and as depicted in the underlying figure. The research will test the effect of the different independent variables on the dependent variable, will reduce the number of variables and correct for data availability.

Display Folder Advertising Promotion TV variables Holiday

Length promo Retailer Absolute # of selling Retailer Promotional Price decrease variables sales Percentual points Promo mechanism Repeat buyers # of products in promotions Promotion Brand pressure variables Lift factor former promotions Market Preservability penetration Size of product Susceptibility to stockpiling Frequency of purchase Product category Summer products Weather Winter products

Results The first step in developing a forecasting model for Unilever is to test the model performance of the full model, where all variables in above figure are included, on the consumer demand. The consumer demand is the actual number of products which are scanned at the registers of the retailer stores during a promotion. The effect size and direction of the independent variables are depicted in the table below, where two plus or minus signs indicate a strong effect of the variable on the promotional sales. The Adjusted Rsquare of the model is quite high with a value of 0.700. This indicates a good model fit where 70% of the variance of the promotional demand is explained by the model. Furthermore, the model results are robust when used for other promotions than the promotions with which the model is calibrate.

Besides the fact that the variables with a large effect size are more important to inherit in a forecasting model, the effect size of a variable can also be used to drive marketing decisions. The first marketing implication is that a display (second placement) of a promotion in a retailer store is far more important than folder advertisement and TV advertisement. Hence, when the marketing budget should be allocated, investments in display should have priority above investments in folder advertisement and both should have priority on investments in TV advertisement. The second implication is that the promotion mechanism where a consumer has to buy four or more products to get the promotional discount results in the highest promotional demand. Surprisingly, a Single Price

Page V Improving the promotion forecasting accuracy at Unilever Netherlands

Off (SPO), where a consumer only has to buy one product, leads to a highger promotional demand than a promotion where a consumer has to buy two or three products. A promotion where the consumer gets a free product or premiaat has the lowest promotional demand, although the success of such a promotion really depends on the type of free product or premiaat. The last important implication is that marketing can increase the promotional sales by making sure that the promotion is sold in all stores of a retailer. This variable is especially important if the product is not sold in (almost) all stores in base line sales. For these products there is a lot of extra promotional sales to gain. One way of boosting the number of stores is by advertising the promotion in the folder, since all stores are expected to have the folder promotions available. So when a product is not sold in all stores it is more interesting for Unilever to invest in folder advertisement.

Variable Effect size Variable Effect size Display ++ log_growth_number_selling_points ++ Folder + Percentage_repeat_buyers n.e. TV_support n.e. / + Promotion_pressure n.e. Holiday_products n.e. ln_LF_former_promotions_EAN ++ Promo_length ++ Market_penetration n.e. Percentual_discount ++ Preservability + SPO a - log_size_of_product n.e. Two_for a - Frequency_of_purchase - Three_for a - Personalcare c n.e. Free_product a n.e. Ice_and_beverages c - Premiaat a n.e. SCC_and_vitality_shots c + Number_of_products_in_promotion - Savoury_and_dressings c n.e. C1000 b + winter_products_temp n.e. Plus b - summer_products_temp n.e. Kruidvat b - - n.e. = no effect on the promotional sales a The baseline group for the different product categories is the product group “Four_or_five_for” b The baseline group for the different retailers is the retailer “Albert Heijn” c The baseline group for the different product groups is the product group “Homecare”

To increase the usability of the model in practice, a model is constructed with the above most important variables. The model fit of this model with a limited number of variables is still surprisingly high and almost equal to the model fit of the full model. However, not all variables have data availability at Unilever, since Unilever as a manufacturer is dependent on a retailer for information of upcoming promotions. For two variables in the adapted model Unilever has no data availability. These are the percentage of shops with a second placement and the extra number of shops where the product is sold in promotion. To analyze what the effect is of the lack of data on the forecast accuracy of the model a new model without these variables is tested. The model fit decreases to an adjusted Rsquare of around 0.500, indicating that the exclusion of the two variables substantially

Page VI Improving the promotion forecasting accuracy at Unilever Netherlands worsens the performance of the forecasting model. Thus it is important for Unilever to gain data availability on these variables.

Unilever not only wants to know how much a promotion sells on the shopping floor, but also wants to know how much a retailer orders of a product. Therefore, the model results for the consumer demand are adapted to retailer orders. The retailers included in the research order on average between 39% and 85% more than is sold during the promotion. The forecasts for the consumer demand are raised with this difference. The model performance decreases substantially because of the extra variance in the retailer orders. The adjusted Rsquare for the NonFood data set has decreased to 0.103, meaning that the predictive power of the model is almost absent. For the Food data set the Rsquare is 0.392. So, the variability in the retailer orders is a lot higher for the Non food products than Food products. Forecasting retailer orders for NonFood products seems to have little to no benefit, forecasting retailer orders for Food products has more practical value.

Implementation & conclusions The different model adaptations in this research show that if the right information is available Unilever is very well capable of accurately predicting the consumer demand. Unilever has an advantage over the retailer because of their larger data pool of promotions over all retailers which can be used to forecast upcoming promotions. Hence, with this skill Unilever is able to take the lead in establishing a collaboration with retailers and increasing the forecast accuracy.

However, two aspects decrease the forecast accuracy of a manufacturer. First, a manufacturer has less data availability than a retailer and thus important variables cannot be used to forecast the promotional demand. Second, forecasting retailer orders has turned out to be far more difficult than consumer demand, especially for NonFood products. The bullwhip effect leads to a substantial deviation between retailer orders and consumer demand. As a result, Unilever should first increase their data availability on promotions by closer collaboration with retailers and better database management. Thereafter, in order to be able to accurately forecast retailer orders, the disturbing factors behind the bullwhip effect should be analyzed. Close collaboration with a retailer is needed to successfully analyze these disturbances. When the disturbing factors are successfully analyzed, a promotion forecasting model which forecasts the consumer demand and corrects for the disturbing factors should be formulated and employed together with the retailer.

Concluding, close collaboration and information sharing is needed, where in the end Unilever and the retailer together use one forecasting approach. Concepts like Vendor Managed Inventory (VMI), Continuous Replenishment Program (CPR) and Collaborative Planning, Forecasting and Replenishment (CPFR) can be used to increase the collaboration between Unilever and a retailer, where VMI is the most basic concept and CPFR is the most advanced concept.

Page VII Improving the promotion forecasting accuracy at Unilever Netherlands

Preface

This master thesis is the result of the final part of my study Industrial Engineering and Management at Eindhoven University of Technology. The master thesis project was executed Unilever Netherlands in Rotterdam from the beginning of 2010 up to the end of the summer.

When I started my master thesis I just came back from an international semester in Hong Kong. Life over there had been eye opening, and really interesting, but also relaxing and having a lot of fun in one way or another. Therefore, starting my master thesis in Rotterdam really pushed me back into normal hard working life. And I have to say that I still feel lucky that an opportunity for my master thesis had presented itself at Unilever, since the working atmosphere is really good in the headquarters in Unilever Rotterdam. Luckily the burden of the master thesis did not feel like that at all, so I can look confidently in to the future where a real job is waiting for me.

I would like to grab the opportunity to express my gratitude towards a few people. First of all, I would like to thank Patrick van Balkom, my supervisor at Unilever. His guidance and comments provided very useful insights and shed light on my path the moments I needed it. I really enjoyed working with him.

Second, I would like to thank my first supervisor at the TU/e, Karel van Donselaar. His thorough knowledge on the subject led to some very good discussions. And without his efforts of finding an internship I would not have had the opportunity at Unilever. Third, I would like to thank my second supervisor at the TU/e, Jeroen Schepers. The feedback he gave on my work provided new insights and improved the quality of my work.

Lastly, I would like to thank my girlfriend for supporting me during the project.

Thijs van der Poel Rotterdam, August 2010

Page VIII Improving the promotion forecasting accuracy at Unilever Netherlands

Index

Abstract ...... III Management summary...... IV Preface ...... VIII Index ...... IX

Part 1: Project definition ...... 1 1 Introduction of research ...... 1 1.1 Structure of report ...... 1 1.2 Company description ...... 1 1.3 Problem introduction ...... 3 1.4 Overview literature...... 5 1.5 Gaps in literature ...... 6 2 Problem definition...... 6 2.1 Problem formulation ...... 6 2.2 Problem decomposition ...... 7 2.3 Research questions ...... 8 2.4 Practical requirements research ...... 9 3 Scope of research ...... 10 3.1 Region ...... 10 3.2 Retailers ...... 10 3.3 Time horizon ...... 11 3.4 Products (SKU’s) ...... 11

Part 2: Research design ...... 15 4 Method of research ...... 15 5 Dependent and independent variables ...... 16 5.1 Dependent variable ...... 16 5.1.1 Lift factor as dependent variable ...... 16 5.2 Independent variables ...... 17 5.3 Transformations of (in)dependent variables ...... 20 5.4 Assign baseline dummy variables ...... 20 6 Different data sets & hypotheses ...... 21 6.1 Sample size ...... 21 6.2 Data set split and reduction ...... 22 6.3 Hypotheses effect size variables and data sets ...... 23 6.4 Measurement indicators hypotheses ...... 25

Page IX Improving the promotion forecasting accuracy at Unilever Netherlands

Part 3: Results full model...... 27 7 Regression analyses full model ...... 27 7.1 Overview most important dependent and independent variables ...... 27 7.2 Checking the assumptions underlying multiple linear regression ...... 28 7.3 Results full model ...... 28 7.4 Validation full model ...... 33 8 Generalizability of model results ...... 34 8.1 Generalizability of sample size ...... 34 8.2 Comparison with other research in the field ...... 37

Part 4: Model adaptation ...... 41 9 Adaptations to increase the usability and check for data availability ...... 42 9.1 Adaptation 1: Increase the usability by reducing the number of variables ...... 42 9.2 Adaptation 2: Increase the usability by checking for data availability ...... 44 9.3 Comparison of the different adaptations with the full model ...... 45 10 Model adaptation 3: From consumer demand to retailer orders ...... 46 10.1 Calculation retailer orders ...... 46 10.2 Model fit on retailer orders ...... 46 10.3 Difference between retailer orders and consumer demand ...... 47

Part 5: Implementation and conclusions ...... 51 11 Implementation ...... 51 11.1 Final model for implementation ...... 51 11.1.2 Results retailer orders (model adaption 1 as basis) ...... 52 11.1.3 Results retailer orders (model adaption 2 as basis) ...... 53 11.1.4 Conclusion results retailer orders based on model adaptation 1 & 2 ...... 53 11.1.5 Actions needed to overcome current problems ...... 54 11.2 Implementation plan ...... 54 12 Conclusions ...... 58 12.1 Ideal model ...... 58 12.2 Adaptations needed on ideal model ...... 60 12.3 Future steps to increase the forecast accuracy ...... 61 12.4 Contribution to literature ...... 62

References ...... 65 Appendices ...... 67

Page X Improving the promotion forecasting accuracy at Unilever Netherlands

Part 1: Project definition

1 Introduction of research

1.1 Structure of report

The report is structured in five parts. The parts are based on the regulative cycle of Van Strien (1979). In the first part the motivation for this research is discussed resulting in the Needs of Unilever (Figure 11). This part discusses the exact problem Unilever is experiencing and the possibilities of dealing with the problem, which results in the starting point for the rest of the research. The second part will translate the company needs into a research design . The research design will depict how the needs can be investigated and translated into methods to research the problem. The third part of this research will discuss the model results . The model results will enhance the understandability of the problem. The results need to be adapted to be applicable in practice. This is done in the model adaptation in part four. The last part, the implementation & conclusions, discusses how this research can be implemented in order to fulfil the needs which are distinguished in part one. Throughout the different parts, the research will have a contribution to the existing literature as described in paragraph 1.5. After reading the first part it should be what the problem is that Unilever is encountering and what the scope of this research is.

Project definition (Needs)

Implementation Research Design & Conclusions (Methods)

Model Model adaptation results

Figure 11: The different project parts of the research

1.2 Company description

Unilever is a global manufacturing company operating in the Fast Moving Consumer Goods (FMCG) industry. The company is specialized in Food, Home and Personal care products. The company employs around 163.000 people worldwide. The Unilever portfolio of 400 , of which , , and Omo are some of the largest brands, is sold in over 100 countries. These brands contributed to a turnover in 2009 of 39.8 billion euro’s worldwide ( www.unilever.com ). Most of Unilever’s products are manufactured in the 264 self owned plants. This research will focus on Unilever Benelux, of which the main office is located in Rotterdam. Moreover, the emphasis is placed on the Dutch market and thus on the Dutch part of the Unilever Benelux organization. This decision

Page 1 Improving the promotion forecasting accuracy at Unilever Netherlands will be clarified in paragraph 3.1. In the Netherlands Unilever is split up in 5 product categories and 4 customer teams. The different product categories are:  Home care (HC)  Personal Care (PC)  Savoury & Dressings (S&D)  Vitality shots & Spreads/Cooking Category (SCC)  Icecream & Beverages (I&B)

The different customer teams are:  Albert Heijn (including Etos)  Bijeen (C1000 & Jumbo), Super de Boer en Makro  Drugteam (different drugstore, e.g. Kruidvat, DA)  Superunie (16 smaller retailers, e.g. Sligro, Plus)

Unilever is organized in a matrix organization around the above product categories and customer teams (see Figure 12). Alongside the customer teams interdisciplinary Customer Development Teams meet ones a month to discuss the more tactical issues. The Customer Development Teams are responsible for the planning horizon between 06 months. The product categories are overviewed by Category Brand Teams which have a longer planning horizon of 324 months. Hence, the customer teams have a more operational focus than the product category teams.

Concluding, in this paragraph the organization structure has been explained in a simplistic way to enhance the understandability. The product categories and some of the named retailers will be used in the further research. Hence, the reader is able to position the research within the Unilever organization.

Page 2 Improving the promotion forecasting accuracy at Unilever Netherlands

Customer teams

Home Care SuperBijeen, de Boer, Makro

Customer Development Personal Care Teams (CDT) AlbertHeijn Drugstores Superunie (planning horizon of 0-6 months) Product Icecream & categories Beverages

SSC & Vitality CBT : Marketing, Sales, Planning, Finance Savoury & Dressings CDT : Sales, Customer Development, Customer Service Category Brand Teams (CBT) (planning horizon of 3-24 months) Figure 12: Organization matrix Unilever Netherlands

1.3 Problem introduction

Within the Dutch FMCG (Fast Moving Consumer Goods) market as well as foreign markets the promotional share of the total volume has increased in the last decades. Accordingly, in the Netherlands the promotion pressure has increased in the last couple of years due to multiple price wars, to around 40% promotional volume of the total volume. Because of that, Unilever noticed that their promotion forecasting process became more and more important over the years and needed improvement. Promotion forecasting has received increased attention at Unilever Benelux since halfway 2008. The forecasting accuracy at that moment was open for improvements with a case fill, which is a service level measure, of at best 95% (the current case fill target is 98.5%). Besides the low case fill, Unilever overforecasted their promotions on average with 30%, which resulted in high numbers of obsoletes. Furthermore, the employees who produced the promotion forecast were not able to put a lot of time in a promotion forecast, while they addressed the importance of accurate forecasting in interviews back then.

Hence, the objective formulated was to reduce the overforecast while increasing the case fill. A program within the company was directed at report and evaluation possibilities of promotion forecasting and the training of the involved employees. The given trainings further increased the awareness of an accurate promotion forecast and improved the creation process of a promotion forecast. Currently the forecasting accuracy is analyzed on SKU level (stands for Stock Keeping Unit and is defined as a unique product), range level (i.e. different variants of one product together, for example different DOVE spray deodorants) and promotion level (all products within one promotion, for example all products in the DOVE line). Before 2008 the promotion accuracy was not yet analyzed on SKU level. The forecast accuracy on range and promotion level seemed to be quite

Page 3 Improving the promotion forecasting accuracy at Unilever Netherlands good; however, on SKU level the forecast accuracy looked more dramatic. This occurred because the variance of the different SKU’s levelled each other out. So when one SKU was overforecasted and another SKU in the promotion was underforecasted, the forecast inaccuracy of the two SKU’s cancelled out against each other.

In Figure 13 a simplified overview of the supply chain is depicted of the market where Unilever operates in. In this figure Unilever is the manufacturer, Albert Heijn for example the retailer and shoppers in the retailer stores are the consumer. In this supply chain there are two different demand origins, the demand from the consumers at the retailers and the demand from the retailers at the manufacturer. Both demand origins can be forecasted with a model and the remainder of this paragraph will show which of the two will be forecasted. The consumer demand is a more direct and accurate representation of the promotional sales and the retailer orders more indirect and contain more variation. This increase variation is caused mainly by the forward buying of a retailer, available stock at a retailer before a promotion takes place, the inaccuracy in the promotional forecast of a retailer and the pipeline fill. These cause a variation which is difficult to explain, especially since stock levels of a retailer are pretty much unknown at Unilever. Besides that, modelling the promotional sales requires modelling the promotion mechanism underlying the sales and occurs on the shopping floor. Hence, this enables a model to accurately forecast the factors behind promotional sales. Furthermore, the On Shelf Availability in the retailer shops is regarded as a more important measure than the case fill at the retailer, since the products are sold in the shops and not in the warehouse of a retailer. Lastly, when discussing the height of the expected promotional sales with the retailer, the consumer demand is the fundament for this discussion. Hence, it is preferred to base the forecast on the consumer demand. This forecast will be corrected for the disturbing factors between consumer demand and retailer orders. So, first a model will be developed which forecasts the consumer demand. The consumer demand forecast generated by the model has to be adapted to retailer orders afterwards. Because consumer demand is used as basis, the research will measure the promotional demand in consumer units. This measure can be adapted to the more widely used case pack size measure within Unilever (a case pack contains a certain number of consumer units).

Manufacturer GoodsRetailer Goods Consumer

Demand Demand (retailer orders) (consumer demand)

Adaptation Figure 13: Overview of the demand in the FMCG supply chain

Page 4 Improving the promotion forecasting accuracy at Unilever Netherlands

This paragraph aimed to give an introduction of the promotion forecasting problem within Unilever. It depicted the background of the problem and specified the different demand origins in the supply chain where Unilever operates in. In the next chapter the problem definition will be discussed in more detail. First an overview of the available relevant literature about promotion forecasting is given.

1.4 Overview literature

In this chapter the relevant literature for the research field of this master thesis will be summarized and linked with the situation of the company. An extensive literature study about promotion forecasting can be found in Van der Poel (2010a) on which this summary is based.

The importance of promotions within the FMCG sector has grown substantially over the last 20 years (Blattberg, 1995). The market share of promotions has increased likewise in the Dutch FMCG market. With the increase of promotion pressure, simultaneously the instability of the demand has increased. Promotions are responsible for large volumes, typically between 4 to 8 times the base line sales (Buckers, 2010, Van den Heuvel, 2009). Hence, logically the importance of accurate promotion forecasting has increased as well. On the one hand, a low case fill, because of underforecasting, results in Out Of Stocks in retailer stores and is harmful for the sales and retailer relationship. On the other hand, overforecasting results in extra stock (costs) and potential obsoletes.

There are different types of promotions. At the moment, almost all promotions in the Dutch FMCG sector are price promotions, where the consumer gets a reduced price in one or another form. Likewise, the price promotion is the largest single category in the marketing budget in American FMCG companies (SilvaRiso, 1999). But besides price promotions occasionally a coupon promotion or a promotion with a premiaat or free product (e.g. gadget, discount on theme park ticket or a free (new) product) is offered. The success of the different type of promotions is influenced by numerous variables. In Van der Poel (2010a) 53 variables with a possible influence are listed. Which variables are perceived as important will be discussed in chapter 5.2. These variables have to be fitted in a model. The most widely used method found in literature is a multiple linear regression analysis (Van Loo, 2006, Van den Heuvel, 2009, De Schrijver, 2009, Cooper et al, 1999, Wittink et al, 1988). In such an analysis multiple independent variables predict one dependent variable. Interaction effects between independent variables can be incorporated when the form of the interacting variables is continuous. Furthermore, the (in)dependent variables can be included in their linear and logarithmic form as long as their form is metric.

The literature described will be useful in the development of a promotion forecasting model on manufacturer level. The same variables have an impact on the promotional volume for retailers and manufacturers. However, the data availability will differ and manufacturers are dependent on

Page 5 Improving the promotion forecasting accuracy at Unilever Netherlands retailers for the data of an upcoming promotion. No research is available on the effect of unknown stock levels. As mentioned this is only important if a manufacturer wants to predict retailer orders.

1.5 Gaps in literature

In the literature study of Van der Poel (2010a) multiple gaps in the literature on promotion forecasting were discussed. This paragraph indicates the gaps this research will address: 1. The dependent variable. This can be the lift factor (hereafter shortened with LF) over the base line sales or the absolute promotional sales. Furthermore the LF as dependent variable can be transformed in multiple ways. There is no conclusive research on the performance of the different forms of the dependent variable. 2. Manufacturer based model. All relevant promotion forecasting models found in literature are retailer based. It is unclear on what aspects a retailer model and manufacturer model differ and how this could have an impact on the performance. This gap also relates to the following gap, which discusses whether it is an advantage or disadvantage to be a retailer. 3. Advantage or disadvantage of being a retailer. It is interesting to investigate if a manufacturer based model has a different performance than a retailer based model. A factor which might cause this difference is the dependency on the retailer for information. A second factor is that the products of a manufacturer might be more homogenate than the products of a retailer. Moreover, a manufacturer has more promotions of the same product than a retailer, because the product is sold at multiple retailers. Literature on promotion forecasting does not specify if there is a difference in performance between a manufacturer and retailer based model and which factors cause this difference.

2 Problem definition

This chapter specifies the aim of the research. First, the overall problem formulation is depicted. Second, an initial analysis of the problem context is depicted. Third, the corresponding research questions are formulated. Lastly, the practical requirements of the research for Unilever are stated.

2.1 Problem formulation

Although some steps have been made the last 2 years (as mentioned in paragraph 1.3), the promotion forecasting process is still open for quite some improvement. In this paragraph the problems related to promotion forecasting are used to come to the problem formulation and accompanying research question with sub questions. The purpose of this research is to analyze the inaccuracy of the promotion forecasts. Hence the following problem formulation is depicted.

Problem formulation: The forecasting accuracy of the current promotion forecasting process is too low

Page 6 Improving the promotion forecasting accuracy at Unilever Netherlands

2.2 Problem decomposition

A first analysis of the overall problem context is shown in the Fishbone diagram in Figure 21. The goal of this analysis is to investigate which general problem areas have an impact on the forecast accuracy and too choose the scope this research will focus on. The forecast inaccuracy of a promotion forecast is regarded as the main problem. A high forecast inaccuracy results in more obsoletes, higher stock costs and a lower case fill. The causes of Forecast inaccuracy can be divided into five general problem areas, which are:  Phasing of promotions: For the measurement of the forecast accuracy it is important that the deliveries of promotional volumes are planned in the correct weeks. This is mainly a measurement issue, but can cause problems if volumes have to be delivered earlier than planned. Promotions are typically delivered one or two weeks before the promotion takes place.  Sales oriented organization: Unilever is a sales oriented organization where logistic issues typically have less priority. The sales department wants to be able to deliver the products to the retailer at all costs, i.e. they have less of an eye for logistic costs and operations. This mentality can result in a tendency to overforecasting.  Customer team deviation: There are four customer teams which work quite independently from each other. Information sharing and learning from other customer teams is not common practice. Furthermore, ways of working differ substantially between the customers within one customer team.  Retailer dependency: Retailers have the power to change promotions and can decide not to share all relevant information with Unilever. It is common practice that retailers are not willing to share information, mainly because of data sensitivity. Furthermore, similar promotions at other retailers can result in last minute changes when the discount of a promotion at another retailer is higher.  Poor database usage: The different data sources have to be consulted manually for each promotion. This is a time consuming user unfriendly process, which does not enhance the usage of data and thus the forecast accuracy. Furthermore, no model is provided to calculate the sales of a new promotion. Therefore, an employee has to search and analyze all the information him or herself.

Page 7 Improving the promotion forecasting accuracy at Unilever Netherlands

Phasing of Sales oriented Customer team promotions organization deviation Multiple SS Divided Different information increases responsibilities available Measurement Promotion forecasts mechanism Low power Logistiek Assistent on history single retailer Obsoletes Measurement Risk averse Different method and error timeline for process per retailer Forecast Low Service inaccuracy Level (case fill) Forward buying Usability databases Last minute changes Lack of Stock costs tool # of databases to extract No stock data data from Limited information Lack of Time consuming sharing knowledge LA operation

Retailer Poor database dependency usage

Figure 21: Problem areas and scope of the research

The blue zone indicates the main scope of the research, which is mainly to enhance the database usage within Unilever. The grey zones are areas which will benefit from this research as well, because of more standardization and more clarity on the information needed from a retailer. Regarding the main scope, employees currently need to research the different databases by themselves in order to create an accurate forecast. Furthermore, no tool is provided to calculate a promotion forecast. Hence, an employee has to make his own assumptions and calculations with limited information available. Concluding, this process is complex, time consuming and not standardized and a forecasting tool which is used throughout the Unilever organization will improve these aspects.

2.3 Research questions

Figure 21 shows that a low forecasting accuracy results in more obsoletes, a lower service level and higher stock costs. The forecasting accuracy should be increased to reduce this effect. Contiguously, the main research question is: What are the causes for the low forecasting accuracy of the promotion forecasting process and how can the forecasting accuracy be improved?

Most of the issues can be improved with a suitable demand forecasting model for promotions. Such a forecasting model can diminish the poor database usage, standardize the processes among

Page 8 Improving the promotion forecasting accuracy at Unilever Netherlands different customer teams, can serve as an argument towards retailers to legitimize the request for data and can serve as a tool to strengthen the logistic voice in the sales oriented organization of Unilever. Only the phasing of the orders of a promotion is not expected to directly improve. Hence, the research will be focussed on the development and implementation of a promotion forecasting model. The following sub questions can be used to answer the main research question: 1. What are the functional requirements to make a forecasting model work within Unilever? 2. Which products, retailers, time horizon and region should be included in the analysis? 3. Which prediction method is most suitable for promotion forecasting? 4. Which independent and dependent variables should be included in a forecasting model? 5. How should a model generate a forecast for retailer orders? 6. What is the impact of being a manufacturer on the performance of the forecasting model?

2.4 Practical requirements research

Besides the scientific nature of this research, the practical goals should be defined as well. The overall goal is to improve the promotion forecast accuracy of Unilever. A couple of sub goals should be formulated to reach this overall goal. The sub goals formulated are practical requirements that a potential solution should meet in order to work in practice. These are: 1. Ease of use 2. Data availability 3. Consumer demand as basis

(1) Ease of use: This aspect reflects on the fact that a forecasting model should be simple to use. Hence, it should not cost a Unilever employee too much effort to use the model, the interface should be very simple and the output should be understandable. Regarding the number of variables in the model, interviews within Unilever indicated that a practical useful model should contain maximum 10 variables and preferably less. Furthermore, the result the model generates should be understandable and the model itself should not be seen as a black box. This would decrease the acceptance of the forecast of the model. (2) Data availability: The model should work with data which is readily available for the Unilever employees who have to work with the model. (3) Consumer demand as basis: As reasoned in paragraph 1.3 the consumer demand will be the starting point for developing a forecasting model. Later on this forecast will be adapted to retailer orders.

Summarizing, to meet the practical requirements the model needs to be understandable, have a high usability, work with available data and focus on consumer demand. These requirements will be taken into account in the model building process.

Page 9 Improving the promotion forecasting accuracy at Unilever Netherlands

3 Scope of research

In this chapter the scope of the research will be determined. The region, retailers, time horizon and products which will form the sample size are discussed successively.

3.1 Region

Unilever Benelux exists out of The Netherlands and Belgium. An analysis has been done to judge the comparability of the Dutch and Belgium market. If the markets and promotion processes are comparable, the research would have focussed on both markets. However, the Belgium market differs too much from the Dutch on a couple of aspects. A vast part of the promotions on the Belgium market are coupon promotions, while these promotions are rare in the Dutch market. Moreover, most promotions are promoted on special cardboard displays and multiple items of a SKU’s are bundled together in a repack. Often different SKU’s are even bundled together in one repack. Lastly, the retailer Colruyt in Belgium has a lowest price guarantee for all his products. Therefore, they match promotions of all other retailers on the Belgium market on products they offer in store. As a result, other retailers try to come up with promotions that Colruyt does not need to match, which brings a high variety of promotions to the Belgium market.

Concluding, above factors are very likely to cause a different promotion mechanism on the Belgium and Dutch market. Because the markets are very different, the model will not be capable to benefit from the larger pool of data. Therefore, one of both markets needs to be chosen for the research. Since the research is performed from the office in Rotterdam, data collection will be easier for the Dutch market; therefore, the Dutch market is chosen as scope for this research.

3.2 Retailers

Next, the research needs to be focussed on certain retailers in the market, since inclusion of all retailers will lead to extensive data gathering and will decrease the quality of the analysis. The following criteria are used to select four retailers in the market.  Size of the retailer: How large is the retailer compared to other retailers. This indicates the importance of a retailer. A large retailer is more important than a small retailer, because promotions of a large retailer have a higher impact on the safety stock and quicker lead to out of stocks. Therefore, large retailers are preferred.  Promotion pressure: How much of the total volume of a retailer originates from promotional volume.  Possibility of collaboration with a retailer: If the retailer is likely to or does already cooperate with Unilever to enhance the accuracy of the promotion forecasting process.

Page 10 Improving the promotion forecasting accuracy at Unilever Netherlands

 Data availability: In paragraph 5.2 the variables that will be included in the forecasting model are discussed. However, to include a variable in the model, data is needed. The data availability differs for each retailer.  Duration promotion: The duration of a promotion in the retailer sector is normally one week; however, some retailers have a different promotion period (e.g. Kruidvat, Jumbo, Makro, Sligro). It is preferred to perform the analysis on retailers with a promotion period of 1 week, to enhance the comparability between promotions.

Based on these criteria the retailers AH, C1000, Kruidvat and Plus are included in the research. AH and C1000 are the largest retailers and Kruidvat is the largest drugstore in the Netherlands. Therefore, including them is very logical although Kruidvat has promotions with a duration of 1 and 2 weeks; hence, the duration has to be included as independent variable in the research. Plus is a smaller retailer; however, they are included because they are open for collaboration and the data availability on pr omotions of the Plus is good.

3.3 Time horizon

A time horizon of at least 1 year is desirable, to overcome potential seasonal effects. Therefore, a time horizon from the beginning of 2009 up to week 13 of 201 0 is chosen for this research. To be able to cross validate the model the sample is split. The promotions in 2009 are used to calibrate the model and the promotions in the first quarter of 2010 are use d to validate the model results (see Figure 31). By this approach the results o f the model can be tested on their robustness (Miles et al., 2001).

Figure 31: Time horizon and data split research

3.4 Products (SKU’s)

Unilever Net herlands has around 2500 SKU’s which are sold at a rando m point in time. Quite some of these products are offered only once and/or have a very low volume. The SKU’s are divided among five product categories and these five product categories are divided in subcategories. The 5 product categories are named in cha pter 1; the subcategories are depicted below, in Figure 32, in the bottom layer of the figure.

Page 11 Improving the promotion forecasting accuracy at Unilever Netherlands

All products

Food Non-Food

Ice & Vitality shots, Savoury Home- Personal- Beverage Spreads & Cooking & Dressings care care

Vitality Spreads & House- Laundry shots cooking Hold care

Ice Tea & fruit Savoury Dres- Other Other Skin Hair Deo & cream beverages sings Foods bakery care grooming

Figure 32: The product categories of Unilever

Since data gathering is a time consuming process, a sample of the total population will be taken. The sample size needs to be representative for the whole set of products. Hence, the selection of SKU’s for the research sample is based on the following criteria: 1. At least 10% and at most 90% of the total volume of a SKU originates from promotional volume. Otherwise, a product does almost have no promotions or almost no base line sales. 2. The focus will be on the more important high volume SKU’s (A and B SKU’s), although some low volume SKU’s will be included in the sample as well (C SKU’s). 3. The SKU is sold in promotion in at least 2 of the 4 retailers included in the analysis. 4. The number of the total promotions of a SKU at the 4 retailers in the analysis during the time horizon of the research should be at least 4. 5. Each SKU is part of a broader range of Unilever products (e.g. the product “Kip Siam wereldgerecht” is part of the “Knorr Wereldgerechten” range). Always at maximum three variants of a range will be included to assure the diversity of the sample size. 6. The product should have sales since January 2009 up to march 2010, since this is the time horizon for the research.

Because there are 13 categories in total and the total sample size should be around 100 products, the aim is to select between 5 and 15 SKU’s for each subcategory, depending on the category size. The resulting sample size of 86 products is depicted in Appendix 1. For the category Savoury 19 SKU’s are selected because of the large number of SKU’s in this category (Table 31). For the categories Dressings, Other Foods and Tea, Soy & Fruit Beverages respectively 3, 4 and 3 products are selected. This is less than the goal of 5, because not enough products fulfilled the criteria. In the Dressings category and in the Tea, Soy & Fruit Beverages category a lot of products have been

Page 12 Improving the promotion forecasting accuracy at Unilever Netherlands innovated or relaunched in the time horizon of the research. In the Other Foods category, the number of products which is sold in promotion by at least two retailers is very limited. The category Vitality shots does almost not have any SKU’s anymore with respectable sales. The sample size taken is responsible for 1238 promotions in the time horizon of the research. The total number of promotions in this time horizon is 15283, meaning that the sample size contains 8.1% of the total promotions. This percentage of the promotions combined with the selection criteria should provide a representative sample size. This will be checked in chapter 8.

Category Number Category Number of SKU’s of SKU’s Deo & Grooming 9 Other bakery 0 Dressings 3 Savoury 19 Hair care 8 Skin 10 Household care 6 Spreads and cooking products 8 Ice cream 9 Tea and soy & fruit beverages 3 Laundry 6 Vitality shots 0 Other foods 4 Total 86

Table 31: Number of SKU's per category

Conclusion Part 1: This part resulted in a clear problem formulation, scope of the research and requirements which the research should fulfill in practice (sub research question 1). The research will be focused on improving the forecast accuracy of Unilever by taking the consumer demand as a starting point and later on adjust this consumer demand to retailer orders. Chapter 3 depicted the retailers, SKU’s, time horizon and region which form the sample size of the research (sub research question 2).

Page 13 Improving the promotion forecasting accuracy at Unilever Netherlands

Page 14 Improving the promotion forecasting accuracy at Unilever Netherlands

Part 2: Research design

The first part defined the needs within the Unilever organization, stated the problem formulation and limited the scope of the research. In order to be able to produce an accurate forecasting model, this paragraph will discuss which methods are most suitable, which variables should be included and what effects are expected in the results of the model (hypotheses). Basically, this part forms t he framework for the research.

4 Method of research

In this paragraph the method to be used for the research will be discussed. According to Makridakis et al (1988), three families of forecasting models can be distinguished, namely judgmental methods, Time series analysis and Explanatory methods. Judgmental forecasting is currently used within Unilever. The aim is to come to a more sophisticated, quantitative model. Time series analysis requires lengthy time series for the prediction of the upcoming period( s). However, promotions are events which occur on an infrequent basis. Therefore, time series cannot be used to analyze the promotional volume. Lastly, Explanatory methods aim to forecast the promotional volume as a dependent variable by independent variab le(s). Each independent variable needs to have an explanatory relationship with the dependent variable. Concluding, of the two quantitative approaches, only explanatory models are suitable for forecasting promotional volumes. Lastly, judgmental analysis wi ll always co exist, since the forecast a model provides needs to be verified with common sense.

Next, the different forecasting methods within the explanatory family will be analyzed and a choice is made for one method. Van Loo (2006) analyzed the four m ost important forecasting techniques on criteria applicable to the situation at Unilever (see Table 41), where a 4 indicates that the method performs the best relative to the other methods and a 1 indicates that a method performs the worst relative to the other methods . Van Loo concluded it is not even sure if more simple models are outperformed by more complex models. Therefore, the scoring on the accuracy criteria is questionable. Still single equation models are perceived as most suitable, especially since Unilever demands a model which is relative flexible and easy to use and interpret. The single equation models can be further divided in single and multiple linear regression models. Since simple linear regressi on models can only include one independent variable and the promotional volume is dependent on more than one independent variable, multiple regression is chosen as the most appropriate method . This is consistent with the analysis of Van der Poel (2010 b) wh ich concludes that multiple linear regression is the most widely used method in literature.

Page 15 Improving the promotion forecasting accuracy at Unilever Netherlands

Criteria Singleequation (single and Multiple Econometric Artificial Neural multiple linear regression) equation models Networks (ANN) Accuracy 1 2 3 4 Costs 4 3 2 1 Complexity 4 3 2 1 Data need 4 3 2 1 Ease of interpretation 4 3 2 1 Ease of Use 3 2 1 4 Total 20 16 12 12

Table 41: Performance forecasting techniques (Van Loo, 2006)

This paragraph concluded that multiple linear regression is the most suitable method for a promotion forecasting model. Consequently, this research will make use of multiple linear regression to analyze the promotions of Unilever.

5 Dependent and independent variables

In this chapter the dependent and independent variables that will be included in the model are discussed. The choice of the variables and the form of the variables have an important effect on the model results later on. A poor choice of variables results in a low model fit and thus in an inaccurate forecasting model for Unilever. Therefore, an adequate analysis will be made in this chapter to select the variables.

5.1 Dependent variable

The dependent variable of the model is the sales height of a promotion. As concluded in paragraph 2.4, the consumer demand will be forecasted as dependent variable. However, this variable can be predicted in numerous forms. Hereunder, the LF as a form of the promotional consumer demand is discussed.

5.1.1 Lift factor as dependent variable Paragraph 2.4 consumer demand was chosen as dependent variable. The promotional consumer demand as dependent variable can still be forecasted in multiple ways. A distinction is made between the absolute sales of a promotion and the LF of a promotion (Cooper et al, 1999, Wittink et al, 1988). The LF is defined as follows:

Promotional sales  Lift Factor = Formula 5-1 Base line sales

Page 16 Improving the promotion forecasting accuracy at Unilever Netherlands

The advantage of working with a LF as dependent variable is that the promotional sales volume is standardized against the base volume. As a result, the influence of the absolute sales height of a promotion has been removed from the model equation. The promotional sales is a given fact, but the way the base line sales is calculated is less straight forward. In this research the base line sales is calculated by averaging the base line sales of the 5 weeks before a promotion (consistent with Van den Heuvel, 2009). A time period of 5 weeks has been chosen to reduce the effect of irregularities in the base line sales. Furthermore, when a promotion occurs in these 5 weeks, which happens only occasionally, these promotional sales are not included in the base line sales. But, a substitute base line is calculated when a promotion takes place. No other corrections are made on the base line sales. Because the above approach works with a base line, the seasonality and trend effects are included, since the base line sales is already subjected to these effects. Therefore, seasonality and trend effects will not have to be included as independent variables in the model. 1 Summarizing, the LF of a promotion is preferred above the absolute sales of a promotion, since the absolute promotional sales does not provide a forecasting model with a clear reference (i.e. the absolute sales constantly differs because different products and retailers have different height of sales).

5.2 Independent variables

Van der Poel (2010a) published a list of 53 variables with a possible influence on promotional demand. Of these 53 variables a selection of 21 variables is made which are taken into account in this research. The other variables are excluded because of lack of data, complexity issues, irrelevance because of the supply chain perspective taken or a limited expected influence. The most important variables omitted are (1) the percentage of products which is on promotion within the category, (2) the percentage of products which was on promotion within the category last week, (3) promotions of competitors and (4) price discount of last promotion. The first three variables are excluded because of a lack of data and the last variable is excluded because of complexity issues. Figure 51 categorizes the 21 variables taken into account among the clusters Promotion, Retailer and Brand. The Promotion cluster is perceived as the most important before the Brand cluster and Retailer cluster. Figure 51 forms the backbone of this research. The split between the clusters is made to create a better overview and to gain insight in the effect sizes for promotion related variables, retailer related variables and brand related variables.

1 The model will use the last 5 weeks before a promotion to calculate the average base line sales. In practice a promotion has to be forecasted between 13 and 4 weeks in advance, when the base line sales is not available for the 5 weeks before a promotion. Than the base line sales will have to be calculated for the upcoming weeks with a simple trend and seasonal model.

Page 17 Improving the promotion forecasting accuracy at Unilever Netherlands

n Display n Folder Advertising Promotion p TV variables g Holiday n p Length promo Retailer n Absolute # of selling m Retailer Promotional n Price decrease variables sales Percentual points p Promo mechanism m Repeat buyers # of products p p in promotions Promotion Brand pressure variables Lift factor former promotions s Market m Preservability penetration s Size of product Susceptibility n = available data in Nielsen to stockpiling m p = data available in Frequency of promoplanner purchase s Product category m = data available at marketing n Summer products s = SAP data n n Weather g = general available data Winter products Figure 51: Variables with a likely influence on the promotional sales

The variables in Figure 51 are discussed in more detail in Table 51. Under type of promo, multiple promotion mechanisms are discussed in further detail. The minimum and maximum measurement values are shown in the third column and the scale of a variable is depicted in the fourth column of the table. The variables as described in the underlying table will be tested on their effect on promotional sales. Which effect is expected to occur for each variable is depicted in paragraph 6.3.

Variable Description Measurement Scale

• Promotion variables

Display The percentage of the selling stores in which the promotion is placed on a (0, 100) % Scale display (kopstelling). The variable is not available for Kruidvat and will be replaced with the average over the other observations (Cooper et al, 2003). Folder Depicts if the promotion is shown in the folder of the retailer. (0, 1) Nominal

TV support This variable states if the promotion is shown on television. Because of low (0, 1) Nominal data availability this variable is only available for the retailer Albert Heijn.

Holiday products The interaction effect between the holiday weeks (New year, Easter, (0, 1) Nominal Whitsunday, Christmas) and products which have higher sales during holiday weeks (luxury icecream). Promolength Length promotion in weeks (1 or 2 weeks). (1, 2) Scale

Absolute discount The absolute price decrease of a promotion measured per product. (0, 3.53) € Scale

Percentual discount The percentual price decrease of a promotion. (0, 100) % Scale

Page 18 Improving the promotion forecasting accuracy at Unilever Netherlands

Promo mechanism: The different promo mechanisms will be programmed with dummy variables:

 SPO  Single price off, the consumer receives discount when he buys at least (0, 1) Nominal one promotion product.  two for X  The consumer receives discount when he buys at least two promotion (0, 1) Nominal products.  three for X  The consumer receives discount when he buys at least three promotion (0, 1) Nominal products.  four or five for X  The consumer receives discount when he buys at least four or five (0, 1) Nominal promotion products.  Premiaat  The consumer receives a free non Unilever item with the promotion (0, 1) Nominal product(s).  Free product  The consumer receives a free Unilever product with the promotion. The (0, 1) Nominal free product is mostly a new Unilever product and cannot be compared with a for example 2+1 promotion, since the consumer is not able to choose the product he gets for free. Number of products in The number of SKU’s which are sold in the same promotion. (1, 366) Scale promotion

• Retailer variables

Retailer The retailer (Albert Heijn, Kruidvat, C1000, Plus) where the promotion is sold. (0, 1) Nominal

Growth # of selling points The number of selling points where the promotion is sold divided by the (47, 162) % Scale average number of selling points in the 5 weeks before the promotion period.

• Brand variables

Repeat buyers The percentage of repeat buyers of the product in a quarter of a year. (0, 100) % Scale

Promo pressure The percentage of products of the total sales which is sold in promotions. (0, 100) % Scale

LF former promotions SKU The natural logarithm of the average LF of historical promotions of the (0.32, 2.97) Scale product.

Market penetration The percentage of consumers who buy the product. (0, 100) % Scale

Preservability The preservability of a product in days with a maximum of 730 days. (84, 730) Scale

Size of product The size of a product in cubical centimetres. (194, 3444) Scale

Frequency of purchase The number of times a product is bought by consumers on average in a (1.5, 7.6) Scale quarter of a year. Product category The category to which the product belongs (Ice & beverages, Savoury & (0, 1) Nominal dressings, Spreads & Cooking, Home Care, Personal Care). Winter products The interaction effect between the average weekly temperature and products (2.91, 18.33) Scale temperature which report higher sales during cold weather. Summer products The interaction effect between the average weekly temperature and products (0, 20.80) Scale temperature which report higher sales during warm weather.

Table 51: Overview of the independent variables with a likely influence on promotional sales

Page 19 Improving the promotion forecasting accuracy at Unilever Netherlands

5.3 Transformations of (in)dependent variables

In this paragraph the variables which should be considered for transformation are discussed. Variables included in a linear regression analysis should meet the assumptions of parametric data (Field, 2005). For the dependent variable it is most important that these assumptions are met. The assumptions for parametric data are: 1. Normally distributed data 2. Interval data 3. Independent of other variables in or outside the model 4. Homogeneity of variance

In order to judge normality in large sample sizes, with more than 200 cases, one should look at the histogram of a variable and the value of the skewness and kurtosis instead of calculating their significance (Field, 2005). In appendix 2 the histogram of the variable “Lift Factor” shows obvious signs of nonnormality. Therefore, one of the advised transformations by Field (2005) is applied on this variable. The natural logarithm of the LF seems to meet the normality requirements. Furthermore, the variable fulfils the interval data assumption. The assumption of homogeneity of variance and independency will be tested for the regression model as a whole in paragraph 7.2. The research of Van Loo (2006) uses a different transformation of the LF, which will be discussed in paragraph 8.2.

Next, the independent variables will be tested on the above assumptions. Most of the independent variables are coded with dummy variables and thus do not qualify for transformation. On the variable LF former promotions SKU the same transformation is applied as above (see appendix 2), with similar results. Of the other independent variables which have interval data characteristics, “growth number of shops”, “absolute discount” and “size of product” are positively skewed and “percentage of repeat buyers” is negatively skewed. Therefore, a log(10) transformation is applied to these variables (Field, 2005). 2 According to the statistics in appendix 2, the normality of three variables (growth number of shops, absolute discount and size of product) has improved. Therefore, these variables are included in the analysis in their logarithmic form.

5.4 Assign baseline dummy variables

According to Field (2005) a baseline group should be chosen, when a characteristic is coded with dummy variables. The effect of the other dummy groups will be measured against the baseline group. There are three characteristics which are coded with dummy variables and need a baseline group. These are Retailer, Product category and promo mechanism. Albert Heijn is chosen as the

2 For the independent variable the natural logarithm performed slightly better than the Log 10 values. Therefore, the ln values are used. For the dependent variables the Log 10 transformation is used.

Page 20 Improving the promotion forecasting accuracy at Unilever Netherlands baseline variable for Retailer, Homecare as the baseline variable for “product category” and “4 or 5 for X” as the baseline variable for promo mechanism. Regarding the variable promo mechanism, 4.6% of the promotions have a double promo mechanism (e.g. three for 5 euro plus a free gadget). Moreover, because the percentage of double promo mechanisms is fairly small the impact on the choice of a baseline variable will be absent or minor.

Concluding, the dependent and independent variables have been chosen and the form in which they should be included in the model has been discussed as well. In chapter 4, multiple linear regression was already decided to be the most suitable method. Together, these chapters provided the fundamentals for the construction of a forecasting model. The next chapter will discuss the hypotheses and the different datasets that will be tested with the model.

6 Different data sets & hypotheses

In this chapter the different datasets that will be tested with the research model are discussed and the hypotheses are formulated for the direction of the variables and the performance of the data sets. The purpose of the chapter is to construct theoretical expectations (hypotheses) which will be tested in the model result part. First the data sets will be determined, which deviate from each other depending on the products categories that are included in the data set. The data set of promotions cannot be broken down unlimitedly into subsets, because of a minimum sample size that is required. First this minimum sample size is discussed, second the different data sets are discussed, third the hypotheses are discussed and fourth the measurement indicators for the model performance are discussed.

6.1 Sample size

According to Green (1991), the minimum acceptable sample size of a data set if one wants to test the overall fit of a model can be determined with the formula 50 + 8k, where k is the number of predictors. Table 51 discussed 33 3 predictor variables which will be included in the model. This results in a minimum sample size of 314 cases. This rule of thumb is very useful but oversimplifies the issue. As a final check the difference between the R2 and the adjusted R 2 should be analyzed. When the difference between those two measures is minor, the variance explained by the regression model is more likely to be generalizable to other datasets. The R2 and adjusted R 2 are compared in the model results.

3 This is more than the 21 variables named earlier in this research, because some variables of those 21 need to be coded with multiple dummy variables (e.g. Retailer)

Page 21 Improving the promotion forecasting accuracy at Unilever Netherlands

6.2 Data set split and reduction

Hence, the dataset should not be split in such a way that the minimum acceptable sample size is violated. Besides the data set split, an analysis of the outliers is discussed as well in this paragraph. The most obvious split in the dataset is between Food and Nonfood (Homecare and Personal care products, hereafter named HPC) SKU’s. Food SKU’s follow a different sales pattern than HPC SKU’s, where HPC SKU’s are more slow moving products and thus have a much lower sales. Other potential data set splits are on a retailer, product category and promo mechanism level. But, these splits do result in sample sizes which are not large enough for the different data sets and/or the demand patterns of the possible splits do not clearly differ. While the products on Food and HPC level do clearly differ in demand height and pattern. Hence the sample is split on this level which results in three models: All categories, Food categories and HPC categories.

Next, when exploring the outliers of a regression where all cases are included, it seems that most of the outliers originate from the products (see Appendix 3). More specific, 15 out of the 24 outliers originate from the Magnum products. After closer inspection, the Magnum products have very high LF’s. The first two staves in Figure 61 are the LF’s of two out of three Magnum products in the sample size and are considerably higher than the LF’s of the other products in the sample size. Since the Magnum SKU’s are responsible for over half of the outliers and have a very high LF, the different data sets will be tested with and without the three Magnum SKU’s. The high Lift Factor of the Magnum products are probably the result of the combination of the facts that Magnum is a very strong brand, that Magnum is an expensive brand with high absolute discounts in promotion and that Magnum is not often in promotion. However, the high average LF of Magnum is still remarkable against the average LF of the other Unilever brands. Besides the Magnum products, three other promotions have been deleted because their standard residual exceeded 3.5 (Appendix 3).

Page 22 Improving the promotion forecasting accuracy at Unilever Netherlands

Average Lift factor sample SKU's 60.00

50.00

40.00

30.00

20.00 Average Lift factor Lift Average

10.00

0.00 1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930 313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081 82 83 84 85 SKU

Figure 61: Lift Factors of the SKU’s in the sample size

As a result, the 5 data sets in Table 61 will be tested, where a first split is made between Food and HPC categories and a second split is made by the inclusion or exclusion of Magnum products.

Categories Data set Number of Cases calibration Cases validation number total cases period period All Data set 1 1235 989 246 Food Data set 2 482 388 94 HPC Data set 3 753 601 152 All w/o Magnum Data set 4 1211 968 243 Food w/o Magnum Data set 5 458 367 91

Table 61: Data set to be checked to analyze best performing data set

6.3 Hypotheses effect size variables and data sets

Hereunder, in Table 62, the hypotheses are formulated for the different independent variables and the performance of the data sets relative to each other. These hypotheses will be checked using the results of the model on the different data sets. The number of plus or minus signs indicate the expected size of the effect on the promotional sales. All variables originate from literature and in the research of Van der Poel (2010a) the source of each variable is depicted.

Page 23 Improving the promotion forecasting accuracy at Unilever Netherlands

Variable Effect Explanation • Hypotheses Promotion variables H1 Display ++ A promotion placed on a display will have higher promotional sales. H2 Folder ++ A promotion depicted in the folder will have higher promotional sales. H3 TV support ++ A promotion showed on TV will have higher promotional sales. H4 Holiday products + Products which are expected to sell better in holiday weeks are expected to have higher promotional sales in a holiday week. H5 Promolength ++ The longer a promotion the higher the promotional sales. H6 Absolute discount ++ A higher absolute discount will result in higher promotional sales. H7 Percentual discount ++ More percentual discount will result in higher promotional sales. H8 Promo mechanism Ranked from the expected most positive to the most negative effect on the promotional sales: Four or five for X, three for X, two for X, SPO, free product, premiaat. H9 Number of products in _ A promotion with more products in the same promotion will result in lower promotional promotion sales per SKU. • Hypotheses Retailer variables H10 Retailer Unknown which retailer will have a positive or negative effect. H11 Growth # of selling points ++ More extra selling points will result in higher promotional sales. • Hypotheses Brand variables H12 Repeat buyers _ A higher percentage of repeat buyers indicates a larger group of loyal consumers and likely a lower LF. H13 Promo pressure + A high promo pressure means relative low base sales. Hence, promotional pressure increases the LF, since this measure is dependent on the base. H14 LF former promotions SKU ++ Higher historical LF’s of a SKU indicate higher promotional sales. H15 Market penetration _ When a higher amount of consumers already buys the product there will be fewer consumers who switch to this product in promotion. H16 Preservability + Products with a longer preservability will have higher promotional sales. H17 Size of product _ The more space a product consumes the lower the susceptibility to stockpiling is, which is likely to result in a lower LF. H18 Frequency of purchase _ The higher the frequency of purchase the lower the susceptibility to stockpiling is, which is likely to result in a lower LF. H19 Product category Unknown which product category will have a positive or negative effect. H20 Winter products temp. _ Promotions in weeks with a low temperature will have a higher LF for “winter” products.

H21 Summer products temp. + Promotions in weeks with a high temperature will have a higher LF for “summer” products. • Hypotheses different datasets H22 Dataset 1 & 2 vs. dataset 4 The exclusion of the Magnum products will increase the model fit for dataset 4 & 5. & 5 H23 Dataset 3 & 5 vs. dataset 4 Breaking down the data set in Food and HPC categories makes the data sets more specific and will result in a higher model fit for dataset 3 & 5.

Table 62: Hypotheses of the effects sizes of the variables and the performance of the different data sets

Page 24 Improving the promotion forecasting accuracy at Unilever Netherlands

6.4 Measurement indicators hypotheses

Before the results will be discussed in the next chapter, general accepted measurement indicators to test the hypotheses will be specified in this paragraph. The measurement indicators can be distinguished on model performance and variable performance. The model performance is tested with two measurement indicators, the (adjusted) Rsquare and the MAPE. The Rsquare and adjusted Rsquare are calculated according to formulas 61 and 62. The (adjusted) Rsquare is a widely used measurement for the goodness of fit of a linear regression model and is used in other research on promotion forecasting as well (Van Loo (2006), Van den Heuvel (2009), Wittink et all (1988)). For the validation period of the models in this research only the Rsquare is depicted, because the adjusted Rsquare has no meaning when a model from a calibration period is fitted on a validation period. SS R 2 = mo del Formula 6-1 SS regression

n − 1 AdjustedR2=1 − (1 − R 2 ) Formula 6-2 n− p − 1

With n is the number of cases and p the number of predictors included in the model.

The other measure on model performance is the MAPE (mean absolute percentage error). The MAPE measure states the absolute error of a forecast for every single promotion and aggregates this for the total data set. Hence, the MAPE calculates the average absolute error of multiple forecasts and thus the inaccuracy of these forecasts. The most widely used MAPE measure uses the actual sales as the basis (formula 63). However, within Unilever the MAPE is based on the forecast and the maximum MAPE value is limited to 100% for each independent promotion (formula 64). Formula 6 5 transforms the MAPE (i.e. the forecast inaccuracy) to the forecast accuracy; where a lower MAPE percentage relates to a higher forecast accuracy.

1 n actual sales – forecast MAPE () actual = ∑ *100 Formula 6-3 n 1 actual sales

1 n  forecast actual sales   =     Formula 6-4 MAPE ()Unilever ∑  Min , 100*100   n 1  forecast  

Forecast accuracy = 100% MAPE Formula 6-5

To test the variable performance, one would like to know the effect size and direction a single

Page 25 Improving the promotion forecasting accuracy at Unilever Netherlands variable has in the total model. Linear regression models indicate the effect size and direction of a single variable with the Beta coefficients (β), which are situated in front of every variable in the equation. However, the Beta coefficient is an unstandardized measurement, since the scale for each variable differs (e.g. the dummy variable for folder versus the LF former promotions SKU). Therefore, the effect size and direction will be judged on the standardized Beta coefficients, which are corrected for the different scale of each variable.

Conclusion Part 2: This part depicted how to model the promotions of Unilever. In total 23 independent variables will be used to forecast the natural logarithmic value of the LF of the promotional demand (sub research question 3). Multiple linear regression is chosen as the best method to forecast promotional demand (sub research question 4).

Page 26 Improving the promotion forecasting accuracy at Unilever Netherlands

Part 3: Results full model

After defining the goals of the research and specifying the design of the research, this part will test the results of the model design. It will analyze which variables are most important for a promotion forecasting model and what the performance of the model is. This part serves as the starting point for the creation of a model which can be used within the Unilever organiza tion. The number of variables included in the full model in this chapter is quite numerous, but it indicates out the variables that should be included in the forecasting model for Unilever.

7 Regression analyses full model

The following chapter discusses the results of the full model for the five different data sets. The model is calibrated with the promotions of 2009 and validated with the promotions occurring in quarter 1 of 2010.

7.1 Overview most important dependent and independent variables

In Table 71 the descriptive statistics of the (un)transformed dependent variable and the most important continues variables of the research are depicted. In the second column, t he number of observations are shown (N), which have been reduced fr om 1238 promotions to 1211, because 3 outliers and 24 Magnum promotions have been excluded for data set 3, 4 and 5 (paragraph 6.2). Data set 1 and 2 still include the Magnum promotions. In the third column the minimum is shown 4. In the last three columns the mean, standard deviation and variance of the variables are shown. The mean over all the LF’s is 6.07, meaning that on average a promotion sells 6.07 times the base line sales within the taken sample size.

4 The LF of three promotions in the dataset is lower than one. A LF lower than one is very uncommon in promotions, since in that situation a promotion would sell less than the normal base line sales . Therefore, for the three cases where the LF is low er than one, the LF has been changed to one.

Page 27 Improving the promotion forecasting accuracy at Unilever Netherlands

N Min Max Mean Std. Dev. Var.

LF_promotions 1211 1.00 49.8 6.07 5.16 26.62

ln_LF_promotions 1211 0.00 3.91 1.56 0.68 0.46

ln_LF_former_promotions_EAN 1211 0.32 2.97 1.35 0.38 0.14

Display 1211 0.00 100.0 52.3 26.4 696

Percentual_discount 1211 0.00 57.0 24.2 16.0 255.7

Absolute_discount 1211 0.00 3.53 0.66 0.64 0.41

Table 71: Descriptive statistics most important variables in the model

7.2 Checking the assumptions underlying multiple linear regression

Next, the most important assumptions for linear regression are verified (Field, 2005):  Normality of dependent variable  Multicollinearity  Normality of the error distribution  Homoscedasticity (constant variance) of the errors  Linearity of the relationship between dependent and independent variables  Independence of the errors

A verification of the assumptions can be found in Appendix 4. The assumption analysis is performed on Data set 1 which includes all promotions of 2009. It is assumed that when the assumptions are met for this data set, they will be met for the other four data sets as well; since the other data sets are a large subset of data set 1. The analysis in Appendix 4 shows that all of the assumptions regarding a linear regression analysis are met.

7.3 Results full model

In this paragraph first the model performance for the different data sets is discussed and second the effect size and direction of individual variables are discussed. In Table 72 the model summary of the different data sets is depicted. The number of predictors varies between 14 and 21. The adjusted Rsquare values range between 0.575 and 0.704, indicating that there is a difference between the model fit of the five data sets. Data sets 1 & 2 with Magnum products included have a lower model fit than data sets 4 & 5. Table 72 shows the difference of the adjusted Rsquare values and MAPE values (H22 confirmed) . This indicates that the variability of Magnum products worsens the model fit. Furthermore, the adjusted Rsquare of data set 3 & 5 is slightly higher than data set 4 (0.703 and 0.705 against 0.697). However, the difference is too small to confirm that splitting the cases into a data set for Food and HPC results in a higher model fit (H23 unconfirmed) .

Page 28 Improving the promotion forecasting accuracy at Unilever Netherlands

calibration period (2009)

All Food HPC All w/o Food w/o (1) (2) (3) Magnum (4) Magnum (5)

sample size 989 388 601 968 367

number of predictors 21 16 14 19 18

R-square 0.635 0.593 0.711 0.697 0.713

adjusted R-square 0.627 0.575 0.704 0.691 0.698

MAPE (actuals) 31.7% 36.3% 27.2% 27.9% 26.3%

MAPE (Unilever) 29.7% 32.5% 26.8% 27.0% 24.9%

Table 72: Model summary of the full model

Concluding, the model fit of the models without Magnum products is better than the models where the Magnum products are included. However, splitting up the total data set in a Food and HPC data set does not result in an obvious better performance.

Besides the performance of the overall model the cause of the model fit is very interesting as well, i.e. which independent variables in the model are responsible for the model fit. The coefficients of the independent variables (B) accompanied with their significance level and standardized coefficients (Beta) are depicted in Table 73.

All Food HPC All w/o Magnum Food w/o

(1) (2) (3) (4) Magnum (5) B Beta B Beta B Beta B Beta B Beta (Constant) 3.697 3.923 2.799 3.401 4.621 Display 0.007 0.253* 0.004 0.165* 0.009 0.301* 0.008 0.317* 0.007 0.328* Folder 0.497 0.179* 0.726 0.299* 0.234 0.071* 0.467 0.180* 0.681 0.343* TV_support a a a a 0.311 0.07* a a a a Holiday_products a a a a a a a a a a Promo_length 0.572 0.353* 0.761 0.087** 0.535 0.405* 0.593 0.398* 1.015 0.146* log_absolute_discount 0.676 0.146* a a 0.473 0.134** 0.562 0.132* a a Percentual_discount 0.021 0.464* 0.014 0.163* 0.023 0.651* 0.022 0.543* 0.022 0.313* SPO b 0.153 0.074** a a 0.185 0.073** 0.207 0.107* 0.203 0.135** Two_for b 0.292 0.202* 0.199 0.129* 0.248 0.169* 0.300 0.224* 0.332 0.264* Three_for b 0.258 0.152* 0.152 0.075 0.232 0.154* 0.237 0.152* 0.262 0.160** Free_product b a a a a a a a a a a Premiaat b a a a a a a a a a a Number_of_products_in_promotion 0.001 0.132* 0.004 0.156* a a 0.001 0.105* 0.003 0.162*

Page 29 Improving the promotion forecasting accuracy at Unilever Netherlands

C1000 c 0.090 0.048** a a 0.122 0.058** 0.168 0.096* 0.190 0.134* Plus c 0.179 0.102* 0.336 0.192* a a 0.136 0.083* 0.235 0.166* Kruidvat c 0.503 0.327* a a 0.507 0.391* 0.509 0.36* a a log_growth_number_selling_points 3.815 0.306* 4.254 0.184* 3.714 0.356* 3.802 0.332* 4.373 0.231* Percentage_repeat_buyers a a a a a a a a a a Promotion_pressure 0.003 0.053 0.007 0.126** a a a a a a ln_LF_former_promotions_EAN 0.798 0.414* 0.856 0.453* 0.663 0.337* 0.600 0.336* 0.491 0.323* Market_penetration a a a a a a a a a a Preservability 0.001 0.291* 0.001 0.296* a a 0.001 0.168* 0.001 0.247* log_size_of_product a a a a a a 0.112 0.057 0.362 0.180* Frequency_of_purchase 0.043 0.059 0.098 0.132* a a 0.053 0.079* 0.115 0.193* Personalcare d a a a a 0.086 0.055 a a a a Ice_and_beverages d 0.310 0.133* 0.776 0.440* a a 0.164 0.069* 0.361 0.236* SCC_and_vitality_shots d 0.594 0.173* 0.454 0.185* a a 0.318 0.101* 0.396 0.204* Savoury_and_dressings d 0.146 0.090** a a a a a a a a winter_products_temp a a a a a a a a a a summer_products_temp 0.042 0.306* 0.052 0.519* a a a a a a * = significant at a 0.01 significance level ** = significant at a 0.05 significance level a The variable is not significant for this data set. b The baseline group for the different product categories is the product group “Four_or_five_for” c The baseline group for the different retailers is the retailer “Albert Heijn” d The baseline group for the different product groups is the product group “Homecare” Table 73: Unstandardized and standardized Beta coefficients with significance level for all 5 data sets

Based on the Beta coefficients in Table 73 the hypotheses drawn in paragraph 6.3 will be discussed. The promotional variables, Display, Folder and TV_support were expected to have a very positive effect on the promotional sales. Display indeed has a very positive effect, Folder has a positive effect; however, less than display. And TV_support only has an effect in the HPC dataset (H1 and H2 confirmed and H3 rejected). Because of the unexpected result for the variable TV support an extra analysis is performed. Since there is only data availability for promotions at the Albert Heijn for the variable TV_support, it is worth to check if the variable is significant if loaded for all promotions in the sample at Albert Heijn. Appendix 5 depicts the results for this single linear regression model. The standardized Beta coefficient is significant at a 0.003 level with a value of 0.141, which is still not very high. Since in a full model colinearity with other variables is likely to decrease this effect size, it is concluded that the effect size is medium to small. Therefore, in this research TV_support is not considered as an important variable for the model. However, when full information is available for all retailers, a new analysis is needed to test this conclusion.

The promotion variables Holiday_products and Promo_length were suspected to be positively correlated with the promotional sales. No significant effect is found at all for Holiday products.

Page 30 Improving the promotion forecasting accuracy at Unilever Netherlands

Products like luxury ice cream do sell more in Holiday period; however, apparently this effect is lost or hard to find for promotions in holiday period. Regarding the promo_length, this variable indeed has a high standardized Beta coefficient, especially for the HPC models and data set where all promotions are included. The effect in the Food data set is minor, since almost all promotions have a duration of 1 week in this data set. (H4 rejected and H5 confirmed).

Regarding the discount on a promotion, both log_absolute_discount and percentual_discount are expected to have a highly positive influence on promotional sales. However, only percentual_discount confirms this hypothesis and log_absolute_discount has no influence or even a significant negative influence in three of the data sets. The significant negative influence of the variable log_absolute_discount in the data sets 1, 3 and 4 on the LF is contradictory to the hypothesis. Since a higher absolute discount is very likely to result in higher promotional demand, this result requires further investigation. When this variable is the only dependent variable in the model the impact becomes positive with a Beta of 0.480 and a significance level of 0.000 (Appendix 6). Hence, correlation effects with other dependent variables are responsible for the negative effect in the final model. The highest correlation in the correlation matrix (Appendix 7) of 0.759 between the variable percentual_discount and log_absolute_discount is likely to be responsible for the negative effect in the full model 5. Therefore, the variables percentual discount and log_absolute_discount should not be included in the same model. These results are consistent with the results of Van Loo (2004) and Van den Heuvel (2006), where the absolute discount had no impact or a negative impact on the promotional sales. However, so far the absolute discount for a promotion is calculated per product. Another option is to calculate the absolute discount per offer, since a consumer is probable sensitive for the total discount received on an offer. Appendix 6 depicts individual linear regression analysis where the percentual discount, absolute discount per offer and absolute discount per product 6 are compared. Remarkably, both absolute discount measures result in a higher model fit and have a higher standardized Beta value. However, when the absolute discount per offer is included in the full model the effect reverses and becomes small (Appendix 6). Furthermore, a threshold effect could occur at the absolute discount per offer, i.e. consumers are only willing to especially go to a retailer for a promotion if the total discount per offer received is high enough. Appendix 6 tests this effect as well for SPO promotions and all for all promotion mechanisms together. However, no threshold effect is discovered. Altogether, the percentual discount might be a better predictor because both absolute discount variables correlate too much with other variables. This is reflected by the last table in 6 where the full models are fitted with the

5 All the other correlation heights in the correlation matrix in appendix 5 are below 0.8 as well (according to Field (2005) a correlation of 0.8 or higher indicates a multicolinearity problem). 6 No transformation is applied on the absolute discount per offer and the absolute discount per product to simplify the comparison, because the log transformation does only improve the results slightly.

Page 31 Improving the promotion forecasting accuracy at Unilever Netherlands three different discount variables. The model with percentual discount clearly has a higher model fit than the other two models (0.689 against 0.654 and 0.651). Concluding, in a full model the percentual discount is a better predictor than the absolute discount per offer or per product and both predictors do not function together in a model because the Beta coefficient of the absolute discount variable turns negative, which decreases the understandability of the model (H6 rejected and H7 confirmed) .

For the next hypothesis, the different promotion mechanisms, the result is less conclusive. The promotion mechanism Four_or_five_for_X functioned as the baseline variable and has the most positive impact on the promotional sales. SPO has a very similar result as Four_or_five_for_X and the mechanisms Two_for_X and Three_for_X have the most negative result. For the variables Premiaat and Free_product no effect is found which could indicate that their effect size is similar as the baseline group (four_or_five_for_X) or that their effect is insignificant. One would expect that promotions with a price off should sell better than promotions which offer a premiaat or (unrelated) Free_product. It could be that this effect is already inherited in the variable percentual_discount since both free_product and premiaat have no percentual discount. This can be tested by running a regression analysis on all promotions of 2009 where the promo mechanism variables are the only included independent variables. The results for this analysis are depicted in Appendix 8, where the variable four_or_five_for_X is maintained as the baseline variable. And indeed this confirms the hypothesis that the promo mechanisms Free_product and Premiaat have the most negative/least positive impact on promotional sales. However a SPO promotion still sells better than a two_for_X or three_for_X promotion, which was not expected in advance (H8 partly confirmed). The result of the variable number_of_products_in_promotions corresponds with the hypothesis that more products in the same promotions negatively affect the promotional sales (H9 confirmed).

Regarding the retailer variables, similar promotions sell better at the C1000, average at Albert Heijn and lower at Plus and Kruidvat. However, the retailer dummy variables are not significant for all data sets. C1000 and Plus have a significant effect in four of the five data sets; Kruidvat has a strong significant effect in all NonFood data sets. No clear hypothesis was drawn on forehand for this variable(s) (H10 not tested). Another variable which relates directly to the retailer is the number of selling points at which a promotion is sold. If that number of selling points in a promotion is higher than the usual number of selling points than the promotional sales is higher. The effect of the extra number of selling points is very strong (H11 confirmed) .

Next, the effect of the brand variables will be discussed. Both the percentage_of_repeat_buyers and the promotion_pressure of a SKU have almost no impact on the promotional sales. An explanation might be that the measures are not directly related to a promotion; therefore, clear effects decrease.

Page 32 Improving the promotion forecasting accuracy at Unilever Netherlands

(H12 and H13 rejected). The LF_of_former_promotions does have a very strong positive effect on the promotional sales. Meaning that when a SKU had high promotional sales in the past, it is more likely to have high promotional sales in the future (H14 confirmed). For the variable market_penetration no effect is found, meaning that the promotional sales is not affected by the penetration a product has in the market (H15 rejected).

The variables preservability, size_of_product and frequency_of_purchase are inherited in the research to describe the susceptibility of stockpiling. Products with a longer preservability, a smaller size and a lower frequency of purchase are thought to be more susceptible for stockpiling and to have higher promotional sales. Indeed a longer preservability and a lower frequency of purchase result in higher promotional sales, especially in the food categories. This might be caused by a more frequent shopping pattern for food categories, which makes the variable frequency of purchase more important. Also, the fact that the preservability is of less importance in the HPC categories explains the lack of effect in the HPC data set (H16 and H18 confirmed). The size of a product positively affects the promotional sales. This is not in line with the hypothesis and could be the result of the higher value of large products, which concurs with the absolute discount of a product (H17 rejected) .

For the different categories (Homecare, Personalcare, Savoury & Dressings, Ice & Beverages and Spreads & Cooking) no conclusive results are found over the different data sets for the direction and magnitude of the categories. Only for the categories Ice_and_beverages and SCC_and_vitality_shots a medium effect size is found, where Ice_and_beverages has a negative impact on promotional sales and SCC_and_vitality_shots has a positive impact on promotional sales. Again as with the retailer it was unclear in advance which effects should be expected (H19 not tested).

For the variables winter_products_temp and summer_products_temp the effect of the temperature is tested on temperature sensitive products. Temperature is expected to have a positive effect on summer products since sales is expected to be higher at higher temperatures and a negative effect on winter products, since sales is expected to be higher at lower temperatures. However, no effects are found for both variables. A reason could be that seasonality effects are already taken into account and the extra temperature differences per week are not significant enough to be found. A more detailed research on temperatures for temperature sensitive products would most likely find an effect. However, because of the inclusion of all products, temperature does not produce a better forecast (H20 and H21 rejected).

7.4 Validation full model

The results in the previous paragraph sketched the performance of the full model fitted on the promotional datasets of 2009. To test the robustness of the model, the promotions of 2010 are

Page 33 Improving the promotion forecasting accuracy at Unilever Netherlands forecasted with the same variables and coefficients as in the model of 2009. Continuously, the sample size and number of predictors in the validation period are equal to the calibration period. The results of this robustness check are depicted in Table 74. The Rsquare 7 of the data sets is slightly higher than the Rsquare in the calibration period. Hence, it can be concluded that the model fitted on the calibration data sets is robust and generalizable for other data periods. Furthermore, the data sets without the Magnum products have a higher Rsquare and a lower MAPE (in line with H22). And the more specific HPC and Food data sets do not generate in both data sets (H23 not confirmed) .

validation period (Q1 2010)

All Food HPC All w/o Food w/o (1) (2) (3) Magnum (4) Magnum (5)

sample size 246 94 152 243 91

number of predictors 21 16 14 19 18

R-square 0.676 0.574 0.742 0.713 0.711

MAPE (actuals) 33.9% 34.9% 31.8% 31.4% 28.8%

MAPE (Unilever) 34.6% 42.3% 29.1% 31.6% 32.9%

Table 74: Model summary validation period for the full model

8 Generalizability of model results

In this chapter the generalizability of the model results will be discussed. First, the generalizability of the sample size taken within Unilever will be discussed. Second, the research is compared with other research in the field. The goal of this chapter is to check if the results are generalizable within Unilever and if the results are consistent with other research in the field. If not, further investigation will be done.

8.1 Generalizability of sample size

The sample size within Unilever is defined on the dimensions retailer, time, region and products. Of these the choice for region and time are assumed not to disturb the sample size, since the region is the whole of the Netherlands and the time horizon is longer than 1 year. The retailers in the sample size are all among the larger retailers in the Netherlands. Furthermore, the retailers included in the sample are the most important retailers for Unilever in terms of volume and thus also for the Unilever wide forecast accuracy. Because of their high sales, the impact on safety stock levels of

7 No adjusted Rsquare is reported for the validation period, because the adjuste Rsquare only makes sense in the calibration period.

Page 34 Improving the promotion forecasting accuracy at Unilever Netherlands

Unilever is higher than that of smaller retailer. Hence, it is concluded that the retailers in this research form a solid representative base for the sample size. Lastly, the selection of SKU’s included in the sample size is analyzed. In paragraph 3.4 the criteria for including SKU’s are stated. The sample size selection has been taken over all different product categories of Unilever. However, it would still be possible that the sample size selection is not representative for all products of Unilever. Especially the 3 rd criteria in paragraph 3.4, that more high volume SKU’s should be included, could cause an unrepresentative sample size. One way of checking the effect of this assumption is to analyze the sample size on ABC classification. The ABC classification is a method used within Unilever to rank SKU’s on their importance. In this classification A SKU’s are high volume, high turnover and high gross profit SKU’s, and C SKU’s are low volume, low turnover and low gross profit SKU’s 8. And the ABC classification is not made over all products of Unilever at once, but over the five different categories named in paragraph 1.2. Figure 81 shows the normal deviation within Unilever and the deviation in the sample size. Within the sample size the A SKU’s are overrepresented, the B SKU’s and the C SKU’s are underrepresented.

ABC partition Unilever (based on volume) ABC partition sample size (based on volume)

C - products C - products A - products 7% 20% 20% A - products B - products 51% 42%

B - products 60%

Figure 81: ABC partition for all products of Unilever and for the sample size (based on volume)

The next step is to analyze what the effect of this deviation of the normal situation is on the performance. Figure 82 depicts the full model MAPE values of data set 4 (all promotions without Magnum products) for the A, B and C SKU’s. The C SKU’s perform the worst, the A SKU’s are in the middle and the B SKU’s perform best. One would expect that the MAPE values decrease for A SKU’s because of the higher sales volumes of these SKU’s. Normally, higher volumes should result in a decrease of variance. Table 81 shows the average and the variance of the LF’s for the SKU classification. Interestingly, the variance for A SKU’s is higher than the variance of the other SKU’s, which explains the difference in MAPE values. The difference in variance is very large meaning that

8 The ABC classification is based on these three criteria. However, the sales departments have the final call over the ABC classification.

Page 35 Improving the promotion forecasting accuracy at Unilever Netherlands

A SKU’s contain a lot more variance than B or C SKU’s. A closer look to the data suggest that the very large LF’s of a SKU have a large contribution to the total variance of that SKU. Table 81 indeed depicts that A SKU’s contain more very high LF’s (20 or higher) than B and C SKU’s. This could be caused by the fact that A SKU’s are more often severely promoted and the forecasting model might not be able to adequately forecast such heavy promotions. Another remarkable issue in Figure 82 is that the MAPE (actuals) value is higher for A SKU’s than the MAPE (Unilever) value. At the C SKU’s this is the other way around. This arises from the fact that in the calculation of the MAPE (actuals) overforecasting is heavier punished and in the calculation of the MAPE (Unilever) underforecasting is punished more severely. Meaning that, A SKU’s tend to be overforecasted and C SKU’s tend to be underforecasted in the model.

40.0% 35.0% 30.0% 25.0% 20.0% MAPE (actuals) 15.0% MAPE (Unilever) 10.0% 5.0% 0.0% All A B C

Figure 82: Comparison of MAPE values data set 4 over the ABC classification

Average Variance Average number of Number of promotions SKU type promotions per SKU with a LF higher than 20 per SKU

A 7.51 34.95 19.31 0.938

B 6.37 12.87 13.81 0.269

C 6.15 16.82 12.80 0.200

Table 81: The average, variance, number of promotions and number of high LF’s on the ABC classification

Concluding, the sample size does deviate from the total Unilever product portfolio on the ABC classification. However, this has no clear implication on the performance and generalizability of the model. Furthermore, it has been reasoned that the choice of time horizon, region and retailer are done in such a way that the sample size is generalizable. One other aspect which could disturb the sample size is the exclusion of SKU’s which are sold less than a year. Newer products tend to be more difficult to forecast, because of the lack of stable base line sales and the lack of historical

Page 36 Improving the promotion forecasting accuracy at Unilever Netherlands comparable promotions. This will hold for the model as well as the current promotion forecasting process of Unilever. It is difficult to determine what the impact is on new products. This research is focussed on more stable products, since the effect size of dependent variables is easier determined for these promotions.

8.2 Comparison with other research in the field

In van der Poel (2010a) the available research on promotion forecasting was split in two parts. The first paragraph was theoretical research papers and the second part was about more practical master theses. To judge if the results of this research are comparable, on what aspects the research differs and what implications these differences have for the result of the forecasting model, a comparison will be made in this paragraph. Both the theoretical research papers and the practical master theses will be included in this comparison. The advantage of the research papers is that the approach is more scientific and the advantage of the master theses is that the model and model performance have been described more extensively. The following research will be included in the comparison: • Cooper et al: PromoCast ™: A New Forecasting Method for Promotion Planning. • Wittink et al: SCAN*PRO: the estimation, validation and use of promotional effects based on scanner data (internal paper). • Van Loo: OutofStock reductie van actieartikelen, Model voor vraagvoorspelling en logistieke aansturing van actieartikelen bij Schuitema/C1000. • Van den Heuvel: Action products at Jan Linders Supermarkets.

Table 82 makes a comparison between the different research papers on promotion forecasting. All four papers have been performed from a retailer point of view. Furthermore, the SCAN*PRO and Promocast models are directed at the store level of a retailer instead of the supply chain level. All methods use linear regression. Regarding the performance of the models, the paper of the Promocast model does not contain any comparable performance measures, since the authors measure the number of case packs missed. For the other models the performance measures differ considerably. The adjusted Rsquare of the models of Van Loo en Van den Heuvel is similar, while the adjusted Rsquare of this research is substantially higher. Regarding the MAPE, the model of Van der Poel and Van Loo perform similar. However, the MAPE calculation of Van Loo is not based on the absolute sales number but on a transformation of the LF. Since this transformation brings the values of the dependent variable closer together, this measure understates the real MAPE values (based on absolute sales).

Page 37 Improving the promotion forecasting accuracy at Unilever Netherlands

Van der SCAN*PRO Promo- Van Loo Van den Poel - model cast Heuvel Year 2010 1988 1999 2006 2009 Point of view Manufacturer Retailer Retailer Retailer Retailer Commercial use No Yes Yes No No Aggregation level Supply chain Store level Store Supply Supply level chain chain Method Linear Linear Linear Linear Linear regression regression regression regression regression Ln LF as dependent var. Yes Yes Yes No No Sample size 1238 20801 n.a. 1556 n.a. Average LF 6.08 n.a. n.a. 9.04 4.59 Variance 26.95 n.a. n.a. 29.93 7.84 Standard deviation 5.19 n.a. n.a. 5.47 2.80 Minimum 1.00 n.a. n.a. 1.13 1.00 Maximum 49.38 n.a. n.a. 34.00 14.28 Adjusted R-square 0.691 a 0.507 b n.a. 0.45 0.44 MAPE validation period (full model) 31.3% 37.1% n.a. 31.1% c n.a. a : The adjusted Rsquare of the full model of data set 4 is taken here. b : MAPE value of SCAN*PRO model in research Van Loo (2006). c : The MAPE calculation in the research of Van Loo seems to be based on the ln of the LF. This calculation understates the MAPE based on absolute promotional demand. Table 82: Comparison research on promotions forecasting

All models in Table 82 differ substantially in performance 9. To investigate where this difference in performance originates from Table 83 shows an overview of the most important variables included in the research. The current research is taken as the frame of reference. The SCAN*PROmodel is a concise model, where only a few important variables are taken into account. The Promocast model is by far the most elaborate model with 67 independent variables. This model makes extensive use of LF’s of former promotions and since the model is directed at the store level, the promotion database is a lot larger. The model of Van Loo does not include the important variables display and folder, which are included in all other models and are among the most important variables. The model of Van den Heuvel includes the most important variables and contains some interesting research on the effect of other actions in the same product category and the effect of Out of Stocks.

9 The performance of the Promocast model is not available for the Rsquare and MAPE measures. The paper on that model only states the performance in case pack size difference on retailer store level.

Page 38 Improving the promotion forecasting accuracy at Unilever Netherlands

The adjusted Rsquare model performance is known for four of the five models. The performance of the model build in this research compared to the other models is considerably higher. Here, the underlying factors for this difference will be discussed. The adjusted Rsquare of the model in this research might be higher, because Van Loo did not include the critical variables display and folder. Furthermore, Van Loo, the SCAN*PROmodel and Van den Heuvel did not include all of the following variables: promo mechanism, the average LF of former promotions, the number of products in promotion, the growth of the number of selling points, the size of a product, preservability and TV support. Finally, Van Loo did not transform the LF as dependent variable and thus the dependent variable is not normally distributed. This has a very negative impact on the performance of the model. Altogether, the model of Promocast is the most sophisticated model regarding the included variables. However, the results of this model cannot be compared and the model is directed at the store level of a retailer.

Van der SCAN Promo- Van Loo Van den Poel *PRO cast Heuvel model Retailer x n.a. n.a. n.a. n.a. Product category x x x LF former promotions SKU x x Display x x x x Folder x x x x Promo-length x x All 1 week All 1 week Promo mechanism x x x tv-support x x Number of products in promotion x Growth # of selling points x n.a. n.a. Percentual discount x x x x x Size of product x Preservability x x number of actions in same product group No data x More specific data on display location No data x More specific data on size and place folder No data x advertisement LF former promotions SKU with matching Not enough x advertisement and display data n.a.: Not applicable in this model because model is build at a single retailer or model is build on store level

Table 83: Comparison of the variables included in the different promotion forecasting research

Lastly, Van Loo (2006) used a different dependent variable than the other research. In his research Van Loo fitted a log normal distribution on the LF’s and then used the cumulative lognormal distribution of each LF as dependent variable (P(LF)). In the research Van Loo concluded that this

Page 39 Improving the promotion forecasting accuracy at Unilever Netherlands measure gave superior results against the LF of a promotion; however, no comparison was made with another widely used dependent variable in literature, the ln transformation of a LF. Appendix 9 shows the results if the P(LF) is used in the model of this research instead of the ln(LF). The new dependent variable is tested on the full model on data set 4. The results indicate that indeed the P(LF) gives superior results against the LF. But the P(LF) has a lower model fit than the ln(LF). An explanation might be that the ln(LF) meets the requirements of a normal distribution better than the P(LF) as shown in appendix 9.

Concluding, this chapter provided insight in the generalizability of the model results by checking the assumptions underlying linear regression, analyzing the generalizability of the sample size and comparing the research with other relevant research in the field. The sample size is regarded to be generalizable over the other Unilever SKU’s. Only the introduction of a new SKU will cause deviation from the current sample size and quite likely lower the performance. But, forecasting new SKU’s has always been difficult. Regarding the comparison against other research, the results of the model constructed in this paper are quite high. The inclusion of important variables in this research, which were not included in the comparable research, is very likely to be responsible for the good model fit.

Conclusion part 3: In this part the results for the full model were depicted. Both the model fit and forecast accuracy values are quite high for the full model. However, the full model on consumer demand level only provides the first part of the total picture. Because of the functional requirements stated in paragraph 2.4, the full model needs to be adapted to the retailer demand level. This will be done in the next part, so the model becomes useful in practice.

Page 40 Improving the promotion forecasting accuracy at Unilever Netherlands

Part 4: Model adaptation

The results of the full model provided det ailed insights in the effect size of the different variables and the performance of the different data sets. In order to translate these results into a model that can be used in practice, this part decreases the number of variables in the models. Then the models will be evaluated based on their model results. This gives insight in the performance of the reduced model against the full model and thus in the practical usefulness of the forecasting model.

There are three main reasons to adapt the full model o f the previous part: 1. To increase the usability of the model in practice. 2. To correct for data availability in practice. 3. Adapt the model of consumer demand to retailer orders .

The first adaptation will result in a model with 5 to 10 variables, since this nu mber of variables is still useful in practice (interviews Unilever). The number of variables has to be limited because an employee of Unilever should be able to quickly work with the model. The variables will be selected on their effect size and direction. The second adaptation will inspect the variables included in the first adaptation on data availability. The variables which are normally not known within Unilever will be deleted from the set of variables. Hence, adaptation two includes the same variables of adaptation one without the variables with low data availability. In the third adaptation, the consumer demand will be adjusted to the retailer orders. The consumer demand serves as the basis for the discussion with the retailer and for the On Shelf Ava ilability of a product, but in the end the retailer orders form the real demand that should be met within Unilever.

The adapted models will be tested on data sets 3, 4 and 5, because the disturbing effect of the Magnum products was too large to include t hese products in the further analysis.

Page 41 Improving the promotion forecasting accuracy at Unilever Netherlands

9 Adaptations to increase the usability and check for data availability

9.1 Adaptation 1: Increase the usability by reducing the number of variables

To reduce the number of variables in the full model some criteria are needed. The goal is to reduce the number of variables to less than ten variables and analyze the impact of the reduction of variables on the performance of the model. The criteria to select the variables are: 1. A strong effect in the three data sets for the full model, i.e. an average standardized Beta of 0.150 or higher over the three data sets. 2. A persistent effect in the three data sets for the full model, i.e. the standardized Beta does not have an opposing direction in the three data sets.

Analysis of the standardized Beta coefficients of the full model in Table 73 leaves nine variables which meet these criteria: Display, Folder, Promo_length, Percentual_discount, Two_for_X, Three_for_X, log_growth_number_selling_points, ln_LF_former_promotions_EAN and Kruidvat. Since the SPO is significant as well and falls under the same variable (promo mechanism) as Two_for_X and Three_for_X this variable is included as well. This argument also holds for the retailers C1000 and Plus, which fall under the same variable as Kruidvat, namely retailer. Table 91 shows the model results of the calibration and validation period. The number of variables is larger than the functional requirement of 10 variables; but, the variables SPO, Two_for_X and Three_for_X as well as the variables Kruidvat, C1000 and Plus are dummy variables for the variable promo mechanism and retailer and can be regarded as one variable in practice, since an employee only needs to complete one data field. Hence, the number of variables comes to 8.

Calibration period Validation period

HPC All w/o Food w/o HPC All w/o Food w/o (3) Magnum (4) Magnum (5) (3) Magnum (4) Magnum (5) sample size 601 968 367 152 243 91

# of predictors 11 / (8) 12 / (8) 11 / (8) 11 / (8) 12 / (8) 11 / (8)

R-square 0.704 0.676 0.646 0.723 0.703 0.663 adjusted R-square 0.698 0.671 0.635

MAPE (actuals) 27.6% 28.8% 29.7% 32.8% 31.3% 30.6%

MAPE (Unilever) 27.2% 27.8% 27.8% 30.2% 31.1% 34.6%

Table 91: Summary of the reduced model with a limited number of variables (adaptation 1)

Logically, compared with the full model the adjusted Rsquare values decrease, since fewer variables are used to fit the data. Furthermore, data set 3 performs slightly better than data set 4 and data

Page 42 Improving the promotion forecasting accuracy at Unilever Netherlands set 5; however, this difference is minor with average MAPE values from 27.2% to 29.7% in the calibration period. In the full model the performance of the HPC data set was slightly worse than the performance of the Food data set. This is probably caused by the fact that most promotions of Home and Personal care products occur at Kruidvat where there is no information available for the variable display. And since this information was already not available in the full model, the decrease of the model fit in data set 3 is less than the decrease of data set 4 and 5. The Bcoefficients and Standardized Beta coefficients of the data sets are depicted in Table 92. The variables Display, Promo_length, Percentual_discount, Kruidvat, ln_LF_former_promotions_EAN, and log_growth_number_selling_point have the largest influence in the model (standardized Beta values of 0.3 or higher in data set 4). Furthermore, all variables are highly significant in all three data sets, except for Kruidvat which of course has no effect in the Food data set.

HPC All w/o Magnum Food w/o Magnum (3) (4) (5)

B Beta B Beta B Beta

(Constant) 2.685 3.162 3.804

Display_1 0.009 0.309* 0.008 0.311* 0.007 0.354*

Folder 0.299 0.09* 0.414 0.16* 0.542 0.273*

Promo_length 0.534 0.404* 0.594 0.399* 1.079 0.155*

Percentual_discount 0.018 0.524* 0.020 0.483* 0.021 0.302*

SPO 0.201 0.079* 0.243 0.125* 0.188 0.125

Two_for 0.260 0.178* 0.287 0.214* 0.256 0.204*

Three_for 0.229 0.152* 0.209 0.134* 0.190 0.116

C1000 a a 0.210 0.12* 0.300 0.211*

Plus 0.099 0.055** 0.102 0.063* 0.126 0.089**

Kruidvat 0.556 0.428* 0.468 0.331* a a

log_growth_number_selling_points 3.714 0.356* 4.145 0.362* 4.062 0.215*

ln_LF_former_promotions_EAN 0.617 0.314* 0.615 0.345* 0.602 0.397*

* = significant with a 0.01 significance level ** = significant with a 0.05 significance level a The variable is not significant for this data set Table 92: B and standardized Beta coefficients for the reduced model with limited variables

Concluding, in the first adaptation the number of variables has decreased from 19 to 12 (based on data set 4), whilst the model fit has almost not decreased. This is promising news for the implementation phase.

Page 43 Improving the promotion forecasting accuracy at Unilever Netherlands

9.2 Adaptation 2: Increase the usability by checking for data availability

In the second adaptation of the full model, the variables will be checked for data availability. In order to use the model in practice, variables in the model should be readily available for employees of Unilever. If not, the process of using the model will be too time consuming, unclear or not possible. As a starting point the variables included in adaptation 1 are used. For the variables Display and Log_growth_number_of_selling_points Unilever has no or limited information. Regarding the variable Display, Unilever often does not know if and in how much stores a specific product has a second placement. Regarding the growth of the number of selling points, Unilever receives very limited information of a retailer about the number of selling points. The variables included in the model are: Folder, Promo_length, Percentual_discount, SPO, Two_for, Three_for, C1000, Plus, Kruidvat, and ln_LF_former_promotions_EAN. Table 93 depicts the model results of the calibration and validation period. Again the dummy variables coding the variables retailer and promo mechanism can be regarded as one in practice. Hence, 6 variables are included in each data set 10 .

Calibration period Validation period

HPC All w/o Food w/o HPC All w/o Food w/o (3) Magnum (4) Magnum (5) (3) Magnum (4) Magnum (5) sample size 601 968 367 152 243 91

# of predictors 10 / (6) 9 / (6) 7 / (6) 10 / (6) 9 / (6) 7 / (6)

R-square 0.548 0.505 0.517 0.505 0.496 0.560 adjusted R-square 0.540 0.501 0.508

MAPE (actuals) 35.8% 36.8% 35.2% 49.0% 46.6% 34.4%

MAPE (Unilever) 33.2% 33.0% 31.1% 36.1% 38.5% 39.2%

Table 93: Summary of the reduced model corrected for data availability (adaptation 2)

Again the adjusted Rsquare decreases, since fewer variables are used to fit the data. As a result the MAPE values in the calibration period decrease as well. The MAPE values in the validation period show similar results. The different data sets have a very similar model fit in the calibration period with the Food model performing slightly better. However, in the validation period the Food model performs quite a lot better than the HPC model for the MAPE (actuals), but not for the MAPE (Unilever). When the MAPE (actuals) value is higher than the MAPE (Unilever) value this indicates underforecasting. When the MAPE (actuals) value is lower than the MAPE (Unilever) value this indicates overforecasting. In this case the promotions for the HPC data set are slightly underforecasted and the promotions for the Food data set are slightly overforecasted. Overall, the

10 The variables excluded in some of the data sets are dummy variables which fall under retailer or promo mechanism. Therefore, the number of variables can be 6 for each data set.

Page 44 Improving the promotion forecasting accuracy at Unilever Netherlands performance of the model decreased considerably because of the exclusion of the variables with limited data availability (Display and Log_growth_number_of_selling_points). And the impact is more severe on the HPC data set. The Bcoefficients and Standardized Beta coefficients of the data sets are depicted in Table 94. The variables Folder, Promo_length, Percentual_discount and Kruidvat have the largest influence in the model (standardized Beta of 0.150 or higher in data set 4).

HPC All w/o Magnum Food w/o (3) (4) Magnum (5)

B Beta B Beta B Beta

(Constant) 0.828 1.316 1.845

Folder 0.436 0.132* 0.652 0.251* 0.789 0.398*

Promo_length 0.572 0.433* 0.647 0.434* 1.036 0.149*

Percentual_discount 0.022 0.632* 0.023 0.553* 0.019 0.266*

SPO 0.202 0.08** 0.189 0.097* 0.126 0.084

Two_for 0.223 0.152* 0.187 0.139* 0.169 0.135*

Three_for 0.154 0.103 a a a a C1000 0.170 0.08** 0.244 0.139* 0.326 0.229*

Plus 0.261 0.146* 0.111 0.068** a a

Kruidvat 0.377 0.291* 0.268 0.189* a a ln_LF_former_promotions_EAN 0.740 0.376* 0.746 0.419* 0.739 0.487*

* = significant with a 0.01 significance level ** = significant with a 0.05 significance level a The variable is not significant for this data set

Table 94: Beta coefficients for the reduced model with limited information (adaptation 2)

9.3 Comparison of the different adaptations with the full model

This paragraph discusses how the different models in this chapter perform relative to each other and the full model. In order to compare the different models, the results of the models are depicted in Table 95. The first conclusion is that the full model performs best on all measurements in the calibration period, followed by adaption 1. Adaptation 2 performs the worst of all. In the validation period the Full model and the model of adaptation 1 perform very similar and again the model of adaptation 2 performs far worse. Overall, the exclusion of the less important variables has very limited or no result at all on the model performance. However, the exclusion of two important variables (because of data availability) does have a substantial effect. Hence, Unilever should focus on obtaining data availability on all variables in Adaption 1.

Page 45 Improving the promotion forecasting accuracy at Unilever Netherlands

Adaptation 1: Adaptation 2: decrease number adjust for data Full model of variables availability

Calibration period adj. Rsquare 0.691 0.671 0.501

MAPE (actuals) 27.9% 28.8% 36.8%

MAPE (Unilever) 27.0% 27.8% 33.0%

Validation period MAPE (actuals) 31.4% 31.3% 46.6%

MAPE (Unilever) 31.6% 31.1% 38.5%

Table 95: Comparison results data set 4 for the full and reduced models predicting consumer demand

10 Model adaptation 3: From consumer demand to retailer orders

The goal of this chapter is to check if the variables used to forecast consumer demand also predict the retailer orders in a satisfying manner. In order to do so, the variables of the full model will be fitted on the retailer orders to gain insight in the difference between consumer demand and retailer orders. The first paragraph explains the calculation of the retailer orders for a single promotion. Thereafter, the full model of chapter 7 is fitted on the retailer orders to analyze how accurate the variables forecast the retailer orders.

10.1 Calculation retailer orders

The retailer orders connected to certain promotions are delivered in multiple weeks to the distribution centre (DC) of a retailer. Furthermore, a retailer still orders products for its base demand in the weeks prior to the promotion. This increases the complexity of connecting the retailer orders to a certain promotion on the shopping floor. However, a promo indicator shows which retailer orders can be connected to the promotional sales. Furthermore, almost all promotional orders are delivered two weeks in advance of the promotion up to the promotion week itself. Therefore, retailer orders for a promotion are defined as orders with a promo indicator in week X2, X1 and X, with X as the promotion week. For the promotions of Kruidvat the retailer orders are not available, because no distinction is made between promotion orders and base orders for Kruidvat. For the other retailers the promotion orders are available. Hence, the following analysis will only be done for Albert Heijn, C1000 and Plus and not for Kruidvat.

10.2 Model fit on retailer orders

The retailer orders calculated are used to determine the LF of a promotion. The LF is calculated according to formula 101, where the base line sales are still based on the consumer demand. However, the upper part of the fraction has changed from consumer demand to retailer orders.

Page 46 Improving the promotion forecasting accuracy at Unilever Netherlands

Retailer orders Lift Factor = Formula 10-1 Base line sales

Table 10.1 depicts the results for the calibration and validation period. Contrary to the models which predicted the consumer demand, the model which predicts the retailer orders is not robust. This is especially true for the HPC data set where the Rsquare value decreases from 0.358 in the calibration period to 0.000 in the validation period. This means that taking the average of the dependent variables predicts the promotional sales equally bad as the model does. For the Food data set, the model fit is a lot better with a MAPE (actual) in the validation period of 33.5%. Hence, the added variability in the food categories is a lot lower than the added variability in the HPC categories. This causes which increase the variability will be discussed in the next paragraph. Concluding, for the forecast for retailer orders, other methods have to be analyzed, since the direct forecast of the retailer orders is too inaccurate, not robust and does not provide a basis to discuss the expected demand of a promotion with a retailer.

calibration period validation period HPC All w/o Food w/o HPC All w/o Food w/o (3) Magnum (4) Magnum (5) (3) Magnum (4) Magnum (5) sample size 289 641 352 82 172 90 # of predictors 15 15 14 15 15 14 R-square 0.358 0.403 0.538 0 0.150 0.577 adjusted R-square 0.323 0.389 0.519 MAPE (actuals) 77.8% 70.1% 48.1% 285.2% 126.7% 33.5% MAPE (Unilever) 42.2% 36.9% 30.5% 48.9% 40.0% 34.9%

Table 101: Model results with retailer orders as dependent variable

10.3 Difference between retailer orders and consumer demand

In the last paragraph the independent variables were fitted on the retailer orders instead of the consumer demand. It turned out that the model was less capable of predicting the retailer orders than the consumer demand. To gain insight in the difference between retailer orders and consumer demand, this paragraph will calculate the percentual difference between both. This way an alternative manner to adapt the consumer demand to retailer orders is hopefully found in this paragraph. The consumer demand is known and the retailer orders were calculated in the first paragraph of this chapter. Table 102 displays the absolute difference and non absolute difference, where the non absolute difference would be zero on average if the retailer orders and consumer demand were similar (see formula 102 and 103). However, as expected the retailer orders are larger than the consumer demand. Moreover, the difference between the retailers is quite large,

Page 47 Improving the promotion forecasting accuracy at Unilever Netherlands from 39.9% up to 86,8%. Albert Heijn has the lowest difference and Plus the highest.

ABS(retailer ordersconsumer demand) Absolute difference = Formula 10-2 consumer demand

retailer ordersconsumer demand Difference = Formula 10-3 consumer demand

Absolute difference Difference consumer consumer demand & demand & retailer retailer orders orders All 67.7% 53.9% Albert Heijn 58.0% 39.9% Plus 97.5% 86.8% C1000 53.6% 46.0% Kruidvat n.a. n.a.

Table 102: Difference between consumer demand and retailer orders for each retailer

But why do the retailer orders differ from the consumer demand? Figure 101 depicts the most plausible disturbing factors. First, forward buy could result in higher retailer orders. Retailers invest in forward buy because of the lower purchase price they pay for a product when the product is on promotion. However, most of the retailers included in this research receive their discount on the purchase price on the bases of scanning data (consumer demand). This is the case for Albert Heijn and C1000. Plus still received full discount for all the products they ordered in promotion up to January 2010. This is a reasonable explanation for the large difference between the consumer demand and retailer orders for a promotion at plus. Second, the DC stock levels and store stock levels have an influence on the retailer orders. When there is a lot of stock available in the stores and/or DC of a retailer they will order fewer products for an upcoming promotion. Especially when the promotion intensity is high, stock levels can be high as a result of earlier promotions. Third, the consumer sales varies through over the different retailer stores. And since a retailer does not want to be out of stock in any of his stores, a safety margin in each retailer store is needed to deal with the variance in sales among the stores. This results in extra retailer orders of approximately 10% to 20% of the consumer demand (interviews Unilever). Fourth, an inaccurate retailer forecast results in a deviation between customer demand and retailer orders. Retailer will always be sensitive to build in extra safety stock since they are punished more heavily and directly for out of stocks than for stock costs. Lastly, the promotional displays are a lot larger than the normal displays and need to be full to the end of the promotion period. Therefore, more products (stock) are needed on the shelf than normal and this stock needs to be ordered extra above on the expected consumer demand.

Page 48 Improving the promotion forecasting accuracy at Unilever Netherlands

The influence the disturbing factors have on the promotional orders is described as the bullwhip effect in literature. Lee et all (1997) state in their paper about the bullwhip effect that the information transferred in the form of orders tends to be distorted and can misguide upstream members in their inventory and production decisions. In particular, the variance of orders may be larger than that of sales, and the distortion tends to increase as one moves upstream the supply chain.

Disturbing factors: • Forward buy • DC stock levels • Store stock levels • Allocation of products over retailer stores • Inaccurate retailer forecast • Larger promotional displays

Customer Retailer demand orders

Figure 101: Disturbing factors which cause a difference between consumer demand and retailer orders

The disturbing factors seem to have a larger influence on the HPC promotions than the Food promotions, since a model to forecast the retailer demand performs a lot worse for the HPC data set (see Figure 102). So, the connection between consumer demand and retailer orders is a lot less for HPC than for Food promotions. Furthermore, Figure 102 depicts the MAPE values for the ABC classification for all promotions of 2009/2010 without Magnum products. The A SKU’s clearly perform better than the B SKU’s and C SKU’s. The B SKU’s also perform better than the C SKU’s. This is contradictory to the MAPE values of the consumer demand forecast, where the A, B and C SKU’s performed very similar. A likely explanation for the better performance of Food and A SKU’s is the law of large numbers. The sales of Food SKU’s and A SKU’s is far larger than the sales of HPC SKU’s, B SKU’s and C SKU (see Figure 103). Hence, the variation in the retailer orders is less, i.e. it is more likely that a retailer places promotional orders which deviate from the forecast when the sales volume is lower.

Page 49 Improving the promotion forecasting accuracy at Unilever Netherlands

180.0% 160.0% 140.0% 120.0% 100.0% MAPE (actuals) 80.0% 60.0% MAPE (Unilever) 40.0% 20.0% 0.0% Food HPC A B C

Figure 102: MAPE values retailer orders for data set 4 for Food, HPC, A, B, and C SKU’s

60000

50000

40000

30000

20000

10000 Average promotionalAverage sales

0 Food HPC A B C

Figure 103: Average consumer demand per promotion for Food, HPC, A, B and C SKU's

This chapter showed that a model which predicts retailer orders directly does not lead to satisfactory results. Especially for HPC SKU’s and C SKU’s the model performance declines substantially when looking at the MAPE(actuals) values. Hence, directly forecasting retailer orders is concluded not to be an appropriate approach. Therefore, another approach will be taken, where the consumer demand is raised with the average difference between consumer demand and retailer orders. This way the consumer demand is still used as a starting point.

Conclusion part 4: This part adapted the full model in three ways. In adaptation 1, a limited number of variables is included in the model. This adaptation has a model fit which is almost as good as the full model. The second adaptation, where 2 important variables without data availability are removed from the model of the first adaptation, has a substantial worse model fit. In the last adaptation the full model is fitted on retailer orders instead of consumer demand. This resulted in a remarkable lower model fit, especially for HPC SKU’s. Concluding, data availability for the more significant variables of the model is highly important for the performance of the model. And directly forecasting the retailer orders does not give a satisfying result (sub research question 5). Therefore, the next part will adapt the consumer demand to retailer orders in another way.

Page 50 Improving the promotion forecasting accuracy at Unilever Netherlands

Part 5: Implementation and conclusions Project definition The previous parts lay the outline for the research, tested which (Needs) variables have a large effect on promotional demand and tested the effect of the necessary adaptations Implementation Research Design to use the model within the Unilever organization. & Conclusions (Methods) In this part the implementation and conclusions will be discussed. First, it will be discussed what kind of model should be implemented within the organization Model Model adaptation results and which implementation steps are needed to be able to forecast the promotional demand more accurately. Second, the findings of this research, the managerial implications and the contribution of the research to science is discussed.

11 Implementation

This part will connect the different parts of this research, so that a promotion forecasting model is created which fulfils the practical requirements of Unilever (paragraph 2.4). The first paragraph uses the forecast for consumer demand to come to a forecast for retailer orders. Because these forecast approach does not result in a satisfactory forecasting accuracy alternative steps need to be taken. Hence, the second paragraph discusses the different implementation steps which are needed to reach a higher forecasting accuracy.

11.1 Final model for implementation

The final model is based on the consumer demand and corrected to retailer orders. In this way the good forecast results on the consumer demand level are used as the basic to forecast the retailer orders. The smaller the deviation between the two, the less variation is added by the retailer order process and the better the forecast accuracy for retailer orders will be. In the next two paragraphs both the results for the consumer demand model with limited variables (adaptation 1) and the results for the consumer demand model corrected for data availability (adaptation 2) are used as input to forecast the retailer orders. Directly forecasting the retailer orders did not lead to satisfactory results (10.2). Therefore, the consumer demand is taken as the starting point here and adapted to retailer orders. The correction is made by multiplying the consumer demand with the average difference between consumer demand and retailer orders as shown in Figure 111. The difference is smallest for Albert Heijn and largest for Plus.

Page 51 Improving the promotion forecasting accuracy at Unilever Netherlands

HPC Food Albert Heijn 1.37 1.25 C1000 1.40 1.49 Plus 1.38 2.06

Forecast … X ... Forecast customer retailer demand orders Figure 111: Generation of forecast retailer orders based on the consumer demand model

11.1.2 Results retailer orders (model adaption 1 as basis) The results for the retailer orders based on adaptation 1 are shown in Table 111. The model results clearly show that the retailer orders for HPC promotions are a lot more difficult to forecast than the retailer orders for Food promotions. The predicting power expressed by the Rsquare for HPC promotions is almost zero (0.103) in the calibration period and is zero in the validation period. This means that taking the average of all LF’s would result in a similar result. The predicting power for Food promotions is substantially higher in the calibration and validation period (0.411 and 0.681 respectively). Corresponding to the Rsquare results the MAPE values are very high for the HPC promotions, whilst the MAPE values for the Food promotions are still quite good. In the table the Case fill without any safety stocks and the average left stock in weeks are depicted as well. A higher average left stock in weeks indicates overforecasting by the model and thus improves the Case fill. Average left stock numbers are very different again for HPC promotions and Food promotions, where the average left stock level of HPC promotions is a lot higher because of the variability in the retailer demand. When a retailer orders less than expected the left stock in weeks increases, which occurs more at HPC promotions than Food promotions. Condensing, the forecast accuracy for food promotions is very acceptable, whilst the forecast accuracy for non food promotions does not generate acceptable results.

calibration period validation period HPC All w/o Food w/o HPC All w/o Food w/o (3) Magnum (4) Magnum (5) (3) Magnum (4) Magnum (5) R-square 0.138 0.314 0.411 0.000 0.105 0.681 adjusted R-square 0.103 0.301 0.392 MAPE (actuals) 124.7% 87.1% 62.3% 293.5% 143.7% 28.3% MAPE (Unilever) 39.5% 36.2% 33.0% 47.2% 38.0% 30.1% Case fill 83.6% 82.5% 81.4% 84.0% 82.3% 80.7% Average left stock in weeks 1.97 1.42 0.99 3.62 2.17 0.94

Table 111: Results for retailer order forecast based on consumer demand model adaptation 1

Page 52 Improving the promotion forecasting accuracy at Unilever Netherlands

11.1.3 Results retailer orders (model adaption 2 as basis) As mentioned before, of the variables Display and number_of_selling_points limited or no information is available at Unilever. Therefore, the forecast of the retailer orders is analyzed for the model where these variables are excluded (model adaptation 2). The results are very similar to the last paragraph, where the forecast for the HPC promotions performs far worse than the forecast of Food promotions (Table 112). Furthermore, the exclusion of two important independent variables results in a worse model fit on all different measurements. Hence, the lack of data on these two variables is an important aspect to focus on. This is consistent with the conclusion on consumer demand level, where the impact of the lack of data of the two important predicts was even larger. calibration period validation period HPC All w/o Food w/o HPC All w/o Food w/o (3) Magnum (4) Magnum (5) (3) Magnum (4) Magnum (5) R-square 0.077 0.214 0.333 0.000 0.056 0.623 adjusted R-square 0.043 0.202 0.320 MAPE (actuals) 127.2% 93.0% 67.8% 298.3% 145.6% 31.6% MAPE (Unilever) 44.1% 40.3% 34.9% 49.8% 40.7% 33.2% Case fill 79.0% 80.9% 78.2% 82.6% 80.5% 74.2% Average left stock in weeks 1.94 1.46 0.95 3.46 2.23 0.96

Table 112: Results forecast retailer orders based on consumer demand model adaptation 2

11.1.4 Conclusion results retailer orders based on model adaptation 1 & 2 Overall the forecast accuracy on retailer order level is disappointing after the good model results on the consumer demand level. The variance caused by the retailer order process has such a disturbing influence that model results on the consumer demand level have limited purpose on the retailer order level. This holds especially for the HPC and C SKU’s and to a lesser extent for the Food, A and B SKU’s (see Figure 112).

180.0% 160.0% 140.0% 120.0% Food 100.0% HPC 80.0% 60.0% A 40.0% B 20.0% C 0.0% MAPE (actuals) - MAPE (actuals) - MAPE (Unilever) - MAPE (Unilever) - adaptation 1 adaptation 2 adaptation 1 adaptation 2

Figure 112: MAPE values retailer orders for data set 4 with the consumer demand model adaptation 1 and 2 as basis

Page 53 Improving the promotion forecasting accuracy at Unilever Netherlands

The model adaptation of consumer demand to retailer orders depicted how a model should ideally work in practice. However, the good results on consumer demand level are not imitated on retailer order level. The effect of the disturbing factors between consumer demand and retailer orders is too large to neglect (sub research question 6). Hence, the performance on retailer order level is not good enough to directly implement the above retailer order models.

11.1.5 Actions needed to overcome current problems The performance of the proposed models to forecast retailer orders in this chapter indicate that future actions need to be taken to improve the forecast accuracy that can be reached. First, the data management of the important promotion variables used to come to a promotion forecast should be improved. These variables should be made easily accessible and usable for analyzing the demand of upcoming promotions. Second, the performance difference between model adaptation 1 and 2 shows that the variables with low data availability, which are not included in model adaptation 2, have a high impact on the forecast accuracy. Hence, data availability should be gained on these variables for upcoming promotions at a retailer. Third, the transformation from consumer demand to retailer orders causes a lot of extra variation. The models in this chapter showed that currently Unilever is not able to bridge the gap between consumer demand and retailer orders. Therefore, the factors causing the difference between consumer demand and retailer orders should be analyzed and included in a forecasting model. These future actions will be addressed in the implementation plan in the next paragraph.

11.2 Implementation plan

This paragraph depicts which steps should be taken in the future to improve the promotion forecasting process. The steps are based on the results found in this research. The first block in Figure 113 depicts the current situation at which no forecasting model is used for promotions, instead Unilever employees use their own judgemental forecast. The first and second step have been covered in this research, whilst the third and fourth step is the future direction this research indicate to improve the promotion forecasting accuracy in the longer run. Every next step increases the alignment with a retailer on trust, strategy and comanagement.

Page 54 Improving the promotion forecasting accuracy at Unilever Netherlands

Covered in this research Future direction given by this research

(4) Generate a supply chain forecast which adjusts Preconditions for disturbing factors on retailer orders

Alignment (3) Further collaborate to with retailer: Preconditions understand disturbing • trust factors on retailer orders • strategic • co-mgt (2) Data availability: Start to collaborate with retailer to enhance data availability and built trust

(1) Ease of Use: Record important data Promo- planner & use important data from Nielsen

(0) Current situation

Time Figure 113: Implementation steps to increase the promotion forecast accuracy

The second block in the above figure depicts the first implementation step , which states that the available data within Unilever should be recorded and used better. Momentarily, promotions are recorded by the logistic employees in a program called Promoplanner. However, during the data gathering phase of this research, the promotional data available in this program turned out to be limited and sometimes incorrect. Limited because important variables are not saved in the program and incorrect because last minute adaptations of a promotion are not always changed in Promoplanner. Another point is that the important variables Display and LF of former promotions on SKU level, which are used in this research, stem from the marketing database Nielsen. These variables should be linked to the promotions in Promoplanner. Accurate historical data is the basis for a forecasting model and thus is the first step. After this step Unilever is able to forecast the consumer demand according to the results of model adaptation 2, assuming data availability on all variables for upcoming promotion except for the two variables with low data availability (Display and growth number of selling points). The data which needs to be recorded accurately or linked from the marketing database Nielsen is: 1. Promotion mechanism 2. Percentual discount 3. Promotion length 4. Type of folder advertisement (location in folder and size of ad)

Page 55 Improving the promotion forecasting accuracy at Unilever Netherlands

5. LF former promotions SKU 6. Display (second placement)

The second step is to ensure that the important information of upcoming promotions is provided by the retailers in advance. The important promotion data (e.g. the height of discount, folder, promotion mechanism and the week the promotion is held) should be agreed on multiple weeks in advance with the retailer. Currently, promotions change often or are cancelled at all, which cause large deviation between the forecasted and expected retailer orders. Furthermore, two important variables to forecast a promotion are not available at all at Unilever (1 st and 2 nd variable below). Both variables have a large impact on the promotional demand and thus on the forecast accuracy of the forecasting model, which is shown in chapter 7. Retailers are mostly unwilling to share this information with Unilever because of data sensitivity reasons. Such a lack of important data increases the difficulty of accurate forecasting. After implementing this step Unilever is able to forecast the consumer demand according to model adaptation 1 in this research, where data availability of all important variables is assured. Moreover, this step has to function as the beginning of a good collaboration with the retailer. Trust between the two parties needs to be created, so the next implementation steps can be taken. This trust can be created by incentive alignment and clear terms of collaboration (Anderson, 2002). 1. Display: the percentage of shops where a promotional product has a second placement 2. Selling points: the percentage of extra selling points a promotion is sold at 3. The percentage of products which is on promotion within the category (see footnote) 11

To go from the second step to the third step some preconditions need to be met. Trust and clear communication should be established between Unilever and the retailer. Also, the value of the project needs to be clear for both Unilever and the retailer. Clear goals, a good project formulation, clear potential gains for all involved parties and honest communication are ways to meet the preconditions. The third step is about bridging the gap between retailer orders and consumer demand. As shown throughout the research, the forecast accuracy of a model on consumer demand level is quite high. However, this forecast accuracy drops substantially when the consumer demand has to be transformed to retailer orders. Especially for HPC products the model fit drops dramatically, for Food products the model fit for retailer orders is considerably better. Figure 101 depicted the most likely factors that cause the difference between retailer orders and consumer

11 In paragraph 5.2 three important variables were excluded from the analysis because of a lack of data: (1) the percentage of products which is on promotion within the category, (2) the percentage of products which was on promotion within the category last week and (3) promotions of competitors. Van den Heuvel (2006) stated that the first variable indeed has an important contribution and that the second variable is not significant. Concerning the third variable, it would be interesting to include more specific data of promotions of competitors. However, promotion forecasting models in the literature have not been able to include this data, because of complexity issues. Summarizing, data on the first variable should be gained.

Page 56 Improving the promotion forecasting accuracy at Unilever Netherlands demand. Clearly the bullwhip effect, which states that orders to the supplier tend to have a larger variance than sales to the buyer (Lee et al, 1997), has its effects in the FMCG market in which Unilever operates. As most important activities to minimize the bullwhip effect, Lee et al (1997) name information sharing of Point Of Sales data and inventory status data, simplification of the promotional activities of a retailer, making one member of the supply chain responsible for the forecasting process (e.g. VMI). Disney (2003) confirms that a VMI supply chain performs better than a traditional supply chain. Hence, the third step is to investigate together with the retailer which factors have a disturbing influence on the retailer orders, causing a larger variation in retailer orders than consumer demand. The goal of this step is to gain insight in these factors, so the variance in the retailer orders is no mystery but can be explained by the retailer and Unilever. These insights can be used to minimize the negative effect of the variance in the retailer orders on the forecast accuracy.

For step 4 similar preconditions need to be met and the trust between Unilever and the retailer even needs to be higher. Therefore, the third step should be successfully finished and both parties should be willing to further collaborate with each other. The fourth step has to bring the insights of the third step into action and take this insights one step further. As a starting point the forecast for the retailer orders should be used. This forecast has to be adjusted for the disturbing factors analyzed in step 3. When for example the forecast for a promotion on the regular jar of Calvé peanut butter at a retailer is 100.000 consumer units, then this number should be adjusted for stock left at the retailer, units needed to fill the pipe line, units needed to fill the promotion displays, safety margin to cover the variance over the different retailer shops and potential other disturbing factors. Since the stock levels at a retailer continuously change, such a model should be updated each week. This results in an accurate promotion forecast on retailer order level. This forecast should be generated by a collaboration between the retailer and Unilever. Hence, both parties should not produce their own forecast separate from each other, which momentarily lead to the disturbance in the demand of the supply chain. This can be regarded as the final stage of collaboration between the retailer and Unilever, because processes of both parties need to be integrated. To do so, the confidence between the retailer and Unilever needs to be high, incentives for both parties should be clear and the responsibility of producing a forecast should be put at the retailer or Unilever.

To support the collaboration between the manufacturer and the retailer, there are already several initiatives in the FMCG industry, like Vendor Managed Inventory (VMI), Continues Replenishment Program (CRP), Collaborative Planning, Forecasting and Replenishment (CPFR) and RFID enabled collaborative process. The order of the collaboration concepts indicates the innovativeness of the concept (Pramatari, 2007). VMI is most likely the first trust based business link between suppliers and customers (Barrat et al, 2001), where the manufacturer has the responsibility of managing the

Page 57 Improving the promotion forecasting accuracy at Unilever Netherlands customers inventory policy. CPR moves one step ahead of VMI and reveals demand from the retailer stores to the supplier. CPFR can be seen as an evolution of VMI and CRP, where joint demand forecasting and promotion planning are also addressed in the approach (Holmstrom et al, 2002). CPFR is based on extensive information sharing between retailer and manufacturers, including Point OfSales data, forecasts and promotion plans. RFID enabled collaboration can be applied when each product is tagged with an RFID chip and thus can be tracked through the whole supply chain.

Concluding, the first part of the chapter showed that the retailer orders cannot be forecasted accurate enough. The second part discussed the future steps that need to be taken to overcome the current problems at Unilever and reach a higher forecast accuracy. At each step the process integration with the retailer becomes higher and more trust is needed between the parties. The end result of the implementation steps is a higher forecast accuracy for retailer orders, a closer collaboration with the retailer and thus more insight in the order process of the retailer. Momentarily, the different retailers where Unilever delivers to are in different stages of the implementation process. For each retailer value can be added by analyzing their current status and making the next implementation step(s). The successes at Albert Heijn, which is currently the only retailer where VMI is employed, can serve as an example for other retailers.

12 Conclusions

This research analyzed the ability to forecast promotional demand at a manufacturer level. The goal of the research is to increase the forecast accuracy of promotions at Unilever. After an analysis of the problem situation the research focussed on the development of a more mathematical forecasting approach, which could support the judgemental forecasting process of the logistic Unilever employees. The main research question formulated at the beginning of this research was: what are the causes for the low forecasting accuracy and how can this forecasting accuracy be improved? The research started with an ideal situation, which focused on the forecasting of the consumer demand without any hindrances. The ideal situation where all variables are included to forecast the consumer demand is depicted in paragraph 12.1. However, in practice some limitations obstruct the use of an ideal model. Alternative models to overcome these limitations are depicted in paragraph 12.2. Finally, paragraph 12.3 depicts the steps which should be taken to overcome the limitations of the current situation to be able to forecast more accurate.

12.1 Ideal model

The ideal (full) model is a forecasting model which includes all variables and forecasts the consumer demand, which is easier than forecasting the retailer orders. Of the full model around half of the variables is significant, dependent on type of products (data set) where the model is fitted on. The different data sets are HPC products (NonFood), Food products and a data set with all products.

Page 58 Improving the promotion forecasting accuracy at Unilever Netherlands

The adjusted Rsquare values of all three data sets are around 0.700 in the calibration period of the model. This indicates a good model fit where 70% of the variance of the promotional demand is explained by the model. In the validation period, where the model of the calibration period is checked on a different data set, the model fit is even slightly higher than 0.700. This indicates that the model results are robust when used on other promotions than the original data set.

The most significant variables in the model are the variables with a double plus or minus sign in Table 121. Besides the fact that these variables are more important to inherit in a forecasting model, the effect size of a variable could also be used to drive marketing decisions. The first marketing implication is that a display (second placement) of a promotion in a retailer store is more important than folder advertisement and TV advertisement. Hence, when the marketing budget should be allocated, investments in display should have priority above investments in folder advertisement and both should have priority on investments in TV advertisement. The second implication is that of all promo mechanisms the mechanism where a consumer has to buy four or more products to get the promotional discount results in the highest promotional demand. Surprisingly, a Single Price Off (SPO), where a consumer only has to buy one product to receive the promotional discount, leads to a better promotional demand than a promotion where a consumer should buy two or three products. And a promotion with a free product or premiaat has the lowest promotional demand, although the success of such a promotion really depends on the type of free product or premiaat. Lastly, marketing can increase the promotional sales by making sure that the promotion is sold in all stores of a retailer. This variable is especially important if the product is not sold in almost all stores in base line sales. For these products there is a lot of extra promotional sales to gain. One way of boosting the number of stores is to advertise the promotion in the folder, since all stores are expected to have the folder promotions available. So, for products which are not sold in all stores it is more interesting for Unilever to invest in folder advertisement.

Page 59 Improving the promotion forecasting accuracy at Unilever Netherlands

Variable Effect size Variable Effect size Display ++ log_growth_number_selling_points ++ Folder + Percentage_repeat_buyers n.e. TV_support n.e. / + Promotion_pressure n.e. Holiday_products n.e. ln_LF_former_promotions_EAN ++ Promo_length ++ Market_penetration n.e. Percentual_discount ++ Preservability + SPO a - log_size_of_product n.e. Two_for a - Frequency_of_purchase - Three_for a - Personalcare c n.e. Free_product a n.e. Ice_and_beverages c - Premiaat a n.e. SCC_and_vitality_shots c + Number_of_products_in_promotion - Savoury_and_dressings c n.e. C1000 b + winter_products_temp n.e. Plus b - summer_products_temp n.e. Kruidvat b - - n.e. = no effect on the promotional sales a The baseline group for the different product categories is the product group “Four_or_five_for” b The baseline group for the different retailers is the retailer “Albert Heijn” c The baseline group for the different product groups is the product group “Homecare” Table 121: Overview of the effect size and direction of the variables on the promotional sales

The ideal model shows that Unilever has the ability to forecast consumer demand. With the right information Unilever is able to forecast the consumer demand at least as good as a retailer. Hence, with this capability Unilever is able to take the lead in establishing a collaboration with retailers and increasing the forecast accuracy. However, because of the practical requirements of a forecasting model the ideal model formulated cannot be used in practice within Unilever. First, the model should have a high ease of use, second the variables used should have data availability and third the retailer orders need to be forecasted. Hence, some adaptations are needed on the full model, which are discussed hereafter.

12.2 Adaptations needed on ideal model

To increase the usability of the forecasting model the most important variables are included in an adapted model. The model fit of this model with a limited number of variables is still surprisingly high and almost equal to the model fit of the full model. However, not all variables have data availability at Unilever, since Unilever as a manufacturer is dependent on the retailers for information of upcoming promotions. For two variables in the limited model Unilever has no data availability. These are the percentage of shops with a second placement and the extra number of shops where the product is sold in promotion. To analyze what the effect of the lack of data is on the forecast accuracy of the model a new model without these variables is tested. The model fit decreases to an adjusted Rsquare of around 0.500, indicating that the exclusion of the two variables substantially

Page 60 Improving the promotion forecasting accuracy at Unilever Netherlands worsens the performance of the forecasting model.

Moreover, Unilever needs to forecast retailer orders instead of consumer demand. Therefore, the model results for the consumer demand are adapted to retailer orders. The retailers included in the research order on average between 39% and 85% more than is sold during promotion. The forecasts for the consumer demand are raised with this difference. The model performance decreases substantially because of the extra variance in the retailer orders. The Rsquare for the HPC data set has decreased to 0.138 in the calibration period, meaning that the predictive power of the model is very low. For the Food data set the Rsquare is 0.411 in the calibration period. So, the variability in the retailer orders is a lot higher for the HPC products than Food products. Forecasting retailer orders for HPC products seems to have little to no benefit, Food orders can be forecasted with a higher accuracy. The difference is partly caused by the height of the sales of a promotion. Because Food promotions sell 4 to 5 times more than HPC promotions the variability in de retailer orders decreases. This reasoning also holds for the A, B and C categorisation where the A SKU’s are the more important high volume products. And indeed A SKU’s have a substantial higher forecasting accuracy than C SKU’s.

The adaptations indicated that the reduction in the number of variables in the model does not lead to a lower model performance. But when two of the most important variables are excluded, because of a lack of data at Unilever, the model performance decreases substantially. Furthermore, the transition from consumer demand to retailer orders leads to a high loss of predictive power. To overcome these problems further steps need to be taken.

12.3 Future steps to increase the forecast accuracy

Since a direct forecast of the retailer orders turned out to be inaccurate and not all variables had data availability, future steps need to be taken to deal with the problems which diminish the forecast accuracy (implementation plan in paragraph 11.2). The first implementation step focuses on the enhancement of the data usage within Unilever. Quite some promotion data is available somewhere in the organization; however, the available data of historical and upcoming promotions should be recorded more centrally and accessible. Then the data can actually be used by the logistic employee to forecast promotions. The second step is to ensure that the important information of upcoming promotions is provided by the retailers in advance. Retailers are afraid to do so because of the sensitivity of the data. Unilever should win their trust to get hold of the important promotion data. The third step should bring insight in the factors causing the gap between retailer orders and consumer demand. Because of the bullwhip effect a lot of extra variance is added to the retailer orders, especially for HPC products. Unilever should focus on understanding the source of the extra variance together with the retailer. The fourth step has to bring the insights of the third step into action and take this insights one step further. During this process the alignment with the retailer

Page 61 Improving the promotion forecasting accuracy at Unilever Netherlands becomes more important as the collaboration becomes more intensive. In the end this will result in a supply chain forecasting model where both the retailer and Unilever make use of and new technologies like RFID can be used to evolve the forecasting model.

The first two implementation steps will solve the poor database usage within Unilever, the main scope of this research paper as stated in paragraph 2.2. The problem areas customer (retailer) team deviation and retailer dependency will be influenced by the implementation steps as well. Because of a standard way of working is proposed over all retailers, the promotion forecasting process will become more alike for the different retailers. Furthermore, retailers who are not as far as others in the implementation steps can learn from the forecasting process of Unilever for the more developed retailers. Regarding the retailer dependency, it has become clearer which variables are needed from a retailer to accurately forecast a promotion. And the implementation steps will convert the dependency on a retailer to collaboration with a retailer.

Altogether, this research showed that if the right information is available Unilever is very well capable of accurately predicting the consumer demand. Unilever has an advantage over the retailers because of their larger data pool of promotions over all retailers which can be used to forecast upcoming promotions. However, forecasting retailer orders has turned out to be far more difficult than consumer demand, especially for HPC products. The bullwhip effect leads to a substantial deviation between retailer orders and consumer demand. As a result, in order to be able to accurately forecast retailer orders, the disturbing factors behind the bullwhip effect should be analyzed. In order to successfully analyze these factors close collaboration with the retailer is needed. When the disturbing factors are successfully analyzed, a promotion forecasting model which forecasts the consumer demand and corrects for the disturbing factors should be formulated and employed together with the retailer. Close collaboration and information sharing is needed, where in the end Unilever and the retailer together use one forecasting approach and the retailer orders can be predicted accurately.

12.4 Contribution to literature

In paragraph 1.5 three gaps in the literature were discussed. The gaps are (1) the choice of the dependent variable to predict the promotional sales, (2) the development of a forecasting model for a manufacturer and (3) if it is an advantage or disadvantage to be a manufacturer.

The first gap exists because there is no clarity in the promotion forecasting literature which measure should be used as dependent variable. Different research uses different dependent variables namely, the LF of the promotional sales, the ln of the LF and the cumulative lognormal distribution of the LF (P(LF)). This research concluded that the LF of the promotional sales is not an adequate measure because of clear signs of non normality. Both other measures correct for this non normality, only the

Page 62 Improving the promotion forecasting accuracy at Unilever Netherlands cumulative lognormal distribution does that in a lesser extent. The performance of a promotion forecasting model substantially improves for both the P(LF) and ln LF measure, where the model fit of the ln LF was slightly higher. Concluding the natural logarithm of the LF matches the normal distribution best and has the highest model fit.

Regarding the second gap, the main difference between a retailer and manufacturer is that a retailer needs to forecast the demand of his shoppers (consumer demand) and a manufacturer has to forecast the orders placed by his customers (retailer orders). This research both developed a model which directly predicts the retailer orders and a model which predicts the consumer demand after which this prediction is adapted to a forecast for the retailer orders. Retailer orders do in fact differ remarkably from consumer demand, between the 39% and 86% for the retailers in this research: therefore, a model which predicts consumer demand cannot be used at a manufacturer without an adaptation.

Third, it is not clear if being a manufacturer is an advantage or disadvantage in producing an accurate promotion forecast. This research built a model which has a high promotion forecast accuracy on consumer demand level. The research made use of promotional data of SKU’s for multiple retailers. The fact that the model performance based on this data is quite good, indicates that the larger promotional database can act as an advantage for a manufacturer. However, not all variables for the consumer demand forecasting model are available for upcoming promotions at Unilever, because retailers are not willing to share some of the important promotion characteristics with Unilever. This is a major disadvantage of which this research indicated that the model performance evidently drops. The second disadvantage of being a manufacturer is that a manufacturer has to deliver retailer orders instead of consumer demand. This research showed that the variability of retailer orders is higher than that of consumer demand and that the model performance decreases substantially when forecasting retailer orders. Therefore, overall it is concluded that a manufacturer has a disadvantage compared to a retailer.

Page 63 Improving the promotion forecasting accuracy at Unilever Netherlands

Page 64 Improving the promotion forecasting accuracy at Unilever Netherlands

References

Anderson, E., & Coughlan, A. T. (2002). Channel Management: Structure, Governance and Relationship Management. In B. A. Weitz & R. Wensley (Eds.), Handbook of Marketing (pg. 223247). London: Sage.

Barratt, M. and Oliveira, A. (2001). Exploring the experience of collaborative planning initiatives. International Journal of Physical Distribution & Logistics Management , Vol. 31, No. 4, pg. 26689.

Blattberg, R.C., Briesch, R., Fox, E.J. (1995). How Promotions Work. Marketing Science , Vol. 14, No. 3, pg. 122132.

Buckers, J. (2010). The ordering process of dry food groceries under promotion: a study of order commitment timing and ordering methods. Master Thesis. Eindhoven University of Technology, Eindhoven.

Cooper, L.G., Baron, P., Levy, W., Swisher, M., Gogos, P. (1999). PromoCast ™: A New Forecasting Method for Promotion Planning. Marketing Science , Vol. 18, No. 3, pg. 301316.

Cooper, D. R., Schindler, P. S. (2003), Business Research Methods , eighth edition, New York, McGrawHill/Irwin

Disney, S.M., Towill, D.R. (2003). The effect of vendor managed inventory (VMI) dynamics on the Bullwhip Effect in supply chains. International Journal of Production Economics, Vol. 85, No. 2, pg. 199215.

Field, A. (2005). Discovering statistics using SPSS. Third edition. SAGE Publications. London.

Green, S.B. (1991). How Many Subjects Does It Take To Do A Regression Analysis? Multivariate Behavioral Research, Vol. 26, No. 3, pg. 499 – 510.

Heuvel, F.P. van den (2009). Action products at Jan Linders Supermarkets. Master Thesis, Eindhoven University of Technology, Eindhoven.

Holmstrom, J., Framling, K., Kaipia, R. and Saranen, J. (2002). Collaborative planning forecasting and replenishment: new solutions needed for mass collaboration. Supply Chain Management: An International Journal , Vol. 7, No. 3, pg. 13645.

Lee, H.L., Padmanabhan, V., Whang, S. (1997). Infortmation distortion in a supply chain: The bullwhip effect. Management Science, Vol. 43, No. 4, pg. 546.

Page 65 Improving the promotion forecasting accuracy at Unilever Netherlands

Lee, H.L., Padmanabhan V., Whang S. (2004). Comments on "Information Distortion in a Supply Chain: The Bullwhip Effect". Management Science, Vol. 50, No. 12, pg. 18871893.

Loo, M. van (2006). OutofStock reductie van actieartikelen, Model voor vraagvoorspelling en logistieke aansturing van actieartikelen bij Schuitema/C1000. Master Thesis, Eindhoven University of Technology, Eindhoven.

Makridakis, S. (1988). Metaforecasting: Ways of Improving Forecasting Accuracy and Usefulness. International Journal of Forecasting , Vol. 4, No. 3, pg. 467491.

Miles, J., Shevlin, M. (2001). Applying regression and correlation: a guide for students and researchers. SAGE publication. London.

Poel, M.J. van (2010a). A Literature study at promotion forecasting in the Fast Moving Consumer Good sector, literature study performed on promotion forecasting. Literature study performed for this master thesis project.

Poel, M.J. van (2010b). A research proposal for promotion forecasting: how to develop a manufacturer based model? Research proposal for this master thesis project

Pramatari, K., Papakiriakopoulos, D., Poulymenakou, A. and Doukidis, G.I.U. (2002). New forms of CPFR. The ECR Journal – International Business Review , Vol. 2 No. 2, pg. 3843.

Pramatari, K. (2007). Collaborative supply chain practices and evolving technological approaches Katerina Pramatari. Supply Chain Management: An International Journal, Vol.12, No. 3, pg. 210– 220.

SilvaRisso, J.M., Bucklin, R.E., Morrison, D.G. (1999). A Decision Support System for Planning Manufacturers' Sales Promotion Calendars. Marketing Science , Vol. 18, No. 3, pg. 274300.

Silver, E. A., Pyke, D. F., & Peterson, R. (1998), Inventory management and production planning and scheduling . Third edition. John Wiley & Sons. New York.

Strien, P.J. van (1997). Towards a methodology of psychological practice. Theory and Psychology , Vol. 7, No. 5, pg. 683700.

Wittink, D.R., Addona, M.J., Hawkes, W.J., Porter, J.C. (1988). SCAN*PRO: The estimation, validation and use of promotional effects based on scanner data. Internal paper , Cornell University.

Page 66 Improving the promotion forecasting accuracy at Unilever Netherlands

Appendices

Appendix 1: Sample size (86 products)

EAN CE Material description Category 8717644013045 Deospray Africa 150ML DEO & GROOMING 8717644042359 Axe Deospray Vice 150ML DEO & GROOMING 50097265 Axe roll on Africa 50ML DEO & GROOMING

50096190 Dove Deo Roll On Original 50ML DEO & GROOMING 8717163965030 Dove Deospray Original 150ML DEO & GROOMING 8717163997345 Dove Deospray Original 250ML DEO & GROOMING 8717163964972 Dove Deospray Sensitive 150ML DEO & GROOMING 8717163593318 Deospray Clear Aqua 150ML DEO & GROOMING 50099214 Rexona Roll On Nutritive 50ML DEO & GROOMING 8593838930653 Calve Dressing Naturel 450ML DRESSINGS 8593838930523 Calve Dressing Slasaus Halfvol 450ML DRESSINGS 8593838930509 Calve Slasaus Yogomix 450ML DRESSINGS 8717644278666 Andrelon Condit. Bruin haar 300ML HAIR CARE 8717163361252 Andrelon Conditioner Perf. Krul 300ML HAIR CARE 8717644341803 Andrelon Hairspr Fix & Shine 250ML HAIR CARE 8717644341582 Andrelon Mousse Volume 200ML HAIR CARE 8717644393956 Andrelon Shamp Hair&Body Men 300ML HAIR CARE 8717163009741 Andrelon Shampoo Glans 300ML HAIR CARE 8717163010068 Andrelon Shampoo Perf. Krul 300ML HAIR CARE 8717644337615 Andrelon Shaper 125ML HAIR CARE 8717163089828 Spray Badkamer 750ML NEK HOUSEHOLD CARE 8717163089897 Cif Spray Keuken 750ML NEK HOUSEHOLD CARE 8717644961001 Glorix Bleek Original 750ML HOUSEHOLD CARE 8717163416976 Glorix Hyg Doekje Normaal. 60ST HOUSEHOLD CARE 8717644465394 Glorix WC Powergel Akalk Lime 750ML HOUSEHOLD CARE 8717163055946 Sun Machinereiniger 40G 3ST HOUSEHOLD CARE 76840600021 B&J Cookie Dough 500ML ICE CREAM 8000920580806 DO 360ML Magnum Snacksize CAW ICE CREAM 8000920553800 DO 480ML Ola Magnum Classic 3+1 ICE CREAM 8000920555705 DO 480ML Ola Magnum White 3+1 ICE CREAM 8710447120187 IJsspecialiteit 3 Chocolades ICE CREAM

Page 67 Improving the promotion forecasting accuracy at Unilever Netherlands

8722700109198 Hertog IJsspecialiteit Stroopwafel ICE CREAM 8710447032756 Ola Festini Peer 600ML 12MP ICE CREAM 5410148322905 Ola Raket 440ML 8MP ICE CREAM 8722700210740 Vien Caramel Crisp 650ml ICE CREAM 8717644379264 Robijn Black Velvet 1,5L 30sc LAUNDRY 8717644391938 Robijn K&K Vloeibaar Color 730ml 20sc LAUNDRY 8717644391907 Robijn K&K Vloeibaar Wit 730ml 20sc LAUNDRY 8717644629536 Robijn Pak Color 1008G 18sc LAUNDRY 8717163404355 Robijn Vloeib Fleur&Fijn 1L 16sc LAUNDRY 8717644374320 Robijn WVZ Zonnig Geel 750ML LAUNDRY 8722700227632 Knaks 200G OTHER FOODS 8722700227618 Unox Knaks Runder 200G OTHER FOODS 8722700335481 Unox Ragout Kalf 400G OTHER FOODS 8722700189510 Unox Rookworst Standaard 275G OTHER FOODS 8722700233053 Bertolli Pastasaus Basilicum 400G SAVOURY 8722700129554 Bertolli Pastasaus Knoflook 450G SAVOURY 8722700093602 Bertolli Pastasaus Kruidig 450G SAVOURY 8714100050262 Boemboe Sajoer Boontjes 100G SAVOURY 8722700208914 Knorr Chicken Tonight Hawai 490ML SAVOURY 8722700206712 Knorr Maaltijdmx Boerenomelet13G SAVOURY 8722700206354 Knorr Saus Kerrie 28 SAVOURY 8722700206361 Knorr Saus Room 46G SAVOURY 8722700206507 Knorr Saus Wit 22G SAVOURY 8711100069973 Knorr Wereld Burritos 229G SAVOURY 8711100069331 Knorr Wereld Kip Tandoori 292G SAVOURY 8722700222576 Knorr Wereld Mex Enchillada 343G SAVOURY 8722700139355 Unox CAS Speciaal Romige Mosterd SAVOURY 8711200189205 Unox Good Noodles Kip 70G SAVOURY 8722700214090 Unox SIZ Soep Bospaddestoelen 570ML SAVOURY 8722700214076 Unox Soep Champignon 300ML Doy SAVOURY 8722700214137 Unox Soep Romige Tomaat 570ML Doy SAVOURY 8722700419051 ZK 60G CNX Kroepoek Bali SAVOURY 8722700418818 ZK 60G CNX Kroepoek Klein Nat. SAVOURY 42153184 Axe SG Dark Temptation 250ml SKIN 8717644006481 Dove Body Cream Oil Pro Age 250ml SKIN

Page 68 Improving the promotion forecasting accuracy at Unilever Netherlands

8717163476789 Dove Body Voedende Creme 150ML SKIN 4000388177000 Dove Cream Wash Liq. soap 250ML SKIN 8717163611548 Dove Face Dagcreme 50ML SKIN 8717644046630 Dove Pro Age Shower 250ml SKIN 8717644027462 Dove Shower Cream Shower 500ML SKIN 8000700000012 Dove Wastablet Regular 100gr SKIN 8717163063606 Body Lotion Aloe Fresh 400ML SKIN 8717163066003 Vaseline Lotion Hand&Nail Tube 75ML SKIN 8711200189403 Becel Bak en Braad 500ML SPREADS AND COOKING PRODUCTS 8722700250494 BECEL LIGHT LQM 500ML SPREADS AND COOKING PRODUCTS 8722700092971 Becel Olijfolie 500ML SPREADS AND COOKING PRODUCTS 8722700259886 Becel PA Bloeddruk 250G KP SPREADS AND COOKING PRODUCTS 8711200134502 Becel Vlees en Jus 400ML SPREADS AND COOKING PRODUCTS 8722700191377 Blue Band Margarine Idee Calc. 500G SPREADS AND COOKING PRODUCTS 8722700462958 CALVE PIKA REGULAR IKB 350G JAR SPREADS AND COOKING PRODUCTS

8711200134403 CROMA B&B LQM SPREADS AND COOKING PRODUCTS 8722700359326 DO 1,5L Lipton Ice T Lemon CAR TEA AND SOY & FRUIT BEVERAGES 8722700243809 Lipton Ice Tea Green 1.5L TEA AND SOY & FRUIT BEVERAGES 8722700056522 Lipton Ice Tea Sparkling Light 1.5L TEA AND SOY & FRUIT BEVERAGES

Page 69 Improving the promotion forecasting accuracy at Unilever Netherlands

Appendix 2: Transformation of variables First the normality and possible transformations of the normal variable will be analyzed. The most left underlying histogram shows the distribution of the untransformed LF. Since it does not meet the normality distribution a logarithmic transformation is applied (ln transformation). The right hand histogram and descriptive statistics table illustrated the large improvements in normality. Consequently, the ln of the LF is used as the dependent variable in the further research.

Descriptive Statistics dependent variables LF_5_weeks_b ln_LF_5_week

efore s_before Mean 6.6303 1.5832 Std. Error of Mean .22112 .02048 Std. Deviation 7.78001 .72050 Variance 60.529 .519 Skewness 5.983 .603 Std. Error of Skewness .070 .070 Kurtosis 54.596 .901 Std. Error of Kurtosis .139 .139 Range 109.88 5.09 Minimum .69 .38 Maximum 110.57 4.71

The next descriptive statistics table and histograms depict the independent variables which are considered for transformation. A logarithmic transformation (Field, 2005) is performed to enhance the normality of the variables. The variables LF_former_promotions_EAN, Growth_number_of_shops, Absolute_discount, Percentage_repeat_buyers and Size_products have improved substantially enough. Therefore, these variables are used in there transformed form.

Page 70 Improving the promotion forecasting accuracy at Unilever Netherlands

Descriptive Statistics independent variables

LF_former_ ln_LF_former Growth_number log_growth_ log_ log_ promotions _promotions _of_shops_ number_ Absolute_ absolute Size_ size_ of_ _EAN _EAN selling_points selling_points discount _discount products product

Mean 4.18 1.36 1.18 .522 .661 .192 1010.1 2.88

Std. Error of Mean .051 .011 .0069 .0016 .0182 .0043 21.28 .0094

Std. Deviation 1.82 .375 .244 .057 .638 .152 747.9 .329

Variance 3.32 .140 .059 .003 .407 .023 559302 .108

Skewness 2.30 .306 2.11 1.48 1.257 .510 1.077 .035

St. Error Skewness .070 .070 .070 .070 .070 .070 .070 .070

Kurtosis 18.57 .909 5.99 3.37 1.516 -.501 .470 -1.070

St. Error Kurtosis .14 .14 .14 .139 .139 .139 .139 .139

Range 18.20 2.65 2.09 .48 3.53 .66 3246 1.25

Minimum 1.38 .32 .53 .31 .00 .00 194 2.29

Maximum 19.58 2.97 2.62 .80 3.53 .66 3440 3.54

Page 71 Improving the promotion forecasting accuracy at Unilever Netherlands

Page 72 Improving the promotion forecasting accuracy at Unilever Netherlands

Appendix 3: Outlier analysis on all cases

Casewise Diagnostics b Case ln_LF_5_weeks_ Predicted Number Status Std. Residual before Value Residual 82 Xa 4.803 3.46 1.6928 1.76717 83 Xa 5.237 3.39 1.4631 1.92688 126 Xa 6.600 3.93 1.5013 2.42866 127 Xa 8.742 4.22 1.0034 3.21665 183 Xa 6.836 3.97 1.4546 2.51545 184 Xa 8.487 4.05 .9269 3.12307 186 3.411 2.37 1.1149 1.25507 208 Xa 6.970 4.71 2.1455 2.56453 209 Xa 7.205 4.57 1.9189 2.65113 213 4.319 3.29 1.7009 1.58912 215 3.081 2.40 1.2663 1.13369 219 3.437 .25 1.5147 1.26466 412 Xa 3.488 3.14 1.8564 1.28357 436 3.287 2.57 1.3603 1.20968 437 3.096 2.10 .9606 1.13935 498 Xa 6.894 4.22 1.6833 2.53670 535 Xa 7.159 3.93 1.2956 2.63438 579 6.006 .00 2.2098 2.20983 643 4.058 .66 2.1530 1.49305 1043 Xa 3.346 1.79 .5588 1.23119 1044 Xa 4.193 2.32 .7772 1.54281 1114 Xa 6.238 3.74 1.4445 2.29551 1115 Xa 7.391 4.22 1.5004 2.71957 1127 3.083 3.30 2.1656 1.13440 a. X a : Magnum cases

Besides the exclusion of all Magnum promotions do to the large number of outliers originating from the Magnum icecreams, the cases with a Standard Residual above 3.5 are also excluded from the analyses. The disturbing effect of theses cases on the models is too large. These are cases 213, 579 and 643. The reason for the high standard residuals is that case 213 and 643 have a very low base sales and case 579 has a very low LF (below one).

Page 73 Improving the promotion forecasting accuracy at Unilever Netherlands

Appendix 4: Assumptions linear regression

Normality of dependent variable Paragraph 5.5 discussed this property of the dependent variable. After transforming the dependent variable to it’s ln value the normality assumption is met.

Multicollinearity This can be assed with the VIF statistics of the different variables. If the largest VIF statistic is greater than 10 or the average VIF statistic is substantially greater than 1 than there is a cause for concern (Bowerman & O’Connel, 1990). The VIF statistic values range from 1.2 up to 6.4 and the average VIF value is 2.6, which are no cause for concern.

Normality of error distribution The following histogram and Normal PP Plot of the standardized residuals picture that the normality of error distribution assumption is accepted.

Homoscedasticity To check this assumption the scatterplot of the regression standardized residuals and regression studentized residuals is analysed. Both scatterplots give no concern for heteroscedasticity and show that the assumption is met.

Page 74 Improving the promotion forecasting accuracy at Unilever Netherlands

Linearity Below the scatterplots of the 12 most important independent variables are depicted. In most of the graphs there is a clear linear relation between the dependent and independent variable. However, for some of the scatterplots (Two_for_X, Three_for_X, Preservability, Number_of_products_on_promotion) the linear relationship is unclear, but there certainly is no concern for nonlinearity. The only concern is a limited or lack of relation. Hence, this assumption is accepted.

Page 75 Improving the promotion forecasting accuracy at Unilever Netherlands

Independence of the errors The independence of error assumption means that for any two observations the residual terms should be uncorrelated. The assumption can be tested with the DurbinWatson test, which checks for serial correlations between errors. The test can vary between 0 and 4, with a value of two meaning that the errors are unrelated. As a general rule values lower than one and greater than three are a cause for concern. The DurbinWatson statistic value of model 1 is equal to 1.386. Therefore, the assumption is accepted.

Page 76 Improving the promotion forecasting accuracy at Unilever Netherlands

Appendix 5: Results linear regression model TV_support

Model Summary

R

Albert_Heijn = 1 Adjusted R Std. Error of the Model (Selected) R Square Square Estimate

1 .141 a .020 .017 .73869 a. Predictors: (Constant), TV_support

ANOVA b,c

Model Sum of Squares df Mean Square F Sig.

1 Regression 4.718 1 4.718 8.647 .003 a

Residual 234.087 429 .546

Total 238.805 430 a. Predictors: (Constant), TV_support b. Dependent Variable: ln_LF_promotions c. Selecting only cases for which Albert_Heijn = 1

Coefficients a,b

Standardized Unstandardized Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 1.604 .038 42.316 .000

TV_support .324 .110 .141 2.941 .003 a. Dependent Variable: ln_LF_promotions b. Selecting only cases for which Albert_Heijn = 1

Page 77 Improving the promotion forecasting accuracy at Unilever Netherlands

Appendix 6: Further investigation hypothesis on log_absolute_discount In the model summary below both absolute discount measures surprisingly have a higher model fit than the percentual discount measure. There under, the coefficients of the different models are depicted. Again the results are stronger for both absolute discount measures with a standardized Beta coefficient of 0.497 and 0.473 against 0.434 for the percentual discount measure

Model Summary different single linear regression models

R

All_2009_w_o_M agnum = 1 Adjusted R Std. Error of the Model (Selected) R Square Square Estimate

Absolute_discount_per_offer .497 a .247 .246 .56426402

Absolute_discount_per_product .473 a .224 .223 .57267972

Percentual_discount .434 a .188 .188 .58567740 a. Predictors: (Constant), Absolute_discount_per_offer

Coefficients a,b

Unstandardized Standardized Coefficients Coefficients

Model B Std. Error Beta t Sig.

(Constant) 1.244 .023 53.532 .000

Absolute_discount_per_offer .180 .010 .497 17.783 .000

(Constant) 1.200 .026 46.487 .000

Absolute_discount_per_product .480 .029 .473 16.698 .000

(Constant) 1.095 .033 33.087 .000

Percentual_discount .018 .001 .434 14.973 .000 a. Dependent Variable: ln_LF_promotions b. Selecting only cases for which All_2009_w_o_Magnum = 1

However, when the absolute discount per offer and the percentual discount are included in the same full model than the absolute discount is not significant anymore (see coefficient table below). This is probably due to collinearity problems.

Page 78 Improving the promotion forecasting accuracy at Unilever Netherlands

Coefficients full model a,b Unstandardized Standardized Coefficients Coefficients Model B Std. Error Beta t Sig. 1 (Constant) 3.424 .326 10.519 .000 Procentual_discount .018 .002 .427 10.545 .000 Absolute_discount_per_offer .005 .013 .014 .392 .695 C1000 .178 .040 .101 4.405 .000 Plus .130 .040 .079 3.281 .001 Kruidvat .106 .068 .075 1.571 .116 Personalcare .113 .060 .087 1.891 .059 Ice_and_beverages .140 .092 .059 1.528 .127 SCC_and_vitality_shots .503 .131 .160 3.847 .000 Savoury_and_dressings .119 .069 .080 1.729 .084 ln_LF_former_promotions_EAN .566 .053 .318 10.699 .000 Display .008 .001 .448 15.019 .000 Folder .441 .055 .170 8.084 .000 Promo_length .589 .060 .396 9.903 .000 SPO .258 .062 .133 4.141 .000 Two_for .330 .057 .246 5.814 .000 Three_for .277 .060 .177 4.648 .000 Free_product .026 .074 .015 .356 .722 Premiaat .045 .078 .021 .580 .562 TV_support .015 .070 .004 .219 .826 Number_of_products_in_promotion .001 .000 .095 2.114 .035 Promotion_pressure .001 .001 .024 .856 .392 log_growth_number_selling_points 3.697 .250 .323 14.804 .000 Market_penetration .002 .002 .047 1.142 .254 Frequency_of_purchase .094 .035 .141 2.665 .008 Percentage_repeat_buyers .002 .002 .028 .716 .474 log_size_of_product .141 .072 .072 1.970 .049 Preservability .001 .000 .175 4.400 .000 Holiday_products .222 .139 .035 1.597 .111 winter_products_temp .001 .004 .006 .273 .785 summer_products_temp .004 .005 .025 .702 .483 a. Dependent Variable: ln_LF_promotions b. Selecting only cases for which All_2009_w_o_Magnum = 1

Page 79 Improving the promotion forecasting accuracy at Unilever Netherlands

To investigate if the absolute discount per offer has a threshold effect this variable is plotted against the average Lift factor for all type of promotions and SPO promotions. For all type of promotions combined, a clear linear effect is found. For the SPO promotions hardly a linear effect is found. And in both graphs no clear threshold effect can be found, e.g. no obvious higher LF is found after a certain absolute discount.

All type of promotions SPO promotions 18 12 16 10 14 12 8 10 6 8 Lift factor Lift Lift Factor Lift 6 4 4 2 2 0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Absolute discount per offer (€) Absolute discount per offer (€)

Model Summary

R

All_2009_w_o_ Magnum = 1 Adjusted R Std. Error of the Model (Selected) R Square Square Estimate Percentual discount .834l 0.695 0.689 0.362

Absolute discount per product .813 i 0.662 0.654 0.382

Absolute discount per offer .812 j 0.659 0.651 0.384

When the absolute discount per product cannot be used, the nonpromo price might be a good replacing predictor to include in the model. The table on the next page shows that the nonpromo price is insignificant in a full model where the absolute discount per product is excluded as variable. The coefficients which are depicted in the table are for the full model for all promotions of 2009 without magnum products (data set 4).

Page 80 Improving the promotion forecasting accuracy at Unilever Netherlands

Coefficients full model a,b Unstandardized Standardized Coefficients Coefficients Model B Std. Error Beta t Sig. 1 (Constant) 3.270 .244 13.384 .000 Display_1 .008 .001 .320 14.984 .000 Folder .467 .053 .180 8.849 .000 Promo_length .596 .056 .400 10.560 .000 Percentual_discount .018 .001 .446 15.257 .000 SPO .229 .058 .118 3.923 .000 Two_for .313 .054 .234 5.796 .000 Three_for .258 .056 .165 4.570 .000 Number_of_products_in_promotion .001 .000 .089 2.264 .024 C1000 .172 .039 .098 4.424 .000 Plus .137 .038 .084 3.578 .000 Kruidvat .513 .061 .363 8.412 .000 log_growth_number_selling_points 3.771 .239 .329 15.754 .000 ln_LF_former_promotions_EAN .593 .042 .333 14.128 .000 Preservability .001 .000 .162 5.308 .000 log_size_of_product .103 .059 .053 1.750 .080 Frequency_of_purchase .053 .019 .079 2.767 .006 Ice_and_beverages .158 .048 .066 3.316 .001 SCC_and_vitality_shots .308 .077 .098 3.990 .000 Non_promo_price .022 .011 .050 1.638 .085 a. Dependent Variable: ln_LF_promotions b. Selecting only cases for which All_2009_w_o_Magnum = 1

Page 81 Improving the promotion forecasting accuracy at Unilever Netherlands

Appendix 7: Correlation matrix of full model (data set 4)

Promotion log_size_o Promo_ Savoury_ Free_ Percentag Three_ Ice_and_ Pres ervabi Market_ Personal Number_o Kruidvat Percentual _pressure f_product length and_ product e_repeat_ for beverages lity penetratio care f_products _discount dressings buyers n _in_prom otion summer_products_temp .051 .117 .029 .150 .009 .045 .069 -.675 .199 .046 .003 .000 .013 .074 Frequency_of_purchase .021 .040 .013 .202 .034 -.604 .010 .027 .001 .246 .156 .026 .059 .051 Holiday_products .007 .019 .006 .027 .014 .002 .025 .455 .055 .025 .000 .005 .000 .018 Folder .017 .052 .092 .041 .062 .060 .067 .030 .097 .096 .091 .105 .106 .044 TV_support .019 .016 .007 .084 .065 .014 .109 .015 .134 .030 .122 .175 .043 .000 log_absolute_discount .292 .117 .015 .396 .054 .078 .143 .216 .013 .260 .188 .100 .128 -.759 Plus .015 .026 .040 .041 .046 .024 .019 .038 .137 .034 .046 .228 .178 .180 Two_ for .032 .007 .149 .078 .111 .101 .625 .050 .030 .134 .114 .255 .165 .019 Display_1 .170 .027 .043 .098 .022 .038 .096 .129 .079 .042 .164 .111 .146 .030 winter_products_temp .072 .031 .018 .016 .037 .172 .117 .093 .215 .256 .068 .013 .01 4 .100 C1000 .045 .001 .031 .002 .010 .033 .022 .136 .131 .058 .099 .134 .154 .061 log_growth_number_selling_points .107 .031 .006 .036 .064 .131 .052 .001 .044 .001 .145 .005 .023 .012 Premiaat .070 .016 .050 .067 .682 .012 .164 .036 .047 .019 .007 .272 .227 .196 SPO .037 .011 .050 .117 .094 .152 .686 .164 .035 .188 .160 .276 .093 .137 ln_LF_former_promotions_EAN .254 .212 .048 .086 .066 .013 .029 .237 .059 .281 .001 .030 .061 .035 SCC_and_vitality_shots .066 .339 .008 .678 .046 .261 .029 .257 .571 .100 .297 .057 .011 .111 Promotion_pressure 1.000 .257 .009 .132 .051 .041 .010 .040 .248 .190 .182 .051 .019 .164 log_size_of_product .257 1.000 .016 .322 .024 .054 .009 .137 .456 .144 .486 .027 .027 .153 Pro mo_length .009 .016 1.000 .000 .211 .066 .133 .014 .021 .087 .012 .058 .455 .016 Savoury_and_dressings .132 .322 .000 1.000 .054 .085 .022 .356 .516 .162 .528 .008 .044 .276 Free_product .051 .024 .211 .054 1.000 .025 .055 .025 .064 .009 .006 .412 .142 .183 Percentage_repeat_buyers .041 .054 .066 .085 .025 1.000 .022 .003 .108 .348 .210 .009 .080 .074 Three_for .010 .009 .133 .022 .055 .022 1.000 .033 .013 .130 .062 .267 .130 .016 Ice_and_beverages .040 .137 .014 .356 .025 .003 .033 1.000 .125 .036 .311 .027 .049 .133 Preservability .248 .456 .021 .516 .064 .108 .013 .125 1.000 .157 .226 .057 .025 .021 Market_penetration .190 .144 .087 .162 .009 .348 .130 .036 .157 1.000 .019 .023 .022 .193 Personalcare .182 .486 .012 .528 .006 .210 .062 .311 .226 .019 1.000 .090 .195 .158 Number_of_products_in_promotion .051 .027 .058 .008 .412 .009 .267 .027 .057 .023 .090 1.000 .037 .023 Kruidvat .019 .027 .455 .044 .142 .080 .130 .049 .025 .022 .195 .037 1.000 .115 Percentual_discount .164 .153 .016 .276 .183 .074 .016 .133 .021 .193 .158 .023 .115 1.000

Page 82 Improving the promotion forecasting accuracy at Unilever Netherlands

summer Frequen Holiday_ Folder TV_ log_ Plus Two_ Display winter_p C1000 log_ Premiaat SPO ln_LF_fo SCC_and _product cy_of_p products support absolute for roducts_ growth_ rmer_pr _vitality_ s_temp urchase _discoun temp #_sellin omotion shots t g_points s_EAN summer_products_temp 1.000 .018 .364 .038 .063 .052 .045 .021 .084 .031 .137 .036 .015 .050 .308 .122 Frequency_of_purchase .018 1.000 .01 4 .048 .024 .095 .054 .034 .013 .152 .043 .018 .023 .033 .235 .559 Holiday_products .364 .014 1.000 .029 .056 .002 .035 .021 .003 .000 .069 .024 .011 .002 .002 .043 Folder .038 .048 .029 1.000 .081 .036 .140 .084 .300 .031 .216 .051 .061 .031 .109 .003 TV_support .063 .024 .056 .081 1.000 .027 .190 .046 .068 .019 .196 .005 .035 .046 .111 .036 log_absolute_discount .052 .095 .002 .036 .027 1.000 .027 .118 .147 .103 .096 .096 .082 .231 .096 .201 Plus .045 .05 4 .035 .140 .190 .027 1.000 .059 .052 .023 .449 .236 .018 .020 .100 .020 Two_for .021 .034 .021 .084 .046 .118 .059 1.000 .051 .060 .031 .067 .210 .646 .049 .021 Display_1 .084 .013 .003 .300 .068 .147 .052 .051 1.000 .016 .006 .052 .035 .008 .129 .102 winter_products_temp .031 .152 .000 .031 .019 .103 .023 .060 .016 1.000 .028 .010 .041 .040 .086 .212 C1000 .137 .043 .069 .216 .196 .096 .449 .031 .006 .028 1.000 .024 .034 .159 .056 .034 log_growth_number_selling_points .036 .018 .024 .051 .005 .096 .236 .067 .052 .010 .024 1.000 .055 .055 .081 .037 Premiaat .015 .023 .011 .061 .035 .082 .018 .210 .035 .041 .034 .055 1.000 .181 .055 .055 SPO .050 .033 .002 .031 .046 .231 .020 .646 .008 .040 .159 .055 .181 1.000 .081 .028 ln_LF_former_promotions_EAN .308 .235 .002 .109 .111 .096 .100 .049 .129 .086 .056 .081 .055 .081 1.000 .156 SCC_and_vitality_shots .122 .559 .043 .003 .036 .201 .020 .021 .102 .212 .034 .037 .055 .028 .156 1.000 Promotion_p ressure .051 .021 .007 .017 .019 .292 .015 .032 .170 .072 .045 .107 .070 .037 .254 .066 log_size_of_product .117 .040 .019 .052 .016 .117 .026 .007 .027 .031 .001 .031 .016 .011 .212 .339 Promo_length .029 .013 .006 .092 .007 .015 .040 .149 .043 .018 .031 .006 .050 .050 .048 .008 Savoury_and_dressings .150 .202 .027 .041 .084 .396 .041 .078 .098 .016 .002 .036 .067 .117 .086 .678 Free_product .009 .034 .014 .062 .065 .054 .046 .111 .022 .037 .010 .064 .682 .094 .066 .046 Percentage_repeat_buyers .045 -.604 .002 .060 .014 .078 .024 .101 .038 .172 .033 .131 .012 .152 .013 .261 Three_for .069 .010 .025 .067 .109 .143 .019 .625 .096 .117 .022 .052 .164 .686 .029 .029 Ice_and_beverages -.675 .027 .455 .030 .015 .216 .038 .050 .129 .093 .136 .001 .036 .164 .237 .257 Preservability .199 .001 .055 .097 .134 .013 .137 .030 .079 .215 .131 .044 .047 .035 .059 .571 Market_penetration .046 .246 .025 .096 .030 .260 .034 .134 .042 .256 .058 .001 .019 .188 .281 .100 Personalcare .003 .156 .000 .091 .122 .188 .046 .114 .164 .068 .099 .145 .007 .160 .001 .297 Number_of_products_in_promotion .000 .026 .005 .105 .175 .100 .228 .255 .111 .013 .134 .005 .272 .276 .030 .057 Kruidvat .0 13 .059 .000 .106 .043 .128 .178 .165 .146 .014 .154 .023 .227 .093 .061 .011 Percentual_discount .074 .051 .018 .044 .000 -.759 .180 .019 .030 .100 .061 .012 .196 .137 .035 .111

Page 83 Improving the promotion forecasting accuracy at Unilever Netherlands

Appendix 8: Results linear regression model Promo mechanism

Model Summary

R

All_2009_w_o_ Magnum = 1 Adjusted R Std. Error of the Model (Selected) R Square Square Estimate

1 .371 a .138 .134 .60483142 a. Predictors: (Constant), Premiaat, SPO, Free_product, Three_for, Two_for

ANOVA b,c

Model Sum of Squares df Mean Square F Sig.

1 Regression 56.338 5 11.268 30.801 .000 a

Residual 351.920 962 .366

Total 408.258 967 a. Predictors: (Constant), Premiaat, SPO, Free_product, Three_for, Two_for b. Dependent Variable: ln_LF_promotions c. Selecting only cases for which All_2009_w_o_Magnum = 1

Coefficients a,b

Standardized Unstandardized Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 1.660 .075 22.039 .000

SPO .176 .090 .091 1.970 .049

Two_for .167 .081 .125 2.061 .040

Three_for .219 .083 .140 2.648 .008

Free_product .421 .081 .234 5.178 .000

Premiaat .560 .096 .260 5.805 .000 a. Dependent Variable: ln_LF_promotions b. Selecting only cases for which All_2009_w_o_Magnum = 1

Page 84 Improving the promotion forecasting accuracy at Unilever Netherlands

Appendix 9: Results full linear regression model with P(LF) as dependent variable.

Model Summary (P(LF) as independent variable) R All_2009_w_o_m agnum = 1 Adjusted R Std. Error of the Model (Selected) R Square Square Estimate 1 .829 a .687 .677 .15748 2 .829 b .687 .677 .15740 3 .829 c .687 .677 .15732 4 .829 d .687 .678 .15724 5 .829 e .687 .678 .15718 6 .829 f .686 .678 .15712 7 .828 g .686 .678 .15708 8 .828 h .686 .678 .15705 9 .828 i .686 .679 .15702 10 .828 j .686 .679 .15699 11 .828 k .685 .679 .15700 12 .827 l .685 .678 .15707 13 .827 m .684 .678 .15716

Statistics P(LF) P(LF) N Valid 1235 Missing 1 Std. Error of Mean .00830 Std. Deviation .29156 Variance .085 Skewness .122 Std. Error of Skewness .070 Kurtosis 1.121 Std. Error of Kurtosis .139 Range 1.00 Minimum .00 Maximum 1.00

Page 85