Machine Learning Based Prediction and Classification for Uplift Modeling

Total Page:16

File Type:pdf, Size:1020Kb

Machine Learning Based Prediction and Classification for Uplift Modeling DEGREE PROJECT IN MATHEMATICS, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2020 Machine Learning Based Prediction and Classification for Uplift Modeling LOVISA BÖRTHAS JESSICA KRANGE SJÖLANDER KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ENGINEERING SCIENCES Machine Learning Based Prediction and Classification for Uplift Modeling LOVISA BÖRTHAS JESSICA KRANGE SJÖLANDER Degree Projects in Mathematical Statistics (30 ECTS credits) Degree Programme in Applied and Computational Mathematics (120 credits) KTH Royal Institute of Technology year 2020 Supervisor at KTH: Tatjana Pavlenko Examiner at KTH: Tatjana Pavlenko TRITA-SCI-GRU 2020:002 MAT-E 2020:02 Royal Institute of Technology School of Engineering Sciences KTH SCI SE-100 44 Stockholm, Sweden URL: www.kth.se/sci Abstract The desire to model the true gain from targeting an individual in marketing purposes has lead to the common use of uplift modeling. Uplift modeling requires the existence of a treatment group as well as a control group and the objective hence becomes estimating the difference between the success probabilities in the two groups. Efficient methods for estimating the probabilities in uplift models are statistical machine learning methods. In this project the different uplift modeling approaches Subtraction of Two Models, Modeling Uplift Directly and the Class Variable Transformation are investigated. The statistical machine learning methods applied are Random Forests and Neural Networks along with the standard method Logistic Regression. The data is collected from a well established retail company and the purpose of the project is thus to investigate which uplift modeling approach and statistical machine learning method that yields in the best performance given the data used in this project. The variable selection step was shown to be a crucial component in the modeling processes as so was the amount of control data in each data set. For the uplift to be successful, the method of choice should be either the Modeling Uplift Directly using Random Forests, or the Class Variable Transformation using Logistic Regression. Neural network - based approaches are sensitive to uneven class distributions and is hence not able to obtain stable models given the data used in this project. Furthermore, the Subtraction of Two Models did not perform well due to the fact that each model tended to focus too much on modeling the class in both data sets separately instead of modeling the difference between the class probabilities. The conclusion is hence to use an approach that models the uplift directly, and also to use a great amount of control data in the data sets. Keywords Uplift Modeling, Data Pre-Processing, Predictive Modeling, Random Forests, Ensemble Methods, Logistic Regression, Machine Learning, Mulit-Layer Perceptron, Neural Networks. i Abstract Behovet av att kunna modellera den verkliga vinsten av riktad marknadsföring har lett till den idag vanligt förekommande metoden inkrementell responsanalys. För att kunna utföra denna typ av metod krävs förekomsten av en existerande testgrupp samt kontrollgrupp och målet är således att beräkna differensen mellan de positiva utfallen i de två grupperna. Sannolikheten för de positiva utfallen för de två grupperna kan effektivt estimeras med statistiska maskininlärningsmetoder. De inkrementella responsanalysmetoderna som undersöks i detta projekt är subtraktion av två modeller, att modellera den inkrementella responsen direkt samt en klassvariabeltransformation. De statistiska maskininlärningsmetoderna som tillämpas är random forests och neurala nätverk samt standardmetoden logistisk regression. Datan är samlad från ett väletablerat detaljhandelsföretag och målet är därmed att undersöka vilken inkrementell responsanalysmetod och maskininlärningsmetod som presterar bäst givet datan i detta projekt. De mest avgörande aspekterna för att få ett bra resultat visade sig vara variabelselektionen och mängden kontrolldata i varje dataset. För att få ett lyckat resultat bör valet av maskininlärningsmetod vara random forests vilken används för att modellera den inkrementella responsen direkt, eller logistisk regression tillsammans med en klassvariabeltransformation. Neurala nätverksmetoder är känsliga för ojämna klassfördelningar och klarar därmed inte av att erhålla stabila modeller med den givna datan. Vidare presterade subtraktion av två modeller dåligt på grund av att var modell tenderade att fokusera för mycket på att modellera klassen i båda dataseten separat, istället för att modellera differensen mellan dem. Slutsatsen är således att en metod som modellerar den inkrementella responsen direkt samt en relativt stor kontrollgrupp är att föredra för att få ett stabilt resultat. ii Acknowledgements We would like to thank Mattias Andersson at Friends & Insights who is the key person who made this project happen to begin with. A great thanks for introducing us to the uplift modeling technique, and for suggesting our thesis project for the CRM department at the retail company. We would also like to thank Elin Thiberg at the retail company who supervised us when in need, and who gladly answered every question we had regarding the structure of the different data sets. Another person at the retail company who was supporting and guided us in the right direction was Sara Grünewald and for that we are truly grateful. Last but not least, we would like to send a great thank you to our examiner and supervisor, Professor Tatjana Pavlenko, for providing professional advise and for guiding us during our meetings. iii Contents 1 Introduction 1 1.1 Background ...................................... 1 1.2 Problem ........................................ 2 1.3 Purpose and Goal .................................. 2 1.3.1 Ethics ..................................... 2 1.4 Data .......................................... 3 1.5 Methodology ..................................... 3 1.6 Delimitations and Challenges ............................ 4 1.7 Outline ........................................ 4 2 Theoretical Background and Related Work 6 3 Data 8 3.1 Markets and Campaigns ............................... 8 3.2 Variables ....................................... 8 4 Methods and Theory 11 4.1 Data Pre-Processing ................................. 11 4.1.1 Data Cleaning ................................ 12 4.1.2 Variable Selection and Dimension Reduction . 14 4.1.3 Binning of Variables ............................. 17 4.2 Uplift Modeling .................................... 18 4.2.1 Subtraction of Two Models ......................... 19 4.2.2 Modeling Uplift Directly ........................... 19 4.2.3 Class Variable Transformation ....................... 20 4.3 Classification and Prediction ............................ 22 4.3.1 Logistic Regression ............................. 23 4.3.2 Random Forests ............................... 24 4.3.3 Neural Networks ............................... 25 4.3.4 Cross Validation ............................... 30 4.4 Evaluation ...................................... 30 4.4.1 ROC Curve .................................. 31 4.4.2 Qini Curve .................................. 32 4.5 Programming Environment of Choice ....................... 33 5 Experiments and Results 35 5.1 Data Pre-Processing ................................. 35 5.1.1 Data Cleaning ................................ 35 5.2 Uplift Modeling and Classification ......................... 38 5.2.1 Random Forests ............................... 38 5.2.2 Logistic Regression ............................. 40 5.2.3 Neural Networks ............................... 44 5.2.4 Cutoff for Classification of Customers ................... 48 6 Conclusions 49 v 6.1 Discussion ...................................... 49 6.2 Future Work ..................................... 51 6.3 Final Words ...................................... 52 References 53 vi 1 Introduction This thesis begins with a general introduction to the area for the degree project, presented in the following subsections. 1.1 Background In retail and marketing, predictive modeling is a common tool used for targeting and evaluating the response from individuals when an action is taken on. The action is normally refereed to a campaign or offer that is sent out to the customers and the response to model is the likelihood that a specific customer will act on the offer. Put differently, in traditional response models, the objective is to predict the conditional class probability P (Y = 1jX = x) where the response Y 2 f0; 1g reflects whether a customer responded positively (i.e. made a purchase) to an action or not (i.e. did not make a purchase). X = (X1; :::; Xp) are the quantitative and qualitative attributes of the customer and x is one observation. Using traditional response modeling, the resulting classifier can then be used to select what customers to target when sending out campaigns or offers in a marketing purpose. In reality, this is not always the desirable approach to use since the targeted customers are those who are most likely to react positively to the offer after the offer has been sent out. The solution is thus to use a second order approach recognized as uplift modeling. The original idea behind uplift modeling is to use two separate train sets and test sets, namely one train and test set containing a treatment group and one train and test set containing a control group. The customers in the treatment group are subject to an action whereas the customers in the control group are not. Uplift modeling thus aims at modeling
Recommended publications
  • Uplift Modeling in Retail Marketing Using SAS
    Uplift Modeling in Retail Marketing using SAS Aashish Arjun Bugalia, Oklahoma State University, Stillwater, OK Agastya Komarraju, Oklahoma State University, Stillwater, OK Abstract Direct marketing campaigns are often targeted to randomly selected customers which results in huge costs for the company. Also, such campaigns result in customer frustration and makes them less likely to react to further communications in future. Companies involved in direct marketing campaigns often use a random response model to target customers for the campaigns. An alternative approach can be through uplift modeling - precise targeting of the beneficial customers resulting in greater return on invested money and resources by the company. Based on the marketing and finance literature, this article looks at high prediction accuracy for the probability of purchase based on a sample of customers, to whom a pilot campaign has been sent. Uplift modeling analyzes the causal effect of an action such as a marketing campaign on a given individual by considering difference in response rate between a treated group and a randomized control group. The resulting model can then be used to select individuals for whom the action will be most profitable. This article aims at predicting beneficial customers to an online retailer with the implementation of several statistical, machine learning and deep learning methods in SAS. Through above-mentioned methods this paper will also help to know effectiveness of the campaign by determining incremental gains, thus resulting in greater return on invested money and resources by the company. Keywords: Direct marketing, Uplift Modeling, Decision Tree, Logistic Regression, Random Forest, Neural network, Decision tree, Ensemble methods 1 Introduction A marketing campaign comprises of a time bound offer sent to a specific set of people to observe a specific behavior and has a specific reward attached to it.
    [Show full text]
  • ES-Working Paper No. 12
    FACULTEIT ECONOMISCHE EN SOCIALE WETENSCHAPPEN & SOLVAY BUSINESS SCHOOL ES-Working Paper no. 12 THE CASE FOR PRESCRIPTIVE ANALYTICS: A NOVEL MAXIMUM PROFIT MEASURE FOR EVALUATING AND COMPARING CUSTOMER CHURN PREDICTION AND UPLIFT MODELS Floris Devriendt and Wouter Verbeke April 30th, 2018 Vrije Universiteit Brussel – Pleinlaan 2, 1050 Brussel – www.vub.be – [email protected] © Vrije Universiteit Brussel This text may be downloaded for personal research purposes only. Any additional reproduction for other purposes, whether in hard copy or electronically, requires the consent of the author(s), editor(s). If cited or quoted, reference should be made to the full name of the author(s), editor(s), title, the working paper or other series, the year and the publisher. Printed in Belgium Vrije Universiteit Brussel Faculty of Economics, Social Sciences and Solvay Business School B-1050 Brussel Belgium www.vub.be The case for prescriptive analytics: a novel maximum profit measure for evaluating and comparing customer churn prediction and uplift models a, a Floris Devriendt ⇤, Wouter Verbeke aData Analytics Laboratory, Faculty of Economic and Social Sciences and Solvay Business School, Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussels, Belgium Abstract Prescriptive analytics and uplift modeling are receiving more attention from the business analyt- ics research community and from industry as an alternative and improved paradigm of predictive analytics that supports data-driven decision making. Although it has been shown in theory that prescriptive analytics improves decision-making more than predictive analytics, no empirical evi- dence has been presented in the literature on an elaborated application of both approaches that allows for a fair comparison of predictive and uplift modeling.
    [Show full text]
  • Effective Extraction of Small Data from Large Database by Using Data Mining Technique
    International Journal of Scientific & Engineering Research, Volume 7, Issue 4, April-2016 372 ISSN 2229-5518 Effective Extraction of Small Data from Large Database by using Data mining Technique. Mr. Parag Satish Kulkarni Miss. Prajakta Arjun Jejure B.E, A.M.I.E, D.M.E, B.com, PGDOM, B.E, M.B.A Student M.B.A, Student Department of Information Technology Department of Operations Management, K.K.W.I.E.R COE, Symbiosis Institute of Operations Management Savitribai Phule Pune University. Symbiosis International University. Abstract: The demand for extracting meaningful patterns in various applications is very necessary. Data mining is the process of automatically extracting meaningful patterns from usually very large quantities of seemingly unrelated data. When used in conjunction with the appropriate visualization tools, data mining allows the researcher to use highly advanced pattern-recognition skills and knowledge of molecular biology to determine which results warrant further study. Data mining is an automated means of reducing the complexity of data in large bioinformatics databases and of discovering meaningful, useful patterns and relationships in data. Data mining is one stage in an overall knowledge-discovery process. It is an iterative process in which preceding processes are modified to support new hypotheses suggested by the data. The process of data mining is concerned with extracting patterns from the data, typically using classification, regression, link analysis, and segmentation or deviation detection. Keywords: KDD, Computational process, Artificial Intelligence, Data pre processing, Data mining. —————————— —————————— Introduction: Data mining is the process of creating insightful, interesting, and novel patterns, also descriptive, predictive modelsIJSER and understandable from large size data.
    [Show full text]
  • Machine Learning for Marketing Decision Support
    Machine Learning for Marketing Decision Support Doctoral Thesis to acquire the academic degree of doctor rerum politicarum (Doctor of Economics and Management Science) submitted to School of Business and Economics Humboldt-Universität zu Berlin by M.Sc. Johannes Sebastian Haupt President of Humboldt-Universität zu Berlin: Prof. Dr.-Ing. Dr. Sabine Kunst Dean of the School of Business and Economics: Prof. Dr. Daniel Klapper Reviewers: 1. Prof. Dr. Stefan Lessmann 2. Prof. Dr. Daniel Guhl Date of Colloquium: 24 June 2020 2 Abstract The digitization of the economy has fundamentally changed the way in which companies interact with customers and made customer targeting a key intersection of marketing and information systems. Marketers can choose very specifically which customers to serve with a marketing mes- sage based on detailed demographic and behavioral information. Building models of customer behavior at the scale of modern customer data requires development of tools at the intersection of data management and statistical knowledge discovery. The application of these models for successful targeting requires deep understanding of the underlying marketing decision problem and awareness of the ethical implications of data collection. This dissertation widens the scope of research on predictive modeling by focusing on the in- tersections of model building with data collection and decision support. Its goals are 1) to develop and validate new machine learning methods explicitly designed to optimize customer targeting decisions in direct marketing and customer retention management and 2) to study the implications of data collection for customer targeting from the perspective of the company and its customers. The thesis addresses the first goal by proposing methods that utilize the richness of e-commerce data, reduce the cost of data collection through efficient experiment design and address the tar- geting decision setting during model building.
    [Show full text]
  • Predictive Analytics Using R
    Predictive Analytics using R Dr. Jeffrey Strickland is a Senior Predictive Analytics Consultant with over 20 years of Predictive Analytics expereince in multiple industiries including financial, insurance, defense and NASA. He is a subject matter expert on mathematical and statistical modeling, as well as machine using R learning. He has published numerous books on modeling and simulation. Dr. Strickland resides in Colorado. This book is about predictive analytics. Yet, each chapter could easily be handled by an entire volume of its own. So one might think of this a survey of predictive modeling, both statistical (parametric and nonparametric), as well as machine learning. We define predictive model as a statistical model or machine learning model used to predict future behavior based on past behavior. In order to use this book, one should have a basic understanding of mathematical statistics (statistical inference, models, tests, etc.) — this is an advanced book. Some theoretical foundations are laid out (perhaps subtlety) but not proven, but references are provided for additional coverage. Every chapter culminates in an example using R. R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. To download R, please choose your preferred CRAN mirror at http://www.r-project.org/. An introduction to R is also available at http://cran.r-project.org /doc/manuals/r-release/R-intro.html. The book is organized so that statistical models are presented first (hopefully in a logical order), followed by machine learning models, and then applications: uplift modeling and time series.
    [Show full text]
  • Decision Trees for Uplift Modeling with Single and Multiple Treatments
    Knowl Inf Syst (2012) 32:303–327 DOI 10.1007/s10115-011-0434-0 REGULAR PAPER Decision trees for uplift modeling with single and multiple treatments Piotr Rzepakowski · Szymon Jaroszewicz Received: 27 January 2011 / Revised: 16 May 2011 / Accepted: 12 July 2011 / Published online: 29 July 2011 © The Author(s) 2011. This article is published with open access at Springerlink.com Abstract Most classification approaches aim at achieving high prediction accuracy on a given dataset. However, in most practical cases, some action such as mailing an offer or treating a patient is to be taken on the classified objects, and we should model not the class probabilities themselves, but instead, the change in class probabilities caused by the action. The action should then be performed on those objects for which it will be most profitable. This problem is known as uplift modeling, differential response analysis, or true lift modeling, but has received very little attention in machine learning literature. An important modification of the problem involves several possible actions, when for each object, the model must also decide which action should be used in order to maximize profit. In this paper, we present tree- based classifiers designed for uplift modeling in both single and multiple treatment cases. To this end, we design new splitting criteria and pruning methods. The experiments confirm the usefulness of the proposed approaches and show significant improvement over previous uplift modeling techniques. Keywords Uplift modeling · Decision trees · Randomized controlled trial · Information theory 1 Introduction and notation In most practical problems involving classification, the aim of building models is to later use them to select subsets of cases to which some action is to be applied.
    [Show full text]