FACULTEIT ECONOMISCHE EN SOCIALE WETENSCHAPPEN & SOLVAY BUSINESS SCHOOL
ES-Working Paper no. 12
THE CASE FOR PRESCRIPTIVE ANALYTICS: A NOVEL MAXIMUM PROFIT MEASURE FOR EVALUATING AND COMPARING CUSTOMER CHURN PREDICTION AND UPLIFT MODELS
Floris Devriendt and Wouter Verbeke
April 30th, 2018
Vrije Universiteit Brussel – Pleinlaan 2, 1050 Brussel – www.vub.be – [email protected] © Vrije Universiteit Brussel
This text may be downloaded for personal research purposes only. Any additional reproduction for other purposes, whether in hard copy or electronically, requires the consent of the author(s), editor(s). If cited or quoted, reference should be made to the full name of the author(s), editor(s), title, the working paper or other series, the year and the publisher.
Printed in Belgium
Vrije Universiteit Brussel
Faculty of Economics, Social Sciences and Solvay Business School
B-1050 Brussel
Belgium www.vub.be
The case for prescriptive analytics: a novel maximum profit measure for evaluating and comparing customer churn prediction and uplift models
a, a Floris Devriendt ⇤, Wouter Verbeke aData Analytics Laboratory, Faculty of Economic and Social Sciences and Solvay Business School, Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussels, Belgium
Abstract
Prescriptive analytics and uplift modeling are receiving more attention from the business analyt- ics research community and from industry as an alternative and improved paradigm of predictive analytics that supports data-driven decision making. Although it has been shown in theory that prescriptive analytics improves decision-making more than predictive analytics, no empirical evi- dence has been presented in the literature on an elaborated application of both approaches that allows for a fair comparison of predictive and uplift modeling. Such a comparison is in fact prohib- ited by a lack of available evaluation measures that can be applied to predictive and uplift models. Therefore, in this paper, we introduce a novel evaluation metric called the maximum profit uplift measure that allows one to assess the performance of an uplift model in terms of the maximum potential profit that can be achieved by adopting an uplift model. The measure is developed for evaluating customer churn uplift models and for extending the existing maximum profit measure for evaluating customer churn prediction models. Both measures are subsequently applied to a case study to assess and compare the performance of customer churn prediction and uplift models. We find that uplift modeling outperforms predictive modeling and allows one to enhance the profitabil- ity of retention campaigns. The empirical results indicate that prescriptive analytics are superior to predictive analytics in the development of customer retention campaigns. Keywords: Analytics, Business applications, Prescriptive analytics, Uplift modeling, Customer churn prediction, Customer retention
⇤Corresponding author Email addresses: [email protected] (Floris Devriendt), [email protected] (Wouter Verbeke)
Preprint submitted to European Journal of Information Sciences April 9, 2018 1. Introduction
The term business analytics is used as a catch-all term covering a wide variety of what essentially are data-processing techniques. In its broadest sense, business analytics strongly overlaps with data science, statistics, and related fields such as artificial intelligence (AI) and machine learning [1]. Analytics is used as a toolbox containing a variety of instruments and methodologies allowing one to analyze data in support of evidence-based decision-making with the aim of enhancing e ciency, e cacy, and, thus ultimately, profitability. Types of analytical tools, in increasing order, are descriptive, predictive, and prescriptive analytics. While descriptive analytics o↵er insight into current situations, predictive analytics allow one to explain complex relations between variables and to predict future trends. As such, predictive analytics o↵er more uses than descriptive analytics. Currently, prescriptive analytics are receiving more attention from practitioners and scientists in that they add further value by allowing one to simulate the future as a function of control variables to prescribe optimal settings for control variables. At the core of prescriptive analytics is uplift modeling, which is introduced below. In the experiments reported in this article, the use and performance of predictive and prescriptive analytics is thoroughly compared. Business analytics is being applied to an increasingly diverse range of well-specified tasks across a broad variety of industries. Popular examples include tasks related to credit scoring [2, 3], fraud detection [4], and customer churn prediction [5, 6], the latter being the application of interest in this article.
Customer churn prediction models are designed to predict which customers are about to churn and to accurately segment a customer base. This allows a company to target customers that are most likely to churn during a retention marketing campaign, thus improving the e cient use of limited resources for such a campaign, i.e., the return on marketing investment (ROMI), while reducing costs associated with churning [7]. Generally speaking, customer retention is profitable to a company because (1) attracting new clients costs five to six times more than retaining exist- ing customers [8–11]; (2) long-term customers generate more profits, tend to be less sensitive to competitive marketing activities, tend to be less costly to serve, and may generate new referrals through positive word-of-mouth processes, whereas dissatisfied customers might spread negative word-of-mouth messages [12–17]; and (3) losing customers incurs opportunity costs due to a reduc- tion in sales [18]. Therefore, a small improvement in customer retention can lead to a significant increase in profits [19].
However, it has been reported that marketing actions undertaken to retain customers may actually provoke the opposite behavior and may cause or motivate a customer to churn. As noted in Radcli↵e and Simpson [20], churn risk is highly correlated with customer dissatisfaction, and the goal in turn becomes to prevent a dissatisfied customer from actually leaving. Any attempt made to contact a dissatisfied customer with the goal of retaining him or her can actually hasten
2 the process and provoke the customer to leave earlier than expected [20]. Therefore, it is necessary to evaluate the e↵ectiveness of a retention campaign at the individual customer level. Predictive models fail to di↵erentiate between customers who respond favorably (i.e., who do not churn) to a campaign and customers who respond favorably on their own accord regardless of a campaign (i.e., who would not have churned in any case and who were not targeted by a campaign).
To address this shortcoming of predictive models, uplift modeling has recently been proposed as an alternative means of identifying customers who are likely to be persuaded by a promotional marketing campaign, rather than predicting whether customers are likely to respond to a promo- tional marketing campaign (which may or may not be the result of the campaign). Uplift modeling can be applied to identify customers who are likely to be retained through a retention campaign as an alternative to predicting whether customers are likely to churn [21]. More precisely, uplift modeling aims at establishing the net di↵erence in customer behavior resulting from a specific treat- ment a↵orded to customers, e.g., a reduction in the likelihood to churn with retention campaign targeting.
In this paper we aim to contrast customer churn prediction (CCP) and customer churn uplift (CCU) modeling for customer retention by comparing their performance when applied to an ex- perimental case study of the financial industry. To compare the performance of these approaches, a common evaluation procedure is applied. However, given the di↵erent forms of output that these models produce, to evaluate prediction and uplift models, di↵erent performance measures are used. In evaluating classification models and, more specifically, CCP models, the receiver operating char- acteristic (ROC) curve or lift curve are typically used. Performance can be expressed as the area under the ROC curve, as the top decile lift or as the (expected) maximum profit. In evaluating uplift models, the Qini curve and uplift per decile plots are typically used. Performance is typically reported in terms of the Qini index or top decile uplift. As the goal of customer churn modeling is to maximize ROMI, in Verbeke et al. [22], the authors introduce the maximum profit (MP) measure for evaluating CCP models. The MP measure calculates profit generated when considering the op- timal fraction of top-ranked customers according to the CCP model of a retention campaign. The MP measure allows one to determine the optimal model and fraction of customers to include, yield- ing a significant increase in profitability relative to that achieved when using statistical measures [22–24].
In this article, we extend the MP measure to evaluate the performance of CCU models, and we introduce the maximum profit for uplift (MPU) measure. Both the MP and MPU measure are then used to compare the performance of CCP and CCU logistic regression and random forest models through an experimental case study. Our main contributions are threefold: 1. We introduce an application of uplift modeling for customer retention. 2. We extend the maximum profit measure for evaluating uplift models.
3 3. We apply and compare CCP and CCU models through an experimental case study of the financial industry.
This paper is structured as follows. In Section 2, we first introduce customer churn prediction modeling before discussing uplift modeling as an alternative approach to predictive modeling. Then in Section 3, the MP measure for CCP models is defined and extended for application to customer churn uplift models. In Section 4, we describe the experimental design of the case study and then discuss the results of our experiments. Finally, in Section 5, conclusions are given.
2. Literature
In Section 2.1, customer churn prediction is introduced along with current standard approaches as described in the literature and adopted in industry. Then in Section 2.2, we describe uplift modeling and discuss the most prominent uplift modeling techniques and performance measures developed for evaluating uplift models.
2.1. Customer Churn Prediction
Customer churning, which is also referred to as customer attrition or customer defection, is defined as the loss or outflow of customers from the customer base [25]. In saturated markets, there are limited opportunities to attract new customers, and hence, retaining existing customers is considered essential to maintaining profitability. In the telecommunications industry, it is estimated that attracting a new customer costs five to six times more than retaining an existing customer [8, 22, 26]. Established customers are more profitable due to the lower costs required to serve them, and a sense of brand loyalty they have developed over time renders them less likely to churn. Loyal customers tend to be satisfied customers who also serve as word-of-mouth advertisers, referring new customers to a given company. In the context of a financial institution as described in the case study given in Section 4, a definition of churning is naturally present in the data, i.e., contract termination.
Churning is typically addressed by developing a prediction model, i.e., a classification model such as a logistic regression or a decision tree model. Such a model estimates for each customer the probability for a customer to churn during a subsequent period of time. Then, it is straightforward to o↵er customers presenting the highest churn probability with an incentive, e.g., a discount or another promotional o↵er, to encourage them to extend their contracts or to keep their accounts active. In other words, customers who are susceptible to churn can be targeted through a retention campaign. Accurate predictions are perhaps the most apparent goal of developing a customer churn
4 prediction model, but determining reasons for (or at least indicators of) churning is also invaluable to a company. Comprehensible models can o↵er novel insight into correlations between customer behavior and the propensity to churn [7], allowing management teams to address factors leading to churning and to target the customers before they decide to churn.
Numerous classification techniques have been adopted for churn prediction, including traditional statistical methods, such as logistic regression [27, 28], and non-parametric statistical models, such as k-nearest neighbor models [29], decision trees [30, 31], ensemble methods [5, 32], support vector machines [33–35] and neural networks [22, 36, 37]. Additionally, social network analysis has been successfully adopted to predict customer churning [6, 26, 38] in addition to survival analysis, which can be used to estimate the timing of customer churning. These analyses focus on the profitability of a customer’s lifetime rather than on a single moment in time [39, 40]. For an extensive literature review on customer churn prediction modeling, one may refer to Verbeke et al. [7]. The results of an extensive benchmarking experiment are reported in Verbeke et al. [22], confirming the no-free- lunch theorem in application to customer churn prediction, with no modeling technique consistently winning across the various datasets. Recent work on customer churn prediction is covered in [6, 41– 44].
2.2. Uplift Modeling
In Section 2.2.1, a brief introduction of uplift modeling is provided. In Section 2.2.2, an overview of the most prominent uplift modeling techniques is presented. Finally, in Section 2.2.3, evaluation measures for assessing the performance of uplift models are discussed.
2.2.1. Definition
Generally speaking, uplift modeling aims to establish the net e↵ect of applying a treatment to an outcome. When adopted for customer relationship management and, more specifically, for response modeling, uplift models are developed to di↵erentiate between customers who respond favorably as a result of being targeted with a campaign, i.e., being treated, and customers who respond favorably on their own accord regardless of being targeted with a campaign or not. Note that the outcome, i.e., response, may mean that a customer begins or continues to purchase a product or service in the case of acquisition and retention modeling, respectively, or that a customer purchases more or additional products or services in the case of up-sell or cross-sell modeling, respectively.
Conceptually, a customer base can be divided into four categories along two dimensions, as shown in Figure 1[1, 45]:
5 1. Sure Things. Customers who would always respond. Targeting Sure Things does not generate additional returns but does generate additional costs, i.e., the fixed costs of contacting a customer and possibly a cost related to a financial incentive o↵ered to targeted customers. 2. Lost Causes. Customers who would never respond (regardless of which campaign is used). Lost Causes will not generate additional revenues, yet they do generate additional costs, although these are lower than the costs of Sure Things. Lost Causes do not take advantage of financial incentives o↵ered, which are an additional cost that we do take into account for Sure Things. 3. Do-Not-Disturbs. Customers who would not respond only because they are exposed to a cam- paign. They will respond when not targeted but will not respond when they are. For example, populations targeted for retention e↵orts can have an adverse reaction, for example, withdrawing from the delivered product or service. Including Do-Not-Disturbs in a campaign thus generates no additional revenues but comes with considerable additional costs. 4. Persuadables. Customers who respond only because they have been exposed to a campaign. They respond only when contacted and cause a campaign to generate additional revenues, and as such, a net profit after the subtraction of costs is generated by including other types of customers.
Figure 1: The four theoretical classes.
The aim of uplift modeling is to allow for the targeting of Persuadables while avoiding Do-not- Disturbs. From the perspective of a retention campaign, the last category is sometimes referred to as sleeping dogs since, as long as these customers are not disturbed, they will continue to provide benefits. Note that this classification is campaign dependent. It is possible for a customer to be a Lost Cause when a campaign o↵ers a 5% discount for a next purchase, whereas that same customer is a Persuadable when a campaign o↵ers a 20% discount. In others words, the classification is dependent on the treatment given when all customers are treated similarly. In general, uplift modeling involves determining optimal settings for control variables such as a dummy treatment variable denoting whether a customer is targeted with a campaign to optimize a result or e↵ect. Although in most, not to say all, studies on uplift modeling for marketing applications, control variables are typically dummy variables that indicate whether a customer is targeted or not, these
6 control variables may also be continuous or multivalue categorical variables, e.g., the discount or contact channel. Clearly, uplift modeling may have applications to various settings and to many di↵erent purposes. In this article, we focus on the goal of customer retention.
Uplift modeling for customer retention has been documented in relatively few cases. Radcli↵e and Simpson [20] applied uplift modeling to two retention campaigns in telecommunications. One campaign was highly e↵ective and profitable, whereas the other was counter-productive and incurred losses. However, both campaigns improved conditions in terms of reducing churn as a result of uplift modeling. In Guelman et al. [21], the authors applied uplift modeling to an insurance setting. Although the treatment almost had a neutral impact on retention for the entire sample, they found that the impact of the treatment might have been di↵erent for specific subgroups of the customer base. They reported that uplift modeling allowed them to predict the expected change in probability for a customer to switch to another company when targeted by a campaign. To the best of our knowledge, no cases presented in the literature report on the application of uplift modeling to the context of a financial institution and to churning in reference to financial services.
We assume that a sample of customers is randomly divided into two groups defined as the treatment group and control group. A customer is either in the treatment group, i.e., is influenced by the campaign, or in the control group, i.e., is not influenced by the campaign. As a formal definition, let X be a vector of inputs or predictor variables, X = X ,...,X , and let Y be the { 1 n} binary outcome variable, Y 0, 1 , that responds favorably or not. Let the treatment variable T 2{ } denote whether a customer belongs to the treatment group, T = 1, or to the control group, T = 0. P denotes the probability as estimated by the model. Uplift is then defined for customer i with characteristics xi as the probability of responding favorably (i.e., yi = 1) when treated (i.e., for ti = 1) minus the probability of responding favorably when not treated (i.e., for ti = 0): U(x ):=P (y =1x ; t = 1) P (y =1x ; t = 0) (1) i i | i i i | i i In essence, uplift is the di↵erence in outcome, e.g., customer behavior, resulting from a treatment. Uplift modeling aims at estimating uplift as a function of treatment and customer characteristics.
2.2.2. Techniques
Uplift modeling techniques can be grouped into data preprocessing and data processing ap- proaches. The first group adopts traditional predictive analytics in an adapted setup for learning an uplift model, whereas the second group applies adapted predictive analytics in developing uplift models. Table 1 shows the most prominent and frequently adopted approaches to uplift modeling.
Data preprocessing approaches. Data preprocessing approaches include transformation approaches, which redefine a target variable, and approaches that allow one to estimate uplift by defining and
7 Preprocessing Transformation [46, 47] Variable Selection Procedure [48, 49] Data processing Two-Model Approach [50, 51] Direct Estimation [52–54]
Table 1: Most frequently cited uplift modeling approaches. selecting additional predictor variables.
The first group of data preprocessing approaches defines a transformed target variable that is estimated. A customer cannot be assigned to any of the four groups shown in Figure 1, as this information is unavailable and cannot be retrieved. However, we do know whether a customer formed part of the treatment or control group and whether a customer responded or not. Hence, customers can be assigned to any of the following four groups: treatment responders, treatment non-responders, control responders and control non-responders. Techniques such as Lai’s approach [46, 47] and pessimistic uplift modeling [55] make use of these four groups to define a transformed target variable and as such transform the uplift modeling problem into a binary classification problem. Any standard classification technique can be applied to this problem to yield an uplift model.
The second group of data preprocessing approaches extends the set of predictor variables of the model to allow for the estimation of uplift. In Kane et al. [47], Lo [48], an uplift modeling approach that groups the treatment and control group into a single sample for response model estimation is proposed. A dummy variable is introduced to denote the group of origin for each customer. A model is then developed from the original predictor variables, the added dummy variable and interaction variables between the predictor and dummy variables. Subsequently, any predictive modeling approach can be adopted with this setup yielding an uplift model.
Data processing approaches. Among the data processing approaches, further di↵erentiations can be made between indirect and direct estimation approaches.
Indirect estimation approaches include the two-model or naive approach, which is a simple and intuitive approach to uplift modeling. Two separate predictive models can be identified: one for the treatment group, MT , and one for the control group, MC with both estimating the probability of a given response. The aggregated uplift model, MU , then subtracts the response probabilities
8 resulting from both models to find the uplift:
M = M M . (2) U T C This approach has the benefit of being straightforward to implement, and similar to both data pre- processing approaches, it allows one to adopt standard predictive modeling approaches. However, the approach only appears to apply to the simplest of cases [50, 51]. As the main disadvantage of the two models, they are built independent of one another; as such, they are not necessarily aligned in terms of the predictor variables included, and the errors of independent estimates can reinforce one another, generating significant errors in uplift estimates [53].
Alternatively, uplift can be directly modeled. Given the group-based nature of the uplift mod- eling problem, the most frequently adopted direct estimation approaches are tree-based methods that subsequently split the population into smaller segments. Uplift tree approaches are adapted from well-known algorithms such as classification and regression trees (CART) [56] or chi-square automatic interaction detection (CHAID) methods [57] applying modified splitting criteria and pruning approaches. Examples of tree-based uplift modeling approaches include the significance- based uplift trees proposed in Radcli↵e and Surry [53], decision trees making use of information theory-inspired splitting criteria presented in Rzepakowski and Jaroszewicz [54], and uplift random forests and causal conditional trees introduced in Guelman et al. [58].
2.2.3. Evaluation
Despite its clear potential to improve upon predictive modeling outcomes, uplift modeling su↵ers from a lack of intuitive evaluation measures for assessing the performance of a model either in an absolute sense or relative to other models. In the literature on uplift modeling, either charts are used [48, 51] or an adapted version of the Gini coe cient is used, i.e., the Qini coe cient [47, 52].
In predictive modeling, evaluation metrics typically assess the error of point-wise estimates made by a model on each observation for a hold-out test set by comparing observed and actual outcomes and by summarizing observed errors. However, in uplift modeling, the actual outcome estimated, i.e., uplift, is unobserved. As a customer cannot occupy both the treatment and control group, i.e., cannot be treated and not-treated simultaneously, uplift (or, as indicated above, the group shown in Figure 1 to which a customer belongs) cannot be observed for an individual customer. Therefore, evaluation measures adopted in predictive modeling cannot be used. Instead, uplift can be observed and uplift estimates can be evaluated by comparing di↵erences in the behaviors of equivalent subgroups of the treatment and control groups [53].
The performance of an uplift model can be visualized by plotting the cumulative di↵erence in response rates between treatment and control groups as a function of the selected fraction x of
9 customers ranked by the uplift model from high to low values of estimated uplift. This curve is referred to as the cumulative uplift, as cumulative incremental gains, or as the Qini curve [52]. The cumulative di↵erence in the response rate is measured as the absolute or relative number of additional favorable responders, i.e., respectively expressed as the additional number in terms of the number of favorable responders or the fraction of the total population. Note that performance is evaluated by comparing groups of observations rather than individual observations. An example is provided in Figure 2.
Figure 2: Incremental gains or Qini curve.
The Qini metric is a measure related to the Qini curve. It measures the area between the Qini curve of the uplift model and the Qini curve of the baseline random model (see Figure 2). The measure is an adapted version of the Gini metric, which in turn is related to the Gini curve (or the cumulative gains curve) [52].
Although uplift models are developed and adopted to enhance the e ciency and returns of retention campaigns, few articles assess the costs and benefits of applying uplift modeling. In Hansotia and Rukstales [59], the authors compute the incremental return on investment at the gross margin level. These gross profits are then considered as a contribution to the overhead and to net profits [59]. In Radcli↵e [52], the incremental profit is calculated by multiplying the incremental response rate by the total profit. In the next section, we analyze involved costs and benefits and develop a profit-driven approach to evaluating customer churn uplift models.
3. Maximum Profit Measure
The first part of this section discusses the Maximum Profit measure, as introduced in Verbeke et al. [22]. In the second part, we extend the Maximum Profit measure for evaluating customer
10 churn uplift models to compare customer churn prediction and uplift models in Section 4.
3.1. Customer churn prediction models
To maximize the e ciency and returns of a retention campaign, typically, a limited fraction of customers is targeted and given an incentive to remain loyal. Therefore, customer churn prediction models are often evaluated using, for instance, the top-decile lift measure, which only accounts for the performance of the model regarding the top 10% of customers with the highest predicted probabilities of churning. Recently, Verbeke et al. [22] demonstrated that from a profit-centric point of view, using the top decile lift can be expected to result in sub-optimal model selection. Instead, the maximum profit (MP) measure is proposed, which calculates the profit generated when considering the optimal fraction of top-ranked customers using a model for a retention campaign. In essence, this measure evaluates a customer churn prediction model at the cuto↵leading to the maximum profit rather than at an arbitrary cuto↵such as 10%. Performance is expressed as the profit in monetary units that can be achieved by adopting the model for selecting customers to be targeted in a retention campaign. This, as shown by the authors, can yield a significant increase in profits relative to adopting statistical measures and to selecting a fixed fraction of customers to be targeted in an arbitrary or expert-based manner [22].
To calculate profits generated from a retention campaign, we analyze the dynamic process of customer flows in a company (Figure 3). The process involves customers entering by subscribing to the services of an operator and then leaving by churning. To prevent customers from churning, retention campaigns can be established with the goal of retaining customers.
A customer churn prediction model allows one to rank customers based on their probability of churning from high to low. This subsequently allows one to select and target customers with the highest probability of churning from a campaign. The profits of a retention campaign can then be formulated as [27]:
⇧=N↵[ (b c c )+ (1 )( c ) contact incentive contact +(1 )( c c )] (3) contact incentive A with ⇧denoting the profit generated by the campaign, N denoting the number of customers included in the customer base, ↵ denoting the fraction of the customer base targeted by the retention campaign and o↵ered an incentive to remain loyal, denoting the fraction of true would-be churners of customers targeted by the retention campaign, denoting the fraction of targeted would-be churners deciding to remain due to incentives (i.e., the success rate of incentives), b denoting the benefits of the retained customers, ccontact denoting the cost of contacting a customer to o↵er him
11 Figure 3: Visual representation of Neslin et al. [27]’s formula. Colors indicate matching parts of the formula and schematics.
or her the incentive, cincentive denoting the cost of the incentive to the firm when a customer accepts and stays and A denoting the fixed administrative costs of running the churn management program.
The profit formula can be divided into five parts. We highlight each part below and in the visual representation of the formula given in Figure 3:
(a) N↵ denotes that the costs and profits of a retention campaign are solely related to customers targeted by the campaign (with the exception of A).
(b) (b c c ) denotes the profits generated by the campaign, i.e., the reduction contact incentive in lost revenues minus the cost of the campaign b c c by retaining a fraction contact incentive of would-be churners of the fraction of correctly identified would-be churners included in the campaign.
(c) (1 )( c ) reflects part of the costs of the campaign, i.e., the cost of including correctly contact identified would-be churners who were not retained.
(d) (1 )( c c ) reflects part of the costs of the campaign, i.e., the cost resulting contact incentive from targeting non-churners through the campaign; these customers are expected to take advantage of the incentive o↵ered to them through the retention campaign.
12 (e) A reflects the fixed administrative cost that reduces the overall profitability of a retention campaign.
As noted in Neslin et al. [27], reflects the capacity of the predictive model to identify would-be churners and can be expressed as: