FACULTEIT ECONOMISCHE EN SOCIALE WETENSCHAPPEN & SOLVAY BUSINESS SCHOOL

ES-Working Paper no. 12

THE CASE FOR PRESCRIPTIVE ANALYTICS: A NOVEL MAXIMUM PROFIT MEASURE FOR EVALUATING AND COMPARING CUSTOMER CHURN PREDICTION AND UPLIFT MODELS

Floris Devriendt and Wouter Verbeke

April 30th, 2018

Vrije Universiteit Brussel – Pleinlaan 2, 1050 Brussel – www.vub.be – [email protected] © Vrije Universiteit Brussel

This text may be downloaded for personal research purposes only. Any additional reproduction for other purposes, whether in hard copy or electronically, requires the consent of the author(s), editor(s). If cited or quoted, reference should be made to the full name of the author(s), editor(s), title, the working paper or other series, the year and the publisher.

Printed in Belgium

Vrije Universiteit Brussel

Faculty of Economics, Social Sciences and Solvay Business School

B-1050 Brussel

Belgium www.vub.be

The case for prescriptive analytics: a novel maximum profit measure for evaluating and comparing customer churn prediction and uplift models

a, a Floris Devriendt ⇤, Wouter Verbeke aData Analytics Laboratory, Faculty of Economic and Social Sciences and Solvay Business School, Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussels, Belgium

Abstract

Prescriptive analytics and uplift modeling are receiving more attention from the business analyt- ics research community and from industry as an alternative and improved paradigm of predictive analytics that supports data-driven decision making. Although it has been shown in theory that prescriptive analytics improves decision-making more than predictive analytics, no empirical evi- dence has been presented in the literature on an elaborated application of both approaches that allows for a fair comparison of predictive and uplift modeling. Such a comparison is in fact prohib- ited by a lack of available evaluation measures that can be applied to predictive and uplift models. Therefore, in this paper, we introduce a novel evaluation metric called the maximum profit uplift measure that allows one to assess the performance of an uplift model in terms of the maximum potential profit that can be achieved by adopting an uplift model. The measure is developed for evaluating customer churn uplift models and for extending the existing maximum profit measure for evaluating customer churn prediction models. Both measures are subsequently applied to a case study to assess and compare the performance of customer churn prediction and uplift models. We find that uplift modeling outperforms predictive modeling and allows one to enhance the profitabil- ity of retention campaigns. The empirical results indicate that prescriptive analytics are superior to predictive analytics in the development of customer retention campaigns. Keywords: Analytics, Business applications, Prescriptive analytics, Uplift modeling, Customer churn prediction, Customer retention

⇤Corresponding author Email addresses: [email protected] (Floris Devriendt), [email protected] (Wouter Verbeke)

Preprint submitted to European Journal of Information Sciences April 9, 2018 1. Introduction

The term business analytics is used as a catch-all term covering a wide variety of what essentially are data-processing techniques. In its broadest sense, business analytics strongly overlaps with data science, statistics, and related fields such as artificial intelligence (AI) and [1]. Analytics is used as a toolbox containing a variety of instruments and methodologies allowing one to analyze data in support of evidence-based decision-making with the aim of enhancing eciency, ecacy, and, thus ultimately, profitability. Types of analytical tools, in increasing order, are descriptive, predictive, and prescriptive analytics. While descriptive analytics o↵er insight into current situations, predictive analytics allow one to explain complex relations between variables and to predict future trends. As such, predictive analytics o↵er more uses than descriptive analytics. Currently, prescriptive analytics are receiving more attention from practitioners and scientists in that they add further value by allowing one to simulate the future as a function of control variables to prescribe optimal settings for control variables. At the core of prescriptive analytics is uplift modeling, which is introduced below. In the experiments reported in this article, the use and performance of predictive and prescriptive analytics is thoroughly compared. Business analytics is being applied to an increasingly diverse range of well-specified tasks across a broad variety of industries. Popular examples include tasks related to credit scoring [2, 3], fraud detection [4], and customer churn prediction [5, 6], the latter being the application of interest in this article.

Customer churn prediction models are designed to predict which customers are about to churn and to accurately segment a customer base. This allows a company to target customers that are most likely to churn during a retention marketing campaign, thus improving the ecient use of limited resources for such a campaign, i.e., the return on marketing investment (ROMI), while reducing costs associated with churning [7]. Generally speaking, customer retention is profitable to a company because (1) attracting new clients costs five to six times more than retaining exist- ing customers [8–11]; (2) long-term customers generate more profits, tend to be less sensitive to competitive marketing activities, tend to be less costly to serve, and may generate new referrals through positive word-of-mouth processes, whereas dissatisfied customers might spread negative word-of-mouth messages [12–17]; and (3) losing customers incurs opportunity costs due to a reduc- tion in sales [18]. Therefore, a small improvement in customer retention can lead to a significant increase in profits [19].

However, it has been reported that marketing actions undertaken to retain customers may actually provoke the opposite behavior and may cause or motivate a customer to churn. As noted in Radcli↵e and Simpson [20], churn risk is highly correlated with customer dissatisfaction, and the goal in turn becomes to prevent a dissatisfied customer from actually leaving. Any attempt made to contact a dissatisfied customer with the goal of retaining him or her can actually hasten

2 the process and provoke the customer to leave earlier than expected [20]. Therefore, it is necessary to evaluate the e↵ectiveness of a retention campaign at the individual customer level. Predictive models fail to di↵erentiate between customers who respond favorably (i.e., who do not churn) to a campaign and customers who respond favorably on their own accord regardless of a campaign (i.e., who would not have churned in any case and who were not targeted by a campaign).

To address this shortcoming of predictive models, uplift modeling has recently been proposed as an alternative means of identifying customers who are likely to be persuaded by a promotional marketing campaign, rather than predicting whether customers are likely to respond to a promo- tional marketing campaign (which may or may not be the result of the campaign). Uplift modeling can be applied to identify customers who are likely to be retained through a retention campaign as an alternative to predicting whether customers are likely to churn [21]. More precisely, uplift modeling aims at establishing the net di↵erence in customer behavior resulting from a specific treat- ment a↵orded to customers, e.g., a reduction in the likelihood to churn with retention campaign targeting.

In this paper we aim to contrast customer churn prediction (CCP) and customer churn uplift (CCU) modeling for customer retention by comparing their performance when applied to an ex- perimental case study of the financial industry. To compare the performance of these approaches, a common evaluation procedure is applied. However, given the di↵erent forms of output that these models produce, to evaluate prediction and uplift models, di↵erent performance measures are used. In evaluating classification models and, more specifically, CCP models, the receiver operating char- acteristic (ROC) curve or lift curve are typically used. Performance can be expressed as the area under the ROC curve, as the top decile lift or as the (expected) maximum profit. In evaluating uplift models, the Qini curve and uplift per decile plots are typically used. Performance is typically reported in terms of the Qini index or top decile uplift. As the goal of customer churn modeling is to maximize ROMI, in Verbeke et al. [22], the authors introduce the maximum profit (MP) measure for evaluating CCP models. The MP measure calculates profit generated when considering the op- timal fraction of top-ranked customers according to the CCP model of a retention campaign. The MP measure allows one to determine the optimal model and fraction of customers to include, yield- ing a significant increase in profitability relative to that achieved when using statistical measures [22–24].

In this article, we extend the MP measure to evaluate the performance of CCU models, and we introduce the maximum profit for uplift (MPU) measure. Both the MP and MPU measure are then used to compare the performance of CCP and CCU logistic regression and random forest models through an experimental case study. Our main contributions are threefold: 1. We introduce an application of uplift modeling for customer retention. 2. We extend the maximum profit measure for evaluating uplift models.

3 3. We apply and compare CCP and CCU models through an experimental case study of the financial industry.

This paper is structured as follows. In Section 2, we first introduce customer churn prediction modeling before discussing uplift modeling as an alternative approach to predictive modeling. Then in Section 3, the MP measure for CCP models is defined and extended for application to customer churn uplift models. In Section 4, we describe the experimental design of the case study and then discuss the results of our experiments. Finally, in Section 5, conclusions are given.

2. Literature

In Section 2.1, customer churn prediction is introduced along with current standard approaches as described in the literature and adopted in industry. Then in Section 2.2, we describe uplift modeling and discuss the most prominent uplift modeling techniques and performance measures developed for evaluating uplift models.

2.1. Customer Churn Prediction

Customer churning, which is also referred to as customer attrition or customer defection, is defined as the loss or outflow of customers from the customer base [25]. In saturated markets, there are limited opportunities to attract new customers, and hence, retaining existing customers is considered essential to maintaining profitability. In the telecommunications industry, it is estimated that attracting a new customer costs five to six times more than retaining an existing customer [8, 22, 26]. Established customers are more profitable due to the lower costs required to serve them, and a sense of brand loyalty they have developed over time renders them less likely to churn. Loyal customers tend to be satisfied customers who also serve as word-of-mouth advertisers, referring new customers to a given company. In the context of a financial institution as described in the case study given in Section 4, a definition of churning is naturally present in the data, i.e., contract termination.

Churning is typically addressed by developing a prediction model, i.e., a classification model such as a logistic regression or a decision tree model. Such a model estimates for each customer the probability for a customer to churn during a subsequent period of time. Then, it is straightforward to o↵er customers presenting the highest churn probability with an incentive, e.g., a discount or another promotional o↵er, to encourage them to extend their contracts or to keep their accounts active. In other words, customers who are susceptible to churn can be targeted through a retention campaign. Accurate predictions are perhaps the most apparent goal of developing a customer churn

4 prediction model, but determining reasons for (or at least indicators of) churning is also invaluable to a company. Comprehensible models can o↵er novel insight into correlations between customer behavior and the propensity to churn [7], allowing management teams to address factors leading to churning and to target the customers before they decide to churn.

Numerous classification techniques have been adopted for churn prediction, including traditional statistical methods, such as logistic regression [27, 28], and non-parametric statistical models, such as k-nearest neighbor models [29], decision trees [30, 31], ensemble methods [5, 32], support vector machines [33–35] and neural networks [22, 36, 37]. Additionally, social network analysis has been successfully adopted to predict customer churning [6, 26, 38] in addition to survival analysis, which can be used to estimate the timing of customer churning. These analyses focus on the profitability of a customer’s lifetime rather than on a single moment in time [39, 40]. For an extensive literature review on customer churn prediction modeling, one may refer to Verbeke et al. [7]. The results of an extensive benchmarking experiment are reported in Verbeke et al. [22], confirming the no-free- lunch theorem in application to customer churn prediction, with no modeling technique consistently winning across the various datasets. Recent work on customer churn prediction is covered in [6, 41– 44].

2.2. Uplift Modeling

In Section 2.2.1, a brief introduction of uplift modeling is provided. In Section 2.2.2, an overview of the most prominent uplift modeling techniques is presented. Finally, in Section 2.2.3, evaluation measures for assessing the performance of uplift models are discussed.

2.2.1. Definition

Generally speaking, uplift modeling aims to establish the net e↵ect of applying a treatment to an outcome. When adopted for customer relationship management and, more specifically, for response modeling, uplift models are developed to di↵erentiate between customers who respond favorably as a result of being targeted with a campaign, i.e., being treated, and customers who respond favorably on their own accord regardless of being targeted with a campaign or not. Note that the outcome, i.e., response, may mean that a customer begins or continues to purchase a product or service in the case of acquisition and retention modeling, respectively, or that a customer purchases more or additional products or services in the case of up-sell or cross-sell modeling, respectively.

Conceptually, a customer base can be divided into four categories along two dimensions, as shown in Figure 1[1, 45]:

5 1. Sure Things. Customers who would always respond. Targeting Sure Things does not generate additional returns but does generate additional costs, i.e., the fixed costs of contacting a customer and possibly a cost related to a financial incentive o↵ered to targeted customers. 2. Lost Causes. Customers who would never respond (regardless of which campaign is used). Lost Causes will not generate additional revenues, yet they do generate additional costs, although these are lower than the costs of Sure Things. Lost Causes do not take advantage of financial incentives o↵ered, which are an additional cost that we do take into account for Sure Things. 3. Do-Not-Disturbs. Customers who would not respond only because they are exposed to a cam- paign. They will respond when not targeted but will not respond when they are. For example, populations targeted for retention e↵orts can have an adverse reaction, for example, withdrawing from the delivered product or service. Including Do-Not-Disturbs in a campaign thus generates no additional revenues but comes with considerable additional costs. 4. Persuadables. Customers who respond only because they have been exposed to a campaign. They respond only when contacted and cause a campaign to generate additional revenues, and as such, a net profit after the subtraction of costs is generated by including other types of customers.

Figure 1: The four theoretical classes.

The aim of uplift modeling is to allow for the targeting of Persuadables while avoiding Do-not- Disturbs. From the perspective of a retention campaign, the last category is sometimes referred to as sleeping dogs since, as long as these customers are not disturbed, they will continue to provide benefits. Note that this classification is campaign dependent. It is possible for a customer to be a Lost Cause when a campaign o↵ers a 5% discount for a next purchase, whereas that same customer is a Persuadable when a campaign o↵ers a 20% discount. In others words, the classification is dependent on the treatment given when all customers are treated similarly. In general, uplift modeling involves determining optimal settings for control variables such as a dummy treatment variable denoting whether a customer is targeted with a campaign to optimize a result or e↵ect. Although in most, not to say all, studies on uplift modeling for marketing applications, control variables are typically dummy variables that indicate whether a customer is targeted or not, these

6 control variables may also be continuous or multivalue categorical variables, e.g., the discount or contact channel. Clearly, uplift modeling may have applications to various settings and to many di↵erent purposes. In this article, we focus on the goal of customer retention.

Uplift modeling for customer retention has been documented in relatively few cases. Radcli↵e and Simpson [20] applied uplift modeling to two retention campaigns in telecommunications. One campaign was highly e↵ective and profitable, whereas the other was counter-productive and incurred losses. However, both campaigns improved conditions in terms of reducing churn as a result of uplift modeling. In Guelman et al. [21], the authors applied uplift modeling to an insurance setting. Although the treatment almost had a neutral impact on retention for the entire sample, they found that the impact of the treatment might have been di↵erent for specific subgroups of the customer base. They reported that uplift modeling allowed them to predict the expected change in probability for a customer to switch to another company when targeted by a campaign. To the best of our knowledge, no cases presented in the literature report on the application of uplift modeling to the context of a financial institution and to churning in reference to financial services.

We assume that a sample of customers is randomly divided into two groups defined as the treatment group and control group. A customer is either in the treatment group, i.e., is influenced by the campaign, or in the control group, i.e., is not influenced by the campaign. As a formal definition, let X be a vector of inputs or predictor variables, X = X ,...,X , and let Y be the { 1 n} binary outcome variable, Y 0, 1 , that responds favorably or not. Let the treatment variable T 2{ } denote whether a customer belongs to the treatment group, T = 1, or to the control group, T = 0. P denotes the probability as estimated by the model. Uplift is then defined for customer i with characteristics xi as the probability of responding favorably (i.e., yi = 1) when treated (i.e., for ti = 1) minus the probability of responding favorably when not treated (i.e., for ti = 0): U(x ):=P (y =1x ; t = 1) P (y =1x ; t = 0) (1) i i | i i i | i i In essence, uplift is the di↵erence in outcome, e.g., customer behavior, resulting from a treatment. Uplift modeling aims at estimating uplift as a function of treatment and customer characteristics.

2.2.2. Techniques

Uplift modeling techniques can be grouped into data preprocessing and data processing ap- proaches. The first group adopts traditional predictive analytics in an adapted setup for learning an uplift model, whereas the second group applies adapted predictive analytics in developing uplift models. Table 1 shows the most prominent and frequently adopted approaches to uplift modeling.

Data preprocessing approaches. Data preprocessing approaches include transformation approaches, which redefine a target variable, and approaches that allow one to estimate uplift by defining and

7 Preprocessing Transformation [46, 47] Variable Selection Procedure [48, 49] Data processing Two-Model Approach [50, 51] Direct Estimation [52–54]

Table 1: Most frequently cited uplift modeling approaches. selecting additional predictor variables.

The first group of data preprocessing approaches defines a transformed target variable that is estimated. A customer cannot be assigned to any of the four groups shown in Figure 1, as this information is unavailable and cannot be retrieved. However, we do know whether a customer formed part of the treatment or control group and whether a customer responded or not. Hence, customers can be assigned to any of the following four groups: treatment responders, treatment non-responders, control responders and control non-responders. Techniques such as Lai’s approach [46, 47] and pessimistic uplift modeling [55] make use of these four groups to define a transformed target variable and as such transform the uplift modeling problem into a binary classification problem. Any standard classification technique can be applied to this problem to yield an uplift model.

The second group of data preprocessing approaches extends the set of predictor variables of the model to allow for the estimation of uplift. In Kane et al. [47], Lo [48], an uplift modeling approach that groups the treatment and control group into a single sample for response model estimation is proposed. A dummy variable is introduced to denote the group of origin for each customer. A model is then developed from the original predictor variables, the added dummy variable and interaction variables between the predictor and dummy variables. Subsequently, any predictive modeling approach can be adopted with this setup yielding an uplift model.

Data processing approaches. Among the data processing approaches, further di↵erentiations can be made between indirect and direct estimation approaches.

Indirect estimation approaches include the two-model or naive approach, which is a simple and intuitive approach to uplift modeling. Two separate predictive models can be identified: one for the treatment group, MT , and one for the control group, MC with both estimating the probability of a given response. The aggregated uplift model, MU , then subtracts the response probabilities

8 resulting from both models to find the uplift:

M = M M . (2) U T C This approach has the benefit of being straightforward to implement, and similar to both data pre- processing approaches, it allows one to adopt standard predictive modeling approaches. However, the approach only appears to apply to the simplest of cases [50, 51]. As the main disadvantage of the two models, they are built independent of one another; as such, they are not necessarily aligned in terms of the predictor variables included, and the errors of independent estimates can reinforce one another, generating significant errors in uplift estimates [53].

Alternatively, uplift can be directly modeled. Given the group-based nature of the uplift mod- eling problem, the most frequently adopted direct estimation approaches are tree-based methods that subsequently split the population into smaller segments. Uplift tree approaches are adapted from well-known algorithms such as classification and regression trees (CART) [56] or chi-square automatic interaction detection (CHAID) methods [57] applying modified splitting criteria and pruning approaches. Examples of tree-based uplift modeling approaches include the significance- based uplift trees proposed in Radcli↵e and Surry [53], decision trees making use of information theory-inspired splitting criteria presented in Rzepakowski and Jaroszewicz [54], and uplift random forests and causal conditional trees introduced in Guelman et al. [58].

2.2.3. Evaluation

Despite its clear potential to improve upon predictive modeling outcomes, uplift modeling su↵ers from a lack of intuitive evaluation measures for assessing the performance of a model either in an absolute sense or relative to other models. In the literature on uplift modeling, either charts are used [48, 51] or an adapted version of the Gini coecient is used, i.e., the Qini coecient [47, 52].

In predictive modeling, evaluation metrics typically assess the error of point-wise estimates made by a model on each observation for a hold-out test set by comparing observed and actual outcomes and by summarizing observed errors. However, in uplift modeling, the actual outcome estimated, i.e., uplift, is unobserved. As a customer cannot occupy both the treatment and control group, i.e., cannot be treated and not-treated simultaneously, uplift (or, as indicated above, the group shown in Figure 1 to which a customer belongs) cannot be observed for an individual customer. Therefore, evaluation measures adopted in predictive modeling cannot be used. Instead, uplift can be observed and uplift estimates can be evaluated by comparing di↵erences in the behaviors of equivalent subgroups of the treatment and control groups [53].

The performance of an uplift model can be visualized by plotting the cumulative di↵erence in response rates between treatment and control groups as a function of the selected fraction x of

9 customers ranked by the uplift model from high to low values of estimated uplift. This curve is referred to as the cumulative uplift, as cumulative incremental gains, or as the Qini curve [52]. The cumulative di↵erence in the response rate is measured as the absolute or relative number of additional favorable responders, i.e., respectively expressed as the additional number in terms of the number of favorable responders or the fraction of the total population. Note that performance is evaluated by comparing groups of observations rather than individual observations. An example is provided in Figure 2.

Figure 2: Incremental gains or Qini curve.

The Qini metric is a measure related to the Qini curve. It measures the area between the Qini curve of the uplift model and the Qini curve of the baseline random model (see Figure 2). The measure is an adapted version of the Gini metric, which in turn is related to the Gini curve (or the cumulative gains curve) [52].

Although uplift models are developed and adopted to enhance the eciency and returns of retention campaigns, few articles assess the costs and benefits of applying uplift modeling. In Hansotia and Rukstales [59], the authors compute the incremental return on investment at the gross margin level. These gross profits are then considered as a contribution to the overhead and to net profits [59]. In Radcli↵e [52], the incremental profit is calculated by multiplying the incremental response rate by the total profit. In the next section, we analyze involved costs and benefits and develop a profit-driven approach to evaluating customer churn uplift models.

3. Maximum Profit Measure

The first part of this section discusses the Maximum Profit measure, as introduced in Verbeke et al. [22]. In the second part, we extend the Maximum Profit measure for evaluating customer

10 churn uplift models to compare customer churn prediction and uplift models in Section 4.

3.1. Customer churn prediction models

To maximize the eciency and returns of a retention campaign, typically, a limited fraction of customers is targeted and given an incentive to remain loyal. Therefore, customer churn prediction models are often evaluated using, for instance, the top-decile lift measure, which only accounts for the performance of the model regarding the top 10% of customers with the highest predicted probabilities of churning. Recently, Verbeke et al. [22] demonstrated that from a profit-centric point of view, using the top decile lift can be expected to result in sub-optimal model selection. Instead, the maximum profit (MP) measure is proposed, which calculates the profit generated when considering the optimal fraction of top-ranked customers using a model for a retention campaign. In essence, this measure evaluates a customer churn prediction model at the cuto↵leading to the maximum profit rather than at an arbitrary cuto↵such as 10%. Performance is expressed as the profit in monetary units that can be achieved by adopting the model for selecting customers to be targeted in a retention campaign. This, as shown by the authors, can yield a significant increase in profits relative to adopting statistical measures and to selecting a fixed fraction of customers to be targeted in an arbitrary or expert-based manner [22].

To calculate profits generated from a retention campaign, we analyze the dynamic process of customer flows in a company (Figure 3). The process involves customers entering by subscribing to the services of an operator and then leaving by churning. To prevent customers from churning, retention campaigns can be established with the goal of retaining customers.

A customer churn prediction model allows one to rank customers based on their probability of churning from high to low. This subsequently allows one to select and target customers with the highest probability of churning from a campaign. The profits of a retention campaign can then be formulated as [27]:

⇧=N↵[(b c c )+(1 )( c ) contact incentive contact +(1 )( c c )] (3) contact incentive A with ⇧denoting the profit generated by the campaign, N denoting the number of customers included in the customer base, ↵ denoting the fraction of the customer base targeted by the retention campaign and o↵ered an incentive to remain loyal, denoting the fraction of true would-be churners of customers targeted by the retention campaign, denoting the fraction of targeted would-be churners deciding to remain due to incentives (i.e., the success rate of incentives), b denoting the benefits of the retained customers, ccontact denoting the cost of contacting a customer to o↵er him

11 Figure 3: Visual representation of Neslin et al. [27]’s formula. Colors indicate matching parts of the formula and schematics.

or her the incentive, cincentive denoting the cost of the incentive to the firm when a customer accepts and stays and A denoting the fixed administrative costs of running the churn management program.

The profit formula can be divided into five parts. We highlight each part below and in the visual representation of the formula given in Figure 3:

(a) N↵ denotes that the costs and profits of a retention campaign are solely related to customers targeted by the campaign (with the exception of A).

(b) (b c c ) denotes the profits generated by the campaign, i.e., the reduction contact incentive in lost revenues minus the cost of the campaign b c c by retaining a fraction contact incentive of would-be churners of the fraction of correctly identified would-be churners included in the campaign.

(c) (1 )( c ) reflects part of the costs of the campaign, i.e., the cost of including correctly contact identified would-be churners who were not retained.

(d) (1 )( c c ) reflects part of the costs of the campaign, i.e., the cost resulting contact incentive from targeting non-churners through the campaign; these customers are expected to take advantage of the incentive o↵ered to them through the retention campaign.

12 (e) A reflects the fixed administrative cost that reduces the overall profitability of a retention campaign.

As noted in Neslin et al. [27], reflects the capacity of the predictive model to identify would-be churners and can be expressed as:

= 0 (4) with 0 denoting the fraction of all operator customers who will churn and denoting the lift, i.e., how much more the fraction of customers targeted by the retention campaign is likely to churn than all the operator’s customers. Rearranging the terms of Equation 3 leads to:

⇧=N↵ [b + c (1 )] c c A (5) { incentive 0 incentive contact} Neslin et al. [27] uses the direct link between lift and profitability as a means to motivate the use of lift as a performance measure for evaluating customer churn prediction models. Verbeke et al. [22], however, shows that using the lift of an arbitrary cuto↵as a performance measure may lead to suboptimal model selection and, from a business perspective, a significant loss of profitability. Therefore, the authors propose a profit-centric performance measure called the maximum profit (MP) defined as: MP = max(⇧) (6) ↵

To calculate the maximum profit measure, a pragmatic approach is typically adopted [23, 60, 61], and two assumptions are made: (1) the retention rate is independent of the included fraction of customers ↵, and (2) the benefit of a retained customer, b, is independent of the included fraction of customers ↵. These assumptions allow one to use a constant value for both and b in Equation 5, and given the lift curve of the classification model that represents the relation between the lift and ↵, the maximum of Equation 5 over ↵ can be calculated in a straightforward manner [22].

3.2. Customer churn uplift models

None of the existing evaluation metrics for assessing the performance of an uplift model take into account the costs and benefits of adopting the uplift model or express the performance of an uplift model in terms of profitability. To evaluate customer churn uplift models and to compare CCP and CCU models, we apply the profit formula of Equation 3 to the uplift modeling case. First, consider how uplift models are di↵erent from their predictive counterparts. Customer churn prediction models only make use of treatment group data to build a model, whereas uplift models consider both treatment and control group data in developing a model. Additionally, in evaluating an uplift model, the profit measure should consider both the treatment and control group.

13 Consider the left-hand side of Figure 4. The campaign-targeted population consists of three groups: the fraction of true would-be churners (), (1) some of whom accept the o↵er ( or the blue part), whereas (2) others do not (1 or the red part). The third group includes (3) the fraction of those who will not churn (1 or the yellow part) who are erroneously included in the campaign. is the campaign retention rate, which is fixed and to be estimated but in principle unknown.

Figure 4: On the left-hand side is a di↵erent visualization of Neslin’s formula focusing on the campaign-targeted population. On the right-hand side is a translation of the uplift modeling scenario.

For a translation toward uplift modeling, consider the right-hand side of Figure 4. For the treatment group, the same division of groups applied for CCP was used. The control group not targeted by the campaign includes two groups: the fraction of would-be churners and the fraction who will not churn. Although the addition of a control group generally adds an extra layer of complexity, in terms of the profit formula, it also contributes more useful knowledge. The di↵erence is the value of the uplift or the reduction in the churn rate. Whereas must be estimated C T through CCP modeling, in CCU modeling, it is observed, rendering the formula an instrument that is easier to use and generating more reliable estimates of profits. In CCU modeling, represents the uplift, i.e., the reduction or di↵erence in the churn rate between the two groups (i.e., = ). C T Additionally, we can fine-tune the parameter of CCP as C and T . In turn, Equation 3 can be rewritten as follows:

⌃=N↵[( )(b c c )+ ( c ) C T contact incentive T contact +(1 )( c c )] (7) C contact incentive A

14 Reformulating the above formula to place more emphasis on costs and benefits leads to:

⌃=N↵[( ) b c (1 ) c ] A (8) C T ⇤ contact T ⇤ incentive

and are the churn rates of the control and treatment groups, respectively, and (1 ) C T T is the non-churn rate. As in CCP modeling, the goal is to maximize what we denominate the maximum profit uplift (MPU):

MPU = max(⌃) (9) ↵

The MPU measure expresses the performance of a CCU model in terms of the profits gener- ated per customer of the customer base when targeting the optimal fraction of customers ranked according to the estimated uplift of the CCU model.

4. Experiments

The objective of the experiments presented in this section is to compare and contrast customer churn prediction modeling and customer churn uplift modeling outcomes. In the first part of this section, information on the experimental setup is provided, i.e., the dataset and experimental methodology. In Section 4.2, the results of the experiments are presented, and these results are discussed and analyzed in Section 4.2.3.

4.1. Experimental Design

4.1.1. Dataset

The dataset used to conduct the experiments was obtained from a financial institution. It consists of records containing customer information, including a variable on churning and a variable determining whether a customer was targeted by a retention campaign. Table 2 provides detailed information on the dataset. The retention campaign was targeted at a treatment group, for which, in the subsequent period, a churn rate of 13.25 % was observed. For the control group not targeted by the retention campaign, a significantly higher churn rate of 25.52 % was observed. The overall uplift achieved is thus equal to 12.27 %, showing that the campaign had a significant impact on customer behavior. The dataset includes 162 variables, including socio-demographic information and usage and activity data. Both the treatment and control groups are randomly split into training and test sets, respectively including 2/3 and 1/3 of the records.

15 The data Type of organization Financial institution Total observations 200 903 Total variables 162 Control group observations 118 809 Control group churn rate 25.52 % Treatment group observations 82 094 Treatment group churn rate 13.25 % Overall Uplift: 12.27 %

Table 2: Information on the dataset obtained from a private financial institution in Belgium.

4.1.2. Methodology

Unlike conventional predictive modeling, uplift modeling manages two groups, a treatment group and a control group. In testing such techniques and measures, we consider two scenarios. The first scenario tests the classic profit measure, MP (Equation 3), which considers the population to be part of one group, preferably a group that has not had any prior contact with campaigns before. Therefore, to test the MP, we only use the test set of the control group. The second scenario assumes the existence of both a treatment group and a control group, and thus, the MPU metric is applied to the results of test sets of both the treatment and control groups. This is also illustrated in Figure 5.

Figure 5: Scenario 1 focuses solely on the control group, whereas Scenario 2 considers both the treatment and control group.

Two modeling techniques are used to develop and compare CCP and CCU models, i.e., logistic regression and random forests. Both techniques can be used in a straightforward manner to develop predictive models and have been adapted for developing uplift models [21, 48]. The use of these two

16 techniques in our experiments is motivated by their popularity. Logistic regression is the standard predictive modeling approach used in industrial settings across various applications and is a typical benchmark approach used in experimental studies and scientific research. Additionally, logistic regression facilitates the interpretation of the resulting model and typically performs well [62, 63]. Random forests are state of the art in the field of business analytics, have broad applications to industry settings and to scientific research, and typically achieve strong outcomes [62, 63]. Note that a full scale benchmarking study of a broad range of predictive and uplift modeling techniques for various datasets falls beyond the scope of this study.

To execute our experiments, the open source R-package was used [64]. For CCP, the adopted implementations stem from the R-package Caret1. For CCU, adapted implementations were applied to take into account and contrast customer behaviors of the treatment and control groups, although the underlying learning approach used is similar to the counterparts for predictive modeling. For logistic regression, Lo’s approach was applied [48] to our experiments to draw comparisons with standard logistic regression, whereas uplift random forests proposed in Guelman et al. [21] were applied via the ’uplift’ R-package 2.

4.2. Results and discussion

4.2.1. Scenario 1 - Evaluation with Maximum Profit

In this section, we present the results of our experiments on the first scenario as detailed above, in which the maximum profit measure (Equation 6) is used to evaluate the performance of logistic regression and random forest CCP and CCU models. Figure 6 shows the profit curves generated from the experiments on scenario 1. As no information was provided by financial institutions regarding actual values of the cost and benefit parameters of the MP measure, three di↵erent sets of parameters were used to calculate the MP. The three sets of values used are based on values reported in the literature [22, 27, 60, 65] and represent situations presenting low, medium and high profitability resulting from retaining a customer. A full sensitivity analysis on the impact of the adopted cost and benefit parameters falls beyond the scope of this article and is recognized as a topic for further research. However, the results of experiments conducted on the three sets of parameters are fully consistent, and thus, conclusions drawn from the experiments appear to hold irrespective of the assumed parameter values.

The profit curves presented in Figure 6 show the profits generated per customer of the customer base for a fraction x of customers targeted by the retention campaign. These values are ranked

1http://caret.r-forge.r-project.org 2https://cran.r-project.org/web/packages/uplift/index.html

17 per the estimated probability of churning for the CCP models (black profit curves) and ranked per the estimated uplift of the CCU models (blue profit curves). Note that the profits generated per customer of the customer base, rather than the total profit, are plotted because the profit generated per customer is independent of the size of the customer base yet is still proportional to the total profit. Therefore, the profit curves denote the optimal fraction of customers to be targeted by the retention campaign, giving rise to the maximum profit.

4.2.2. Scenario 2 - Evaluation with Maximum Profit Uplift

Figure 7 shows the profit curves generated from the experiments following the second scenario detailed in the previous section, with the results of the CCP and CCU models evaluated using the novel maximum profit for uplift (MPU) modeling measure. The MPU measure includes both treatment and control group observations in the test set of the evaluation.

4.2.3. Discussion

The evaluation based on the maximum profit measure for scenario 1 clearly shows that CCP modeling yields higher profits than CCU modeling for logistic regression and random forests. The profit curves shown in Figure 6 of the CCP models exceed the profit curves of the CCU models. We conclude that CCP models are superior in predicting which customers will churn. This makes sense, as CCP models are trained with the objective of predicting churn patterns, whereas uplift models are designed to predict uplift events, i.e., the impact of a retention campaign on the propensity to attrite. Many of the churners predicted by the CCP model may be customers who have made up their minds and who have decided to churn, and they therefore cannot be retained when targeted by a retention campaign. A successful uplift model will therefore rank these customers at the bottom of the ranking, i.e., will estimate their uplift as close to zero as the impact of the retention campaign will be nil. In other words, many churners identified by a CCP model can be expected to be reflected as Sure Things as defined in Section 2.2.1. When using MP as the evaluation measure, it is natural to see that the measure values CCP more than CCU because the MP assumes a constant retention rate and as such does not acknowledge the true retention rate that can be observed when a control group is present. The MP measure is additionally linearly related to the lift or to the number of churners of the fraction of selected customers. As CCP models can be expected to detect more churners than CCU models, this further contributes to the superiority of the MP of CCP models over CCU models.

For the results of the experiments based on scenario 2, in using the MPU measure to evaluate the performance of the CCP and CCU models, we find that the CCU models outperform the CCP models. This can be attributed to the fact that the uplift models e↵ectively succeed in predicting

18 Profit Per Customer − Classic Profit Measure − Log. Regression Profit Per Customer − Classic Profit Measure − Random Forest 15 15 10 10 5 Profit per Customer Profit per Customer 5

Classic CP Classic CP Uplift CP Uplift CP 0 0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Percentage captured Percentage captured

(a) Logistic Regression (b) Random Forests

b = 200, cincentive = 10, ccontact =1 b = 200, cincentive = 10, ccontact =1

Profit Per Customer − Classic Profit Measure − Log. Regression Profit Per Customer − Classic Profit Measure − Random Forest 4 6 5 3 4 2 3 2 1 Profit per Customer Profit per Customer

Classic CP 1 Classic CP Uplift CP Uplift CP 0 0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Percentage captured Percentage captured

(c) Logistic Regression (d) Random Forests

b = 100, cincentive = 10, ccontact =1 b = 100, cincentive = 10, ccontact =1

Profit Per Customer − Classic Profit Measure − Log. Regression Profit Per Customer − Classic Profit Measure − Random Forest 0 Classic CP 0 Classic CP Uplift CP Uplift CP − 5 − 5 − 10 − 10 − 20 − 20 Profit per Customer Profit per Customer − 30 − 30

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Percentage captured Percentage captured

(e) Logistic Regression (f) Random Forests

b = 100, cincentive = 50, ccontact =1 b = 100, cincentive = 50, ccontact =1

Figure 6: Profit curves for logistic regression (left) and random forest (right) CCP (black curves) and CCU (blue curves) models based on the first scenario using the MP measure with three sets of cost and benefit parameters.

19 Profit Per Customer Profit Per Customer 25 15 20 10 15 10 5 Profit per Customer Profit per Customer 5

Classic CP Classic CP Uplift CP Uplift CP 0 0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Percentage captured Percentage captured

(a) Logistic Regression (b) Random Forests

b = 200, cincentive = 10, ccontact =1 b = 200, cincentive = 10, ccontact =1

Profit Per Customer Profit Per Customer 3.0 10 2.5 8 2.0 6 1.5 4 1.0 Classic CP Profit per Customer Profit per Customer Uplift CP 2 Classic CP 0.5 Uplift CP 0 0.0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Percentage captured Percentage captured

(c) Logistic Regression (d) Random Forests

b = 100, cincentive = 10, ccontact =1 b = 100, cincentive = 10, ccontact =1

Profit Per Customer Profit Per Customer 0

Classic CP 0 Classic CP Uplift CP Uplift CP − 5 − 5 − 10 − 10 − 20 − 20 Profit per Customer Profit per Customer − 30 − 30

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Percentage captured Percentage captured

(e) Logistic Regression (f) Random Forests

b = 100, cincentive = 50, ccontact =1 b = 100, cincentive = 50, ccontact =1

Figure 7: Profit curves for logistic regression (left) and random forest (right) CCP (black curves) and CCU (blue curves) models for the second scenario using the MPU measure with three sets of cost and benefit parameters.

20 (a) Logistic Regression (b) Random Forests

Figure 8: Churn rate as a function of the selected fraction of customers for CCP and CCU logistic regression (a) and random forest (b) models. uplift, which is accounted for in the MPU measure as discussed in Section 3.2. When ranking both the treatment and control groups of the test set following the predicted probabilities of churning in evaluating the CCP models and the estimated uplift in evaluating the CCU models, the observed reduction in churn rates for the selected fraction x of customers can be used to measure the profits generated from a retention campaign when selecting customers based on the CCP and CCU models. As CCP models rank customers who are likely to churn but who cannot necessarily be retained high on the list (which is exactly what the CCU model predicts), CCP models appear to be less profitable than CCU models. The objective of CCU models is to ascribe high scores to customers who are likely to both churn and be retained, and as such, they achieve higher degrees of uplift and profitability.

Note that it is only possible to calculate the MPU measure when both a control group and a treatment group are present, which, in traditional customer churn prediction setups, is not the case. The MP measure still has use in such settings, although uplift modeling is clearly a superior paradigm with respect to developing a data-driven customer retention program.

In addition, although of less importance here, our profit curves show that random forests gen- erally perform better than logistic regressions. Random forest models can generates higher profits per customer and higher profits from a smaller fraction of customers targeted by a retention cam- paign. This result is no surprise and is fully in line with the results of benchmarking experiments conducted across various business domains as reported in the literature [2, 22].

To further analyze and gain insight into the results of the experiments, we plot the churn rate as a function of the fraction of customers selected x for the CCP and CCU logistic regression and

21 Cummulative Uplift Cummulative Uplift 1.0 1.0 Classic CP Uplift CP 0.8 0.8

Classic CP

0.6 0.6 Uplift CP 0.4 0.4 Cummulative Uplift Cummulative Uplift Cummulative 0.2 0.2 0.0 0.0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Percentage captured Percentage captured

(a) Logistic Regression (b) Random Forests

Figure 9: Cumulative uplift in the function of the fraction of customers selected for the CCP and CCU logistic regression (a) and random forest (b) models. random forest models in the left and right panels of Figure 8, respectively. These figures show that the cumulative churn rate for the CCP models always exceeds the churn rate of the CCU model. This indicates that the CCP model captures more churners than the CCU model for the same fraction x of selected customers. We also plot the uplift in the function for customers selected x for the CCP and CCU logistic regression and random forest models in the left and right panels of Figure 9, respectively. Here, it can be seen that the CCU model achieves a stronger degree of uplift, i.e., a stronger reduction of the churn rate for the treatment group than for the control group relative to the CCP model.

Figures 8 and 9 confirm the above analysis and support the conclusion that CCP models tend to detect numerous Sure Things, i.e., customers who decide to churn and who cannot be retained by a campaign, whereas CCU models aim to and succeed at avoiding targeting Sure Things and instead allow one to treat Persuadables to realize a stronger decrease in the churn rate and yield an increased return. This conclusion holds for both the logistic regression and random forests techniques. For uplift churn prediction modeling, this is also seems to be the case. Further research may extend these experiments to the use of alternative predictive and uplift modeling techniques.

A next step in the analysis of our results involves an assessment of similarities in the rankings of customers when scored using the various models developed. For this purpose, Spearman’s rank order correlation and Kendall’s tau are calculated for the first and second scenarios and are reported in Tables 3 and 4. We find that overall, the rankings resulting from the various models substantially di↵er. For the first scenario, the strongest similarity is found between logistic regression models of the CCP and CCU setups and between random forests of the CCP and CCU setups, both presenting

22 the maximum observed value of Spearman’s rank order correlation of 0.52. The weakest similarities are found between the CCP logistic regression model and the CCU random forest model, with a Spearman’s rank order correlation value of 0.31 found for the first scenario and a value of only 0.17 found for the second scenario. Between the CCP random forest and CCU logistic regression models, we find a Spearman’s rank order correlation of 0.23 for the first scenario and of 0.24 for the second scenario. These model setups are the most dissimilar, as both di↵er in terms of predictive versus uplift and logistic regression versus random forests model. For the second scenario, which considers the control set, we find that the rankings of the CCP and CCU logistic regression models become more similar, whereas similarities in the Spearman’s rank order correlations for rankings of the CCP and CCU random forest model decrease to 0.35, which is equal to the correlation between CCU logistic regression and CCU random forest models. Overall, these results confirm that CCU and CCP models identify di↵erent customers to target through campaigns.

Scenario 1 - Spearman Scenario 1 - Kendall’s tau SC1.CCP.GLM SC1.CCP.RF SC1.CCU.DTA SC1.CCU.RF SC1.CCP.GLM SC1.CCP.RF SC1.CCU.DTA SC1.CCU.RF SC1.CCP.GLM 1 0.46 0.52 0.31 1 0.32 0.39 0.21 SC1.CCP.RF 0.46 1 0.23 0.52 0.32 1 0.15 0.37 SC1.CCU.DTA 0.52 0.23 1 0.39 0.39 0.15 1 0.27 SC1.CCU.RF 0.31 0.52 0.39 1 0.21 0.37 0.27 1

Table 3: Spearman’s rank order correlation and Kendall’s tau, scenario 1.

Scenario 2 - Spearman Scenario 2 - Kendall’s tau SC2.CCP.GLM SC2.CCP.RF SC2.CCU.DTA SC2.CCU.RF SC2.CCP.GLM SC2.CCP.RF SC2.CCU.DTA SC2.CCU.RF SC2.CCP.GLM 1 0.47 0.59 0.17 1 0.32 0.44 0.12 SC2.CCP.RF 0.47 1 0.24 0.35 0.32 1 0.16 0.25 SC2.CCU.DTA 0.59 0.24 1 0.35 0.44 0.16 1 0.24 SC2.CCU.RF 0.17 0.35 0.35 1 0.12 0.25 0.24 1

Table 4: Spearman’s rank order correlation and Kendall’s tau, scenario 2.

The observations of the previous analysis on the similarities in the rankings of customers are confirmed when plotting the overlap in selected customers. Figure 10 shows the percentage of overlap in customers when comparing di↵erent cuto↵s of the ranking between di↵erent techniques and methodologies. For the first scenario, both the logistic regression and random forest models present an overlap of 0.55 and 0.46 at 5% for the CCP and CCU settings, respectively (Figure 10a). In comparing logistic regression and random forest models of each setting, we find a lower overlap of 0.30 and 0.38 at 5%, respectively (Figure 10b), revealing clear di↵erences between the targeted customers. For the second scenario, the logistic regression models show a 0.52 overlap between the CCP and CCU setups at a cuto↵of 5%. The largest di↵erence is found between random forest models of the CCP and CCU setups with an overlap of 0.21 at 5% (Figure 10c). This latter observation combined with the MPU results (Figure 7b) again clearly shows that the CCU setup ranks customers more profitably than the CCP setup. Finally, Figure 10d shows overlaps of 0.31 and 0.29 at 5% for techniques of the CCP and CCU setups, respectively. This only confirms the presence of a significant di↵erence in rankings when comparing logistic regression and random forest

23 models.

Scen 1 − Overlap − CCP vs CCU Scen 1 − Overlap − Technique comparison

1.00 1.00

0.75 0.75

Overlap between Overlap between

0.50 glm and dta 0.50 ccp: glm and rf

Overlap rf and urf Overlap ccu: rf and urf

0.25 0.25

0.00 0.00

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Cutoff Cutoff

(a) Overlap comparison across (b) Overlap comparison across methodologies of scenario 1. techniques of scenario 1. Scen 2 − Overlap − CCP vs CCU Scen 2 − Overlap − Technique comparison

1.00 1.00

0.75 0.75

Overlap between Overlap between

0.50 glm and dta 0.50 ccp: glm and rf

Overlap rf and urf Overlap ccu: rf and urf

0.25 0.25

0.00 0.00

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Cutoff Cutoff

(c) Overlap comparison across (d) Overlap comparison across methodologies of scenario 2. techniques of scenario 1.

Figure 10: The overlap in customers observed when comparing di↵erent cuto↵s of the ranking of setups (10a and 10c) and techniques (10b and 10d) for scenarios 1 and 2.

In previous studies on uplift modeling, the performance of uplift models has been reported to be unstable, i.e., to heavily vary across test folds when adopting an n-fold cross validation setup [63]. Therefore, the experiments reported above were repeated five times to assess the impact of randomly splitting the dataset into training and test sets. The results generated across the five repetitions were found to be highly stable, supporting the validity of the presented findings.

24 5. Conclusions and future research

In this article, we introduce a novel, profit-driven evaluation measure for assessing the per- formance of customer churn uplift models. The measure extends the maximum profit measure for customer churn prediction models and allows one to compare customer churn prediction and customer churn uplift models. The measure assesses the performance of a customer churn uplift model in terms of the profits per customer of a customer base generated when targeting the optimal fraction of customers with the highest uplift scores for a retention campaign. The optimal fraction of customers to be targeted is determined by maximizing the profits generated from a retention campaign and is indirectly determined based on the costs and benefits related to a retention cam- paign and on retained customers who are about to churn. The results of a real-life case study of the financial industry are presented. An experimental study was developed and conducted to assess the added value of prescriptive over predictive analytics. The results indicate that customer churn uplift models outperform customer churn prediction models. Uplift models appear to be able to identify so-called persuadables and to therefore yield higher returns than customer churn prediction models that top-rank and thus select lost causes, i.e., customers who are about to churn but who will not be retained when targeted through a retention campaign. These results strongly imply that uplift modeling serves as an improved tool for practical customer churn modeling applications. Future studies will focus on generalizing the newly introduced MPU measure, as there is a need for powerful and application-oriented evaluation measures for assessing the performance of uplift models. This study also opens doors to the development of profit-driven uplift modeling approaches that aim at maximizing profitability.

References

[1] W. Verbeke, B. Baesens, C. Bravo, Profit Driven Business Analytics: A Practitioner’s Guide to Transforming Big Data into Added Value, John Wiley & Sons, 2017.

[2] S. Lessmann, B. Baesens, H.-V. Seow, L. C. Thomas, Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research, Eur. J. Oper. Res. 247 (2015) 124–136.

[3] S. Maldonado, J. P´erez, C. Bravo, Cost-based feature selection for support vector machines: An application in credit scoring, Eur. J. Oper. Res. 261 (2017) 656–665.

[4] B. Baesens, V. Van Vlasselaer, W. Verbeke, Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection, John Wiley & Sons, 2015.

[5] K. Coussement, K. W. De Bock, Customer churn prediction in the online gambling industry: The beneficial e↵ect of , J Bus Res 66 (2013) 1629–1636.

[6] M. Oskarsd´ottir,´ C. Bravo, W. Verbeke, C. Sarraute, B. Baesens, J. Vanthienen, Social network analytics for churn prediction in telco: Model building, evaluation and network architecture, Expert Syst. Appl. 85 (2017) 204–220.

25 [7] W. Verbeke, D. Martens, C. Mues, B. Baesens, Building comprehensible customer churn prediction models with advanced rule induction techniques, Expert Syst. Appl. 38 (2011) 2354–2364.

[8] A. D. Athanassopoulos, Customer satisfaction cues to support market segmentation and explain switching behavior, J Bus Res 47 (2000) 191 – 207.

[9] C. B. Bhattacharya, When customers are members: Customer retention in paid membership contexts, J Acad Market Sci 26 (1998) 31.

[10] M. R. Colgate, P. J. Danaher, Implementing a customer relationship strategy: The asymmetric impact of poor versus excellent execution, J Acad Market Sci 28 (2000) 375–387.

[11] E. Rasmusson, Complaints can build relationships., Sales & Marketing Management 151 (1999) 89–89.

[12] M. Colgate, K. Stewart, R. Kinsella, Customer defection: a study of the student market in ireland, International Journal of Bank Marketing 14 (1996) 23–29.

[13] J. Ganesh, M. J. Arnold, K. E. Reynolds, Understanding the customer base of service providers: An examination of the di↵erences between switchers and stayers, J Mark 64 (2000) 65–87.

[14] R. W. Mizerski, An attribution explanation of the disproportionate influence of unfavorable information, J Consum Res 9 (1982) 301–310.

[15] F. F. Reichheld, Learning from customer defections (1996).

[16] Stum, D. L, A. Thiry, Building customer loyalty, Train Dev J 45 (1991) 34–36.

[17] V. A. Zeithaml, L. L. Berry, A. Parasuraman, The behavioral consequences of service quality, J Mark 60 (1996) 31–46.

[18] R. T. Rust, A. J. Zahorik, Customer satisfaction, customer retention, and market share, J Retailing 69 (1993) 193 – 215.

[19] D. V. den Poel, B. Larivi`ere, Customer attrition analysis for financial services using proportional hazard models, Eur. J. Oper. Res. 157 (2004) 196 – 217. Smooth and Nonsmooth Optimization.

[20] N. J. Radcli↵e, R. Simpson, Identifying who can be saved and who will be driven away by retention activity., Journal of Telecommunications Management 1 (2008).

[21] L. Guelman, M. Guillen, A. M. Perez-Marin, Random forests for uplift modeling: An insurance customer retention case, in: K. J. Engemann, A. M. Gil-Lafuente, J. Merigo (Eds.), Modeling and Simulation in Engineering, Economics and Management, volume 115 of Lecture Notes in Business Information Processing, Springer Berlin Heidelberg, 2012, pp. 123–133. URL: http://dx.doi.org/10.1007/978-3-642-30433-0_13. doi:10.1007/978-3-642-30433-0_13.

[22] W. Verbeke, K. Dejaeger, D. Martens, J. Hur, B. Baesens, New insights into churn prediction in the telecom- munication sector: A profit driven approach, Eur. J. Oper. Res. 218 (2012) 211 – 229.

[23] T. Verbraken, C. Bravo, R. Weber, B. Baesens, Development and application of consumer credit scoring models using profit-based classification measures, Eur. J. Oper. Res. 238 (2014) 505 – 513.

[24] F. Garrido, W. Verbeke, C. Bravo, A robust profit measure for binary classification model evaluation, Expert Syst. Appl. 92 (2018) 154–160.

26 [25] B. Baesens, Analytics in a big data world: The essential guide to data science and its applications, John Wiley &Sons,2014.

[26] W. Verbeke, D. Martens, B. Baesens, Social network analysis for customer churn prediction, Appl. Soft Comput. 14 (2014) 431–446.

[27] S. A. Neslin, S. Gupta, W. Kamakura, J. Lu, C. H. Mason, Defection detection: Measuring and understanding the predictive accuracy of customer churn models, Journal of Marketing Research 43 (2006) 204–211.

[28] J. Burez, D. V. den Poel, Handling class imbalance in customer churn prediction, Expert Syst. Appl. 36 (2009) 4626 – 4636.

[29] P. Datta, B. Masand, D. R. Mani, B. Li, Automated cellular modeling and prediction on a large scale, Artificial Intelligence Review 14 (2000) 485–502.

[30] C.-P. Wei, I.-T. Chiu, Turning telecommunications call details to churn prediction: a data mining approach, Expert Syst. Appl. 23 (2002) 103 – 112.

[31] E. Lima, C. Mues, B. Baesens, Domain knowledge integration in data mining using decision tables: case studies in churn prediction, J Oper Res Soc 60 (2009) 1096–1106.

[32] A. Lemmens, C. Croux, Bagging and boosting classification trees to predict churn, Journal of Marketing Research 43 (2006) 276–286.

[33] S. Lessmann, S. Voß, A reference model for customer-centric data mining with support vector machines, Eur. J. Oper. Res. 199 (2009) 520–530.

[34] Z.-Y. Chen, Z.-P. Fan, M. Sun, A hierarchical multiple kernel support vector machine for customer churn prediction using longitudinal behavioral data, Eur. J. Oper. Res. 223 (2012) 461–472.

[35] J. Moeyersoms, D. Martens, Including high-cardinality attributes in predictive models: A case study in churn prediction in the energy sector, Decis Support Syst 72 (2015) 72–81.

[36] W.-H. Au, K. C. C. Chan, X. Yao, A novel evolutionary data mining algorithm with applications to churn prediction, IEEE Trans. Evol. Comput. 7 (2003) 532–545.

[37] S.-Y. Hung, D. C. Yen, H.-Y. Wang, Applying data mining to telecom churn management, Expert Syst. Appl. 31 (2006) 515 – 524.

[38] K. Dasgupta, R. Singh, B. Viswanathan, D. Chakraborty, S. Mukherjea, A. Nanavati, A. Joshi, Social ties and their relevance to churn in mobile telecom networks, in: Proceedings of the 11th international conference on Extending Database Technology: Advances in database technology, EDBT ’08, 2008, pp. 697–711.

[39] B. Baesens, T. Van Gestel, M. Stepanova, D. Van den Poel, J. Vanthienen, Neural network survival analysis for personal loan data, J Oper Res Soc 56 (2005) 1089–1098.

[40] A. Backiel, B. Baesens, G. Claeskens, Predicting time-to-churn of prepaid mobile telephone customers using social network analysis, J Oper Res Soc 67 (2016) 0.

[41] A. Keramati, R. Jafari-Marandi, M. Aliannejadi, I. Ahmadian, M. Moza↵ari, U. Abbasi, Improved churn prediction in telecommunication industry using data mining techniques, Appl. Soft Comput. 24 (2014) 994 – 1012.

27 [42] A. Amin, S. Anwar, A. Adnan, M. Nawaz, K. Alawfi, A. Hussain, K. Huang, Customer churn prediction in the telecommunication sector using a rough set approach, Neurocomputing 237 (2017) 242 – 254.

[43] K. Coussement, S. Lessmann, G. Verstraeten, A comparative analysis of data preparation algorithms for customer churn prediction: A case study in the telecommunication industry, Decis Support Syst 95 (2017) 27 – 36.

[44] B. Zhu, B. Baesens, A. Backiel, S. K. L. M. vanden Broucke, Benchmarking sampling techniques for imbalance learning in churn prediction, J Oper Res Soc 69 (2018) 49–65.

[45] N. Radcli↵e, Generating incremental sales: Maximizing the incremental impact of cross-selling, up-selling and deep-selling through uplift modelling, Stochastic Solutions Limited (2007).

[46] L. Lai, S. F. U. (Canada)., Influential Marketing: A New Direct Marketing Strategy Addressing the Existence of Voluntary Buyers, Canadian theses on microfiche, Simon Fraser University (Canada), 2006. URL: https: //books.google.be/books?id=5EvSuAAACAAJ.

[47] K. Kane, V. S. Y. Lo, J. Zheng, True-lift modeling: Comparison of methods, J Market Analytics 2 (2014) 218–238.

[48] V. S. Y. Lo, The true lift model: A novel data mining approach to response modeling in database marketing, SIGKDD Explor. Newsl. 4 (2002) 78–86.

[49] K. Larsen, Generalized naive bayes classifiers, SIGKDD Explor. Newsl. 7 (2005) 76–81.

[50] D. M. Chickering, D. Heckerman, A decision theoretic approach to targeted advertising, in: Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, UAI’00, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2000, pp. 82–88. URL: http://dl.acm.org/citation.cfm?id=2073946.2073957.

[51] B. Hansotia, B. Rukstales, Incremental value modeling, Journal of Interactive Marketing 16 (2001) 35–46.

[52] N. J. Radcli↵e, Using control groups to target on predicted lift: Building and assessing uplift models, Direct Market J Direct Market Assoc Anal Council 1 (2007) 14–21.

[53] N. J. Radcli↵e, P. D. Surry, Real-world uplift modelling with significance-based uplift trees, White Paper TR-2011-1, Stochastic Solutions (2011).

[54] P. Rzepakowski, S. Jaroszewicz, Decision trees for uplift modeling with single and multiple treatments, Knowl Inf Syst 32 (2012) 303–327.

[55] A. Shaar, T. Abdessalem, O. Segard, Pessimistic uplift modeling, ACM SIGKDD (2016).

[56] L. Breiman, J. Friedman, C. J. Stone, R. A. Olshen, Classification and regression trees, CRC press, 1984.

[57] G. V. Kass, An exploratory technique for investigating large quantities of categorical data, Applied statistics (1980) 119–127.

[58] L. Guelman, M. Guillen, A. M. P´erez-Mar´ın, Optimal personalized treatment rules for marketing interventions: A review of methods, a new proposal, and an insurance case study, Working Papers 2014-06, Universitat de Barcelona, UB Riskcenter, 2014. URL: http://ideas.repec.org/p/bak/wpaper/201406.html.

[59] B. Hansotia, B. Rukstales, Direct marketing for multichannel retailers: Issues, challenges and solutions, Journal of Database Marketing 9 (2002) 259–266.

28 [60] T. Verbraken, W. Verbeke, B. Baesens, A novel profit maximizing metric for measuring classification performance of customer churn prediction models, IEEE Trans Knowl Data Eng 25 (2013) 961–973.

[61] T. Verbraken, W. Verbeke, B. Baesens, Profit optimizing customer churn prediction with bayesian network classifiers, Intell. Data Anal. 18 (2014) 3–24.

[62] K. Dejaeger, W. Verbeke, D. Martens, B. Baesens, Data mining techniques for software e↵ort estimation: A comparative study, IEEE Trans. Softw. Eng. 38 (2012) 375–397.

[63] F. Devriendt, W. Verbeke, A literature survey and experimental evaluation of the state-of-the-art in uplift modeling: a stepping stone towards the development of prescriptive analytics, Big Data (2018). Submitted in December 2017.

[64] R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Com- puting, Vienna, Austria, 2013. URL: http://www.R-project.org/.

[65] J. Burez, D. Van den Poel, CRM at a pay-TV company: Using analytical models to reduce customer attrition by targeted marketing for subscription services, Expert Syst. Appl. 32 (2007) 277–288.

29