IT in Industry, Vol. 9, No.1, 2021 Published Online 18-Feb-2021

AN EFFECTIVE MACHINE LEARNING MODEL FOR CUSTOMER ATTRITION PREDICTIONIN MOTOR USING GWO-KELM ALGORITHM

Deepthi Das1, Raju Ramakrishna Gondkar2

1 Research Scholar, Department of Computer Science, CMR University, (India) Associate Professor, Department of Computer Science, CHRIST (Deemed to be University),(India) 2 Professor, Department of Computer Science, CMR University,(India)

ABSTRACT: categorizes the collected data based on the policy renewals. Then, the behavior of the The prediction of customers' churn is a customers are also investigated, so as to ease challenging task in different industrial the construction process training classifiers. sectors, in which the motor insurance With the help of Naive bayes algorithm, the industry is one of the well-known industries. behaviors of the customers on the upgraded Due to the incessant upgradation done in the policies are examined. Depending on the insurance policies, the retention process of dependency rate of each variable, a hybrid customers plays a significant role for the GWO-KELM algorithm is introduced to concern. The main objective of this study is classify the churners and non-churners by to predict the behaviors of the customers and exploring the optimal feature analysis. to classify the churners and non-churners at Experimental results have proved the an earlier stage. The Motor Insurance sector efficiency of the hybrid algorithm in terms dataset consists of 20,000 records with 37 of 95% prediction accuracy; 97% precision; attributes collected from the machine 91% recall & 94% F-score. learning industry. The missing values of the records are analyzed and explored via Keywords - Customer chum, Customer Expectation Maximization algorithm that

91 ISSN (Print): 2204-0595 Copyright © Authors ISSN (Online): 2203-1731 IT in Industry, Vol. 9, No.1, 2021 Published Online 18-Feb-2021

retention, Extreme Learning Machine the inefficient tools development that fails to (ELM), Grey Wolf Optimizer (GWO), predict the risk of customer churn. Motor Insurance Generally, each data item in the 1. Introduction communication environment consists of churn that can be identified with the help of Motor Industry is one of the well-known and background information. The churn of a leading industries that make use of data customer is defined as the process of ceasing mining techniques [1]. It is mainly on the the customer connection with the company fact that a vast amount of data with incessant [8]. In the context of an online increased rate of customers relies on the environment, customers turn out to be insurance companies. It poses a challenging churned, when the customer communication scenario for the researchers as well as with the service time has elapsed. This could industrialists. Since the customers have the bring revenue lost and the deprived ability to switch over among the available marketing costs due to the improper providers, the prediction of the churn rate in customer communication. Therefore, the motor insurance industry is not customer churn should be minimized which performing well in the aspects of customer’s is the ultimate aim for all online business requirements. Therefore, it is addressed in environments [9]. The main challenges in terms of churn prediction using machine the customer churn prediction systems are: i) learning techniques [2-4]. The insurance do not associate with the business objective industry is extensive and changes rapidly and ii) relying on feedback loops. These are according to the global market trends. the two reasons that have increased the Developing a trustable and reliable count of churners in the business Customer Relationship Management (CRM) environment. Customers who are willing to [5-7] in the motor industry is an interesting churn should be excluded from the study area which stimulates to deal with the campaign, so as to preserve the resources. real-time challenges. The prediction of churn in the motor insurance industry is a The rest of the paper is organized as significant research area for the fault follows: Section II presents the Related recognition and classification tasks. The rise work; Section III presents the Research of churners in several are due to Methodology; Section IV presents the 92 ISSN (Print): 2204-0595 Copyright © Authors ISSN (Online): 2203-1731 IT in Industry, Vol. 9, No.1, 2021 Published Online 18-Feb-2021

Experimental Results and Analysis and the key issues and the future directions of Section V presents the Conclusion. customer retention management. The role of customer retention was to increase the firm 2. Related work value by deducing the rate of subscription This section presents the reviews of existing based firms. By estimating the social- machine learning techniques designed for connectivity factors, the churns were churn predictions. The author in [10] has predicted. Campaign management was the discussed the market segmentation using most composed strategy that introduced the hierarchical self-organizing maps. It was win-back scenario on all customer retention. explored in a real-time dataset of market All the data were leveraged based on the segmentation in Taiwan. The classification retention values. expected error and the associated input Intelligent data analysis and their vectors were organized in a hierarchical way. approaches were studied to resolve the This helps to build a customer map that business issues [13]. Due to the effectively brings out the stage of customers enhancement in the globalization process, an under the churning stage. However, the in-depth analysis of churn in the sense of clustering process develops a class customer continuity management. It has imbalance issues over the multi-subscription stated the use and the merits of the customer data. Detection of potential prediction models with the business churners at the earlier stage is the solution. Some recommendations were complicating task due to the invasion of new given according to the scope of industry. subscription services [11]. It was explored The study has concluded that the scope of on the European pay TV company where a deep learning and ensemble learning might high rate of churners were observed. With even predict faster than the machine learning the help of markov chain concepts, the techniques. In [14], the author has reasons behind the customer attrition were introduced a random forest algorithm along found. These were then fed into RF with the use of regression analysis, in order classifiers, so as to effectively classify the to predict the profitability and retention of churners. Depending on the contact the customers. It was explored on the 100, variables, the technical problems were 000 customers taken from the warehouse of resolved. The author in [12] has discussed 93 ISSN (Print): 2204-0595 Copyright © Authors ISSN (Online): 2203-1731 IT in Industry, Vol. 9, No.1, 2021 Published Online 18-Feb-2021

European company. The simulated annealing process, the rate of false RF algorithm was applied on two aspects, positivity and the choice of features one is to classify the binary data and second complexity were significantly reduced. is to leverage the linear dependent variables. Though it has provided an optimal strategy Compared to the logistic regression models, to predict the churn, yet the time complexity the behavior analysis of the customers was and termination criteria are high which done by exploring the buying variables and brings a heavy computational effort. In [17], the profitability variables. Though it has a random rough subspace based neural suggested to increase the profitability, ensemble learning process was developed to however, the multi-objective based behavior improve the performance of the neural patterns can also be focussed. The author in network classifiers. It was combined with [15] has suggested the scope of random the set of feature reductions process, so as to forest algorithms in order to enhance the increase the effectiveness and the efficiency uplift modeling on customer retention cases. of the ensemble strategies. The data Here, a customer retention model was proportion of the classifier misleads the data designed to the customers who were actively relationship. One-class SVM with engaged. By the use of a random forest undersampling data [18] was studied to algorithm, the customer attrition rate was improve the churn prediction along with leveraged. Strong empirical evidence was insurance fraud. It is analyzed on the suggested to retain the customers for long- Automobile Insurance fraud dataset and term. Though it has easily predicted the Credit card customer churn dataset by retentional customers, the effects of attrition significantly improving the AUC that leads to customer retention are not performance. Compared to other ML recommended. techniques, the OC-SVM has predicted the churners with the least computational steps. Biological inspired approaches were However, the class imbalance and false introduced to predict the churn, so as to positive rate are higher, when the sensitivity yield the optimal performance. PSO and its learning values increases. variants [16] were introduced to effectively preprocess the data. With the help of In the context of B2B e-commerce optimized feature selection process and the companies, the SVM [19] was employed to

94 ISSN (Print): 2204-0595 Copyright © Authors ISSN (Online): 2203-1731 IT in Industry, Vol. 9, No.1, 2021 Published Online 18-Feb-2021

enhance the AUC parameter-selection From the conducted reviews, the technique. In the perspective of churn research problems pertaining in enhancing prediction, a hyperplane in high dimensional the CRM in the motor insurance industry are space was constructed using SVM that presented. In the angle of machine learning optimally discriminates between the techniques, the churns are predicted based churners and non-churners. It also on the customer association with each marginalized the different hyperplanes of subscription service. The problems can be the data classes. Though this model has viewed in three forms, namely, testing effectively predicted the churn and increased phase, training phase and the prediction the revenue costs, the subscription renewal phase. In training phase viewpoint, the service is not focussed. Subscription labels are assigned based on the available services are one of the challenging tasks in churn data, which develops a class dealing churn prediction. Initially, a churn imbalance during the prediction phases model was designed on the profile behavior leading to the degradation of classifying analysis. Grid search is performed on the accuracy. In prediction phase viewpoint, the obtained noise marketing data. Skewing data possibility of predicting the churners based is likely to increase the support vectors. on the retention policies which confuse the Thus, an optimized learning parameters with classifier in the context of multi-dimensional SVM has achieved 4.492 decile score than data variables. the logistic regression and random forests. 3. Research Methodology Some variable groups are dynamic over data. In [20], Support Vector Machine This section presents the proposed (SVM) was studied to analyze the customer classification scheme using GWO-KELM. churns in the case of credit cards. In order The proposed phases are: to increase the churn prediction accuracy, 3.1 Data preprocessing: the learning unit in SVM was redefining the feature space. It has achieved 94.7% of It is the first step that cleans the collected training accuracy and 82.4 % of testing data by introducing a sampling process. It accuracy, compared to the ANN and BPN. contains two steps, namely, E-Step and M- However, the various parameters related to Step. The aim of these two steps are to customer satisfaction are not considered. eliminate the irrelevant data to develop 95 ISSN (Print): 2204-0595 Copyright © Authors ISSN (Online): 2203-1731 IT in Industry, Vol. 9, No.1, 2021 Published Online 18-Feb-2021

better prediction model systems. Here, the is given as: concept of Expectation-Maximization algorithm is employed to estimate the likelihood behavior on the two sets of …..(3) random variables. iii) Then, the expected value of the log a) E-step: In this step, the unlabeled likelihood function is given as: class data is estimated by deriving the expectation on the missing value. The probability distribution is …..(4)

derived over all the considered iv) Discovering the parameters that attributes. maximizes the data is given as: b) M-Step: It derives the maximum likelihood for all considered attributes. …..(5) The above steps (i) to (iv) are The followings are the steps iteratively applied on the acquired data, until involved in preprocessing the acquired data: the data is eligible to develop prediction i) Let the data be labeled as D; Missing model systems. values as M and be the unknown 3.2 Feature extraction & Feature selection parameter. It is mathematically expressed as: process:

…..(1) From the preprocessed data, the behaviors of be the expression of likelihood function: the customers are analyzed, so as to select the relevant features for constructing

efficient classifiers. Here, the Naive Bayes …..(2) technique is employed to model the ii) For the considered variables, D, M and , behavior of the customers. Let us consider a the calculation of maximum likelihood is set of instances with a vector of features as, done by the marginal likelihood on the data. and the bayes theorem for the The estimation of the maximum likelihood

96 ISSN (Print): 2204-0595 Copyright © Authors ISSN (Online): 2203-1731 IT in Industry, Vol. 9, No.1, 2021 Published Online 18-Feb-2021

observed data is expressed as: GWO and then, it is used for classification purposes by introducing the KELM model. The kernel based extreme learning machine consists of three layers namely, input I, …..(6) output O and hidden H. Initially, the

Where, connection weights to the inputs are arbitrarily selected for the input and hidden The class probability is represented as layer. During the training process, the which is mentioned in training data. threshold of weights is being adjusted in the networks. An unique optimal solution is While observing the data, the probability of obtained by triggering the hidden layers the class is represented as, which is done by GWO algorithm. The Then, the data instances that has the observed output between the hidden layers is highest probability is given as: computed by the activation function which is expressed as:

…..(7)

With the help of the above classification rule, the probability is …..(8) estimated over the training data. Depending where,the input nodes with hidden on the obtained probability rate, the relevant nodes L are optimized using GWO. Inspired features of the churn prediction are from the hunting behavior of the wolf, the discovered. GWO algorithm is designed. The proposed steps of the GWO algorithm are: 3.3 Classification & Prediction process: a) It consists of four groups, namely, In this step, Grey Wolf Optimization Alpha ( ; Beta ( ) ; Delta ( ) and (GWO)- Kernel Extreme Learning Machine omega ( ) in which the 1st three (KELM) explains the process of groups determines the fitness criteria classification and prediction by considering and the 4th group determines the the optimal selection of attributes. The optimal position of the wolf. selected features are then optimized using b) The position of the wolf is

97 ISSN (Print): 2204-0595 Copyright © Authors ISSN (Online): 2203-1731 IT in Industry, Vol. 9, No.1, 2021 Published Online 18-Feb-2021

determined from below two iteratively in the position (X, Y). equations: The pseudocode of the GWO algorithm:

…..(9) Declaring the population Pi where, i = 1,2,...n

Initializing the a, A and C.

…..(10) The fitness criteria of the each searching process is estimated as, Where, P → 1st searching agent

The difference between the position and P → 2nd searching agent predator is represented as, P → 3rd searching agent The current iteration is given as t. While (iter< stopping criteria) The position of the prey is given as . For each searching agent

The position of the wolf is given as Alter the present the position of the searching agent by below equation, Above all, the vectors A and C are the most important parameters that determine the accuracy of the optimization process: End for

Alter A, a & C

…..(11) Again the fitness value is computed,

Altering P , P , P …..(12) iter= iter + 1

Where, End while

The vectors r1 and r2 are the random values Return P ranging from 0 to 1. Finally, the A and C are the According to the position of the class , regularization coefficient that triggers the the position of the features are updated position of the class according to the 98 ISSN (Print): 2204-0595 Copyright © Authors ISSN (Online): 2203-1731 IT in Industry, Vol. 9, No.1, 2021 Published Online 18-Feb-2021

optimized features. Mostly, the small Labels: Churners and Non- Churners weights are taken for the training procedure True positive (TP)- No. Of samples and thus, a minimized error function is correctly identified as abnormal achieved. Depending on the settings of the parameters, the stability of the churners False positive (FP)- No. Of samples behaviors are predicted with the least incorrectly identified as abnormal computational steps. True Negative (TN)- No. Of samples 4. Experimental results and analysis correctly identified as normal

This section presents the experimental setup False Negative (FN)- No. Of samples parameters and the performance measures incorrectly identified as normal. considered in this study. The achieved a) Recall: It is defined as the proportion of results are discussed in a detailed manner. real positive cases that are correctly The insurance sector database collected predicted positive. It is given as: from the machine learning repository is used Recall= (TP)/(TP+FN) for experimental purposes. The dataset is …..(13) collected over a period of a year and a half b) Precision: It is defined as the proportion that incorporate the details of each customer of predicted positives that are correctly real recorded throughout the specified time positives. It is given as: period. It is tested on Python, a high-level Precision= (TN)/(TN+FP) programming language. It consists of 37 …..(14) attributes and 20,000 records, in which only c) Prediction Accuracy: It is the most few of them are taken for constructing the intutional performance that measures the training classifier. The hybrid algorithm is correctly predicted observation to the total compared with the classical KELM. The observations. It defines the ability of performance metrics analyzed in this study distinguishing normal and abnormal cases. It are accuracy, precision, recall and F-score is given as: which is displayed via confusion matrix in Accuracy= (TP+ TN)/ (TP+TN+FP+FN) Fig.1. …..(15) d) F-score: It conveys the balance between Premises: precision and recall which is given in the

99 ISSN (Print): 2204-0595 Copyright © Authors ISSN (Online): 2203-1731 IT in Industry, Vol. 9, No.1, 2021 Published Online 18-Feb-2021

below equation: F-score: 2* ((precision * recall)/ (precision + recall) …..(16)

Since the proposed objectives consist of 2 classes, a confusion matrix is displayed as follows:

(a) Proposed (b) Existing

Figure2. ROC analysis between the proposed and existing algorithm

(a) Proposed (b) Existing

Figure1. Confusion matrix between

the proposed and existing algorithm

100 ISSN (Print): 2204-0595 Copyright © Authors ISSN (Online): 2203-1731 IT in Industry, Vol. 9, No.1, 2021 Published Online 18-Feb-2021

Table 1. Achieved results - Proposed and Existing Existing Proposed

f1- f1- Precisi Reca scor Precisi Reca scor on ll e on ll e

Churn ers (0) 0.98 0.11 0.2 1 0.82 0.9

Non- churne 0.8 0.9 rs (1) 0.81 1 9 0.95 1 7 (b) Proposed hybrid GWO- KELM algorithm Overal Figure 3. Comparative graphs between l existing and proposed classifiers. accura cy 0.812 0.95 In order to obtain high-quality data, the records are analyzed thoroughly using an

EM algorithm. By estimating the probability distribution rate on the set of random variables, it has significantly reduced the redundancies and inconsistencies of the records. It has grouped the policy, policy holder, vehicle, drivers, premiums, claims and claims cost from different sources. In that, missing values are observed on the two important variables, proposal date and date of birth. Based on the computed active years, the

authorization details of the drivers are (a) Existing KELM algorithm determined. It is resolved by the computed

E-step and M-step. In general, the prediction 101 ISSN (Print): 2204-0595 Copyright © Authors ISSN (Online): 2203-1731 IT in Industry, Vol. 9, No.1, 2021 Published Online 18-Feb-2021

of customer churn turns out to be a complex discarded. The other features such as task, because of the imbalanced data. It is Customer ID, motor policies, vehicle type, also observed that most of the policy holders policy claims, and years insured count and stay in a company for a long-term, which the premium are considered. These selected also poses another complex task in this features have explored what attrition might study. The role of the EM algorithm lead to become a churner. The hybrid concerning to provide balanced datasets are algorithm stated that the vehicle type, achieved. The ROC analysis presents the policy type and the engine size are the achievements of balanced data which is attrition of the churners. While, a customer shown in Fig. 2. holding more than a policy might also become a churner with respect to the policy Since the analysis of customer renewal. Likewise, the customers who have behaviors is vague and uncertain, the the highest rate of NCB are less likely to concept of NB algorithm is applied to find become churners. Compared to the existing out the similarities of the churners. It is done KELM classifier, the proposed classifier has by building an association between the obtained 95% prediction accuracy. random set of variables. These values are then imputed in order to find out the similar 5. Conclusion behaviors of the customers who are willing In this paper, we have introduced a hybrid to become churners. Each behavior of the algorithm, named GWO-KELM that customers is then fed into the hybrid GWO- predicts the churns rate in the motor KELM algorithm. In Spite of choosing the insurance industry using their behavior significant features, the most optimal analysis. Initially, the insurance sector features are selected by using GWO model. database is acquired and it is preprocessed The Fig. 3 &Table 1 presents the achieved using Expectation-Maximization algorithm results between existing KELM and which explores the maximum likelihood rate proposed GWO-KELM. For example, among the available features. The behavior Gender, DOB, sales channel and customer analysis of their customers are analyzed segments are eliminated. Some customers using Naive Bayes algorithm that associates have renewed the policies frequently and the relationship between the insurers and the thus, duration of the policies are also insurance policies. The outcome of the NB 102 ISSN (Print): 2204-0595 Copyright © Authors ISSN (Online): 2203-1731 IT in Industry, Vol. 9, No.1, 2021 Published Online 18-Feb-2021

algorithm determines the dependency rate [4] G. Xia and W. Jin, Model of between the churners behavior and the customer churn prediction on support available policies.The feature with vector machine,Systems dependency rate is fed into the GWO- Engineering-Theory & Practice, KELM classifier which extracts the optimal 28(1), 2008, 71-77. features and finally, classifies the churners [5] U. Yabas, H. C. Cankaya and T. and non-churners. Experimental study has Ince,Customer Churn Prediction for suggested efficient predictor variables with Telecom Services, 2012 IEEE 36th the least computational time and efforts. Annual Computer Software and Applications Conference, Izmir, References Turkey, 2012,358-359.

[1] A. Oyeniyi and A. [6] V. Yeshwanth, V. V. Raj and S. BAdeyemo,Customer churn analysis Mohan, Evolutionary churn in the banking sector using data prediction in mobile networks using mining techniques,African Journal of hybrid learning,Proceedings of the Computing and ICT, 8(3), 2015, 24th International Florida Artificial 166–174. Intelligence Research Society, [2] N. Kamalraj and A. Malathi, FLAIRS, Florida, USA, 2011. Applying data mining techniques in [7] M. J. Kim, S. J.Koh and Y. J. Park, telecom churn prediction, A Study on Retaining Existing International Journal of Advanced Customers in the Korean High-Speed Research in Computer Science and Internet Service Market, 2006 Software Engineering, 310, 2013, Technology Management for the 363–370. Global Future - PICMET 2006 [3] Clara-Cecilie Gunter, Conference, Istanbul, Turkey, 2006, IngunnFrideTvete, KjerstiAas, Geir 1970-1976. Inge Sandnes, and OrnulfBorgan, [8] J. Cao, H. Zhang and Q. Zheng, Modelling and predicting customer Retaining Customers by Data Mining: churn from an insurance A Telecomunication Carrier's Case company,Scandinavian Actuarial Study in China, 2010 International Journal, 2014(1), 2014, 58–71. Conference on E-Business and E- 103 ISSN (Print): 2204-0595 Copyright © Authors ISSN (Online): 2203-1731 IT in Industry, Vol. 9, No.1, 2021 Published Online 18-Feb-2021

Government, Guangzhou, 2010, [13] D. L. García, À. Nebot and A. 3141-3144. Vellido, Intelligent data analysis [9] K. Coussement andD. V. D.Poel, approaches to churn as a business Churn prediction in subscription problem: a survey,Knowledge and services: An application of support Information Systems, 51(3), 2017, vector machines while comparing 719-774. two parameter-selection [14] B. Larivièreand D. V. techniques,Expert Systems with N.Poel,Predicting customer retention Applications, 34(1), 2008, 313-327. and profitability by using random [10] C. Hung and C. F. Tsai, Market forests and regression forests segmentation based on hierarchical techniques,Expert Systems with self-organizing map for markets of Applications, 29(2), 2005, 472-484. multimedia on demand,Expert [15] L. Guelman, M. Guillén and A. Systems with Applications, 34(1), M.Pérez-Marín, Random forests for 2008, 780-787. uplift modeling: an insurance [11] Burez, J., & Van den Poel, D. CRM customer retention case,In: at a pay-TV company: Using Engemann K.J., Gil-Lafuente A.M., analytical models to reduce customer Merigó J.M. (eds) Modeling and attrition by targeted marketing for Simulation in Engineering, subscription services,Expert Economics and Management. MS Systems with Applications, 32(2), 2012. Lecture Notes in Business 277-288 (2007). Information Processing, vol 115. [12] E. Ascarza, S. A. Neslin, O. Netzer, Springer, Berlin, Heidelberg, 2012, Z. Anderson, P. S. Fader, S. Gupta, 123-133. B. G. S. Hardie, A. Lemmens, B. [16] J. Vijaya and E. Sivasankar, An Libai, D. Neal, F. Provost and R. efficient system for customer churn Schrift, In pursuit of enhanced prediction through particle swarm customer retention management: optimization based feature selection Review, key issues, and future model with simulated directions,Customer Needs and annealing,Cluster Computing, Solutions, 5(1-2), 2018, 65-81. 22, 2019, 10757–10768.

104 ISSN (Print): 2204-0595 Copyright © Authors ISSN (Online): 2203-1731 IT in Industry, Vol. 9, No.1, 2021 Published Online 18-Feb-2021

[17] W. Xu, S. Wang, D. Zhang, D and B. churn prediction and marketing Yang, Random rough subspace retention strategies. An application based neural network ensemble for of support vector machines based insurance fraud detection, In on the AUC parameter-selection Computational Sciences and technique in B2B e-commerce Optimization (CSO), 2011 Fourth industry,Industrial Marketing International Joint Conference on Management, 62, 2017, 100-107. Computational Sciences and [20] S. Kim, K. S. Shin and K. Park,An Optimization, Yunnan, China, 2011, application of support vector 1276-1280. machines for customer churn [18] G. G. Sundarkumar, V. Ravi and V. analysis: Credit card case, In: Wang Siddeshwar, One-class support L., Chen K., Ong Y.S. (eds) vector machine based undersampling: Advances in Natural Computation. Application to churn prediction and ICNC 2005. Lecture Notes in insurance fraud detection, 2015 Computer Science, vol 3611. IEEE International Conference on Springer, Berlin, Heidelberg, 2005, Computational Intelligence and 636-647 Computing Research (ICCIC), Madurai, India, 2015, 1-7. [19] N. Gordini and V. Veglio, Customers

105 ISSN (Print): 2204-0595 Copyright © Authors ISSN (Online): 2203-1731