Credit Scoring Using Genetic Programming
Total Page:16
File Type:pdf, Size:1020Kb
NOVA IMS Credit Scoring Using Genetic Programming David Micha Horn Advisor: Professor Leonardo Vanneschi Internship report presented as partial requirement for obtaining the Master s degree in Advanced Analytics NOVA Information Management School Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa CREDIT SCORING USING GENETIC PROGRAMMING by David Micha Horn Internship report presented as partial requirement for obtaining the Master’s degree in Advanced Analytics. Advisor: Leonardo Vanneschi November 2016 ACKNOWLEDGEMENTS “I am not going to thank anybody, because I did it all on my own” — Spike Milligan, on receiving the British Comedy Award for Lifetime Achievement I would like to thank my advisor Professor Leonardo Vanneschi who sparked my interest in Genetic Programming and gave me important remarks on the thesis. Furthermore, I would like to express my gratitude to Dr. Klaus-Peter Huber who gave me the opportunity todo an internship and write my thesis with Arvato Financial Solutions. My thank also goes to my advisor Sven Wessendorf for giving me guidance and the very necessary suggestions for improvements in this thesis. Special thanks to Dr. Karla Schiller for introducing and explaining the process of credit scoring to me. Additionally, I would like to thank her and Dr. Markus Höchststötter for letting me take part in their project, and thus providing me with a topic for this thesis. I gratefully acknowledge the support given by Achim Krauß and Hannes Klein towards my understanding of the data used in this thesis as well as Isabell Schmitt whose knowledge of administrative mechanisms and resources within Arvato Financial Solutions proved extremely beneficial for my work. Finally, I would like to thank everyone at NOVA IMS and Arvato Financial Solutions I met and worked with. i ABSTRACT Growing numbers in e-commerce orders lead to an increase in risk management to prevent default in payment. Default in payment is the failure of a customer to settle a bill within 90 days upon receipt. Frequently, credit scoring is employed to identify customers’ default probability. Credit scoring has been widely studied and many different methods in different fields of research have been proposed. The primary aim of this work is to develop a credit scoring model as a replacement for the pre risk check of the e-commerce risk management system risk solution services (rss). The pre risk check uses data of the order pro- cess and includes exclusion rules and a generic credit scoring model. The new model is supposed to work as a replacement for the whole pre risk check and has to be able to work in solitary and in unison with the rss main risk check. An application of Genetic Programming to credit scoring is presented. The model is developed on arealworld data set provided by Arvato Financial Solutions. The data set contains order requests processed by rss. Results show that Genetic Programming outperforms the generic credit scoring model of the pre risk check in bothclas- sification accuracy and profit. Compared with Logistic Regression, Support Vector Machines and Boosted Trees, Genetic Programming achieved a similar classificatory accuracy. Furthermore, the Genetic Programming model can be used in combination with the rss main risk check in order to create a model with higher discriminatory power than its individual models. KEYWORDS Risk Management; Credit Scoring; Genetic Programming; Machine Learning; Optimization ii INDEX List of Figures iv List of Tables v 1 Introduction 1 2 Theoretical Framework 4 3 risk solution service 10 4 Research Methodology 14 4.1 Dataset . 17 4.2 Genetic Programming . 21 4.2.1 Initialization . 22 4.2.2 Evaluation Criteria . 23 4.2.3 Selection . 25 4.2.4 Genetic Operators . 26 4.3 Applying Genetic Programming . 28 4.4 Calibration . 31 5 Results 34 5.1 Discriminatory Power of GP and Comparison to other Classifier . 35 5.2 Discriminatory Power of GP in Collaboration with the Credit Agency Score . 38 5.2.1 Varying Threshold . 38 5.2.2 Fixed Threshold . 40 6 Conclusion and Discussion 43 7 Limitations and Recommendations for Future Works 45 8 References 47 9 Appendix 53 iii LIST OF FIGURES 1 Relationship between Default Probability and Scores . 4 2 Example distributions of goods and bads . 4 3 rssSystem......................................... 10 4 rss Risk Management Services . 11 5 Risk Check . 11 6 Prescore Behavior in Score Range . 12 7 Prescore Distribution of Goods and Bads . 13 8 Overall Work Flow . 16 9 Discretization Plot . 20 10 Example Syntax Tree . 21 11 GP Process . 22 12 Confusion Matrix . 23 13 Example ROC curve . 24 14 Genetic Operators . 27 15 Median Fitness over all Runs . 28 16 Model with highest Fitness Value . 30 17 Calibration Plots . 32 18 GP Distribution . 35 19 ROC curve and PR diagram . 36 20 Score Distribution of ABIC and GP . 38 21 ROC curve and PR diagram for ABIC and GP . 39 iv LIST OF TABLES 1 Discretization of Continuous Variables . 18 2 Input Variables . 19 3 GP Parameter Settings . 21 4 Occurrences of Variables . 29 5 Effect of IR on GP, SVM and BT . 36 6 Classifier Accuracies . 40 7 GP Order Value Comparison . 41 8 Deviation to GP . 42 v 1. INTRODUCTION E-commerce vendors in Germany have to deal with a peculiarity. Commonly used payment types like credit cards or PayPal represent a relatively low market share and the majority of orders are processed using open invoice instead. Using open invoice, a vendor bills his customers for goods and services only after delivery of the product. Thus, the vendor grants his customer a credit to the extend of the invoice. Usually, the vendor sends his customers an invoice statement as soon as the products are delivered or provided. The invoice contains a detailed statement of the transaction. Because the customer receives his purchase before payment, it is called ’open’ and once the payment is received the invoice is closed. Around 28% of customers in Germany choose open invoice as payment type, a market share that comes with reservations for the vendors (Frigge, 2016). Customers are used to pay using open invoice inanex- tend that it is a payment type selection requirement. Incidentally, around 30% of customers who abort the purchasing process do so because their desired payment type is unavailable and 68% of customers name open invoice the most desired payment type next to the ones already available (Fittkau & Maaß Consulting, 2014; Wach, 2011). However, open invoice is most prone to payment disruptions. Among the most common reasons vendors find that customers simply forget to settle the bill or delay the pay- ment on purpose but around 53% of vendors named insolvency as one of the most common reasons for payment disruption (Weinfurner et al., 2011). The majority of cases that conclude in default onpayment are assigned to orders with open invoice as payment type, with more than 8% of all orders defaulting (Seidenschwarz et al., 2014). E-commerce vendors find themselves in a conflict - offering open invoice decreases the order abortion rate but increases the default on payment rate. The former has a positive effect on revenue while the latter drives it down. Additionally, default on payment has a negative impact on the profit margin, due to costs arising through provision of services and advance payments to third parties. In order to break through this vicious circle vendors can fall back on a plethora ofmethods. Many tackle this conflict by implementing exclusion rules for customer groups they consider especially default-prone, for instance customers who are unknown to the vendor or whose order values are con- spicuously high. Another approach, used by more than 30% of e-Commerce vendors in Germany, is to fall back on external risk management services (Weinfurner et al., 2011). Risk management applications aim at detecting customers with a high risk of defaulting. Those appli- cations are frequently build using credit scoring models. Credit Scoring analyzes historical data inorder to isolate meaningful characteristics that are used to predict the probability of default (Mester, 1997). However, the probability of default is not an attribute of potential customers but merely a vendors’ assessment if the potential customer is a risk worth taking. Over the years, credit scoring has evolved from a vendors’ gut decision over subjective decision rules to a method based on statistically sound models (Thomas et al., 2002). Among the providers of risk management services in Germany is the risk management division of Arvato Financial Solutions (AFS), which provides a number of services including identification of individuals, evaluation of credit-worthiness and fraud recognition. The AFS databases consist of 21 million solvency observations totaling information of 7 million individuals in Germany, address and change in address information, bank account information as well as phone numbers, email addresses and device infor- mation. The risk management service for e-commerce is called risk solution service (rss). Rsscovers the entire order process and provides a number of services for every stage during the order process. The main services for evaluation of customers default probability is called risk check and split intoapre 1 risk check and a main risk check in order to satisfy different demands in an international environment. The main risk check is based on a credit agency score that uses country specific solvency information on individuals. Hence, the main risk check is inoperable in countries without accessible solvency infor- mation. Contrarily, the pre risk check was designed to be always operable and to ensure that therisk check returns an evaluation of the customers’ default probability. For this purpose, the pre riskcheck uses data transmitted by the customer during the order process. However, the pre risk check isbased on a generic model without statistical sound backup.