A Comparative Performance Assessment of Ensemble Learning for Credit Scoring
Total Page:16
File Type:pdf, Size:1020Kb
mathematics Article A Comparative Performance Assessment of Ensemble Learning for Credit Scoring Yiheng Li * and Weidong Chen College of Management and Economics, Tianjin University, Tianjin 300072, China; [email protected] * Correspondence: [email protected] Received: 5 September 2020; Accepted: 29 September 2020; Published: 13 October 2020 Abstract: Extensive research has been performed by organizations and academics on models for credit scoring, an important financial management activity. With novel machine learning models continue to be proposed, ensemble learning has been introduced into the application of credit scoring, several researches have addressed the supremacy of ensemble learning. In this research, we provide a comparative performance evaluation of ensemble algorithms, i.e., random forest, AdaBoost, XGBoost, LightGBM and Stacking, in terms of accuracy (ACC), area under the curve (AUC), Kolmogorov–Smirnov statistic (KS), Brier score (BS), and model operating time in terms of credit scoring. Moreover, five popular baseline classifiers, i.e., neural network (NN), decision tree (DT), logistic regression (LR), Naïve Bayes (NB), and support vector machine (SVM) are considered to be benchmarks. Experimental findings reveal that the performance of ensemble learning is better than individual learners, except for AdaBoost. In addition, random forest has the best performance in terms of five metrics, XGBoost and LightGBM are close challengers. Among five baseline classifiers, logistic regression outperforms the other classifiers over the most of evaluation metrics. Finally, this study also analyzes reasons for the poor performance of some algorithms and give some suggestions on the choice of credit scoring models for financial institutions. Keywords: credit scoring; ensemble learning; baseline classifiers; comparative assessment 1. Introduction Charging interest on loan is the main business income way of financial institutions and in the meanwhile, granting loans has great contribution in socioeconomic operation. However, unreasonable lending will cause a huge loss given default, and hence to large losses at many financial institutions. Emerging and developing economies urgently need to strike a careful balance between promoting growth through lending loan and preventing credit risk caused by excessive releasing according to the Global Economic Prospects [1] from the World Bank in 4 June 2019. Credit scoring has been one of the primary approaches for principal institutions to evaluate credit risk and make rational decisions [2]. The purpose of credit scoring model is to solve problems of classifying loan customers into two categories: good customers (those are predicted to be fully paid within the specified period) and bad customers (those predicted to be default). Good customers are more likely to repay their loans on time, which will bring benefits to financial institutions. On the contrary, bad customers will lead to financial losses. Therefore, banks and financial institutions pay more and more attention to the construction of credit scoring models, because even a 1% increase in the quality of the bad credit applicants would significantly translate into remarkable future savings for financial institutions [3,4]. Since the pioneering work of Beaver [5] and Altman [6], credit scoring has been a major research subject for scholars and financial institutions, and has obtained extensive research in the past 40 years. Subsequently, many types of credit scoring models have been proposed and developed by using statistical methods, such as linear discriminate analysis (LDA) and logistic regression (LR) [7–10]. Mathematics 2020, 8, 1756; doi:10.3390/math8101756 www.mdpi.com/journal/mathematics Mathematics 2020, 8, x FOR PEER REVIEW 2 of 19 Mathematics 2020, 8, 1756 2 of 19 statistical methods, such as linear discriminate analysis (LDA) and logistic regression (LR) [7–10]. However, data is exploding in today's world, when it comes to huge quantities of data, the elastic performanceHowever, data of isclassical exploding statistical in today’s analysis world, models when itis comes not very to huge good. quantities As a result, of data, some the elasticof the assumptionsperformance ofin classicalthese models statistical cannot analysis be established, models is not which very in good. turn Asaffects a result, the accuracy some of the of assumptionspredictions. Within these the modelsbreakthrough cannot of be artificial established, intelligence which (AI) in turn technology, affects the such accuracy as neural of networks predictions. (NN) With [11,12], the supportbreakthrough vector of machines artificial (SVM), intelligence random (AI) forests technology, (RF) [13], such and as neuralNaïve networksBayes (NB) (NN) [14] [can11,12 achieve], support the samevector or machines better results (SVM), compared random with forests the (RF) statistica [13], andl models. Naïve However, Bayes (NB) LDA [14] and can LR achieve still have the same a wide or rangebetter resultsof applications compared due with to thetheir statistical high accuracy models. and However, ease of LDA model and operation. LR still have Machine a wide learning range of belongsapplications to the due category to their of highAI and accuracy is one of and the easemost of efficient model methods operation. in Machinedata mining learning that can belongs provide to analyststhe category with of more AI and productive is one of theinsights most on effi thecient use methods of big indata data [15]. mining Machine that canlearning provide models analysts are withgenerally more subdivided productive into insights individual on the machine use of big learning data [15 (NN]. Machine and LR), learning ensemble models learning are (bagging, generally boosting,subdivided and into randomspace), individual machine and integrated learning ensemb (NN andle LR),machine ensemble learning learning (RS-boosting (bagging, and boosting, multi- boosting)and randomspace), [16]. Ensemble and integrated learning approaches ensemble machine are considered learning to (RS-boostingbe a state-of-the-art and multi-boosting) solution for many [16]. machineEnsemble learning learning tasks approaches [17]. After are Chen considered and Gues to betrin a state-of-the-art[18] proposed XGBoost solution forin 2016, many ensemble machine learning,learning tasksi.e., XGBoost [17]. After and ChenLightG andBM, Guestrin have become [18] proposed a winning XGBoost strategy in to 2016, a wide ensemble range of learning, Kaggle competitionsi.e., XGBoost anddue LightGBM,to its inspiring have success. become Accordingly, a winning strategy more and to a widemore rangescholars of Kaggleintroduce competitions ensemble learningdue to its models inspiring to solve success. credit Accordingly, scoring problems more and[19–22]. more Ensemble scholars learning introduce is ensemblea method learningto make excellentmodels to performance solve credit by scoring building problems and combining [19–22]. Ensemble multiple learningbase learners is a method with certain to make strategies. excellent It canperformance be used to by solve building classification and combining problems, multiple regression base problems, learners feature with certain selection, strategies. outlier detection, It can be etc.used Generally, to solve classificationensemble learning problems, can be regression separated problems, into homogenous feature selection, and heterogeneous, outlier detection, where the etc. formerGenerally, combines ensemble classifiers learning of can the besame separated kind, wherea into homogenouss the latter combines and heterogeneous, classifiers of where different the former kinds [22].combines Boosting classifiers represented of the same by AdaBoost kind, whereas and theBagg lattering combinesrepresented classifiers by random of di ffforesterent kindsbelong [22 to]. homogenousBoosting represented integration by AdaBoost in ensemble and Bagginglearning, representedotherwise, byStacking random belongs forest belongto the toheterogeneous homogenous (Figureintegration 1). in ensemble learning, otherwise, Stacking belongs to the heterogeneous (Figure1). Figure 1. EnsembleEnsemble machine learning framework. Scholars have also conducted comparative studiesstudies on ensembleensemble learninglearning models:models: Bagging, Bagging, Boosting, and Stacking. Wang Wang et al. [4] [4] investigatedinvestigated the performance of ensemble learning-bagging, boosting, stacking-based on on four four individual individual le learnersarners (LR, (LR, DT, DT, NN, and SVM) on credit scoring problems across three real world credit datasets and found that bagging gets better than boosting, stacking, and bagging DT perform the best output.output. Barboza, Kimura, and and Altman [23] [23] conduct the comparison betweenbetween machine machine learning learning methods methods (SVM, (SVM, bagging, bagging, boosting, boosting, and RF) and someRF) and traditional some traditionalmethods (discriminant methods (discriminant analysis, LR, analysis, and NN) LR, to and predict NN) to bankruptcy, predict bankruptcy, showed that showed bagging, that boosting,bagging, boosting,and RF outperform and RF outperform the other algorithms.the other algorithms. Ma et al. Ma [15] et used al. [15] data used cleaning data methodcleaning of method “multiple of “multipleobservation” observation” and “multiple and dimensional”“multiple dimensional” to evaluate to the evaluate performance the performance of XGBoost of and XGBoost LightGBM and Mathematics 2020, 8, 1756 3 of 19 Mathematics