On the Rise of the Fintechs—Credit Scoring Using Digital Footprints
Total Page:16
File Type:pdf, Size:1020Kb
WORKING PAPER SERIES On the Rise of the FinTechs—Credit Scoring using Digital Footprints Tobias Berg Valentin Burg Frankfurt School of Finance & Management Humboldt University Berlin Ana Gombović Manju Puri Frankfurt School of Finance & Management Duke University Federal Deposit Insurance Corporation National Bureau of Economic Research September 2018 FDIC CFR WP 2018-04 fdic.gov/cfr NOTE: Staff working papers are preliminary materials circulated to stimulate discussion and critical comment. The analysis, conclusions, and opinions set forth here are those of the author(s) alone and do not necessarily reflect the views of the Federal Deposit Insurance Corporation. References in publications to this paper (other than acknowledgement) should be cleared with the author(s) to protect the tentative character of these papers. On the Rise of FinTechs – Credit Scoring using Digital Footprints Tobias Berg†, Valentin Burg‡, Ana Gombović+, Manju Puri* July 2018 Abstract We analyze the information content of the digital footprint – information that people leave online simply by accessing or registering on a website – for predicting consumer default. Using more than 250,000 observations, we show that even simple, easily accessible variables from the digital footprint equal or exceed the information content of credit bureau scores. Furthermore, the discriminatory power for unscorable customers is very similar to that of scorable customers. Our results have potentially wide implications for financial intermediaries’ business models, for access to credit for the unbanked, and for the behavior of consumers, firms, and regulators in the digital sphere. We wish to thank Frank Ecker, Falko Fecht, Christine Laudenbach, Laurence van Lent, Kelly Shue (discussant), Sascha Steffen, as well as participants of the 2018 RFS FinTech Conference, the 2018 Swiss Winter Conference on Financial Intermediation, and research seminars at Duke University, FDIC, and Frankfurt School of Finance & Management for valuable comments and suggestions. This work was supported by a grant from FIRM (Frankfurt Institute for Risk Management and Regulation). † Frankfurt School of Finance & Management, Email: [email protected]. Phone: +49 69 154008 515. ‡ Humboldt University Berlin, [email protected], + Frankfurt School of Finance & Management, Email: [email protected]. Phone: +49 69 154008 830. * Duke University, FDIC, and NBER. Email: [email protected]. Tel: (919) 660-7657. 1 1. Introduction The growth of the internet leaves a trace of simple, easily accessible information about almost every individual worldwide – a trace that we label “digital footprint”. Even without writing text about oneself, uploading financial information, or providing friendship or social network data, the simple act of accessing or registering on a webpage leaves valuable information. As a simple example, every website can effortlessly track whether a customer is using an iOS or an Android device; or track whether a customer comes to the website via a search engine or a click on a paid ad. In this project, we seek to understand whether the digital footprint helps augment information traditionally considered to be important for default prediction and whether it can be used for the prediction of consumer payment behavior and defaults. Understanding the importance of digital footprints for consumer lending is of significant importance. A key reason for the existence of financial intermediaries is their superior ability to access and process information relevant for screening and monitoring of borrowers.1 If digital footprints yield significant information on predicting defaults then FinTechs – with their superior ability to access and process digital footprints – can threaten the information advantage of financial intermediaries and thereby challenge financial intermediaries’ business models.2 In this paper, we analyze the importance of simple, easily accessible digital footprint variables for default prediction using a comprehensive and unique data set covering approximately 250,000 observations from an E-Commerce company located in Germany. Judging the creditworthiness of its customers is important because goods are shipped first and paid later. The use of digital footprints in similar settings is growing around the world.3 Our data set contains a set of ten digital footprint variables: the device type (for 1 See in particular Diamond (1984), Boot (1999), and Boot and Thakor (2000) for an overview of the role of banks in overcoming information asymmetries and Berger, Miller, Petersen, Rajan, and Stein (2005) for empirical evidence. 2 The digital footprint can also be used by financial intermediaries themselves, but to the extent that it proxies for current relationship-specific information it reduces the gap between traditional banks and those firms more prone to technology innovation. 3 In China, Alibaba’s Sesame Credit uses social credit scores from AntFinancial and goods are also shipped first and paid later (see https://www.economist.com/news/finance-and-economics/21710292-chinas-consumer-credit-rating- culture-evolving-fastand-unconventionally-just). Other FinTechs that have publicly announced using digital footprints for lending decisions include ZestFinance and Earnest in the U.S., Kreditech in various emerging markets, and Rapid Finance, CreditEase, and Yongqianbao in China (see https://www.nytimes.com/2015/01/19/technology/ banking-start-ups-adopt-new-tools-for-lending.html and https://www.forbes.com/sites/rebeccafeng/2017/ 07/25/chinese-fintechs-use-big-data-to-give-credit-scores-to-the-unscorable/#45b0e6ed410a). 2 example, tablet or mobile), the operating system (for example, iOS or Android), the channel through which a customer comes to the website (for example, search engine or price comparison site), a do not track dummy equal to one if a customer uses settings that do not allow tracking device, operating system and channel information, the time of day of the purchase (for example, morning, afternoon, evening, or night), the email service provider (for example, gmail or yahoo), two pieces of information about the email address chosen by the user (includes first and/or last name and includes a number), a lower case dummy if a user consistently uses lower case when writing, and a dummy for a typing error when entering the email address. In addition to these digital footprint variables, our data set also contains a credit score from a private credit bureau. We are therefore able to assess the discriminatory ability of the digital footprint variables both separately, vis-à-vis the credit bureau score, and jointly with the credit bureau score. Our results suggest that even the simple, easily accessible variables from the digital footprint proxy for income, character and reputation and are highly valuable for default prediction. For example, the difference in default rates between customers using iOS (Apple) and Android (for example, Samsung) is equivalent to the difference in default rates between a median credit score and the 80th percentile of the credit score. Bertrand and Kamenica (2017) document that owning an iOS device is one of the best predictors for being in the top quartile of the income distribution. Our results are therefore consistent with the device type being an easily accessible proxy for otherwise hard to collect income data. Variables that proxy for character and reputation are also significantly related to future payment behavior. For example, customers coming from a price comparison website are almost half as likely to default as customers being directed to the website by search engine ads, consistent with marketing research documenting the importance of personality traits for impulse shopping.4 Belenzon, Chatterji, and Daley (2017) and Guzman and Stern (2016) have documented an eponymous-entrepreneurs-effect, implying that whether a firm is named after their founders matters for subsequent performance. Consistent with their results, customers having their names in the email address are 30% less likely to default. 4 See for example Rook (1987), Wells, Parboteeah, and Valacich (2011), and Turkyilmaz, Erdem, and Uslu (2015). 3 We provide a more formal analysis of the discriminatory power of digital footprint variables by constructing receiver operating characteristics and determining the area under the curve (AUC). The AUC is a simple and widely used metric for judging the discriminatory power of credit scores (see for example Stein, 2007; Altman, Sabato, and Wilson, 2010; Iyer, Khwaja, Luttmer, and Shue, 2016; Vallee and Zeng, 2018). The AUC ranges from 50% (purely random prediction) to 100% (perfect prediction) and is closely related to the Gini coefficient (Gini= 2*AUC–1). The AUC corresponds to the probability of correctly identifying the good case if faced with one random good and one random bad case (Hanley and McNeil, 1982). Following Iyer, Khwaja, Luttmer, and Shue (2016), an AUC of 60% is generally considered desirable in information-scarce environments, while AUCs of 70% or greater are the goal in information- rich environments. The AUC using the credit bureau score alone is 68.3% in our data set, comparable to the 66.6% AUC using the credit bureau score alone documented in a consumer loan sample of a large German bank (Berg, Puri, and Rocholl, 2017), as well as the 66.5% AUC using the credit bureau score alone in a loan sample of 296 German savings banks (Puri, Rocholl, and Steffen, 2017). As a comparison, Iyer, Khwaja, Luttmer, and Shue (2016) report an AUC of 62.5% in a U.S. peer-to-peer lending data set using a credit bureau score only. Similarly, in an own analysis we find an AUC of 59.8% using U.S. credit scores from Lending Club. This suggests that the score provided to us by a German credit bureau clearly possesses discriminatory power and we use the credit bureau score related AUC of 68.3% as a benchmark for the digital footprint variables in our analysis.5 Interestingly, a model that uses only the digital footprint variables equals or exceeds the information content of the credit bureau score: the AUC of the model using digital footprint variables is 69.6%, higher than the AUC of the model using only the credit bureau score (68.3%).