ANALYISIS OF SCORING IN PEER-TO- PEER LENDING DETERMINANTS OF LOAN DEFAULT

Aantal woorden/ Word count: 12.659

Davy Lust Stamnummer/ Student number : 01201013

Promotor/ Supervisor: Prof. dr. Rudi Vander Vennet

Masterproef voorgedragen tot het bekomen van de graad van: Master’s Dissertation submitted to obtain the degree of:

Master of Science in Business Engineering

Academiejaar/ Academic year: 2016 - 2017

ANALYISIS OF SCORING IN PEER-TO- PEER LENDING DETERMINANTS OF LOAN DEFAULT

Aantal woorden/ Word count: 12.659

Davy Lust Stamnummer/ Student number : 01201013

Promotor/ Supervisor: Prof. dr. Rudi Vander Vennet

Masterproef voorgedragen tot het bekomen van de graad van: Master’s Dissertation submitted to obtain the degree of:

Master of Science in Business Engineering

Academiejaar/ Academic year: 2016 - 2017

VERTROUWELIJKHEIDSCLAUSULE/ CONFIDENTIALITY AGREEMENT

PERMISSION

Ondergetekende verklaart dat de inhoud van deze masterproef mag geraadpleegd en/of gereproduceerd worden, mits bronvermelding.

I declare that the content of this Master’s Dissertation may be consulted and/or reproduced, provided that the source is referenced.

Naam student/name student:

Davy Lust

Handtekening/signature

Dutch summary Deze thesis is gericht op het bepalen van de kredietwaardigheid van een ontlener in de peer- to-peer-leningenmarkt. Hierbij wordt in de eerste plaats aandacht besteed aan het bepalen van de voornaamste determinanten van een ‘loan default’, of de situatie waarbij de ontlener niet meer aan zijn financiële verplichtingen kan voldoen. Om dit te doen, maken we gebruik van een dataset van het grootste Amerikaanse P2P-Lending platform, namelijk Lending Club. Hierin zijn alle gegevens met betrekking tot de op het platform uitgegeven leningen terug te vinden. Aan de hand van deze dataset stellen we een statistisch model op, dat de status van de lening (default of niet) op het eind van de looptijd relateert aan de verschillende gegevens met betrekking tot de ontlener, zoals bijvoorbeeld zijn inkomen, huidige schulden en betalingsverleden. Op die manier kan worden vastgesteld welke variabelen een invloed uitoefenen op het zich al dan niet voordoen van een loan default, en hoe deze variabelen aan deze waarschijnlijkheid zijn gerelateerd.

De thesis vangt aan met een beschrijving van het concept ‘peer-to-peer lending’, waarbij ook de voor- en nadelen voor zowel de ontlener als de investeerder worden besproken. Vervolgens wordt de huidige situatie op de Europese, Amerikaanse en Aziatische P2P-Lending markt besproken, en wordt er dieper ingegaan op hoe ‘credit scoring’ in deze financiële markten doorgaans in z’n werk gaat.

De paper gaat verder met het beschrijven van de gebruikte data, en hoe deze data is verwerkt om in het statistisch model opgenomen te kunnen worden. Hierna wordt dieper ingegaan op het toegepaste statistische model, meer bepaald het logit model, de karakteristieken van dit model, en welke invloed dit heeft op onze analyse.

Ten slotte worden de resultaten van het onderzoek weergegeven. Deze resultaten worden vergeleken met de huidige literatuur rond ‘credit scoring’, alsook met gelijkaardige studies, om zinvolle conclusies te kunnen trekken. Verder worden er voor elk van de bevindingen economisch gerelateerde verklaringen gezocht.

II

Foreword This master’s dissertation serves as the conclusion of five years of intensive academic and personal development, and is the final stepping stone towards a promising future as a graduate in Business Engineering.

I would like to take this opportunity to first of all thank my parents for their continuous support, both mentally and financially, during this important period in my life. Secondly, I want to thank prof. dr. Rudi Vander Vennet, for granting me the opportunity to work on this fascinating and challenging topic, as well as Thomas Present, for his excellent guidance during the development of this thesis. Finally, I want to express my heartfelt gratitude towards my girlfriend, for her everlasting motivation and continuous belief in me.

III

Table of content

Dutch summary ...... II

Foreword ...... III

Table of content ...... IV

List of used abbreviations ...... VI

List of Figures and Tables ...... VII

1 Introduction ...... 1

2 Theoretical Background ...... 2

2.1 What is Peer-To-Peer-Lending? ...... 2

2.2 Advantages of P2P-Lending ...... 2

2.2.1 Advantages for the lender ...... 3

2.2.2 Advantages for the borrower ...... 3

2.3 Disadvantages of P2P-Lending ...... 4

2.4 Market overview ...... 5

2.4.1 American market - USA ...... 5

2.4.2 Asian market ...... 6

2.4.3 European market ...... 6

2.5 Credit Scoring ...... 7

2.5.1 Credit Scoring in general ...... 7

2.5.2 Credit Scoring in P2P-Lending ...... 8

3 Data Description ...... 10

3.1 Data set and variables ...... 10

3.1.1 Dependent variable ...... 12

3.1.2 Predictor variables ...... 12

3.2 Descriptive statistics and correlation matrix ...... 15

4 Econometrical Methodology ...... 17

4.1 Model selection ...... 17

4.2 Model characteristics ...... 18 IV

4.2.1 Goodness of Fit ...... 19

4.2.2 Model significance ...... 20

4.2.3 Significance of variables ...... 20

4.2.4 Coefficient interpretation ...... 21

5 Specification Adjustments ...... 22

5.1 Employment length ...... 22

5.2 Open Accounts & Total Accounts ...... 23

5.3 Public records & Months since last record ...... 23

6 Empirical Results ...... 26

6.1 Non-significant variables ...... 26

6.2 Significant variables ...... 28

7 Conclusion ...... 33

8 Further Research ...... 34

References ...... I

Appendices ...... IV

V

List of used abbreviations

Abbreviation Meaning

P2P-Lending Peer-To-Peer Lending

EU European Union

USA United States of America

UK United Kingdom

SME Small and Medium-sized Enterprises

FICO Fair Isaac Corporation

DTI Debt-To-Income

LC Lending Club

LPM Linear Probability Model

MLE Maximum Likelihood Estimation

LR Likelihood Ratio

OLS Ordinary Least Squares

VI

List of Figures and Tables

Figure 1: FICO-score Components ...... 8

Figure 2: VantageScore 3.0 Influences ...... 9

Table 1: Model Variables and Description ...... 11

Table 2: Descriptive statistics of numerical variables ...... 15

Table 3: Correlation matrix of numerical variables ...... 16

Table 4: Regression results initial model - coefficients and odds ratios...... 19

Figure 3: Regression coefficients employment length, including linear trendline ...... 22

Table 5: Regression results Final Model ...... 24

Table 6: Regression coefficients for different specifications ...... 25

VII

1 Introduction In today’s ever changing, global society where individualism and self-interest are frowned upon, and the prosperity of the community and the globe is becoming a core value in the policy of the future, we can observe the emergence of all kinds of social initiatives. This is also the case in the financial market, where actors often happily exchange the lack of connectedness or the institutional and authoritarian structures of mainstream financial institutions for more social, transparent and relational alternatives (Hulme & Wright, 2006). The emergence of social lending is a clear example of this current trend.

The main part of this paper aims at analysing the scoring of loans in the peer-to-peer lending market, based on data provided by Lending Club. This data is used to develop a model relating the probability of default of borrowers to personal information provided during the loan application, in order to define the main determinants of loan default in the P2P-Lending market.

In the first part of this paper, we shortly introduce the concept of P2P-Lending, its characteristics, advantages and disadvantages compared to traditional investment or borrowing opportunities, and the influences on the financial market. This allows us to determine the need for adequate credit scoring in social lending. We further describe the emergence of P2P-Lending in the financial market, followed by an overview of the American, Asian and European P2P-Lending markets.

The paper continues with a description and interpretation of the data used in our analysis, and how this data will be incorporated into our model. We further describe the econometrical methodology, as well as its characteristics and implications on the use and interpretation of our model.

Subsequently, the empirical results of this research are described and compared with the findings in current literature and similar studies on credit scoring in P2P-Lending, in order to draw meaningful conclusions. Finally, these conclusions, as well as the rest of this paper, are summarized.

1

2 Theoretical Background 2.1 What is Peer-To-Peer-Lending? Peer-To-Peer-Lending (also known as person-to-person lending, social lending or P2P- Lending) is a type of consumer lending where one individual lends money to another individual, without the intervention of a financial institution acting as an intermediary (Investopedia, n.d.). Consumer lending generally consists of loans such as debt consolidation and refinancing, medical loans, auto loans and loans for home improvements or major purchases (Mateeschu, 2015). More recent trends show that the P2P-Lending market has broadened in terms of loan types, covering not only consumer loans, but other types of loans such as small business loans, student loans and real estate loans as well. The P2P-Lending market generally consists of online marketplaces or platforms (Mateeschu, 2015), acting as facilitators for both parties in the transaction (Bajpai, 2015). However, it needs to be noted that technically speaking, the act where one individual lends money to another individual without the use of an online marketplace or platform can be described as P2P-Lending as well.

In P2P-Lending, both parties often don’t know each other and have no direct relationship (Renton, 2012). The main reason these individuals engage in the financial transaction with each other is their matching preferences in terms of the loan characteristics related to the lending or borrowing of an amount of money. The role of the lending platform in this situation is limited to the following tasks: (1) authenticating the participants, (2) managing the money movement and loan repayment, and (3) providing the users of the platform with detailed reports (Emekter, Tu, Jirasakuldechc, & Lu, 2015). Next to this, the platform can offer certain services in case of a default.

Loans in the P2P-Lending market are unsecured, which means that there is no collateral to support the loan in case of a default, and consequently, the security of the loan only depends on the creditworthiness of the borrower. (Investopedia, n.d.). This implies that the risk for the investor is often far greater than in the case where he deposits his capital on a bank savings account, due to the fact that, in most cases, these accounts are protected by a deposit guarantee scheme in case of default of the financial institution (Directive 2014/49/EU).

2.2 Advantages of P2P-Lending The reason why P2P-Lending exists, is because it “provides an alternative and more efficient lending model compared to mainstream financial institutions” acting as an intermediary (Mateeschu, 2015). In what follows, these advantages are described for both the lender and the borrower.

2

2.2.1 Advantages for the lender The lender (or investor) as a first party in the P2P-Lending market has some clear advantages compared to the traditional investment options provided by mainstream financial institutions. Firstly, by disintermediation, or cutting out the middle man (in this case the financial institution), the investors can become a higher interest rate as a return on their investment (Renton, 2012) & (Mateeschu, 2015). This is due to several reasons. The first reason is that P2P-Lending takes place online. Therefore, there are no operating costs with respect to physical locations, as opposed to the traditional financial institutions which most of the time operate mainly according to a brick-and-mortar business model. The second reason is that online P2P-Lending platforms often operate in a more efficient and faster way in terms of the loan application process. This is due to the fact that these platforms operate online, avoiding slow paperwork and a delaying bureaucratic policy. A second advantage is that P2P-Lending platforms work in a transparent way (Mateeschu, 2015). Most of the platforms provide their users with all sort of historical and statistical data, allowing them to conduct their own analysis on the investment opportunities. This gives investors more authority over their investments, an enables them to gain a better understanding of what they invest in and what actually happens with their money. Thirdly, P2P-Lending provides alternative opportunities for the investors to diversify their investment portfolio and thus reduce the overall risk of their investments (Renton, 2012) & (Rind, 2016). Fourthly, the investment process on P2P-Lending platforms is generally much easier, quicker, and more approachable for individual investors compared to that of mainstream financial institutions (Rind, 2016). It is easy to create an online investment account and initial investments often have a very low minimum investment requirement. Finally, because online P2P-Lending companies use more credit variables than the mainstream financial institutions when assessing the credit risk of a borrower, this credit risk is claimed to be presented more accurately in P2P-Lending (Mateeschu, 2015). This benefits the investors due to the fact that this enables them to base their investment decision on more truthful information.

2.2.2 Advantages for the borrower Next to the advantages for the lender, the borrower as well has some clear incentives to enter the P2P-Lending market. First of all, the biggest advantage for the borrower is the lower cost of credit compared to the cost associated with the borrowing options at mainstream financial institutions or credit card companies (Renton, 2012), (Rind, 2016) & (Mateeschu, 2015). This is mainly due to the same reasons the investors can obtain a higher rate of return on their investment, namely lower operating costs and a more efficient processing procedure.

3

A second big advantage for the borrowers is that obtaining a loan is less difficult in the P2P- Lending market, compared to the financial institutions (Renton, 2012), (Rind, 2016) & (Mateeschu, 2015). This has several reasons. Firstly, financial institutions are relatively strict in the loans they grant. Due to the more stringent regulations resulting from the financial crisis, banks are even more restricted in how much risk they can bear, and this has impacted their loan granting behaviour over the last couple of years (Finger, 2013). Secondly, financial institutions often require collateral when granting a loan. A lot of the borrowers are not able to provide the necessary collateral to get their loan request approved. In the P2P-Lending market, loans are unsecured, which means they are not backed up by collateral. This makes it easier for some borrowers to get approval for their loan request (Renton, 2012) & (Rind, 2016). A third and final advantage is the fact that applying for a loan in the P2P-Lending market does not affect the credit score of the inquirer. This is because a credit application in the P2P- Lending market counts as a so called “soft inquiry”, which means the application does not negatively impact the borrower’s credit score (Woodruff, 2014).

2.3 Disadvantages of P2P-Lending P2P-Lending doesn’t only have advantages. There are also some disadvantages compared to the lending or investment options provided by traditional financial institutions. Firstly, for borrowers with a low credit score, interest rates are often very high (25%-35%), resulting in a high cost of lending (Rind, 2016). This makes it harder to keep fulfilling repayment obligations, which may damage the credit score even more in case of missed payments or loan defaults. Secondly, unlike in the case where an individual invests his capital in a bank savings account, the investment of investors in P2P-Lending is definitive, and can’t be reimbursed before the loan expires. Thirdly, the loans in a P2P-Lending market are unsecured, and don’t have a deposit insurance, in contrast to deposits made with most financial institutions (Wright, 2015). Therefore, inability of the lender to fulfil his payment obligations or a loan default has the effect that the investor completely loses his investment and incomplete interest payments. Fourthly, the concept of information asymmetry, or the situation where the parties engaging in an economic transaction do not possess equal material knowledge on each other or the transaction details (Investopedia, n.d.), is heavily present in the P2P-Lending market (Lin, Prabhala, & Viswanathan, 2013). Although some information on the reasons of the borrower to apply for a loan in the P2P-Lending market is presented to the investors, in most cases this information is incomplete. This may result in adverse selection, or the situation where one of the parties engages in an undesired transaction unknowingly, due to this information asymmetry (Nickolas, 2015). Due to the lack of information on some aspects, combined with possible wrong or deceiving information (for example the real reason as to why the lender

4

needs money), investors can be misled and invest in a loan request they would normally not invest in if they were in possession of truthful information (Berger & Gleisner, 2009). Next to this, the information asymmetry could lead to moral hazard, or the situation where the borrower changes his behaviour or intentions after the deal has been made, adding risk that was previously not present or known by the other party. Therefore, investors might invest in loan request that can possibly harm their investment portfolio in terms of diversification or desired level of risk.

These disadvantages, and especially the information asymmetry and its consequences, make it clear that adequate risk evaluation is a crucial but challenging element in the P2P-Lending market. Individual investors often lack the knowledge necessary to appropriately evaluate the risk of investing in loans offered on P2P-Lending platforms. This paper therefore tries to discover signals of possible loan default by identifying its main determinants based on historical data provided by Lending Club.

2.4 Market overview The following section first describes the emergence of P2P-Lending in the financial sector, followed by an overview of the current situation in the American, European and Asian P2P- Lending market.

The first online P2P-Lending platform, , was founded in 2004 and launched in 2005 in the UK. The founders based their company strategy on one simple problem: borrowers were being charged high borrowing rates and investors were receiving low returns on their investments (Zopa, 2016). This problem could, by their believe, easily be solved by matching borrowers and investors directly through an online platform, and like that, Zopa was founded. Since then, over 100 platforms have risen and fallen in the UK alone (Gurney, 2017), and many more all over the world adopted the same business idea and entered the peer-to-peer lending market.

2.4.1 American market - USA The American peer-to-peer lending market is currently dominated by three players, Lending Club, SoFi and Prosper, with Lending Club, founded by Renaud Laplanche in 2007, being the market leader. Lending Club reported at the end of 2016 that the company has funded over 24.5 billion dollars in loans since their launch in 2007, with close to 2 billion dollars in the last quarter of 2016 alone (LendingClub Corporation, 2017). Prosper on the other hand reports to have funded over 9 billion dollars in loans (, Inc, 2017), where Sofi claims to have funded loans for a value of over 18 billion dollars (Social Finance, Inc, 2017). Next to these three big players, other P2P-Lending platforms are active in the American market, including , founded in 2010 by Wall Street executives, , founded in 2012 by

5

ex-Googlers, and , a company founded in the UK in 2010 with an exclusive focus on SME’s.

2.4.2 Asian market The P2P-Lending market in Asia is still in its infancy, but a number of start-ups have emerged, being active in different regions in the continent (Fintechnews Singapore, 2016). According to Fintech News, a news outlet focusing on Digital Finance, the following companies are among the top players in the Asian P2P-Lending market. Crowdo, a Malaysian company founded in 2013, offers various solutions. Funding Societies, an Indonesian company founded in 2015 and active in Indonesia and Singapore, connects smaller businesses with both institutional and individual investors. MoolahSense, a Singaporean P2P-Lending platform founded in 2013, brings investors and local SME’s together on their online platform. WeLab Holdings, a company founded in Hong Kong in 2013, is the owner of WeLend.hk, an online lending platform in Hong Kong, and Wolaidai, one of the largest mobile lending platforms in China. Another big player in China is CreditEase, a P2P-Lending and microfinance platform founded in 2006, aimed at democratizing credit in China. Next to this, the company is the owner of the online lending platform Yirendai. In the Japanese P2P-Lending market, Maneo takes the place of the largest P2P-Lending platform, allowing SME’s to receive funding from investors. Crowdcredit, another Japanese company launched in 2014, offers the ability to lend money to SME’s and individuals in countries all over the world, including Estonia, Spain, Italy, Finland, Cameroon, and Peru.

2.4.3 European market According to Fintech News, more than 84% of the European P2P-Lending activity is concentrated in the UK (Fintechnews Switzerland, 2016). Evelyn Bidenko, a finance coach and mentor with more than 12 years of experience working in the financial industry in London, states that this market is dominated by three players: Zopa, RateSetter and Funding Circle. Zopa, as stated above, was the first online P2P-Lending platform to ever have launched. Since its launch in 2005, it has lent more than 2.25 billion British pounds (equivalent to approximately 2.9 billion dollars or 2.65 billion euros) to consumers in the UK. RateSetter, founded in 2010, claims to be the biggest P2P-Lending platform in the UK, and has recorded over 1.8 billion British pound (approximately 2.3 billion dollars or 2.1 billion euros). The company states that thanks to their Provision Fund and 100% track record, investors haven’t lost a single penny. Funding Circle, founded in 2010, focuses on small businesses instead of individuals, and states to have lent to more than 23 700 businesses, providing close to 2.25 billion British pounds to date.

In other countries in Europe, the P2P-Lending market is far less developed. According to Frédéric Dujeux, co-founder of the Belgian fintech company Mozenno founded in December

6

2015, this is due to the European Prospectus Law that implies that individuals are prohibited to raise funds publicly (Dujeux, 2017). This law makes it very difficult for start-ups to set up a P2P-Lending platform. Nevertheless, some companies have managed to set up a platform and stay within the laws of their country. In Germany, a company named Auxmoney, is active on the P2P-Lending market since 2006, and has a user base of over 2.1 million users. Younited Credit, formerly known as Prét d’Union, is a France fintech company founded in 2009, and operates the biggest P2P-Lending platform in France. To date, it has funded close to 60 000 loans for a total amount of over 433 million euros, and the company plans to expand to other countries as well.

2.5 Credit Scoring 2.5.1 Credit Scoring in general Credit scoring is the act of statistically determining and assigning a score or a grade to an individual, that represents the creditworthiness of that individual (Investopedia, n.d.). Subsequently, the score is equivalent with the probability that the individual fulfils his financial obligations, and per definition not defaults on his payments.

Credit scoring is a widely used technique in almost every financial institution. However, there is no standardized way of calculating a credit score. Nevertheless, there are a few well- developed techniques that have gained popularity and are seen as standards in the credit scoring industry.

Probably the most famous scoring technique is the one developed by the Fair Isaac Corporation, known as the FICO-score. According to the company, the score is used by 90% of the lenders. The FICO scoring technique was invented in 1989, and adopted in 1991 by the three biggest U.S. credit reporting agencies: Equifax, TransUnion and Experian (Fair Isaac Corporation, 2017). However, each credit reporting agency uses a different version of the FICO-score, accommodating for the structural differences in the databases of the agencies (Fair Isaac Corporation, 2017). Due to this difference, it is rather difficult to compare the scores reported by the agencies. The FICO-score ranges from 300 to 850, and although the exact calculation of the score is a well-kept company secret, there is some information on the type of factors that influence the score, as illustrated by Figure 1. The payment history of the borrower plays the most important role in calculating the score, with an estimated weight of approximately 35%. The amount of debt contributes approximately 30% to the score calculation, and the length of the credit history determines on average 15% of the score. The final two components, new credit and the credit mix, each have a weight of approximately 10% in the calculation of the score.

7

Figure 1: FICO-score Components

Source: Website FICO

In reaction to the dominant market position of the FICO-score, as well as the inability to compare their scores with one another, the three previously mentioned U.S. credit reporting agencies have developed their own credit rating score, the VantageScore, launched in 2006. The latest version of the score, VantageScore 3.0, released in 2013, uses the same scale as the FICO-score, ranging from 300 to 850. The factors that influence the score are similar to those of the FICO-score (VantageScore Solutions, LLC, 2017). From Figure 2 we can learn that payment history has the biggest impact on your score, followed by the age and type of your credit, and the percentage of your total credit limit you use. Your balance to debt ratio moderately influences your VantageScore credit score, and the factors ‘available credit’ and ‘recent credit behaviour and inquiries’ are the least influential when it comes to determining your credit score according to the VantageScore credit scoring model.

2.5.2 Credit Scoring in P2P-Lending Credit scoring in the P2P-Lending market is very similar to how mainstream financial institutions conduct their credit scoring. Lending Club uses the self-reported FICO-score of the borrower to conduct an initial screening and provide an estimate of the borrowing interest rate. When the borrower decides to apply for a loan, Lending Club gathers all the information it deems relevant to truthfully assess the creditworthiness of the borrower. In most cases, critical information such as yearly reported income is verified by Lending Club before the loan is approved or declined. An approved loan will be assigned a loan grade ranging from A to G,

8

each of which is subdivided into 5 subgrades, ranging from 1 to 5. Each subgrade corresponds to an interest rate, where current macroeconomic factors such as the current risk-free rate are taken into account as well.

Other P2P-Lending platforms such as Zopa and Prosper conduct their credit scoring process in a similar way, basing their scoring on the information provided by credit rating agencies such as Equifax, in combination with their own analysis based on provided and self-gathered information on the borrower.

Figure 2: VantageScore 3.0 Influences

Source: Website VantageScore

9

3 Data Description 3.1 Data set and variables The goal of this paper is to develop a model that relates the probability of default of a borrower to certain borrower characteristics, based on the information provided during the loan application. This enables us to identify the main determinants of loan default in the P2P- Lending market. We define a defaulted loan as a loan on which the payments are late for more than 120 days.

To estimate our model, we use a data set provided by Lending Club, which can be found and downloaded on their website1. The data set contains all the information gathered by Lending Club during the loan application process, as well as during the maturity of the loan. To develop our model, we only use the information provided and gathered during the loan application process. The dependant variable, however, will be the loan status at the end of maturity.

The Lending Club offers loans with a maturity of 36 months and 60 months. For consistency purposes, we will focus on loans with a maturity of 36 months, and only include loans for which the maturity has ended. This gives us a sample of 175037 observations (after corrections, see following sections), consisting of loans initiated between June 2007 and December 2013.

The data set contains for each observation 115 variables, of which the full list can be found in Appendix 1. However, a big part of these variables can’t be used in our model, due to several reasons. A first reason is that several variables are introduced during the period the lending platform was operational and improving, which results in the fact that the early loans have no information concerning these variables. A second reason is that some variables gather non- standardized, user-generated info. This is for example the case for the variables ‘job title’ and ‘loan description’. As a result, these variables can’t be included in a statistical model. A third reason is that some variables are based on information gathered during the duration of the loan. Our model tries to relate loan default to borrower characteristics based on the information gathered during the loan application process, and consequently, variables that fall under the category described above can’t be included in our model. A fourth and final reason that limits us in the use of the available variables is the fact that some variables in the data set are variables that have been developed by Lending Club, based on the information the loan applicant has provided. A few examples of these variables are the loan grade and subgrade, the interest rate applicable to the loan, and the monthly installment.

All of the limitations described above result in a new data set, consisting of 16 predictor variables and one dependant variable, as described in Table 1.

1 https://www.lendingclub.com/info/download-data.action

10

Variable Description Dependent variable Loan Status Current status of the loan Predictor variables Loan Amount The listed amount of the loan applied for by the borrower. If at some point in time, the credit department reduces the loan amount, then it will be reflected in this value. Employment Length Employment length in years. Possible values are between 0 and 10 where 0 means less than one year and 10 means ten or more years. Home Ownership The home ownership status provided by the borrower during registration or obtained from the credit report. Our values are: RENT, OWN, MORTGAGE, OTHER Annual Income The self-reported annual income provided by the borrower during registration. Debt-to-Income Ratio A ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, excluding mortgage and the requested LC loan, divided by the borrower’s self-reported monthly income. Delinquencies 2 years The number of 30+ days past-due incidences of delinquency in the borrower's credit file for the past 2 years Earliest Credit Line The month the borrower's earliest reported credit line was opened

Inquiries last 6 months The number of inquiries in past 6 months (excluding auto and mortgage inquiries) Months since last The number of months since the borrower's last delinquency. delinquency Months since last The number of months since the last public record. record Open Accounts The number of open credit lines in the borrower's credit file.

Public Records Number of derogatory public records

Revolving Balance Total credit revolving balance

Revolving Utilization Revolving line utilization rate, or the amount of credit the borrower is using relative to all available revolving credit. Total Accounts The total number of credit lines currently in the borrower's credit file

Initial Listing Status The initial listing status of the loan. Possible values are – W (whole) , F (fractional)

Table 1: Model Variables and Description

Source: Data Dictionary from Lending Club Statistics webpage

11

3.1.1 Dependent variable The dependent variable in our model is the status of the loan at the end of maturity. This status can either be “Fully paid”, which means that all the financial obligations have been fulfilled, or “Charged off”, meaning that there is no expectation of further payments, and the borrower has defaulted. We will model this variable as a dummy variable, ‘dummy loan status’, where a value of 0 indicates a loan status “Fully paid”, and a value of 1 indicates a loan status “Charged off”.

3.1.2 Predictor variables 3.1.2.1 Loan Amount The variable ‘loan amount’ represents the amount (in US dollar) the borrower applied for in his loan application, and that has been approved by the credit department of Lending Club. This is a numerical variable, and will be integrated into the model in this form.

3.1.2.2 Employment Length The variable ‘employment length’ tells us how many years the borrower is employed in his current job. The variable ranges from values between 0 and 10, 0 meaning less than one year, and 10 meaning ten or more years. If there is no value for this variable, or the value is ‘n/a’, the borrower is unemployed.

To make the interpretation of this variable more meaningful, as well as to allow testing for multiple relations (linear, exponential, …) between the variable ‘employment length’ and the dependent variable, we have decided to remodel this variable into 11 dummy variables. These dummy variables are ‘dummy_<1y’, ‘dummy_1y’, dummy_2y’, … , ‘dummy_9y’, ‘dummy_10+y, where the first dummy variable takes a value of 1 if the borrower is employed for less than 1 year, and a value of 0 otherwise. The second till tenth dummy variables have a value of 1 for an employment of 1 till 9 years, respectively, and a value of 0 otherwise. The final dummy variable, ‘dummy_10+y’, takes a value of 1 if the borrower is employed for 10 or more years, and a value of 0 otherwise. If all dummy variables have a value of 0, the borrower is unemployed.

3.1.2.3 Home Ownership The variable ‘home ownership’ is a qualitative, categorical variable, that takes 5 different values in the data set, being ‘OWN’, ‘MORTGAGE’, ‘RENT’, ‘NONE’, and ‘OTHER’. The first three values speak for themselves in terms of meaning, but the values ‘NONE’ and ‘OTHER’ are not clearly defined. When analysing the observations, we can determine that out of the 175251 observations, 39 have a value ‘NONE’, and 175 have a value ‘OTHER’. For interpretation purposes, we therefore have decided to omit these observations from the data set.

We again have created dummy variables to transform this qualitative, categorical variable into a usable form in our model. Two new variables are introduced, ‘dummy home mortgage’ and

12

‘dummy home rent’, taking a value of 1 in the borrower has a mortgage on his home or rents his home, respectively, and taking a value of 0 otherwise. In the case where both these dummies take a value of 0, the borrower is the owner of his home.

3.1.2.4 Annual Income The variable ‘annual income’ is a numerical variable representing the annual income (in dollar) of the borrower at the time of initiating the loan. No transformation is required to use this variable in our model.

3.1.2.5 Debt-to-Income Ratio The variable ‘debt-to-income ratio’ represents, in the words of Lending Club (2017), “a ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, excluding mortgage and the requested LC loan, divided by the borrower’s self-reported monthly income.” This is a numerical variable, defined with an accuracy of two decimals, and can therefore be integrated into our model without transformation.

3.1.2.6 Delinquency 2 years The variable ‘delinquency 2 years’ is a numerical variable that represents the amount of delinquencies reported in the credit file of the borrower for the past 2 years. We define a delinquency as a payment that is more than 30 days past-due. This numerical variable can be integrated into our model in this form.

3.1.2.7 Earliest Credit Line The variable ‘earliest credit line’ is a numerical variable in the form of a date that represents the month and year in which the borrower has opened his first credit line. Because of the fact that Stata, the statistical software package used to estimate our model, is capable of correctly interpreting and using a date variable, no transformation is needed to integrate this variable into our model.

3.1.2.8 Inquiries last 6 months The variable ‘inquiries last 6 months’ represents in numerical form the amount of hard inquiries on the credit report of the borrower during the last 6 months. A hard inquiry is defined as the situation where a financial institution checks the credit report when it has to make a lending decision, as a result of a loan application by the borrower (Irby, 2016). This numerical variable can be integrated into our model without a transformation.

3.1.2.9 Months since last delinquency The variable ‘months since last delinquency’ represents the number of months since the borrower had a delinquency for the last time, as reported by his credit history file. A value of 0 means there is no recorded delinquency in the credit file of the borrower. To capture the effect of having no recorded delinquencies, we introduce an additional dummy variable, labelled

13

‘dummy delinquencies’, which has a value of 1 if there are recorded delinquencies in the credit file of the borrower, and a value of 0 otherwise.

3.1.2.10 Months since last record The variable ‘months since last record’ reports the number of months since the last time a public record was registered in the credit history file of the borrower. A credit report usually can contain three types of public records, namely (1) bankruptcy filings, (2) tax liens, and (3) civil judgement (Irby, 2016). Similarly to the previously described variable, a value of 0 means there are no public records in the credit report of the borrower. We again create an additional dummy variable, ‘dummy public records’, taking a value of 1 if there are public records in the credit report, and a value of 0 otherwise.

3.1.2.11 Open Accounts The numerical variable ‘open accounts’ represents the number of currently open credit lines in the credit file of the borrower. This variable can be integrated into the model without a transformation.

3.1.2.12 Public Records The variable ‘public records’ is a numerical variable that represents the total amount of derogatory public records in the credit file of the borrower. This variable needs no transformation to be integrated into our model.

3.1.2.13 Revolving Balance The variable ‘revolving balance’ is a numerical variable that represents the total credit revolving balance (in US dollar) over the lifetime of the borrower, as recorded by his credit history. Revolving balance, or revolving credit, is the amount of credit that goes unpaid at the end of a billing cycle. This numerical variable can be integrated into our model without a transformation.

3.1.2.14 Revolving Utilization The numerical variable ‘revolving utilization’ represents the utilization rate of the total available credit of the borrower. In other words, this variable is the ratio between the average monthly credit use to the total available monthly credit, given in a percentage. This variable can be integrated into our model in this form.

3.1.2.15 Total Accounts The variable ‘total accounts’ is a numerical variable that represents the total number of credit lines that are now available to the borrower, or have been available to the borrower in the past, as currently stated in the credit file. This numerical variable requires no transformations to be integrated into our model.

14

3.1.2.16 Initial listing status Finally, the qualitative, categorical variable ‘initial listing status’ represents the listing status of the loan at the time of approving and listing the loan. The variable can take two values, ‘f’ and ‘w’, where ‘f’ represents a listing status ‘fractional’, and ‘w’ a listing status ‘whole’. A fractional loan can be funded by multiple investors on the platform whereas a loan with a listing status ‘whole’ can only be fully funded by one investor. To use the information of this variable in our model, we introduce a dummy variable, ‘listing status’, which has a value of 1 if the initial listing status of the loan was ‘fractional’, and a 0 in the case where this status was ‘whole’.

3.2 Descriptive statistics and correlation matrix In Table 2 we can find for each numerical variable described above some descriptive statistics, namely the mean, standard deviation, minimum and maximum value.

VARIABLES N Mean Std Dev Min Max

Loan Amount 175,037 11,862 7,202 500 35,000 Employment Length 175,037 5.490 3.644 0 10 Annual Income 175,037 69,423 55,528 1,896 7,141,778 Debt-to-Income Ratio 175,037 16.06 7.604 0 34.99 Delinquencies last 2 years 175,037 0.220 0.675 0 29 Inquiries last 6 months 175,037 0.836 1.147 0 33 Months since last delinquency 175,037 14.66 22.29 0 152 Months since last record 175,037 7.650 25.72 0 129 Open Accounts 175,037 10.53 4.601 1 62 Public Records 175,037 0.101 0.397 0 54 Revolving Balance 175,037 15,012 20,060 0 2,568,995 Revolving Utilization 175,037 0.558 0.245 0 1.404 Total Accounts 175,037 23.45 11.15 1 105

Table 2: Descriptive statistics of numerical variables

Source: Stata output

Table 3 represents the correlation matrix for the numerical variables. Variables with a high correlation can cause some estimation problems. This will be addressed later in this paper.

15

Months Employ- Debt-to- Delinquen- Inquiries Months Revolving Loan Annual since last Open Public Revolving Total Correlation ment Income cies last 2 last 6 since last Utili- Amount Income delin- Accounts Records Balance Accounts Length Ratio years months record zation quency Loan Amount 1.00000 Employment Length 0.12249 1.00000 Annual Income 0.34618 0.10778 1.00000 Debt-to-Income Ratio 0.03834 0.04496 -0.17127 1.00000 Delinquencies last 2 years 0.00755 0.03669 0.05873 0.00025 1.00000 Inquiries last 6 months -0.02070 -0.01940 0.06121 -0.00493 0.02157 1.00000 Months since last delinquency -0.01638 0.04342 0.02793 0.00405 -0.02960 0.02815 1.00000 Months since last record -0.06976 0.03833 -0.04209 -0.02760 -0.02526 0.00572 0.01467 1.00000 Open Accounts 0.20299 0.07316 0.16242 0.31487 0.06241 0.10212 0.04453 -0.03359 1.00000 Public Records -0.05644 0.02696 -0.01973 -0.03260 -0.01913 0.01261 0.03682 0.73076 -0.02249 1.00000 Revolving Balance 0.30121 0.09884 0.32538 0.14306 -0.02174 0.00958 -0.04793 -0.08122 0.22379 -0.06918 1.00000 Revolving Utilization 0.07954 0.05497 0.01822 0.24112 -0.01233 -0.08887 0.02256 -0.01099 -0.09715 -0.02255 0.18809 1.00000 Total Accounts 0.23344 0.14251 0.23957 0.23805 0.13346 0.12422 0.13280 -0.03366 0.67566 -0.00232 0.22139 -0.07367 1.00000

Table 3: Correlation matrix of numerical variables

Source: Stata output

16

4 Econometrical Methodology 4.1 Model selection To use our available data and estimate a model relating the probability of default of the loan to the borrower characteristics, we need to define the model specification and functional form that best fits this goal and our data. According to Bolton (2009), the first step in this process is to analyse the dependent variable. In this case, the dependent variable is the loan status at the end of maturity. This variable can take two values, ‘Fully Paid’ or ‘Charged Off’, and is therefore by definition a dichotomous or binary dependent variable (Wooldridge, 2002). According to Wooldridge (2002), the most simple model to estimate and use in this situation is the linear probability model (LPM), which is basically a multiple linear regression model where the dependent variable is a binary variable. The model specification is defined by equation 4.1.

푃(푦 = 1|푥) = 훽0 + 훽1푥1 + ⋯ + 훽푘푥푘 (4.1)

In this model, the regression coefficient 훽푗 measures the change in the probability of the occurrence of the event depicted by the dependent variable, in our case a loan default, for a change in the predictive variable 푥푗 of 1 unit, ceteris paribus. The results of this regression can be found in Appendix 2.

Although this model seems to fit the requirements of our case, there are some limitations that have to be taken into account. First of all, in this model, the fitted probabilities, or the probabilities that are a result of filling in variable values based on the observations, can be greater than 1 and less than 0. Next to this, the partial effect of the predictor variables is constant (Wooldridge, 2002). Finally, the error terms in the regression usually present themselves with non-normality and heteroscedasticity, making it difficult to perform truthful hypothesis tests based on the t-statistics the regression generates (Verbeek, 2012). These three disadvantages of the model motivate us to explore other options.

Another binary choice model similar to the LPM is the logit model, based on the idea of applying a transformation G on the linear relation defined by the LPM. This gives us a general form as depicted by equation 4.2.

푃(푦 = 1|푥) = 퐺(훽0 + 훽1푥1 + ⋯ + 훽푘푥푘) (4.2)

In a logit model, this transformation G is the logistic transformation, as defined by equation 4.3. This generates a function ranging between 0 and 1 for all real numbers 푧 (Wooldridge, 2002).

푒푧 퐺(푧) = (4.3) 1 + 푒푧

17

If we now define 휋(푥) = 푃(푦 = 1|푥), and 푧 = (훽0 + 훽1푥1 + ⋯ + 훽푘푥푘), then our model becomes:

훽 + 훽 푥 +⋯+ 훽 푥 푒 0 1 1 푘 푘 (4.4) 휋(푥) = 1 + 푒훽0+ 훽1푥1+⋯+ 훽푘푥푘

Rearranging this to make the right hand side linear gives us equation 4.5, which is the logit regression model we will use.

휋(푥) ln ( ) = 훽 + 훽 푥 + ⋯ + 훽 푥 (4.5) 1 − 휋(푥) 0 1 1 푘 푘

To fit this model, we make use of the Maximum Likelihood Estimation, which is a method to estimate the regression coefficients of the model by determining the combination of coefficients or parameters that maximizes the likelihood that these estimated parameters fit the actual population parameters, based on the observations in the sample (Wooldridge, 2002). This method defines a likelihood function that needs to be optimized iteratively, in order to obtain the estimated parameters. In practice, we usually work with the log-likelihood function, as it is more convenient to use (Verbeek, 2012). This log-likelihood function is ′ defined by equation 4.6, where 퐹(푥 푖훽) = 푃(푦푖 = 1|푥푖; 훽).

푁 푁 ′ ′ (4.6) log 퐿(훽) = ∑ 푦푖log (퐹(푥 푖훽)) + ∑(1 − 푦푖)log (1 − 퐹(푥 푖훽)) 푖=1 푖=1

Maximizing this function gives us the estimated parameters of the model.

This model and the corresponding estimation method adequately fit the requirements of our case. Although other models, like the tobit or probit models, would qualify as well, we decide to use the logit model, due to the fact that this model is commonly accepted as the standard model in credit scoring and default prediction.

4.2 Model characteristics We now use a statistical software package, namely Stata, to estimate this model based on the dataset we composed. Stata estimates the logit model by executing the MLE method based on the log-likelihood function, and reports the estimated parameters, as well as some information with respect to statistical tests. The results can be found in Table 4. The Stata command can be found in Appendix 3.1.

Before interpreting the results, it’s important to test the characteristics of the model and its variables.

18

Std Error Odds Std Error VARIABLES Coeff Coeff z p-value Ratio OR Loan Amount 0.0000 0.0000 9.0224 0.000 1.0000 0.0000 Employment Length < 1 year -0.4254 0.0403 -10.5576 0.000 0.6535 0.0263 Employment Length 1 year -0.4722 0.0422 -11.1850 0.000 0.6236 0.0263 Employment Length 2 years -0.4478 0.0397 -11.2851 0.000 0.6390 0.0254 Employment Length 3 years -0.4258 0.0407 -10.4689 0.000 0.6533 0.0266 Employment Length 4 years -0.4471 0.0429 -10.4181 0.000 0.6395 0.0274 Employment Length 5 years -0.4390 0.0412 -10.6594 0.000 0.6447 0.0266 Employment Length 6 years -0.3596 0.0426 -8.4440 0.000 0.6980 0.0297 Employment Length 7 years -0.3590 0.0438 -8.2029 0.000 0.6984 0.0306 Employment Length 8 years -0.3817 0.0465 -8.2153 0.000 0.6827 0.0317 Employment Length 9 years -0.4058 0.0501 -8.0948 0.000 0.6665 0.0334 Employment Length 10+ years -0.3955 0.0342 -11.5521 0.000 0.6734 0.0231 Dummy Home Mortgage -0.1521 0.0274 -5.5406 0.000 0.8589 0.0236 Dummy Home Rent 0.1052 0.0268 3.9247 0.000 1.1109 0.0298 Annual Income 0.0000 0.0000 -21.0859 0.000 1.0000 0.0000 Debt-to-Income Ratio 0.0128 0.0011 11.3438 0.000 1.0129 0.0011 Delinquencies 2 years 0.0557 0.0135 4.1443 0.000 1.0573 0.0142 Earliest Credit Line 0.0000 0.0000 6.9047 0.000 1.0000 0.0000 Inquiries last 6 months 0.2111 0.0059 36.0733 0.000 1.2350 0.0072 Months since last delinquency -0.0005 0.0006 -0.7065 0.480 0.9995 0.0006 Dummy Delinquencies 0.0999 0.0316 3.1591 0.002 1.1051 0.0350 Months since last record 0.0003 0.0010 0.2806 0.779 1.0003 0.0010 Dummy Public Records 0.0844 0.1043 0.8085 0.419 1.0880 0.1135 Open Accounts 0.0248 0.0023 10.8474 0.000 1.0251 0.0023 Public Records 0.0049 0.0326 0.1493 0.881 1.0049 0.0328 Revolving Balance 0.0000 0.0000 -0.5469 0.584 1.0000 0.0000 Revolving Utilization 0.8345 0.0339 24.6008 0.000 2.3037 0.0781 Total Accounts -0.0115 0.0010 -11.0940 0.000 0.9886 0.0010 Listing Status 0.0768 0.0197 3.9060 0.000 1.0798 0.0212 Constant -2.5773 0.0682 -37.7908 0.000 0.0760 0.0052

Table 4: Regression results initial model - coefficients and odds ratios

Source: Stata output 4.2.1 Goodness of Fit To estimate the goodness of fit of our model, or how well the model fits the observed data (Verbeek, 2012), we analyse the pseudo R-squared statistic of the model, which is a statistic ranging from 0 to 1. There are several ways to calculate the pseudo R-squared of a logit model, but there is no agreement on which one of them is the preferred one to use.

19

The pseudo R-squared of our model, reported by Stata, is 0.0301. Although a single pseudo R- squared statistic of a logit model can’t be accurately interpreted on its own, this value clearly indicates that our model performs poorly in its ability to fit the data. This could be the result of the possibility that the model is incomplete, and that we require other variables to more accurately predict the probability of default. Unfortunately, we are restricted by our data set, and therefore, no other variables are available.

However, in logit models, the goodness of fit of the model is relatively unimportant compared to the statistical and economic significance of the model and its predictor variables (Wooldridge, 2002). We therefore leave these findings out of account in the remainder of this analysis, and focus on the estimated regression coefficients and their interpretation.

4.2.2 Model significance Assessing the significance of a logit model essentially comes down to comparing the full model to the model where the only predictor variable is a constant, and determining whether the log- likelihood of the full model is statistically significantly greater than the log-likelihood of the restricted model. According to the likelihood ratio test, as described by Wooldridge (2002), a likelihood ratio (LR) test statistic is calculated, as illustrated by equation 4.7.

퐿푅 = 2(ℒ푓푢푙푙 − ℒ푟푒푠푡푟푖푐푡푒푑) (4.7)

This test statistic has a chi-square distribution of which the number of degrees of freedom is equal to the difference between the number of predictor variables in the full model and the number of predictor variables in the restricted model.

Calculating the LR test statistic of our model gives us a value of 3990.604. The critical chi- square value with a significance level of 1% and 29 degrees of freedom is approximately 49.59. The test statistic exceeds this value, and we can therefore conclude that the model is statistically significant on a significance level of 1%.

4.2.3 Significance of variables Assessing the significance of the variables in our model can be done by testing whether the regression coefficient corresponding to each predictor variable is statistically significantly different from 0. The easiest way to do this is by looking at the p-values of the coefficients. A p-value represents the strongest significance level on which the null hypothesis of the coefficient being statistically not significantly different from 0 can be rejected (Wooldridge, 2002). In other words, it represents the strongest significance level on which the coefficient is significantly different from 0.

Based on the model output in Table 4, we can observe that most of the coefficients are statistically significantly different from 0, with p-values close to or equal to zero. There are 5

20

coefficients, however, of which the p-value indicates that they are not statistically significantly different from 0. These coefficients are the ones corresponding to the variables ‘months since last delinquency’, ‘months since last record’, ‘dummy public records’, ‘public records’ and ‘revolving balance’. The economic implications of these findings will be discussed in a later section, where the results of this research are analysed.

4.2.4 Coefficient interpretation The interpretation of the regression coefficients of a logistic regression is rather different from that of an OLS regression. As can be derived from the model depicted by equation 4.5, a regression coefficient represents the increase in the logarithmic odds of the occurrence of the event coded in the dependent variable, in our case a loan default, for an increase of the predictor variable of 1 unit, all other variables remaining constant (Verbeek, 2012). This relation is rather difficult to interpret, and we therefore generate odds ratios for each predictor variable. To do so, we simply raise the mathematical constant e to the power of the coefficient corresponding to each variable, as illustrated by equation 4.8. This can be done in Stata by using the ‘logistic’ command, as illustrated in Appendix 3.2. The results have been added to Table 4.

훽푖 푂푅푖 = 푒 (4.8)

In our model, the odds ratio corresponding to a certain predictor variable is the ratio of the odds that a loan will default to the odds that it will not, for a one-unit increase in the value of the predictor variable. In other words, it represents the multiplicator that defines the change in the odds of a loan default for a one-unit increase in the value of the predictor variable. An odds ratio typically ranges from zero to positive infinity. A value lower than 1 represents a decrease in the odds of the probability of default, and therefore corresponds to a negative relation between the dependent variable and the predictor variable. An odds ratio of exactly 1 implies no relation between the dependent variable and the predictor variable. Note that an odds ratio of 1 corresponds to a regression coefficient of 0, or by definition a statistically insignificant regression coefficient An odds ratio greater than 1 corresponds to a positive relation between the dependent variable and the predictor variable.

The odds ratios corresponding to each predictor variable, and their implications in our model, will be discussed in a following section.

21

5 Specification Adjustments Before we can correctly interpret the results of our model, some adjustments need to be made to our initial specification. These adjustments, the reason behind them, and their implications for our model are discussed in this section.

5.1 Employment length As previously mentioned, the variable ‘employment length’ has been recoded into 11 dummy variables, primarily to test for multiple relations between this variable and the probability of default. The regression coefficients of these dummy variables, including a linear trendline, are displayed in Figure 3.

At first sight, there seems to be no clear positive or negative relation between the increase or decrease of employment length of the borrower and his probability of default. The trendline doesn’t give a definitive answer as well, showing only a marginally positive2 relation. This initial finding corresponds with the findings in the study conducted by Serrano-Cinca, Gutiérrez-Nieto & López-Palacios (2015), where no significant relation was found as well. We can therefore conclude that employment length doesn’t have a significant impact on the probability of default of the borrower.

Regression coefficients Employment Length

Employment Length 0 <1y 1y 2y 3y 4y 5y 6y 7y 8y 9y 10+y -0.05 -0.1 -0.15 -0.2 -0.25

Coefficient -0.3 -0.35 -0.4 -0.45 -0.5

Figure 3: Regression coefficients employment length, including linear trendline

Source: Stata output, own calculations

2 This positive relation is counter-intuitive, because it indicates that the probability of default is higher when the borrower is employed for a longer time.

22

However, due to the fact that every regression coefficient is statistically significantly different from zero, the employment status of the borrower does seem to have an impact. The data points towards the possibility that a borrower with a job has a significantly lower probability of default than an unemployed borrower. We can test this by replacing the 11 dummy variables in our model with a single new dummy variable, ‘employment’, representing whether or not the borrower is employed. A value of 1 indicates employment, a value of 0 represents unemployment.

Comparing the new model with the initial model by the use of the LR test will teach us if there is a statistical difference between these two models. No statistical difference points towards no loss of information and predictive power of the model, and therefore a valid replacement of variables.

The LR test statistic, calculated according to equation 4.7, equals 17.82. The critical chi-square value with a significance level of 1% and 10 degrees of freedom amounts to approximately 23.21. The test statistic doesn’t exceed the critical value, which means the null hypothesis of no statistical difference between the models can’t be rejected. Our replacement of variables is therefore valid, and the adjusted specification can be used. The results of this regression can be found in Table 6, in the column of Model 1.

5.2 Open Accounts & Total Accounts As can be seen in Table 3, we found a relatively high correlation (0.67566) between the variables ‘open accounts’ and ‘total accounts’. This high correlation could result in a biased estimation of the corresponding regression coefficients. We therefore execute the regression twice, where each of these two variables will be integrated individually. This has given rise to the results that can be found in Table 6, labelled as Model 2 and Model 3.

Based on these results, we can conclude the following. Both variables remain statistically significant, and their relation with the dependent variable remains the same as in the initial model. Only the actual value of the regression coefficients slightly differs from those of the initial specification, as can be expected. We therefore decide to keep both variables in the model.

5.3 Public records & Months since last record Table 3 shows us that the variables ‘public records’ and ‘months since last record’ are highly correlated as well, with a correlation of 0.73076. We therefore again execute two regressions, each containing one of the highly correlated variables. The results of these regressions can be found in Table 6, in the column of Model 4 and Model 5.

23

These results show us that the variable ‘months since last record’ and its corresponding dummy variable remain statistically not significant, whereas the variable ‘public records’ becomes significant when integrated separately into our model. We therefore decide to only keep the significant variable ‘public records’ in our model.

The regression coefficients of each of the models used in this section are summarised in Table 6. Model 1 is the full model, where the dummy variables for employment length have been replaced with a single dummy variable representing the employment status of the borrower. This model serves as the basis for the following adaptions. In Model 2, the variable ‘total accounts’ has been left out, and in Model 3, the same has been done for the variable ‘open accounts’. In Model 4, the variables ‘months since last record’ and ‘dummy public records’ have been omitted, and in Model 5, this is the case for the variable ‘public records’.

In conclusion, the final model that will serve as the base for our analysis is the model as presented in Table 5, where the dummy variable ‘employment’ has been introduced as a replacement for the dummies of the variable ‘employment length’. Next to this, the variables ‘months since last record’ and ‘dummy public records’ have been omitted due to their high correlation with ‘public records’ and their statistical insignificance, and the variables ‘months since last delinquency’ and ‘revolving balance’ have been omitted due to their statistical insignificance.

Odds Std Dev VARIABLES Coeff Std Dev z p-value Ratio OR Loan Amount 0.0000110 1.21e-6 9.0713 0.000 1.0000110 1.21e-6 Employment -0.4142 0.0323 -12.8412 0.000 0.6609 0.0213 Dummy Home Mortgage -0.1475 0.0274 -5.3791 0.000 0.8629 0.0237 Dummy Home Rent 0.1011 0.0267 3.7811 0.000 1.1064 0.0296 Annual Income -0.0000062 2.8e-7 -22.1264 0.000 0.9999938 2.8e-7 Debt-to-Income Ratio 0.0128 0.0011 11.4887 0.000 1.0129 0.0011 Delinquencies 2 years 0.0602 0.0111 5.4421 0.000 1.0620 0.0117 Earliest Credit Line 0.0000216 3.22e-6 6.7201 0.000 1.0000216 3.22e-6 Inquiries last 6 months 0.2104 0.0058 36.0289 0.000 1.2342 0.0072 Dummy Delinquencies 0.0827 0.0164 5.0349 0.000 1.0862 0.0178 Open accounts 0.0243 0.0023 10.7526 0.000 1.0246 0.0023 Public records 0.0622 0.0170 3.6682 0.000 1.0642 0.0180 Revolving utilization 0.8330 0.0332 25.0895 0.000 2.3002 0.0764 Total Accounts -0.0114 0.0010 -11.0754 0.000 0.9887 0.0010 Listing Status 0.0737 0.0196 3.7541 0.000 1.0765 0.0211 Constant -2.5489 0.0675 -37.7650 0.000 0.0782 0.0053

Table 5: Regression results Final Model

Source: Stata output

24

(1) (2) (3) (4) (5) VARIABLES Model 1 Model 2 Model 3 Model 4 Model 5

Loan Amount 1.12e-05*** 1.07e-05*** 1.15e-05*** 1.11e-05*** 1.12e-05*** (1.22e-06) (1.22e-06) (1.22e-06) (1.22e-06) (1.22e-06) Employment -0.411*** -0.411*** -0.398*** -0.414*** -0.411*** (0.0323) (0.0323) (0.0322) (0.0323) (0.0323) Dummy Home Mortgage -0.149*** -0.168*** -0.151*** -0.147*** -0.149*** (0.0274) (0.0274) (0.0274) (0.0274) (0.0274) Dummy Home Rent 0.100*** 0.105*** 0.104*** 0.101*** 0.100*** (0.0268) (0.0267) (0.0267) (0.0268) (0.0268) Annual Income -6.13e-06*** -6.64e-06*** -5.99e-06*** -6.14e-06*** -6.13e-06*** (2.91e-07) (2.90e-07) (2.89e-07) (2.91e-07) (2.91e-07) Debt-to-Income Ratio 0.0130*** 0.0113*** 0.0156*** 0.0129*** 0.0130*** (0.00113) (0.00112) (0.00110) (0.00113) (0.00113) Delinquencies 2 years 0.0553*** 0.0509*** 0.0561*** 0.0551*** 0.0553*** (0.0134) (0.0134) (0.0134) (0.0134) (0.0134) Earliest Credit Line 2.18e-05*** 2.98e-05*** 2.54e-05*** 2.13e-05*** 2.18e-05*** (3.27e-06) (3.22e-06) (3.26e-06) (3.26e-06) (3.27e-06) Inquiries last 6 months 0.210*** 0.206*** 0.211*** 0.210*** 0.210*** (0.00584) (0.00582) (0.00583) (0.00584) (0.00584) Months since last delinquency -0.000431 -0.000379 -0.000336 -0.000406 -0.000431 (0.000640) (0.000641) (0.000640) (0.000640) (0.000640) Dummy Delinquencies 0.100*** 0.0702** 0.0876*** 0.0983*** 0.100*** (0.0316) (0.0315) (0.0316) (0.0316) (0.0316) Months since last record 0.000356 0.00142 0.000881 0.000317 (0.000975) (0.000967) (0.000972) (0.000945) Dummy public records 0.0816 -0.0245 0.0289 0.0909 (0.104) (0.101) (0.102) (0.0867) Open accounts 0.0245*** 0.00955*** 0.0245*** 0.0245*** (0.00228) (0.00184) (0.00228) (0.00228) Public records 0.00527 0.0142 0.0101 0.0617*** (0.0324) (0.0292) (0.0308) (0.0170) Revolving balance -3.15e-07 -2.77e-07 2.98e-07 -4.16e-07 -3.16e-07 (5.87e-07) (5.92e-07) (5.50e-07) (5.91e-07) (5.87e-07) Revolving utilization 0.838*** 0.854*** 0.773*** 0.838*** 0.838*** (0.0339) (0.0339) (0.0331) (0.0339) (0.0339) Total Accounts -0.0114*** -0.00494*** -0.0114*** -0.0114*** (0.00103) (0.000827) (0.00103) (0.00103) Listing Status 0.0753*** 0.0769*** 0.0719*** 0.0739*** 0.0753*** (0.0196) (0.0196) (0.0196) (0.0196) (0.0196) Constant -2.568*** -2.708*** -2.545*** -2.548*** -2.568*** (0.0678) (0.0670) (0.0679) (0.0675) (0.0678)

Observations 175,037 175,037 175,037 175,037 175,037 Standard errors in parentheses *** p<0.01, ** p<0.05, * p<0.1

Table 6: Regression coefficients for different specifications

Source: Stata output

25

6 Empirical Results This section analyses the results of the regression by interpreting the regression coefficients and odds ratios corresponding to the variables incorporated into our model. These results can be found in Table 5. We compare these findings with those of similar studies and the current literature on credit scoring in P2P-Lending, and consequently draw conclusions.

As previously described, the model defines a relation between the probability of default of a loan issued by Lending Club on the one hand, and a set of predictor variables gathered by Lending Club during the loan application process on the other hand. Therefore, when we talk about the probability of default, we are referring to the probability of the borrower defaulting on his loan at Lending Club.

6.1 Non-significant variables We first take a look at the variables for which we previously found that their regression coefficients are statistically not significantly different from zero. As mentioned above, these variables are ‘months since last delinquency’, ‘months since last record’, ‘dummy months since last record’ and ‘revolving balance’. Note that the variable ‘public records’ has become statistically significant after the removal of the highly correlated variable ‘months since last record’ from our model.

Delinquencies The coefficient of the variable ‘months since last delinquency’ is, according to our model, statistically not significantly different from zero. This implies that how long ago a borrower had his last delinquency doesn’t impact his probability of defaulting on his loan at Lending Club. If we analyse ‘dummy months since last delinquency’, the corresponding dummy variable we created to capture the effect of the difference between ever having had a delinquency or not, we can conclude the following. The dummy variable is statistically significantly different from zero, which implies that whether or not the borrower ever had a delinquency, does impact his probability of default. The odds ratio of this dummy variable amounts to 1.0862. This points towards a positive relation between the borrower having a delinquency recorded on his credit file and the probability of default, and indicates that the odds of default are approximately 8.62% higher if the borrower ever had a delinquency, compared to never having had a delinquency.

The variable ‘delinquencies 2 years’, representing the amount of delinquencies in the past two years, has a significant coefficient as well. According to the odds ratio corresponding to this variable, which is equal to 1.0620, each additional delinquency in the past two years increases the odds of a default on the loan of the borrower with approximately 6.20%.

26

These findings are in line with what has been found in previous studies. As can be expected, borrowers who have had delinquencies in the past, are more likely to miss payments or default on their loan in the future (Nefer, 2010). The Fair Isaac Corporation, developer of the FICO- score, states that historical payment behaviour determines 35% of a borrower’s credit score (Fair Isaac Corporation, 2017). Next to this, Serrano-Cinca, Gutiérrez-Nieto and López- Palacios (2015) also found a positive relation between the amount of delinquencies and the probability of default, and no statistical relation between the number of months since the borrower’s last delinquency and his probability of default.

Public records The next set of variables we will discuss are the variables relating to public records in the credit file of the borrower. These variables are ‘public records’, ‘months since last record’ and ‘dummy records’. As previously mentioned, the variables ‘months since last record’ and ‘dummy records’ appear to have a regression coefficient that is statistically not significantly different from zero. This implies that the amount of months since the last time a public record was recorded in the credit file of the borrower has no impact on his probability of default.

The variable ‘public records’ however, does have a regression coefficient that is statistically significantly different from zero. The amount of public records in the credit file of the borrower seems to have an impact on the probability of default of the borrower, and, as can be expected, the relation is positive. With a regression coefficient of approximately 0.0622 and a corresponding odds ratio of approximately 1.0642, we can state that, according to our model, each additional public record in the credit file of the borrower increases the odds of defaulting by approximately 6.42%.

This is more or less in line with what has been found in similar studies. According to Credit Karma (2012), public records on the credit report of a borrower have a significant negative impact on his credit score, and subsequently his probability of default. The study conducted by Serrano-Cinca et al. (2015) shows that the number of public records on the credit file of the borrower is positively correlated with his probability of default.

This positive relation can easily be explained from an economic point of view. Public records are the result of serious financial delinquencies, such as bankruptcies or tax liens. In case of a tax lien, for example, the borrower owes a substantial amount of tax money to the state, who has a legal claim on the assets of the noncompliant taxpayer. The consequences of these delinquencies can therefore have a significant impact on the financial status of the borrower. This places the borrower in a vulnerable position with respect to future financial obligations, and he therefore has an increased change of not being able to fulfil these obligations in the future.

27

Revolving balance The final variable of which the regression coefficient is statistically not significantly different from zero is the variable ‘revolving balance’. This implies that the total credit revolving balance over the lifetime of the borrower has no impact on his probability of defaulting on future loans.

This finding is in line with what has been found in the study conducted by Emekter, Tu, Jirasakuldechc, & Lu (2015). In most of the studies, however, the focus lies on the revolving line utilization, or the average amount of credit used relative to the total available credit. This variable is integrated into our model as well, and will be analysed in the following section.

6.2 Significant variables We now further analyse the variables for which the regression coefficient is statistically significantly different from zero, and consequently seem to have an impact on the probability of default of the borrower.

Loan amount The regression coefficient on the variable loan amount is in our model equal to 0.000011. To make the interpretation of this coefficient and its corresponding odds ratio more meaningful, we multiply it by 100, giving us a coefficient of approximately 0.0011. The corresponding odds ratio is found by raising e to the power of this coefficient, and results in an odd ratio of approximately 1.001101. Interpreting this odds ratio explains us that according to our model, the odds of defaulting on the loan increase by approximately 0.11% for every increase in the loan amount of 100 units (or 100 dollar).

At first sight, this seems logical. The higher the amount the borrower wants to borrow, the higher his monthly installment, and, ceteris paribus, the bigger the chance the borrower won’t be able to fulfil these payment obligations. However, the study conducted by Serrano-Cinca et al. (2015) seems to find no relation between the loan amount and the probability of default. Similarly, a study by Kočenda and Vojtek (2009), where three models were tested, found that in two of their models, the loan amount was negatively correlated with the probability of default, whereas in the third model, a positive relation was found. We can therefore conclude that, based on these findings, there is generally no clear relation between the loan amount and the probability of default.

Employment As previously stated, the length of employment doesn’t seem to have an impact on the probability of default of the borrower. However, according to our final model, the employment status does have a statistically significant impact on the probability of default. With an odds ratio of approximately 0.6609, we can state that for a borrower who has a job, the odds of

28

defaulting are, ceteris paribus, approximately 23.91% lower compared to a borrower without a job.

From an economic point of view, this finding makes perfect sense. Being employed generally means having a steady income, which creates certainty for the future. This certainty is very important for investors, as it indicates that the borrower will remain creditworthy during the maturity of the loan, and will consequently continue to be able to fulfil his financial obligations. The length of employment plays a minor role in this certainty. One could state that the longer the borrower is employed, the more certain he is of keeping his job. This statement, however, isn’t supported by the data, and we therefore conclude that the employment status of the borrower plays by far the most important role compared to the employment length.

Home ownership For the categorical variable ‘home ownership’, the dummy variables ‘dummy house mortgage’ and ‘dummy house rent’ have been created. The first dummy captures the difference in probability of default between owning a house and having a mortgage on your house, and the second dummy does this for the difference between owning a house and renting one. Analysing the odds ratios of these dummies, which are 0.8629 and 1.1064 respectively, allows us to conclude the following. The odds of defaulting decrease by approximately 13.71% when the borrower has a mortgage on his house compared to owning a house, ceteris paribus. In the other case, for a borrower renting his house compared to a borrower owning a house, the odds of defaulting are approximately 10.64% higher, ceteris paribus. All of this indicates that borrowers renting a house are more likely to default compared to borrowers owning a house, whereas borrowers having a mortgage on their home are less likely to default. These findings are in line with those of Serrano-Cinca et al. (2015).

Annual income Concerning the variable ‘annual income’, we intuitively expect a negative relation between the probability of default and the amount of annual income of the borrower. Indeed, the regression coefficient is negative and the odds ratio is lower than 1. Multiplying the regression coefficient by 1000 and calculating the corresponding odds ratio, results in an odds ratio of approximately 0.9938. This indicates that for every additional 1000 dollar of annual income, the odds of defaulting on the loan decrease with approximately 0.62%. Other studies come to the same conclusion concerning this negative relation.

Debt-to-income ratio The next variable we analyse is the ‘debt-to-income ratio’ variable. This variable has a significant regression coefficient and an odds ratio of 1.0128. As can be expected, this implies a positive relation between the debt-to-income ratio of a borrower and his probability of

29

default. More specifically, the odds ratio indicates that for every increase in the debt-to-income ratio of one unit, the odds of defaulting on the loan increase with approximately 1.28%. This positive relation is also found in the studies conducted by Serrano-Cinca et al. (2015), Carmichael (2014), Ponela & Regner (2016) and Emekter et al. (2015). Intuitively, this relation makes sense as well. The more debt a borrower has relative to his income, the harder it is for him to fulfil all of his financial obligations, and the higher his probability of defaulting on these obligations. This statement is also supported by what can be seen in Figure 1 and Figure 2, where the main determinants of the FICO-score and VantageScore 3.0 are illustrated. Both scoring models allocate a substantial weight to the amount of debt of the borrower.

Earliest credit line The variable ‘earliest credit line’ represents the date on which the borrower has opened his first credit line. Each unit of this variable represents one day. For interpretation purposes, we therefore multiply the regression coefficient of 0.000022 with 365, and calculate the corresponding odds ratio. This odds ratio equals approximately 1.0079, indicating that the more recent a borrower opened his first credit line, the higher his probability of default. More precisely, according to this odds ratio, a borrower who has opened his first credit line one year later than another borrower, has, ceteris paribus, increased odds of defaulting of approximately 0.79%. This finding is in line with what has been found by Serrano-Cinca et al. (2015) Polena & Regner (2016) and Carmichael (2014).

Inquiries in the last 6 months The variable ‘inquiries in the last 6 months’, representing the amount of hard inquiries on the credit report of the borrower during the last 6 months, has a significant, positive regression coefficient, and a corresponding odds ratio of approximately 1.2342. This indicates that for each additional hard inquiry on the credit file of the borrower during the last 6 months, the odds of defaulting increase with approximately 23.42%. The study conducted by Serrano-Cinca et al. (2015) confirms this positive relation.

From an economic point of view, this could be explained as follows. A lot of recent inquiries indicates that the borrower has applied for a loan several times during the last six months. This could mean that he has either engaged in a lot of loan commitments, or that he has been rejected several times during a loan application. Both situation indicate an unhealthy financial situation. On the one hand, a lot of loan commitments result in a lot of payment obligations, and consequently a higher chance of not fulfilling these obligations. A lot of loan rejections on the other hand clearly indicate that there is little believe in the creditworthiness of the borrower. We can therefore conclude that from an economic point of view, a high amount of inquiries on your credit report corresponds to a higher probability of default.

30

Open accounts The number of open accounts in the credit file of the borrower has, according to our model, a significant impact on the probability of default as well. With an odds ratio of approximately 1.0246, we can state that for each additional open account on the credit file of the borrower, his odds of defaulting increase by approximately 2.46%. However, this finding is not supported by the similar studies. The study conducted by Serrano-Cinca et al. (2015) finds a significant negative relation, whereas Polena & Regner (2016) and Emekter et al. (2015) find no significant relation between the number of open accounts and the probability of default of the borrower. Reasons for these discrepancies could be the use of different data sets, or the possibility that previously found relations have changed due to learning effects in the financial market. Nevertheless, we conclude that we can’t make decisive conclusions on the relation between the number of open accounts in the credit file of the borrower and his probability of default.

Revolving utilization As previously mentioned, similar studies have shown that the variable ‘revolving utilization’ has a quite significant impact on the probability of default of a borrower. This statement is supported by studies conducted by Serrano-Cinca et al. (2015), Emekter et al. (2015) and Carmichael (2014). With an odds ratio of approximately 2.3, our model tells us that for every increase in the revolving utilization of the borrower of 1 unit (or 100 percentage points), the odds of defaulting increase by approximately 130%. Recalculating the odds ratio for an increase of 10 percentage points gives us an odds ratio of approximately 1.087, indicating that an increase in the revolving utilization of 10 percentage points results in an increase in the odds of defaulting of approximately 8.7%. If we again take a look at Figure 2, we can see that the amount of credit used relative to the available credit plays an important role in the calculation of the VantageScore 3.0. Indeed, borrowers who use a substantial amount of their available credit might have more problems repaying that credit, resulting in a higher probability of defaulting on these and other financial obligations.

Total accounts According to our model, the total number of accounts, as currently reported by the borrowers credit file, has a significant impact on his probability of default as well. However, as opposed to the variable ‘open accounts’, this variable has a negative relation with the probability of default. The odds ratio of 0.9887 indicates that for every additional account recorded in the credit file of the borrower, his odds of defaulting decrease by approximately 1.13%. Here as well, this statement is not in line with what other studies report. For example, Emekter et al. (2015) find no significant relation. These discrepancies could again be the result of the use of different data set or learning effects in the financial market, but we are forced to conclude that, based on this analysis, no clear relation can be determined.

31

Listing status The last variable discussed in this paper is the dummy variable ‘listing status’. As previously described, this variable takes a value of 0 for an initial listing status of ‘whole’, and a value of 1 for an initial listing status of ‘fractional’. The odds ratio corresponding to this dummy variable is 1.0765, which indicates that the odds of defaulting are, ceteris paribus, approximately 7.65% higher for a ‘fractional’ loan compared to a ‘whole’ loan.

The reason behind this result is difficult to determine, mainly due to the fact that at first sight, the listing status of the loan has nothing to do with the creditworthiness of the borrower. Other studies haven’t incorporated this variable in their research either. Therefore, it is likely that this finding is coincidental, and the listing status has in reality no real economic impact on the probability of default of the borrower, but merely a statistical correlation with it. Additional studies where this variable is included could confirm or deny this statement.

32

7 Conclusion The aim of this dissertation was to define the main determinants of loan default in the P2P- Lending market, by developing a statistical model that relates the probability of default of a borrower to several borrower characteristics gathered during the loan application process. For this analysis, we used a data set provided by Lending Club, the largest P2P-Lending platform in the US. Based on current literature on credit scoring, in combination with the available data, we defined several model specifications to correctly determine the significance and impact of each of the variables under consideration. This has led to a final model, that served as the base for the analysis of the results. Based on these results, we can conclude the following.

First of all, we concluded that delinquencies and public records registered in the credit file of the borrower raise his probability of default, and the more delinquencies or public records, the higher this probability of default. However, the time since the last registered delinquency or public record seems to have no impact of the default probability. Next to this, we found that the amount of revolving balance of the borrower has no real impact on the probability of default as well. The utilization rate of this revolving balance, however, does have a significant impact. The higher this utilization rate, the higher the probability of default of the borrower. With respect to employment, we found that the employment length has no significant impact on the probability of default, but the employment status does. A borrower with a job has a substantially lower probability of default compared to a borrower without a job. The annual income of the borrower plays a significant role as well. As can be expected, the higher the income, the lower the probability of default. Continuing with the solvency of the borrower, we can state the following. The ratio of current debt to total income has proven to be a powerful predictor of future loan default, with a high debt-to-income ratio corresponding to a high probability of default. The loan amount, however, has a more unclear relation with the default probability. Our study found a positive relation, but this is contradicted by other studies, where negative or insignificant relations are found. We therefore refrain from drawing decisive conclusions with respect to the loan amount. The home ownership has a significant impact as well. According to our analysis, a borrower who has a mortgage on his house has the lowest probability of default, followed by a borrower who is the owner of his home. A borrower who rents his house has the highest probability of defaulting on his loan. Finally, we found that the variables relating to the credit record of the borrower yield some valuable information as well. First of all, we can state that the more hard inquiries that have been made on the credit file of the borrower, the higher his probability of default is. Secondly,

33

we found that the longer ago a borrower has opened his first credit line, the lower his probability of default. The impact of the number of accounts in the credit file of the borrower is less clear. Based on our analysis, we found a positive relation for the number of open accounts, and a negative relation with the probability of default for the number of total accounts. This opposite relation in itself is rather counterintuitive, and similar studies contradict these findings as well. We therefore again decide to refrain from drawing conclusions with respect to the accounts registered in the credit file of the borrower.

8 Further Research The analysis in this paper, and the corresponding results, have been compared with the findings from several similar studies in order to draw meaningful conclusions. However, it needs to be noted that this study is insufficient in drawing a truthful image of the determinants of loan default in the P2P-Lending market. This is due to several shortcomings. First of all, this study is focused on data from Lending Club, who is only one of the major players in the P2P- Lending market. Next to this, we focused only on loans with a maturity of 36 months. These two points show that there is ample opportunity to take further steps in this field of research. A first step could be to conduct the same analysis with a data set containing the Lending Club loans with a maturity of 60 months, and comparing those results with the ones found in this paper. Next to this, similar studies could be conducted with data from other P2P-Lending platforms, again comparing both results.

34

References

Bajpai, P. (2015). The 7 Best Peer-To-Peer Lending Websites (LC). Investopedia.

Berger, S. C., & Gleisner, F. (2009). Emergence of Financial Intermediaries in. BuR - Business Research, 39-65.

Credit Karma. (2012, January 12). Public Records on Your Credit Report. Retrieved from Credit Karma: https://www.creditkarma.com/article/public-records-on-credit-report

Dujeux, F. (2017, February 15). Interview with Frédéric Dujeux, Co-Founder of Mozzeno. (Wiseclerk, Interviewer)

Emekter, R., Tu, Y., Jirasakuldechc, B., & Lu, M. (2015). Evaluating credit risk and loan. Applied Economics, 47(1), 54-70.

Fair Isaac Corporation. (2017). Learn About The FICO® Score and its Long History. Retrieved from Fico: http://www.fico.com/25years/

Fair Isaac Corporation. (2017). Why are my FICO® Scores different for the 3 credit bureaus? Retrieved from myFICO: http://www.myfico.com/credit-education/questions/why- are-my-credit-scores-different-for-3-credit-bureaus/

Finger, R. (2013, May 30). Banks Are Not Lending Like They Should, And With Good Reason. Retrieved from Forbes: http://www.forbes.com/sites/richardfinger/2013/05/30/banks-are-not-lending-like- they-should-and-with-good-reason/#348fd0fe44b1

Fintechnews Singapore. (2016, June 29). Asia’s Top 7 Peer-to-Peer Lending Platforms. Retrieved from Fintechnews: http://fintechnews.sg/3518/crowdfunding/asias-top-7- peer-peer-lending-platforms/

Fintechnews Switzerland. (2016, July 1). Europe’s Top 11 Peer-to-Peer Lending Platforms. Retrieved from Fintech News: http://fintechnews.ch/p2plending/europes-top-11- peer-to-peer-lending-platforms/4960/

Gurney, I. (2017). Companies. Retrieved from p2pmoney: http://www.p2pmoney.co.uk/companies.htm

Hörkkö, M. (2010). The Determinants of Default in Consumer Credit Market. Aalto University School of Economics.

I

Hulme, M. K., & Wright, C. (2006). Internet Based Social Lending: Past, Present and Future. Social Futures Observatory.

Investopedia. (n.d.). Adverse Selection. Retrieved from Investopedia: http://www.investopedia.com/terms/a/adverseselection.asp

Investopedia. (n.d.). Asymmetric Information. Retrieved from Investopedia: http://www.investopedia.com/terms/a/asymmetricinformation.asp

Investopedia. (n.d.). Credit Scoring. Retrieved from Investopedia: http://www.investopedia.com/terms/c/credit_scoring.asp

Investopedia. (n.d.). Moral Hazard. Retrieved from Investopedia: http://www.investopedia.com/terms/m/moralhazard.asp

Investopedia. (n.d.). Peer-To-Peer Lending (P2P). Retrieved from Investopedia: http://www.investopedia.com/terms/p/peer-to-peer-lending.asp

Investopedia. (n.d.). Revolving Credit. Retrieved from Investopedia: http://www.investopedia.com/terms/r/revolvingcredit.asp

Investopedia. (n.d.). Unsecured Loan. Retrieved from Investopedia: http://www.investopedia.com/terms/u/unsecuredloan.asp

Irby, L. (2016, November 10). Public Records and Your Credit Report. Retrieved from thebalance: https://www.thebalance.com/public-records-and-your-credit-report- 960740

Irby, L. (2016, September 1). What is a Hard Inquiry? Retrieved from thebalance: https://www.thebalance.com/what-is-a-hard-inquiry-960549

Kočenda, E., & Bojtek, M. (2009). Default Predictors and Credit Scoring Models. CESifo.

LendingClub Corporation. (2017). Lending Club Statistics. Retrieved from Lending Club: https://www.lendingclub.com/info/statistics.action

Lin, M., Prabhala, N. R., & Viswanathan, S. (2013). Judging Borrowers by the Company They Keep: Friendship. Management Science, 17-35.

Mateeschu, A. (2015). Peer-to-Peer Lending. Data&Society, 1-23.

Nefer, B. (2010, November 12). What Does Delinquency on a Credit Report Mean? Retrieved from Sapling: https://www.sapling.com/7491164/delinquency-credit-report-mean

II

Nickolas, S. (2015, April 24). What is the difference between moral hazard and adverse selection? Retrieved from Investopedia: http://www.investopedia.com/ask/answers/042415/what-difference-between-moral- hazard-and-adverse-selection.asp

Polena, M., & Regner, T. (2016). Determinants of borrowers' default in P2P lending. Jena Economic Research Papers, No. 2016-023.

Prosper Marketplace, Inc. (2017). About us. Retrieved from Prosper: https://www.prosper.com/plp/about/

Renton, P. (2012). The Lending Club Story: How the world's largest peer to peer lender is transforming finance and how you can benefit. Great Britain: Amazon.

Rind, V. (2016, April 26). Pros and Cons of Peer-To-Peer Lending. Retrieved from GoBankingRates: https://www.gobankingrates.com/personal-finance/5-perks-peer- to-peer-lending/

Serrano-Cinca, C., Gutiérrez-Nieto, B., & López-Palacios, L. (2015). Determinants of Default in P2P Lending. Plos One, 1-22.

Social Finance, Inc. (2017). Sofi. Retrieved from Sofi: https://www.sofi.com/

VantageScore Solutions, LLC. (2017). What influences your score. Retrieved from VantageScore: https://your.vantagescore.com/score-influences

Verbeek, M. (2012). Modern Econometrics. John Wiley & Sons Inc.

Woodruff, M. (2014, August 29). Here's what you need to know before taking out a peer-to- peer loan. Retrieved from Yahoo Finance: http://finance.yahoo.com/news/what-is- peer-to-peer-lending-173019140.html

Wooldridge, J. M. (2002). Introductory Econometrics - A Modern Approach. South-Western.

Wright, M. (2015, February 20). Pros and cons of peer-to-peer lending. Retrieved from MoneySuperMarket: http://www.moneysupermarket.com/c/news/pros-and-cons-of- peer-to-peer-lending/0085915/

Zopa. (2016). Our Story. Retrieved from Zopa: https://www.zopa.com/about/our-story

III

Appendices

Appendix 1: List of variables in the original dataset of Lending Club LoanStatNew Description acc_now_delinq The number of accounts on which the borrower is now delinquent. acc_open_past_24mths Number of trades opened in past 24 months. addr_state The state provided by the borrower in the loan application all_util Balance to credit limit on all trades The self-reported annual income provided by the borrower during annual_inc registration. The combined self-reported annual income provided by the co- annual_inc_joint borrowers during registration Indicates whether the loan is an individual application or a joint application_type application with two co-borrowers avg_cur_bal Average current balance of all accounts bc_open_to_buy Total open to buy on revolving bankcards. Ratio of total current balance to high credit/credit limit for all bc_util bankcard accounts. chargeoff_within_12_mths Number of charge-offs within 12 months collection_recovery_fee post charge off collection fee collections_12_mths_ex_med Number of collections in 12 months excluding medical collections The number of 30+ days past-due incidences of delinquency in the delinq_2yrs borrower's credit file for the past 2 years The past-due amount owed for the accounts on which the borrower delinq_amnt is now delinquent. desc Loan description provided by the borrower A ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, excluding mortgage and the requested dti LC loan, divided by the borrower’s self-reported monthly income. A ratio calculated using the co-borrowers' total monthly payments on the total debt obligations, excluding mortgages and the requested LC loan, divided by the co-borrowers' combined self- dti_joint reported monthly income earliest_cr_line The month the borrower's earliest reported credit line was opened Employment length in years. Possible values are between 0 and 10 emp_length where 0 means less than one year and 10 means ten or more years. emp_title The job title supplied by the Borrower when applying for the loan.* The upper boundary range the borrower’s FICO at loan origination fico_range_high belongs to. The lower boundary range the borrower’s FICO at loan origination fico_range_low belongs to. funded_amnt The total amount committed to that loan at that point in time. The total amount committed by investors for that loan at that point funded_amnt_inv in time. grade LC assigned loan grade

IV

The home ownership status provided by the borrower during registration or obtained from the credit report. Our values are: home_ownership RENT, OWN, MORTGAGE, OTHER id A unique LC assigned ID for the loan listing. Ratio of total current balance to high credit/credit limit on all install il_util acct initial_list_status The initial listing status of the loan. Possible values are – W, F inq_fi Number of personal finance inquiries inq_last_12m Number of credit inquiries in past 12 months The number of inquiries in past 6 months (excluding auto and inq_last_6mths mortgage inquiries) installment The monthly payment owed by the borrower if the loan originates. int_rate Interest Rate on the loan issue_d The month which the loan was funded last_credit_pull_d The most recent month LC pulled credit for this loan The upper boundary range the borrower’s last FICO pulled belongs last_fico_range_high to. The lower boundary range the borrower’s last FICO pulled belongs last_fico_range_low to. last_pymnt_amnt Last total payment amount received last_pymnt_d Last month payment was received The listed amount of the loan applied for by the borrower. If at some point in time, the credit department reduces the loan amount, then loan_amnt it will be reflected in this value. loan_status Current status of the loan max_bal_bc Maximum current balance owed on all revolving accounts member_id A unique LC assigned Id for the borrower member. mo_sin_old_il_acct Months since oldest bank installment account opened mo_sin_old_rev_tl_op Months since oldest revolving account opened mo_sin_rcnt_rev_tl_op Months since most recent revolving account opened mo_sin_rcnt_tl Months since most recent account opened mort_acc Number of mortgage accounts. mths_since_last_delinq The number of months since the borrower's last delinquency. mths_since_last_major_derog Months since most recent 90-day or worse rating mths_since_last_record The number of months since the last public record. mths_since_rcnt_il Months since most recent installment accounts opened mths_since_recent_bc Months since most recent bankcard account opened. mths_since_recent_bc_dlq Months since most recent bankcard delinquency mths_since_recent_inq Months since most recent inquiry. mths_since_recent_revol_delinq Months since most recent revolving delinquency. next_pymnt_d Next scheduled payment date num_accts_ever_120_pd Number of accounts ever 120 or more days past due num_actv_bc_tl Number of currently active bankcard accounts num_actv_rev_tl Number of currently active revolving trades num_bc_sats Number of satisfactory bankcard accounts num_bc_tl Number of bankcard accounts

V

num_il_tl Number of installment accounts num_op_rev_tl Number of open revolving accounts num_rev_accts Number of revolving accounts num_rev_tl_bal_gt_0 Number of revolving trades with balance >0 num_sats Number of satisfactory accounts Number of accounts currently 120 days past due (updated in past 2 num_tl_120dpd_2m months) Number of accounts currently 30 days past due (updated in past 2 num_tl_30dpd months) num_tl_90g_dpd_24m Number of accounts 90 or more days past due in last 24 months num_tl_op_past_12m Number of accounts opened in past 12 months open_acc The number of open credit lines in the borrower's credit file. open_acc_6m Number of open trades in last 6 months open_il_12m Number of installment accounts opened in past 12 months open_il_24m Number of installment accounts opened in past 24 months open_il_6m Number of currently active installment trades open_rv_12m Number of revolving trades opened in past 12 months open_rv_24m Number of revolving trades opened in past 24 months out_prncp Remaining outstanding principal for total amount funded Remaining outstanding principal for portion of total amount funded out_prncp_inv by investors pct_tl_nvr_dlq Percent of trades never delinquent percent_bc_gt_75 Percentage of all bankcard accounts > 75% of limit. publicly available policy_code=1 policy_code new products not publicly available policy_code=2 pub_rec Number of derogatory public records pub_rec_bankruptcies Number of public record bankruptcies purpose A category provided by the borrower for the loan request. pymnt_plan Indicates if a payment plan has been put in place for the loan recoveries post charge off gross recovery revol_bal Total credit revolving balance Revolving line utilization rate, or the amount of credit the borrower revol_util is using relative to all available revolving credit. sub_grade LC assigned loan subgrade tax_liens Number of tax liens The number of payments on the loan. Values are in months and can term be either 36 or 60. title The loan title provided by the borrower tot_coll_amt Total collection amounts ever owed tot_cur_bal Total current balance of all accounts tot_hi_cred_lim Total high credit/credit limit The total number of credit lines currently in the borrower's credit total_acc file total_bal_ex_mort Total credit balance excluding mortgage total_bal_il Total current balance of all installment accounts total_bc_limit Total bankcard high credit/credit limit

VI

total_cu_tl Number of finance trades total_il_high_credit_limit Total installment high credit/credit limit total_pymnt Payments received to date for total amount funded Payments received to date for portion of total amount funded by total_pymnt_inv investors total_rec_int Interest received to date total_rec_late_fee Late fees received to date total_rec_prncp Principal received to date total_rev_hi_lim Total revolving high credit/credit limit url URL for the LC page with listing data. Indicates if income was verified by LC, not verified, or if the income verification_status source was verified Indicates if the co-borrowers' joint income was verified by LC, not verified_status_joint verified, or if the income source was verified The first 3 numbers of the zip code provided by the borrower in the zip_code loan application.

VII

Appendix 2 – Regression results LPM

(1) VARIABLES Coeff

Loan amount 4.96e-07*** (1.22e-07) Employment Length < 1 year -0.0586*** (0.00476) Employment Length 1 year -0.0640*** (0.00492) Employment Length 2 years -0.0617*** (0.00467) Employment Length 3 years -0.0595*** (0.00477) Employment Length 4 years -0.0617*** (0.00498) Employment Length 5 years -0.0610*** (0.00481) Employment Length 6 years -0.0523*** (0.00500) Employment Length 7 years -0.0525*** (0.00511) Employment Length 8 years -0.0550*** (0.00535) Employment Length 9 years -0.0574*** (0.00568) Employment Length 10+ years -0.0566*** (0.00412) Dummy Home Mortgage -0.0192*** (0.00296) Dummy Home Rent 0.0115*** (0.00297) Annual Income -2.33e-07*** (1.65e-08) Debt-to-Income Ratio 0.00191*** (0.000119) Delinquencies 2 years 0.00611*** (0.00156) Earliest Credit Line 2.65e-06*** (3.45e-07) Inquiries last 6 months 0.0252*** (0.000696) Months since last delinquency -1.11e-05 (7.02e-05) Dummy Delinquencies 0.00707** (0.00347) Months since last record 5.98e-05 (0.000105) Dummy public records 0.00666 (0.0111) Open accounts 0.00226*** (0.000244)

VIII

Public records 3.33e-05 (0.00350) Revolving balance -1.28e-07*** (4.49e-08) Revolving utilization 0.0787*** (0.00347) Total Accounts -0.00141*** (0.000106) Listing Status 0.00877*** (0.00207) Constant 0.0625*** (0.00728)

Observations 175,037 R-squared 0.021 Standard errors in parentheses *** p<0.01, ** p<0.05, * p<0.1

IX

Appendix 3: Stata commands

3.1: Initial model – regression coefficients logit dummy_loan_status loan_amnt dummy_lessthan1y dummy_1y dummy_2y dummy_3y dummy_4y dummy_5y dummy_6y dummy_7y dummy_8y dummy_9y dummy_10y dummy_house_mortgage dummy_house_rent annual_inc dti delinq_2yrs earliest_cr_line inq_last_6mths months_since_last_delinq dummy_months_since_last_delinq months_since_last_record dummy_months_since_last_record open_acc pub_rec revol_bal revol_util total_acc dummy_listing_status

3.2: Initial model – odds ratios logistic dummy_loan_status loan_amnt dummy_lessthan1y dummy_1y dummy_2y dummy_3y dummy_4y dummy_5y dummy_6y dummy_7y dummy_8y dummy_9y dummy_10y dummy_house_mortgage dummy_house_rent annual_inc dti delinq_2yrs earliest_cr_line inq_last_6mths months_since_last_delinq dummy_months_since_last_delinq months_since_last_record dummy_months_since_last_record open_acc pub_rec revol_bal revol_util total_acc dummy_listing_status

X