Portfolio Selections in P2P Lending: A Multi-Objective Perspective

∗ Hongke Zhao1 Qi Liu1 Guifeng Wang1 Yong Ge2 Enhong Chen1 1School of Computer Science and Technology, University of Science and Technology of China {zhhk, wgf1109}@mail.ustc.edu.cn {qiliuql, cheneh}@ustc.edu.cn 2Eller College of Management, University of Arizona, [email protected]

ABSTRACT many users (i.e., borrowers and lenders) and generates mas- P2P lending is an emerging wealth-management service for sive lending transactions. For instance, the total loan is- individuals, which allows lenders to directly bid and invest on suance amount of Lendingclub had reached more than $13.4 the loans created by borrowers. In these platforms, lender- billion at the end of 2015. s often pursue multiple objectives (e.g., non-default prob- The prevalence of P2P lending and the availability of trans- ability, fully-funded probability and winning-bid probability) action data have attracted many researchers’ attentions, which when they select loans to invest. How to automatically as- mainly focused on risk evaluation [23, 11], social relation sess loans from these objectives and help lenders select loan analysis [20, 13] and fully-funded analysis [28, 25]. Recently, portfolios is a very important but challenging problem. To authors in [34] proposed to study loan recommendations for that end, in this paper, we present a holistic study on portfo- lenders. However, due to the specific working mechanism of lio selections in P2P lending. Specifically, we first propose to P2P lending, the problem of loan/investment recommenda- adapt gradient boosting decision tree, which combines both tions in these platforms is still largely underexplored. static features and dynamic features, to assess loans from In P2P lending, there are mainly two kinds of roles: the multiple objectives. Then, we propose two strategies, i.e., borrowers who want to borrow money from others and the lenders who lend money to borrowers. Trading in these mar- weighted objective optimization strategy and multi-objective 3 optimization strategy, to select portfolios for lenders. For kets follows the Dutch Auction Rule [17, 31]. Specifically, each lender, the first strategy attempts to provide one opti- for borrowing money, a borrower will first create a listing mal portfolio while the second strategy attempts to provide to solicit bids from lenders by describing herself, the rea- a Pareto-optimal portfolio set. Further, we design two algo- son of lending (e.g., for wedding), the required amount (e.g., rithms, namely DPA and EVA, which can efficiently resolve $1,000) and the maximal interest rate (e.g., 10%). Then, if the optimizations in these two strategies, respectively. Fi- a lender wants to lend to this loan within its soliciting du- nally, extensive experiments on a large-scale real-world data ration (e.g., one week), a bid is created by describing both set demonstrate the effectiveness of our solutions. how much money she wants to lend (e.g., $50) and the min- imum interest rate (e.g., 9.5%). If this listing receives more Keywords than its required amount in its soliciting duration, those bids with lower rates will succeed/win, and other bids with high- P2P Lending; Portfolio Selection; Multi-objective Optimiza- er rates will be outbid/fail. In contrast, if this listing can’t tion; Dynamic Feature receive enough bids in time, it would be expired and all the 1. INTRODUCTION previous bids would also fail [34, 6]. Based on this trading rule, a rational lender Alice may have the following two con- Recent years have witnessed the rapid development of on- siderations while selecting loans to bid. Multi-objective. 1 2 line P2P lending platforms, e.g., Prosper , Lendingclub . As While selecting loans, Alice may evaluate a loan from the a new emerging wealth-management service for individuals, probability of this loan being fully funded, the probability of P2P lending allows individuals to borrow and lend money winning the bid, as well as the loan risk (i.e., default prob- directly from one to another without going through any tra- ability) [4]. Portfolio. To be a successful lender, Alice also ditional financial intermediaries. Indeed, P2P lending has has the portfolio [24] perspective in her mind, i.e., she usual- become a fast growing investment market which attracts ly wants to select more than one loan (i.e., a portfolio) to bid ∗ in each investment. Indeed, some platforms (e.g., Prosper) Corresponding author. 1 already instruct lenders to diversify their money on multiple https://www.prosper.com/ 2 loans to reduce risk. However, it is difficult and boring for https://www.lendingclub.com/ Permission to make digital or hard copies of all or part of this work for personal or lenders to select dozens of loans in each time. Thus, devel- classroom use is granted without fee provided that copies are not made or distributed oping an automatic approach to recommend portfolios for for profit or commercial advantage and that copies bear this notice and the full cita- lenders is very needed. tion on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- In this paper, we present a holistic approach to help P2P publish, to post on servers or to redistribute to lists, requires prior specific permission lenders select investment portfolios, which can satisfy lender- and/or a fee. Request permissions from [email protected]. KDD ’16, August 13-17, 2016, San Francisco, CA, USA 3 There exists another kind of trading rule in P2P lending, in which c 2016 ACM. ISBN 978-1-4503-4232-2/16/08. . . $15.00 the platform determines posted rates for loans [31]. This trading can DOI: http://dx.doi.org/10.1145/2939672.2939861 be treated as a special case of our studied scenario. Table 1: Mathematical notations. Notation Description Historical Data Auction Data Loan VA = {v1, ..., v|VA|} the set of being-auctioned loans currently UA = {u1, ..., u } the set of current active lenders |UA| Historical Loans All Lenders Being-auctioned Loans Recent Bids x = (x1, ..., x|VA|) a selected loan portfolio u Lender Rei lender ui’s preference on rate expectation v Dynamic Feature Extraction Active Lender Identification Rej loan vj ’s declared interest rate in auction Multi-objective Pj = [Rj ,Tj ,Cj ] loan vj ’s assessed profile Assessments Model Training …… GBDT αi = (αi1, αi2, αi3) lender ui’s personalized weighted vector Assessment Models …… s’ interest rate expectations, minimize investment risk (i.e., Offline Flow default probabilities) and maximize trading efficiency (i.e., Weighted Objective Strategy Multi-objective Optimization Strategy Portfolio fully-funded probabilities and winning-bid probabilities) si- A Portfolio A Portfolio Set Online Selections Flow multaneously. Specifically, we first identify active lenders DPA … EVA … …… in current market, i.e., the lenders who are most likely to in- vest in the following period, as our target users. Second, we Figure 1: Flowchart of portfolio selections. assess each being-auctioned loan from multi-objective views, importantly, we introduce how to assess being-auctioned loan- i.e., the non-default probability, fully-funded probability and s from multiple objectives. For better illustration, Table 1 winning-bid probability. Here, different from previous work- lists some mathematical notations used in this paper. s which only used static features in assessments, we also extract dynamic features of loans, and adapt an ensemble 2.1 Preliminaries method (i.e., Gradient Boosting Decision Tree) to combine Problem Statement. Formally, given the lenders’ his- both static features and dynamic features to improve the torical bidding records, and current being-auctioned loans prediction performances. Finally, given the identified ac- VA = {v1, ..., v|VA|} in market, our goal is to select loan tive lenders and assessed loans, we attempt to select port- portfolios from VA for each active lender. Active lenders folios for each active lender. As we described above, the UA = {u1, ..., u|UA|} are those who are most likely to lend in selection should take into account multiple economic fac- the following period. A portfolio x = (x1, ..., x|VA|) is an op- tors/objectives, and the recommendation for each lender timal combination of multiple loans, if xj = 1, the j-th loan should also be portfolios rather than single loans. Specif- in VA is selected and put into the portfolio. For each lender ically, we propose two strategies, i.e., weighted objective op- ui ∈ UA, the selected portfolios should satisfy her prefer- u timization strategy and multi-objective optimization strategy ence on rate expectation Rei with minimum risk (maximal to solve portfolio selections. Weighted objective strategy non-default probability) and maximum transaction efficien- combines three objectives into a single objective based on cy (fully-funded probability, winning-bid probability). a weighted objective vector, and provides each lender with Framework Overview. For tackling the above problem, an optimal portfolio. Multi-objective optimization strategy we propose a solution framework which is show in Figure 1. optimizes three objectives simultaneously and gets a Pareto- There are three major steps (brown backgrounds): (1) i- optimal solution set (portfolios) for each lender. For these dentifying the active lenders UA and learning their prefer- u two strategies, two efficient algorithms, i.e., DPA (dynamic ences on rate expectation Rei , i ∈ {1, ..., |UA|}; (2) assessing programming) and EVA (evolutionary algorithm), are de- each being-auctioned loan vj ∈ VA on multiple objectives, signed to solve the optimization problems respectively. The (i.e., Non-default probability Rj , Fully-funded probability contributions of this paper can be summarized as follows. Tj , Winning-bid probability Cj ); and (3) selecting portfolios • To the best of our knowledge, this is the first work for all active lenders. on assessing loans from a multi-objective perspective We identify active lenders online (green arrows) and learn in P2P lending. Furthermore, we also propose to ex- lenders’ preferences on rate expectation from their historical tract dynamic features from both bidding lenders and investment records. For assessing being-auctioned loans, we auctions, which are very helpful for loan assessments. train multiple assessment models using the historical loans offline (purple arrows). Given the identified active lenders • With our two portfolio selection strategies, we attempt and the assessed loans, we propose two strategies to select to recommendation portfolios rather than single loan- portfolios. Weighted objective optimization strategy pro- s for lenders. Especially in the multi-objective opti- vides an optimal portfolio and multi-objective optimization mization strategy, for each lender, we get a Pareto- strategy provides a Pareto-optimal (skyline) portfolio set for optimal (skyline) portfolio set. These distinguish our each lender. In these two strategies, two algorithms, namely study from other recommendation works very much. DPA and EVA, are designed. This portfolio selection step is • We develop two algorithms, DPA and EVA, which can also achieved online. The first two steps will be introduced select portfolios in two strategies effectively. Particu- in the rest of this section, and the portfolio selections will be larly, EVA optimizes for all lenders one by one with a introduced in the next section. special inherited initialization, which is more effective and efficient than conventional algorithms. 2.2 Active Lenders and Rate Preferences • We construct extensive experiments on a real-world da- According to the data analysis and observation, we find ta set. The experimental results clearly demonstrate that, in a certain period of time, only a small part of lenders the effectiveness of our solutions. (usually less than 10%) rather than all lenders will bid on current loans. We call these lenders who are most likely to 2. PRELIMINARIES AND ASSESSMENTS bid in the following period active lenders. Identifying active In this section, we first introduce the preliminaries of port- lenders in advance can achieve accurate service of portfolio folio selections. Then, we identify the active lenders and selection, also improve the service efficiency and user experi- learn their preferences on rate expectation. Finally and most ence. Indeed, most lenders often bid periodically, and invest Algorithm 1: ALI: Active lenders identification. Table 2: Feature examples. Name Description Class Input: Lenders U = {u1, ..., u|U|}, Time threshold TR, Bor Rat the maximum interest rate the borrower is Current time CT ; willing to pay Loan Output: Active Lenders UA = {u1, ..., u|UA|}, Category borrowing purpose u (static) Rate expectation preference Rei of ui ∈ UA; Cre Dat the created date of the listing of this loan Initialize: UA = φ; ...... for each ui ∈ U do Deb Inc debt to income ratio of the borrower Borrower if ui’s latest bid time in the range of TR before CT then Credit credit grade of the borrower (static) UA = UA ∪ {ui}, ...... u Compute Rei according to Equation (1); Am Rem the amount which remains to be funded Auction Auc Pro time interval from Cre Date to current CT return UA and Reu = {Reu, ..., Reu }. (dynamic) 1 |UA| ...... Def Per lenders’ past default loan percent Lender in a series of loans continuously. Thus, the lenders who bid Fun Per lenders’ past fully-funded loan percent (dynamic) in recent days have higher probabilities to invest in the fol- ...... lowing period than other lenders. In our study, we select among bids on a loan will take place and some bids will fail. the lenders who have bidding behaviors in a time range TR Thus, winning-bid probability reflects the biding competi- (e.g., 10 days) before current time CT as our current target tion on a loan [4, 18]. users, i.e., active lenders. Non-default probability Rj mainly affects lenders’ profits, In investment, lender’s preference is mainly reflected in while fully-funded probability Tj and winning-bid probabil- her return expectation. In P2P lending, the interest rate of ity Cj mainly affect the investment efficiency, i.e., helping a loan declared before auction is often treated as the return avoid invalid bids. In summary, for each being-auctioned [23, 34]. Thus, we can get a lender’s preference on return loan vj , we can adopt a three-element vector Pj = [Rj ,Tj ,Cj ] rate through her historical investments. Specifically, lender to denote its profile. Next, we will introduce how to estimate u ui’s preference Rei on rate expectation can be calculated as: these profile terms on specific objectives for given loans. u 1 X v Rei = Rej , (1) |VUi| 2.3.1 Features for Assessments vj ∈VUi Here, we introduce extracting features for loan assess- where VUi is the loan set that lender ui invested in the v ments. In previous works, researchers explored some clas- past, Rej is the declared interest rate of loan vj . Algorithm 1 shows the process of identifying active lenders and the sification or regression models on some single objective as- calculation of their preferences on rate expectation. sessment tasks, e.g., default [29, 9, 34], fully-funded [14, 28]. However, these studies only explored the static features from 2.3 Multi-objective Assessments for Loans both loan (e.g., rate, amount, purpose,) and borrower (e.g., In this subsection, we show the way of assessing the profiles credit, debt). In [23], authors evaluated a loan by the char- of being-auctioned loans on multiple objectives. According acteristics extracted from the bidding lenders, i.e., average to the previous study [4], there are three major economic and variance of the past real returns of bidding lenders. In- factors that lenders take into account when deciding to bid deed, a loan receives bids from lenders one by one during on a loan: lenders’ belief about the probability of a loan be- its auction. Thus, the features extracted from the lenders ing fully funded, the probability of winning the bid, as well who have bid on this loan are dynamic. Besides, we can as the interest rate. Similarly, we also formalize the loan also extract some dynamic features from loan auctions, e.g., assessment from a multi-objective view. In our study, we auction phase/time, percent fund. These dynamic features in assess a loan v on the following three objectives. auction reflect the popularity of a loan directly. In summary, j in this paper, we explore both static features and dynamic Non-default Probability(Rj ). Since the loan interest rate v features for better multi-objective loan assessments. Rej is given explicitly in market and often taken as a search- ing criterion by many lenders, we take the rate expectation Static Features. Static features are given explicitly before as one important constraint, and formalize another widely- auction, which are extracted from the loan’s properties and used assessment metric: Risk, i.e., default probability [23, the associated borrower’s properties. These static features 19, 34]. For consistency, we maximize loans’ non-default directly reflect the basic properties of loans and borrowers. probabilities rather than minimizing default probabilities on For example, borrowers with high credits are more likely to risk assessment. Formally, non-default probability is the es- repay in time than borrowers with poor credits. timated probability that one loan may repay the principal Dynamic Features. Dynamic features are extracted from and interest to lenders in time. the temporal auction and incremental bidding lenders of a loan, which are changing from time to time during an auc- Fully-funded Probability(Tj ). Fully-funded probability is the estimated probability that one loan may receive e- tion. Dynamic features can reflect the popularity of a loan. nough bids in its auction. According to the trading rule, the For example, popular loans may receive massive bids rapidly, T ransactions are valid only if the corresponding loan can re- thus, are more likely to be fully funded. Further, the dynam- ceive enough bids. Thus, fully-funded probability is another ic features extracted from lenders reflect the lenders’ views, important aspect to assess loans [14, 4]. such that, loans received many bids from experienced lender- 4 s (lenders whose investments success frequently and default Winning-bid Probability (Cj ). Winning-bid probability is the estimated probability that a lender’s bid on this loan rarely) may be better than other loans. Especially, dynamic under current loan status will finally success or participate features, Def Per, Fun Per, Win Per, extracted from the in- after the loan’s entire auction. Since some popular loans may cremental bidding lenders’ historical investments in line with receive more bids than their required amount, C ompetition our three assessment objectives, will have great help to the prediction performances on corresponding objectives. 4 This objective doesn’t exist on the loans with posted rates [31]. We represent all the features as numerics. For temporal features, such as Cre Dat, we convert raw features to a se- the weight coefficients γm is calculated by: rial date number, which represents the whole and fractional n number of days from a fixed preset date [6]. For categorical X ∂L(yi,Gm−1(zi)) γm = arg min L(yi,Gm−1(zi) − γ ). (5) γ ∂G (z ) features, such as Category and Credit, we convert a vari- i=1 m−1 i able with n categories into a n-dimensional binary vector, Repeat this building trees process until h (z), and then in which only the value in the corresponding category is set M G[(z) = G (z). After we get the models GBDT i, i ∈ {1, 2, 3}, to one. All features are normalized for comparability. For M on three objectives. For each being-auctioned loan v , we example, Def Per represents the fraction of all default loans j can get its estimations on three objectives, i.e., the terms of bidding lenders to their total bidding number in the past. in profile P by the model output probabilities on positive Please note that, for non-default objective and fully-funded j classes of these objectives. objective, we extract dynamic features every day, while for winning-bid objective, we extract dynamic features every bid GBDT i V A, {v1, ..., v|VA|} −−−−−−−−→ {(R1,T1,C1), ..., (i.e., the features for winning-bid objective are constructed i∈{1,2,3} (6) according to the latest bid on this loan at current). Thus, (R|VA|,T|VA|,C|VA|)}. the training instance numbers are different on different ob- jectives but the feature dimensions are same on all objec- For the scale consistency of different objectives, we re- tives. In summary, we use a total of 26 features (i.e., 9 loan process the values in each dimension of Pj by their relative features, 4 borrower features, 7 auction features and 6 lender values, e.g., inverse ranking values in all the being-auctioned features) in our study and some examples of these features loans. For example, Pj = [101, 111, 51] means loan vj outper- are shown in Table 2. forms other 100, 110, 50 loans on three objectives respec- tively. Now, we assessed the being-auctioned loans on three 2.3.2 Loan Assessments objectives, in the following, we will make our main effort to After the feature extraction and preparation, in this part, help active lenders select portfolios. we introduce the assessment models. Specifically, we adopt the gradient boosting decision tree (GBDT ) [10, 12] to assess 3. PORTFOLIO SELECTIONS loans, which has been used to predict P2P lending transac- In this section, we introduce the detailed information of tion in [6]. The reason for choosing GBDT is as follows. portfolio selections. Generally, portfolio is a famous theo- First, we should evaluate the being-auctioned loans with nu- ry in finance that attempts to maximize portfolio expected merical or ranking output which is the foundation in the return for a given amount of portfolio risk, or equivalently following portfolio selection formalization. GBDT can esti- minimize risk for a given level of expected return, by careful- mate the probabilities (non-default probability, fully-funded ly choosing the proportions of various assets [24]. Similarly, probability and winning-bid probability) for each loan. More in our study, we maximize the formalized multiple objec- u importantly, GBDT is an ensemble method where an indi- tives of portfolio for a given level of expected return (Rei ). vidual learner is a decision tree which only uses one variable In P2P lending, for a given lender, her bidding amount on at each node when it is trained/constructed as well as when specific loans are not significantly different, which are often it is applied to test data. This characteristic prevents us the smallest allowed amount (i.e., $25) or integer multiple of from worrying about heterogeneity in the features we gener- smallest amount (e.g., $50). Thus, when selecting portfolios, ated [6]. What’s more, compared with conventional machine we don’t care about the investment amount of a lender and learning models, the performances of ensemble methods are we mainly focus on solving a discrete selection optimization significantly better and well demonstrated [10, 32, 15, 1]. problem. Meanwhile, we also assume “lenders are experi- Suppose we have n training instances {(z1, y1), ..., (zn, yn)} enced and rational”, which is one of the most fundamental for a specific task (e.g., non-default objective), where zi is assumptions in economics [24, 26]. the feature vector of loan instance vi, and yi is the label of vi Specifically, given the active lenders UA with their rate u on this objective (e.g., if vi repay in time, yi=1; otherwise, expectation preferences Rei , i ∈ {1, ..., |UA|} and the pro- yi=0) . We train GBDT (G(z)) with M weak learners, each files of being-auctioned loans i.e., Pj , j ∈ {1, ..., |VA|}, we is a decision tree h(z) with a weight coefficient γm, propose two strategies, i.e., weighted objective optimization M strategy and multi-objective optimization strategy to help X G(z) = γmhm(z). (2) lenders select loan portfolios. The first strategy combines m=1 the three objectives into one single objective via a prede- Similar to other boosting algorithms, GBDT builds the ad- fined weighted vector and provides one optimal portfolio for ditive model in a forward stage-wise fashion. At each stage each lender through a dynamic programming algorithm that the decision tree hm(z) is chosen to minimize the loss func- we designed, i.e., DPA. Further, the multi-objective op- tion L given the current model Gm−1 and its fit Gm−1(zi). timization strategy optimizes all objectives simultaneously, and provides all the Pareto-optimal portfolios for each lender Gm(z) =Gm−1(z) + γmhm(z), through a novel evolutionary algorithm, i.e., EVA. n X (3) =Gm−1(z) + arg min L(yi,Gm−1(zi) − h(zi)). 3.1 Weighted Objective Strategy h i=1 Weighted objective strategy supposes there is a weight vec- GBDT attempts to solve this above minimization problem tor αi = (αi1, αi2, αi3) from lender ui to balance the impor- numerically via steepest descent, whose direction is the neg- tance of three objectives. This setting is practical since some ative gradient of the loss function evaluated at the current famous platforms, e.g., Prosper or Lendingclub allows users model Gm−1, to preset some criteria, e.g., their expected interest rates, n X before searching. The weight vector can also be integrated Gm(z) = Gm−1(z) + γm 5GL(yi,Gm−1(zi)), (4) i=1 into the user profiles. Through the multi-objective assessments in section 2.3, we Algorithm 2: DPA: Dynamic programming algorithm. get the profile terms of each loan, i.e., Pj . Thus, for lender u Input: A given lender ui, with her rate expectation Rei , ui, we can obtain a single weighted objective on all being- All being-auctioned loans VA, and each loan vj ∈VA, P|VA| T with its declared rate Rev and objective profit P r ; auctioned loans VA: f(x) = α Pj xj , where xj is the j j j=1 i Output: Portfolio selection vector x, boolean selecting label of loan vj . In this case, the portfolio Maximum objective profit f(x); selection problem for lender ui can be formalized as: Initialize: x, xj = 0, j = 1, ..., |VA|; D[0][0] = 0, other D[p][q] = −1; |VA| All G[p][q] = φ; X T max f(x) = α Pj xj , for j = 1 to |VA| do x i j=1 for p = K-1 down to 0 do for q = p*Rmax down to 0 do |VA| v (7) if (D[p][q] ≥ 0) ∧ (D[p][q] + P rj > D[p + 1][q + Rej ]) 1 X v u s.t. Rej xj ≥ Rei , then ||x||0 v j=1 D[p + 1][q + Rej ] = D[p][q] + P rj , v G[p + 1][q + Rej ] = G[p][q] ∪ {j}; K ≥ ||x||0, where x = (x1, ..., x|VA|) is the selection vector and xj =1 maxD=0,maxp=maxq=0; for p = 1 to K do means the j-th loan in VA is selected and put into ui’s port- 5 for q = 0 to K*Rmax do folio. K is the portfolio size which is given by lender ui . The u if q/p ≥ Rei ∧ D[p][q] > maxD then objective of this function is to maximize the weighted single maxD = D[p][q], objective, i.e., f(x). The first constraint means that the s- maxp=p, maxq=q; elected portfolio should satisfy the lender ui’s preference on u for j ∈ G[maxp][maxq] do rate expectation Rei . This formalization is a discrete con- xj = 1; strained optimization problem, which is NP-hard and will return x, and f(x) = maxD. PK |VA| need i=1 i computations in the worst case. This problem is difficult to solve directly by convention- be respectively defined by their corresponding profile terms, al algorithms since there are two constraints. In this study, i.e., Pj = [Rj ,Tj ,Cj ], j ∈ {1, ..., |VA|}. Thus, we design a dynamic programming algorithm, namely DPA, |VA| |VA| |VA| X X X with two main loops to obtain the optimal solution. First, f1(x) = Rj xj , f2(x) = Tj xj , f3(x) = Cj xj . (8) v we convert the declared rate value Rej of each loan in- j=1 j=1 j=1 to a positive integer representation, e.g., 10.50% → 1050 Pareto-optimal portfolio set (A* ). Suppose x and x0 v (Rej is a percentage with two decimal places in our data), are two feasible solutions/portfolios in the solution space Ω. and denote the largest rate value as Rmax, (i.e., Rmax = x is said to dominate x0 (denoted as x ≺ x0) if and only v 0 maxj∈{1,...,|VA|} Rej ). The weighted objective value of each if ∀e ∈ {1, 2, 3}, fe(x) ≥ fe(x ) and ∃e ∈ {1, 2, 3}, fe(x) > T 0 loan vj is donated as P rj (i.e., P rj = αi Pj ). DPA is shown fe(x ). Thus, a solution x∗ ∈ Ω is Pareto-optimal if there in Algorithm 2, in which D[p][q] denotes weighted objective is no other x ∈ Ω dominates x∗ , i.e., ¬∃x ∈ Ω, x ≺ x∗. of the portfolio with p loans and q summation of rate. The set A* of all Pareto-optimal solutions/portfolios in the In DPA, the first loop only considers the second con- decision space is called Pareto-optimal or skyline portfolio straint in problem (7) and can get all the optimal selections set, and each of those solutions can be treated as a specific under any p and q, and the computational complexity is trade-off among these contradictory objectives. 1 2 O( 2 |VA|K Rmax). Further, the second loop gets the final In fact, Pareto optimality is a state of allocation or selec- optimal solution by satisfying the first constraint, and the tion in which it is impossible to make any objective better computational complexity is O(K2Rmax). Please note that, off without making at least one objective worse off. Each only the second loop takes the lender’s rate expectation into selection portfolio in Pareto-optimal set is assessed under consideration, thus the optimal selections provided by the multiple objectives and no other option can categorically first loop are the same for all lenders. If we want to get the outperform any of the members in this Pareto-optimal set. final solutions for all lenders, we need to compute for all of In the multi-objective optimization strategy, we try to get them in the second loop, and thus, the overall computational the Pareto-optimal portfolio set as the recommendation can- 1 2 2 complexity is O(max{ 2 |VA|K Rmax, |UA|K Rmax}). didates to each lender. The multi-objective optimization Through DPA, we can get an exact optimal portfolio for problem for lender ui can be formalized as follows: each lender. However, in many cases, for most lenders, they max F(x) =(f1(x), f2(x), f3(x)), are ambiguous or not sensitive about the weights on the mul- x tiple objectives. Thus, we propose another strategy, multi- |VA| 1 X v u s.t. Re xj ≥ Re , (9) objective optimization, to solve portfolio selections. ||x|| j i 0 j=1

3.2 Multi-objective Optimization Strategy K ≥ ||x||0. Different from the weighted objective optimization strat- This formalized problem is much more complicated than egy, multi-objective optimization strategy does not need a the weighted optimization problem (7). Getting the exact weight vector and it optimizes the multiple objectives simul- Pareto-optimal solution set for this problem is unpractical. taneously and obtains a Pareto-optimal portfolio set instead Inspired by previous studies on multi-objective optimization of a single portfolio for each lender. In this strategy, the [7, 8, 33], evolutionary algorithms are well-suited to solve three objective functions on being-auctioned loans VA can multi-objective optimization problems because their inher- 5 ent parallelism allows them to find a set of Pareto-optimal Lenders may have different portfolio size K. In practice, we group lenders based on their K values. At each time, DPA precess all solutions in a single run. Being population-based stochas- lenders in a group together with a same K, which is the same in EVA. tic search approaches, evolutionary algorithms use concepts of population and of recombination inspired by Darwinian Algorithm 3: EVA: Evolutionary algorithm. evolution. An iterative process is executed, initialized by a Input: Active lenders UA = {u1, ..., u|UA|}, u u randomly chosen population (portfolios) which is also called their preferences on rate expectation {Re1 , ..., Re|UA|}; chromosome (i.e., x in our study, which also conforms bi- All being-auctioned loans VA; and each vj ∈VA, v nary encoding). In each generation, new solutions or off- with its declared rate Rej and assessed profile Pj ; Output: Pareto-optimal portfolio selections springs are generated through recombination and mutation. ∗ 1 i Ai = {x , ..., x , ...} for lender ui, i ∈ {1, ..., |UA|}; u Besides, selection operation in each generation leads the evo- Initialize: Rank UA according to Rei in descending order; lutionary process according to the defined fitness function. for i from 1 to |UA| do if i==1 then For multi-objective problems, fitness functions are often de- 1 N Initialize N solutions A1 = {x , ..., x } randomly; signed based on Pareto domination, e.g., [8, 16]. Evolution- Go to Evolution; ary algorithms are often used to solve many complicated else 1 N−|Ai−1| problems with providing approximate solutions efficiently. Initialize N − |Ai−1| solutions {x , ..., x } ∗ 1 N−|Ai−1| However, the conventional multi-objective evolutionary al- randomly, Ai = Ai−1 ∪ {x , ..., x }; gorithms, e.g., NSGA-II [8], MOEA/D [33], were designed Evolution: for one multi-objective optimization problem rather than a for x ∈ Ai do Repair x according to Equation (10); series of similar problems, e.g., portfolio selections for al- Reproduction: l lenders in our study. Thus, the conventional algorithms for j from 1 to N do have to run |UA| times independently for all active lenders Select two individuals randomly from Ai; in our problem which will take much computation cost. In Generate one offspring x, repair x, and Ai = Ai ∪ {x}; fact, for many lenders, their selection problems are similar, Select N x from Ai according to their dominations; Stopping Criterion: and the only difference is the rate expectation preference If stopping criterion is not satisfied, go to Reproduction; u Rei in the second constraint. Especially for some lenders Post-processing: Remove the dominated solutions from A , and get A∗; with similar preferences, their final optimal selections may i i return A∗. also be very similar. Thus, there is no need to reprocess from i beginning for all lenders. In this study, we propose a new In summary, for portfolio selections, we first identify the evolutionary algorithm, namely EVA, to solve these series of active lenders in market and get their preferences on rate ex- optimizations for all active lenders more effectively. pectation, and meanwhile, assess the being-auctioned loans EVA first ranks all active lenders in descending order ac- on multiple objectives. Further, we propose two strategies cording to their rate expectation preferences Reu. Then, we i to help active lenders select portfolios. Weighted objective have the following theorem: strategy works efficiently and provides an optimal portfolio ∗ Theorem 1. If Ai is the Pareto-optimal solution set for for each lender, which depends on a weighted vector from ∗ lender ui, each solution x ∈ Ai must be a feasible solution lenders. Multi-objective optimization strategy can automat- 0 (satisfy constraints) for each lender ui0 , i ∈ {i + 1, ..., |UA|}, ically provide an approximate Pareto-optimal portfolio set ∗ and Ai is a good approximation of the Pareto-optimal solu- for each lender. From the formalizations of these two strate- tion set of ui’s next neighbor lender, i.e., ui+1. gies, we can see that both of them can easily expand to other or more objectives. We will evaluate the advantages Considering Theorem 1, we can simplify and speed up and disadvantages of these two strategies in experiments. the convergence processes for lenders ui+1, i ∈ {1, ..., |UA|} by taking ui’s Pareto-optimal set as initialization. Besides, 4. EXPERIMENTS EVA adopts a random greedy strategy to repair the infea- In this section, we will construct extensive experiments on sible individual solutions during evolution. Suppose x is an a large-scale real-world data set. First, we make data anal- 1 P|VA| v u infeasible individual, (i.e., Re xj < Re , or K < ||x||0 j=1 j i ysis and explore the lenders’ bidding behaviors from multi- ||x||0), in each repair, determine an objective e ∈ {1, 2, 3} objective and portfolio perspectives. Second, we evaluate randomly, and select j0 ∈ {1, ..., |VA|} such that: the performances of multi-objective assessments. Finally,

0 j− v we evaluate our portfolio selections holistically. j = arg min (fe(x) − fe(x ))Rej , (10) j∈{1,...,|VA|} 4.1 Experimental Data and Analysis j− where x is different from x only in the position j, and The experimental data set is collected from one famous j− 6 xj = 0. Set xj0 = 0, and repeat repairing until x is feasi- P2P lending platform in America, i.e., Prosper . This data ble. What’s more, EVA selects population in each generation contains the transaction records in this platform of almost 6 according to their dominations, which is same with [8, 16]. years. We mainly use three tables of this data set. Listing The whole process of EVA is shown in Algorithm 3. table contains the loan temporal status on auction and some 2 In each iteration, the computational complication is O(N ) basic credit features of borrowers. This table is mainly used (N is the population size) since EVA adopts the same selec- to extract static features. Bid table contains the specific time tion operation with that in [8]. Thus, the whole computa- and some information of bid, e.g., bid amount of money, and 2 tional complication is O(|UA|GN )(G is the iteration gen- the bidding result of each lender on a certain loan. These erations). In fact, the iteration times for lenders ui+1, i ∈ investment records are the basis to construct dynamic fea- {1, ..., |UA|} are much less than that for u1 because of the tures. Loan table is used to evaluate the performances of a special initialization approach in EVA. loan, e.g., default and fully-funded. Each solution in the Pareto-optimal set is the best under We partition the data into five groups, and take four group- a certain tradeoff of multiple objectives. After getting the s as training data and the remaining one group as test data. Pareto-optimal portfolio set, a lender can select one portfolio 6 from this set autonomously or randomly. https://www.prosper.com/tools/DataExport.aspx Table 3: Experimental data statistics. 0.4 #Loan #Lender #T r1 #T r2 #T r3 #VA #UA 387,848 62,782 19,311 310,159 5,355,873 2,233 4,800 0.3 In experiments, we adopt five-fold cross-validation and al- l results are the average of five test rounds. The number 0.2

of training instances for GBDT i is denoted as T ri, and the Rate number of test instances (being-auctioned loans) is denoted as T e (i.e.,VA). In the test process or portfolio selections, we 0.1 select portfolios for lenders every a certain period, e.g., two weeks, instead of every day. This is because loans being auc- 0 tioned on neighbor days are almost completely overlapped. 1 2 3 4 5 6 7 8 9 10 Table 3 shows the basic statistics of the experimental data. Lenders Figure 4: Rate v.s. single loan and portfolio. 4.1.1 Analysis from Multi-objective Perspective haviors in the previous cumulative days (i.e., TR) before In this part, we analyze the lenders’ bidding behaviors current time CT in each test round, and get their bidding from the multi-objective perspective. Figure 2(a) shows the results in the following auctions. We can see that, identi- lender distribution on risk, where the X-axis represents the fied small part of lenders (blue bars) contribute the most non-default loan percent in one lender’s past investments and bids (red line), e.g., 6% lenders contribute 85% bids. Thus, Y-axis represents the lender percent. The blue bars are raw in portfolio selections, we mainly take the lenders who have results and red line is fitting curve by Gauss distribution. bids in previous 10 days before current time CT as the cur- Figure 2(b) shows the lender distribution on fully-funded rent active lenders. objective, in which the lender distribution is not uniform. 7 85 Average Bid Percent Figure 2(c) shows the lender distribution on winning-bid Average Lender Percent 80 objective. We can see that, only small part of lenders’ bid- 6 s perform well. In other words, many lenders often suffer 75 from risk and investment failure. Our goal is to help lenders 5 select loan portfolios with higher probabilities on all objec- 70 tives. Figure 2(d) shows the lender distribution on lenders’ 4 u 65 rate expectations (Rei ) which is treated as a personalized

3 Average Bid Percent (%) restrict in portfolio selections. We can see that most lenders Average Lender Percent (%) 60 prefer medium and similar rates, e.g., 0.1-0.15, rather than

2 highest rates. This characteristic of similar rate expectation 1 2 3 4 5 6 7 8 9 10 is taken used when designing DPA and EVA. Previous Cumulative Days (TR) Figure 5: Lender/bid percent v.s. TR. 4.1.2 Analysis from Portfolio Perspective 4.2 Evaluations of Loan Assessments In this part, we analyze lenders’ bidding behaviors from In this subsection, we evaluate the performances of multi- portfolio perspective. We randomly select 10 lenders with objective assessments for loans. Specifically, we evaluate more than 100 bidding records. Firstly, we obtain individu- both the utilities of dynamic features and the effectiveness al loans for each lender, besides, we partition each lender’s of GBDT models on different objectives. bidding loans into different portfolios based on the time in- tervals between two neighbor bids, i.e., if the interval of two 4.2.1 Setups and Baselines neighbor bids is more than 30 days, we partition them in- We denote the GBDT models using all features as GBDT, to different portfolios. Thus, we get the loan portfolios for and GBDT models only using static features as GBDT S. each lender, and we compute the average rate of loans in one We also adopt two comparison models, e.g., Logistic Regres- portfolio as this portfolio’s rate. Figure 4 shows the rates of sion and Decision Tree, which have been used in previous lenders using box plot, in which the blue boxes represent the studies on P2P lending [9, 34, 22]. Similarly, we denote the rates of single loans and green boxes represent the rates of comparison models as LR, LR S, DT, DT S respectively. portfolios of lenders. We can see that, different lenders have All these models get their best performances and parame- different rate preferences; furthermore, for a specific lender, ters through training and validation processes. We adopt portfolio rates are much more stable and focused than sin- two widely-used metrics, i.e., ROC curve and AUC [2] (area gle loan rates. In other word, rate expectation constraint or under ROC curve) to evaluate the assessment results. preference based on portfolio is more reasonable than that based on single loans. In fact, in an investment, a lender 4.2.2 Assessment Results may bid several loans whose rates may be quite different, The assessment results are shown in Figure 3. We can see but the portfolio rate or average rate of these loans is stable that, on three objectives, extracted dynamic features can im- and personalized. This finding demonstrates the rationality prove the performances of all models significantly, especial- of portfolio recommendation in our study rather than con- ly on the fully-funded objective prediction, i.e., about 10% ventional single loan recommendation. improvements on AUC of all models. On the other hand, GBDT models, i.e., GBDT and GBDT S, outperform other 4.1.3 Result of Active Lender Identification models on all three assessment tasks, especially on the non- In this part, we report the result of identifying active default objective with more than 10% improvements. Thus, lenders. Figure 5 shows the statistical result of active lender in portfolio selections, we adopt GBDT with all features to identification. We select the lenders who have lending be- assess the being-auctioned loans. 6.8 1.97 2.96 6.84

1.48 5.13 1.97

3.42 3.4 0.98

0.98 0.49 Lender Percent (%) Lender Percent (%)

1.71 Lender Percent (%) Lender Percent (%)

0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0.05 0.1 0.15 0.2 0.25 0.3 Non-default Loan Percent of Lender's Investments Fully-funded Loan Percent of Lenders' Investments Winning-bid Percent of Lenders' Investments Average Rate (Rate Expectation) (a) (b) (c) (d) Figure 2: Lender distributions/percents v.s. different objectives.

1.0 1.0 1.0 0.9

0.8 0.8 0.8 0.7

0.6 0.6 0.6 0.5 Random Random Random LR (AUC=0.76) LR (AUC=0.97) LR (AUC=0.80) 0.3 0.4 LR_S (AUC=0.73) 0.4 LR_S (AUC=0.79) 0.4 LR_S (AUC=0.78) DT (AUC=0.78) DT (AUC=0.98) DT (AUC=0.83) True PositiveTrue Rate DT_S (AUC=0.76) PositiveTrue Rate DT_S (AUC=0.88) PositiveTrue Rate DT_S (AUC=0.82) 0.1 0.2 0.2 0.2 GBDT (AUC=0.89) GBDT (AUC=0.99) GBDT (AUC=0.86) GBDT_S (AUC=0.86) GBDT_S (AUC=0.89) GBDT_S (AUC=0.82) -0.1 0.0 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Winning-bid Fully-funded Non-default False Positive Rate False Positive Rate False Positive Rate Non-default Fully-funded Winning-bid (a) Non-default assessments (b) Fully-funded assessments (c) Winning-bid assessments (d) Correlations Figure 3: Prediction performances of assessments. Besides, we also show the Pearson correlations of assessed adopts random initializations and needs 150 generations for loans’ profile values on different objectives in Figure 3(d). all lenders. The other setups in EVAR are the same with This figure tells us that the correlations between loans’ esti- these in EVA. Similar to EVA series, we also get EV AR1, mated profile values on different objectives are not significant EV AR2, EV AR3, and EV ARR. (i.e., values are less than 0.5). Thus, we can’t get the op- SELF . We also get the results selected by lenders them- timal loans on all objectives by recommending or selecting selves or recommending by criteria match, e.g., rate match, loans through conventional one-objective techniques. in Prosper, which are denoted as SELF . 4.3 Evaluations of Portfolio Selections Metrics. We adopt the real percents of loans with posi- tive labels on different objectives (Non-default, Fully-funded, In this subsection, we evaluate our portfolio selection ap- Winning-bid) in the selected portfolio as our evaluation met- proaches holistically. rics. Besides, we simulate the bidding and repayment pro- 4.3.1 Experimental Setups cesses of selected portfolios, i.e., a good loan means this loan In EVA, the population size is set as 100; the generations can be fully funded, a lender bid at current will success after are set as 150 for first lender and 30 for other lenders; the its auction and this loan will repay in time. Thus, through crossover rate is set as 0.5; and the mutation rate is set as simulation, we can compute the average return rates of port- 50K/|VA|. The portfolio size K is given by each lender. In folios for lenders theoretically, which can be treated as an selections, actually, we can remove some individual loans in overall metric considering three objectives. In our data, for advance that are dominated by many (i.e., 50) other loan- repayment, we only know the default boolean labels without s before selecting, since they are almost impossible to be knowing the specific installments. Thus, when computing chosen into any portfolios. the simulative average return rates, we suppose the default- Baseline Methods. To the best of our knowledge, there ing borrowers will default a certain percent principal amount, are not relevant works on portfolio selections from multi- which is set as 30% and 60% respectively, (i.e., Return-0.3 ple objectives in P2P lending or other domains. Thus, in and Return-0.6 are our two overall metrics). this experiment, we compare DPA, EVA and their variants. Specifically, we set the comparison methods as follows. 4.3.2 Experimental Results DPA series. For DPA, we adopt multiple different param- The selecting results are shown in Table 4. We can see eters α, (i.e., (0.5, 0.3, 0.2),(0.3, 0.4, 0.3),(0.3, 0.5, 0.2),(0.2, that, DPA series can get good results with some certain 0.6, 0.2)), and the corresponding algorithms are denoted as tradeoffs on different objectives and EVA series perform best, DPA1, DPA2, DPA3, DPA4. i.e., EVA1, EVA2, EVA3 perform best on Non-default, EVA series. Since EVA provides a portfolio set rather than Fully-funded, Winning-bid metrics respectively. Compara- a single portfolio for each lender, for comparison, we respec- tively, EVAR series also perform well and a little worse than tively select the portfolios with maximum non-default objec- EVA series in most cases. However, EVAR needs much more tive, fully-funded objective and winning-bid objective from time cost than EVA which will be reported later. Even the the Pareto-optimal candidate set as the final selections. The random selections, i.e., EVAR, from Pareto-optimal set are corresponding methods are denoted as EVA1, EVA2 and much better than the lenders’ own selections on all three EVA3. Besides, we can also select a portfolio randomly metrics. On the two overall metrics, i.e., Return-0.3 and from the Pareto-optimal portfolio set as the final result, and Return-0.6, most methods will improve lenders’ returns to this method is denoted as EVAR. varying degrees. Further, EVA3 and EVA1 can provide the EVAR series. As a variant of EVA, EVAR is a conven- highest average return rates, i.e., 11.5% and 10.3% return- tional evolutionary algorithm similar to EVA, which always s, respectively. In other words, EVA could provide lenders Winning-bid 1 research topics in P2P lending. For assessing the loan risk or EVA2 0.9 credit, some conventional classification models were adopt-

0.8 ed, such as Logistic Regression [9, 34] and Neural Network [3]. Fully-funded probability is another important aspect to 0.9 0.7 assess loans in P2P lending. Herzenstein et al. [14] stud- EVA3 ied both the borrower-related determinants and loan loan- 0.6 related determinants of funding success in Prosper. Their re-

DPA1 0.5 sults indicated that borrower-related financial determinants Fully-funded 0.8 DPA2 affected funding the most, while loan-related variables me- DPA3 0.4 DPA4 diate affected the likelihood of funding success. Ryan et al. EVA [28] proposed two regression models combining personal and EVAR 0.3 social determinants and financial determinants for funded 0.7 EVA1 0.6 0.7 0.8 0.9 1 percent and number of bids respectively. These works fo- Non-default cused on a certain objective assessments, e.g., risk or fully- Figure 6: Solutions found by different algorithms. funded, and none of them adopted or combined the dynamic with more economic profits. SELF and several method- features extracted from lenders and auctions. s with low performances on non-default objective even get Bidding analysis especially herding [4, 18], and social com- negative return rates on Return-0.6. munity [13, 6] were also well studied in previous studies. In Figure 6 visualizes the solutions getting from different [6], authors found lender team was an important community methods for a random lender, in which one point represents a to help lenders make decision and promoted lending activ- portfolio. X-axis represents the non-default assessment val- ities in P2P lending. In [5], authors predicted how likely ues (relative values mapping into 0-1), Y-axis represents the a given lender would fund a new loan through a gradien- fully-funded assessment values and the color represents the t boosting tree method. In [34], authors proposed to rec- winning-bid assessment values. We can see that, DPA se- ommend single personalized loans to lenders by considering ries get one portfolio selection while EVA (and EVAR) gets both lender preference and loan risk. However, there are few a Pareto-optimal portfolio set firstly. In fact, DPA with studies in P2P lending from the multi-objective assessments a certain parameter α is a specific tradeoff on three objec- or select/recommend portfolios to lenders in P2P lending. tives. Further, the Pareto-optimal portfolio set getting via In addition to the studies in P2P lending, works on recom- EVA is a little better than EVAR’ since the point skyline mendation may be also relevant to our portfolio selections or envelope surface of EVA dominates EVAR’. That may to some extent. Recommender system provides suggestions be because EVA adopts the special initialization strategy of items that may interest users and aims to predict users’ which can lead to faster convergence and is more conducive preferences with high accuracy [21, 27]. However, in P2P to inherit the good individuals/solutions. lending or other financial domains, accuracy may be not as We also compare the efficiency results of different algo- important as in the traditional recommendations, e.g., elec- rithms. Since DPA only provides one solution for each tronic commerce. In [35, 30], portfolio theory was used on lender while EVA gets a solution set, for comparison, we recommendation and information retrieval, but these works also report the results of DPA running 50 (Pareto set size) still aim to get single items rather than combination items times, which is denoted as DPAP. The running time results like loan portfolios in our study. of seconds (in log scale) are shown in Figure 7. We can see that DPA and DPAP are most efficient. EVA needs more 6. CONCLUSIONS time than DPAP does. EVAR takes much more time costs In this paper, we proposed a holistic study on portfolio se- compared with EVA since it needs more evolutionary gener- lections in P2P lending. First, we assessed loans on multiple ations. These comparisons of EVA and EVAR demonstrate objectives by a gradient boosting decision tree. We extracted the effectiveness and efficiency of EVA. dynamic features from lenders and auctions for better assess- ment performances. Then, to help lenders select portfolios, we proposed two strategies, i.e., weighted objective optimiza- tion strategy and multi-objective optimization strategy. The

5 first strategy provided an optimal portfolio and the multi- 10 objective optimization strategy provided a Pareto-optimal portfolio set for each lender. In two strategies, two selection DPA algorithms, i.e., DPA and EVA were designed respectively. DPAP 104 EVA For evaluating our approach, we constructed extensive ex- EVAR periments on Prosper data. The analysis and experimental Running Time (log (Seconds)) results demonstrated the significance of our study and the

1 2 3 4 5 effectiveness of our solutions. Specifically, the extracted dy- Test Round namic features and GBDT models significantly improved the Figure 7: Running time results. assessment performances. Further, the portfolio selections, especially the multi-optimization strategy and EVA algorith- 5. RELATED WORK m provided lenders with more economic profits by selecting To the best of our knowledge, there are few existing works Pareto-optimal portfolios both effectively and efficiently. on portfolio selections in P2P lending. However, there are some relevant studies in the P2P lending domain, e.g., loan 7. ACKNOWLEDGEMENTS risk assessment [9, 23], bidding behavior analysis [4, 18]. This research was partially supported by National Sci- Risk assessment and fully-funded prediction are two hot ence Foundation for Distinguished Young Scholars of China Table 4: Performances of portfolio selections. Result\Alg DPA1 DPA2 DPA3 DPA4 EVA1 EVA2 EVA3 EVAR EV AR1 EV AR2 EV AR3 EV ARR SELF Non-default 0.831 0.789 0.734 0.731 0.952 0.714 0.856 0.806 0.923 0.715 0.857 0.801 0.710 Fully-funded 0.178 0.220 0.323 0.342 0.256 0.543 0.214 0.289 0.256 0.529 0.202 0.267 0.123 Winning-bid 0.778 0.856 0.833 0.802 0.824 0.689 0.921 0.843 0.801 0.678 0.902 0.810 0.741 Return-0.3 0.067 0.073 0.069 0.070 0.112 0.080 0.115 0.105 0.105 0.085 0.097 0.086 0.057 Return-0.6 0.035 0.021 -0.026 -0.027 0.103 -0.063 0.069 0.044 0.099 -0.049 0.064 0.032 -0.046

(61325010), Natural Science Foundation of China (61403358, many-objective optimization. In Evolutionary 61572032, 71571093), National High Technology Research Multi-Criterion Optimization, pages 307–321. Springer, and Development Program of China (2014AA015203) and 2013. National Center for International Joint Research on E-Business [17] M. Kumar and S. I. Feldman. Internet auctions. In Proceedings of the 3rd USENIX Workshop on Electronic Information Processing (2013B01035). Qi Liu gratefully ac- Commerce, volume 31, 1998. knowledges the support of the Youth Innovation Promotion [18] E. Lee and B. Lee. Herding behavior in online p2p lending: Association of CAS and acknowledges the support of the An empirical investigation. Electronic Commerce Research CCF-Tencent Open Research Fund. and Applications, 11(5):495–503, 2012. [19] M. Lin, N. R. Prabhala, and S. Viswanathan. Judging 8. REFERENCES borrowers by the company they keep: friendship networks [1] R. J. Agrawal and J. G. Shanahan. Location disambiguation and information asymmetry in online peer-to-peer lending. in local searches using gradient boosted decision trees. In Management Science, 59(1):17–35, 2013. 18th SIGSPATIAL, pages 129–136. ACM, 2010. [20] D. Liu, D. Brass, Y. Lu, and D. Chen. Friendships in online [2] A. P. Bradley. The use of the area under the roc curve in peer-to-peer lending: Pipes, prisms, and relational herding. the evaluation of machine learning algorithms. Pattern Mis Quarterly, 39(3):729–742, 2015. recognition, 30(7):1145–1159, 1997. [21] Q. Liu, E. Chen, H. Xiong, C. H. Ding, and J. Chen. [3] A. Byanjankar, M. Heikkila, and J. Mezei. Predicting credit Enhancing collaborative filtering by user interest expansion risk in peer-to-peer lending: A neural network approach. In via personalized ranking. Systems, Man, and Cybernetics, Computational Intelligence, 2015 IEEE Symposium Series Part B: Cybernetics, IEEE Trans. on, 42(1):218–233, 2012. on, pages 719–725. IEEE, 2015. [22] B. Luo and Z. Lin. A decision tree model for herd behavior [4] S. Ceyhan, X. Shi, and J. Leskovec. Dynamics of bidding in and empirical evidence from the online p2p lending market. a p2p lending service: effects of herding and predicting loan Information Systems and e-Business Management, success. In 20th WWW, pages 547–556. ACM, 2011. 11(1):141–160, 2013. [5] J. Choo, C. Lee, D. Lee, H. Zha, and H. Park. [23] C. Luo, H. Xiong, W. Zhou, Y. Guo, and G. Deng. Understanding and promoting micro-finance activities in Enhancing investment decisions in p2p lending: an investor kiva. org. In 7th WSDM, pages 583–592. ACM, 2014. composition perspective. In 17th SIGKDD, pages 292–300. [6] J. Choo, D. Lee, B. Dilkina, H. Zha, and H. Park. To ACM, 2011. gather together for a better world: Understanding and [24] H. Markowitz. Portfolio selection*. The journal of finance, leveraging communities in micro-lending recommendation. 7(1):77–91, 1952. In 23rd WWW, pages 249–260. ACM, 2014. [25] E. Mollick. The dynamics of : An exploratory [7] C. A. C. Coello, D. A. Van Veldhuizen, and G. B. Lamont. study. Journal of Business Venturing, 29(1):1–16, 2014. Evolutionary algorithms for solving multi-objective [26] T. Odean. Volume, volatility, price, and profit when all problems, volume 242. Springer, 2002. traders are above average. The Journal of Finance, [8] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast 53(6):1887–1934, 1998. and elitist multiobjective genetic algorithm: Nsga-ii. IEEE [27] F. Ricci, L. Rokach, and B. Shapira. Introduction to Trans.Evo.Com., 6(2):182–197, 2002. recommender systems handbook. Springer, 2011. [9] G. Dong, K. K. Lai, and J. Yen. Credit scorecard based on [28] J. Ryan, K. Reuk, and C. Wang. To fund or not to fund: logistic regression with random coefficients. Procedia Determinants of loan fundability in the prosper. com Computer Science, 1(1):2463–2468, 2010. marketplace. WP, The Standord Graduate School of [10] J. H. Friedman. Greedy function approximation: a gradient Business, 2007. boosting machine. Annals of statistics, pages 1189–1232, [29] L. C. Thomas. Consumer Credit Models: Pricing, Profit 2001. and Portfolios: Pricing, Profit and Portfolios. OUP [11] Y. Guo, W. Zhou, C. Luo, C. Liu, and H. Xiong. Oxford, 2009. Instance-based credit risk assessment for investment [30] J. Wang and J. Zhu. Portfolio theory of information decisions in p2p lending. European Journal of Operational retrieval. In 32nd SIGIR, pages 115–122. ACM, 2009. Research, 249(2):417–426, 2016. [31] Z. Wei and M. Lin. Market mechanisms in online [12] T. Hastie, R. Tibshirani, J. Friedman, and J. Franklin. The peer-to-peer lending. Management Science, April 1, 2016. elements of statistical learning: data mining, inference and [32] J. Xie, V. Rojkova, S. Pal, and S. Coggeshall. A prediction. The Mathematical Intelligencer, 27(2):83–85, combination of boosting and bagging for kdd cup 2009-fast 2005. scoring on a large database. In KDD Cup, pages 35–43, [13] S. Herrero-Lopez. Social interactions in p2p lending. In 3rd 2009. Workshop on Social Network Mining and Analysis, page 3. [33] Q. Zhang and H. Li. Moea/d: A multiobjective ACM, 2009. evolutionary algorithm based on decomposition. IEEE [14] M. Herzenstein, R. L. Andrews, U. M. Dholakia, and Trans.Evo.Com., 11(6):712–731, 2007. E. Lyandres. The democratization of personal consumer [34] H. Zhao, L. Wu, Q. Liu, Y. Ge, and E. Chen. Investment loans? determinants of success in online peer-to-peer recommendation in p2p lending: A portfolio perspective lending communities. Boston University School of with risk management. In ICDM, pages 1109–1114. IEEE, Management Research Paper, (2009-14), 2008. 2014. [15] M. Jahrer, A. T¨oscher, and R. Legenstein. Combining [35] H. Zhu, H. Xiong, Y. Ge, and E. Chen. Mobile app predictions for accurate recommender systems. In 16th recommendations with security and privacy awareness. In SIGKDD, pages 693–702. ACM, 2010. 20th SIGKDD, pages 951–960. ACM, 2014. [16] H. Jain and K. Deb. An improved adaptive approach for elitist nondominated sorting genetic algorithm for