electronics

Article Predicting Fundraising Performance in Medical Campaigns Using Machine Learning

Nianjiao Peng 1 , Xinlei Zhou 2 , Ben Niu 1,3 and Yuanyue Feng 1,*

1 College of Management, Shenzhen University, Shenzhen 518061, China; [email protected] (N.P.); [email protected] (B.N.) 2 College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518061, China; [email protected] 3 Greater Bay Area International Institute for Innovation, Shenzhen University, Shenzhen 518061, China * Correspondence: [email protected]; Tel.: +86-18682456820

Abstract: The coronavirus disease (COVID-19) pandemic has flooded public health organizations around the world, highlighting the significance and responsibility of medical crowdfunding in filling a series of gaps and shortcomings in the publicly funded health system and providing a new fundraising solution for people that addresses health-related needs. However, the fact remains that medical fundraising from crowdfunding sources is relatively low and only a few studies have been conducted regarding this issue. Therefore, the performance predictions and multi-model comparisons of medical crowdfunding have important guiding significance to improve the fundraising rate and promote the sustainable development of medical crowdfunding. Based on the data of 11,771 medical crowdfunding campaigns from a leading donation-based platform called Weibo Philanthropy, machine-learning algorithms were applied. The results demonstrate the potential of ensemble-based machine-learning algorithms in the prediction of medical crowdfunding project fundraising amounts and leave some insights that can be taken into consideration by new researchers and help to produce   new management practices.

Citation: Peng, N.; Zhou, X.; Niu, B.; Keywords: crowdfunding; medical crowdfunding; machine learning; fundraising prediction; weibo Feng, Y. Predicting Fundraising philanthropy Performance in Medical Crowdfunding Campaigns Using Machine Learning. Electronics 2021, 10, 143. https://doi.org/10.3390/ 1. Introduction electronics10020143 Pandemics are increasingly becoming a constant threat to human beings [1,2]. As the Received: 8 December 2020 latest example, the coronavirus (COVID-19) pandemic, which has affected public healthcare Accepted: 4 January 2021 systems and caused severe economic and social distress worldwide [3–5], reminded people Published: 11 January 2021 of the significance of medical crowdfunding. As a novel and noteworthy fundraising chan- nel, medical crowdfunding helps with relieve the high medical cost burdens [6–8]. Among Publisher’s Note: MDPI stays neu- one type of donation-based crowdfunding supported by an online platform, medical crowd- tral with regard to jurisdictional clai- funding has arisen as a new form of charitable fundraising solution for individuals dealing ms in published maps and institutio- with financial pressures and reducing the possibility of personal or family bankruptcy from nal affiliations. the costs of medical treatment [9]. More importantly, medical crowdfunding plays an increasingly crucial complementary role when there are certain gaps and deficiencies (e.g., a lack of public insurance, difficulty in obtaining specialist care and long waiting time for treatment) in the public funded Copyright: © 2021 by the authors. Li- health system under a pandemic situation. Medical crowdfunding therefore is a good censee MDPI, Basel, Switzerland. This article is an open access article response to these inadequacies and represent a way to satisfy health-related needs [10]. distributed under the terms and con- The total amount of global crowdfunding platform financing reached US $72.3 billion in ditions of the Creative Commons At- 2017, about half of this amount was collected from medical crowdfunding [11]. Raisers tribution (CC BY) license (https:// could approach the public and raise the necessary funds for their proposed aiding projects creativecommons.org/licenses/by/ through medical crowdfunding projects [9]. Based on the continuous improvement of 4.0/).

Electronics 2021, 10, 143. https://doi.org/10.3390/electronics10020143 https://www.mdpi.com/journal/electronics Electronics 2021, 10, 143 2 of 16

“digital philanthropy” and , medical crowdfunding has become widely accepted and now represents a channel with great potential. During the Covid-19 pandemic, a medical crowdfunding platform called Weibo Philanthropy (Weibo GongYi) took on a great deal of social responsibility and performed very well. The platform itself first invested 100 million Yuan a special rescue fund and then opened donation channel used by early 2,800,000 people that, together, donated about approximately 8 million US dollars to support the fight against the epidemic through this platform. For example, the first project to offer support in the pandemic was achieved its target of 22,000 US Dollars in 50 min [12]. Even if relatively low socio-economic classes and under-educated groups are in urgent need for funds and ways to promote it [13], developing a high-performance fundraising project without guidance is still extremely challenging [11,14,15].Therefore, it is applicable to predict the results of project financing through understanding the influencing factors and attributes of medical crowdfunding. With a better understanding of this scheme, it would be possible to efficiently give particular guidance and optimization to craft fundraising campaigns. This is one of the reasons why this study was conducted. The aim of this paper is to take “Weibo Philanthropy” as the key example, and apply machine-learning algorithm to study the data of Project (including Target Amount; Funding duration; Number of updates; Word count of description; Fundraising success rate) and data of Participant (Type of raisers; Number of donors; Number of retweets). The purpose of this paper is to predict the impact of these factors on the performance of medical crowdfunding projects and the capacity of projects’ fundraising, to provide more conducive suggestions to fundraisers. This paper attempts to make the following contributions. Firstly, the current research on donation-based crowdfunding mainly focuses on the concept, characteristics, impact factors and operation mechanism [16,17]. These attributes related to the performance of medical financing need to be further studied [11]. Other features are centered on the motivation of donors [18–21], or individual biases [22,23]. Moreover, there are few pieces of research on the topic segmentation into medical crowd- funding, let alone adopting machine learning to study this topic. For instance, if the topic of “crowdfunding” is searched on Google Scholar, it will return 6430 results. However, just 39 of those results would be for “medical crowdfunding”, and only 2 results would involve the use of machine learning for medical crowdfunding, representing a gap on the field that must be filled. Secondly, the innovative database is also one of the leads of this study. Unlike pre- vious research data, which is generally obtained from GoFundMe, GiveForward or Kick- starter, the data used here was crawled from the popular platform “Weibo Philanthropy” (gongyi.weibo.com), between February 2012 and March 2020 (eight years) on this platform, which has not been studied before. It is an online platform that was established in 2012 for needy individuals and groups to solicits public welfare fundraising campaigns. It also carries out grassroots activities nationwide. In the past three years, this site has raised over 400 million donations from more than 20 million users [24]. Thirdly, in reality, the success of donation-based crowdfunding projects, especially when it comes to medical assistance projects, dos not entirely driven by the behavior of donors. Factors such as the raiser’s understanding of the project, the donor’s attitude to- wards the project and the interaction between participants, may also affect the performance of crowdfunding campaigns. Lastly, the fundraising of medical crowdfunding is an uncertain process, and its per- formance changes dynamically [11,25]. Based on this, this paper dynamically predicts the financing outcomes to guide raisers how to launch and follow up projects more efficiently. With a methodology based on machine learning, it also puts forward a suggestion for a rea- sonable fundraising target amount. It is noteworthy that this is an exploring research, since systematic attempts to discover the potential and advantages of using machine learning for medical crowdfunding financing predictions through research are still lacking. Electronics 2021, 10, 143 3 of 16

2. Literature Review 2.1. Donation-Based Crowdfunding The concept of crowdfunding comes from the broader concept of [26]. Such strategy focuses on the idea of using people,in other words, the crowd to obtain ideas, feedbacks and solutions to develop corporate activities [27], and it is now considered a new source of value creation [23,28–30]. A myriad of social causes can contribute to the emersion of a donation-based crowdfunding, which generally respond to the lack of consummation of public goods. Moreover, it offers financial or social aids for individuals and communities that could hardly seek help elsewhere [31]. The prosocial motivation, defined as the desire to help others, is perceived to be most considered in donation-based crowdfunding. Individuals behave prosocially for reasons such as feeling good about helping others, making impact or developing a favorable image—all the motives usually associated with the funding performance [32,33]. Previous researchers have suggested the people with higher engagement in prosocial behaviors tend to make financing decisions for public benefits [18,34]. For this reason, donation-based crowdfunding is very likely to attract a wide variety of potential contributors. Notably, projects on donation-based platforms have concrete and well-defined terms and guidelines, especially in comparison with conventional and loosely constructed fundraising campaigns.

2.2. Medical Crowdfunding As one of donation-based crowdfunding, medical crowdfunding arises as a novel solution and configurations for individuals dealing with short-term financial pressures from the costs of medical treatment. Through medical crowdfunding platform raisers could approach the crowd and raise necessary funds for their proposed aiding projects online [9]. These platforms/websites dis-intermediate the fundraising process and exert strong network effects among contributors [35]. At the same time, medical care could incorporate with be positively linked with poverty reduction and children’s education and other projects [36–39]. Medical crowdfunding is increasingly becoming an opportunity area of crowdfunding with great potential and might become an effective way to alleviate financial pressures of patients [10,40–42]. Importantly, it differs from traditional approach in several aspects and such differences must be acknowledge. Firstly, increased funding requirements for health initiatives, such as public health vaccine development, for instance, make medical crowdfunding as a viable funding chan- nel [38,40]. Individuals who are sick, especially those who cannot afford the cost for medical costs, turn to crowdfunding to raise funds [10,15,42,43]. Medical crowdfunding campaigns are the channels to collect funds for various reasons: hospital costs, disease treatment, home care needs, post diagnostic agreements, general support for drugs, postop- erative care. In this scenario, a number of dedicated medical crowdfunding platforms have emerged, including YouCaring, Giveforward and GofundMe in US, Weibo Philanthropy and Tecent Donation in China. Secondly, one significant feature of medical crowdfunding is emphasizing beneficia- ries’ abilities in soliciting funds in terms of the scale. Environmental trend factors, including geographical constraints, deficiency of medical insurance and health care regulations, drive population in need to medical crowdfunding platforms to get funded [10,15,38,44,45]. The scale of capital acquisition has been expanded by aggregating potential donors in nominated locations, eliminating offline spatial and temporal constraints [22,46]. Thirdly, information sharing features are one of the greatest features of Medical crowd- funding and can be linked to a boost in quantity and speed of the fundraising. The platform not only enables funding projects to simultaneously broadcast to immediate relations such as friends and family, but also considerably propagates information sharing via social media to broader social relations, thus helping to spread the word [36]. That said, the quan- tity of potential donors of medical crowdfunding could easily outperforms traditional face-to-face solicitations. Electronics 2021, 10, 143 4 of 16

Finally, the threshold of donation is extremely low, since contributor could voluntarily donate a very small amount of money. Since the medical crowdfunding network has formulated and attracted an ocean of existing and potential donors, the financial resources are accumulated and aggregated exponentially. Therefore, with a low threshold of donation, individuals are more likely to give a hand. Even those with very weak ties to the beneficiary may donate [47].

2.3. Research on Crowdfunding Prediction with Machine Learning Previous studies on crowdfunding prediction usually apply traditional statistical methods, such as linear regression and logistic regression [48,49], assuming that the input variables are also independent variables, Consequently, the the regression method may produce a larger prediction error when the input variables are correlated [50]. Although machine learning is an effective method to analyze hidden association in big data set and identify complex data [51], scholars begin to apply such algorithms to practical issues. For example, using machine-learning algorithm to predict the success of projects [50,52,53], such as Support Vector Machine(SVM), Decision Trees (DT) and K-Nearest Neighbor(KNN) algorithm [54–57], as well as using XgBoost, Gradient Boosting, Random Forest and GLM to construct prediction models [58]. Kamath et al. (2016) builds several supervised learning models to predict success rate of crowdfunding campaigns (including Neural Network, Random Forest, Naïve Bayes and Decision Tree) and found that Neural Network performed better [59]. Furthermore, a multi-modal deep learning framework also has been developed to predict the outcome of crowdfunding projects [60]. The others are based on a framework of text, and using a Random Forest algorithm to predict the financing performance of crowdfunding [61]. Notwithstanding the aforementioned novel methods, their studies are focused on the category of general donation-based crowdfunding. Based on that, the subdivision that is the focus of this paper is medical crowdfunding campaigns belonging to donation- based crowdfunding category. In addition, the prediction model here proposed is based on eXtreme Gradient Boosting algorithm and, differently than its applicability in other studies; it is used to compare it with other four algorithm (including Classification and Regression Tree, K-Nearest Neighbor, Linear Regression, Artificial Neural Network). To the best of our knowledge, this study is the first to use machine-learning algorithm to establish an effective prediction model for Weibo Philanthropy medical crowdfunding campaigns, and compared the five algorithms to get the higher accuracy outcome.

3. Methodology 3.1. Data Collection The data for this research was crawled from a leading donation-based platform called “Weibo Philanthropy” (gongyi.weibo.com), established in 2012 with the aim to help in- dividuals and groups in need to solicit public welfare fundraising campaigns. It is an affiliated crowdfunding platform of Sina Weibo currently the largest micro-blogging plat- form in China, with over 516 million active users. The platform’s impressive fundraising performance, and considerable number of active users were the main reasons why it was chosen as a subject for this research, not to mention its wide range of dissemination. Nearly 28 million people donated about $8.3427 million in medical crowdfunding projects in Weibo Philanthropy to support the fight against the COVID-19 epidemic. The first project of them reached its target of $22,000 in 50 min (Sina 2020). It raised more than 100 million Yuan in 57 h to help the victims of the Lushan earthquake in Sichuan Province (Sina 2013). Thus, it can be said that this platform provides reliable data with multiple attributes for training and testing models, which make the prediction results more realistic and more comparable. The analysis object of this work is data between February 2012 and March 2020 (eight years) on this platform, which reflect the fact on online medical crowdfunding field comprehensively. Electronics 2021, 10, 143 5 of 16

3.2. Data Preprocessing Analysis The data set contains 11,771 instances of medical crowdfunding projects. As shown in Figure1, first step, 25,413 sets of donation-based crowdfunding data from February 2012 to March 2020 on the website were crawled by using Python, followed by the second step to remove 4558 projects that were still in the process of completion. Then, 1747 sets of duplicate data or messy data as well as other data sets except medical crowdfunding in donation-based crowdfunding (including Animal Protection, Education, Environment Protection, Poverty Alleviation) were taken away. Finally, the data sets with missing “Target Amount” in medical crowdfunding campaigns were cleared away. Upon completing the above five steps, we get 11,771 sets of data. As shown in Table1 each sample consists of 8 factors about both project and participant, including type of raisers, number of donators, number of retweets, target amount, funding duration, number of updates, word count of description, and fundraising. The factors were chosen according to the literature and properly summarized in Table2. We take the fundraising which is the total amount of raised funds as the target feature because the purpose of our work is to analyze the influencing factors of the fundraising and predict the fundraising ability of users. After selection, there are 11,771 samples left in the data set that contain 2 missing values of 2 samples. We filled both missing values of duration via the update report of these two samples, respectively. The visualized relationships between each attribute and the final amount of crowdfunding (fundraising) are shown in Figure2 which explicates there are outliers in the data set intuitively. We calculated 24 outliers by comparing the Euclidean distance between the sample and the mean value of each attribute and setting the hyperparameters based on the principle of keeping the greatest number of objects. Figure3, containing 11,747 samples in each subgraph, illustrates the relationship between each attribute and the final amount of crowdfunding (fundraising).

Table 1. Feature and description.

Feature Description TypeRaiser Type of raisers (personal = 0, organization = 1) Target The dollar amount that is sought through the campaign (CNYY) NumRetweet Number of people who shared the campaign link Duration Time period for which the campaign has been active (days) NumDonator Number of people who donated to a campaign NumUpdate Number of updates between the launched time and the ended time NumWord The number of words contained in project description. Fundraising The total amount of raised fund

Table 2. Summary of factors and related references.

Factors Related References

Target Sudhir et al. (2016), Ryzhov et al. (2016), Gleasure and Feller (2016), Qian and Lin (2017), Sulaeman and Lin (2018), Chen et al. (2020)

Type of Raisers Burnett (2002), Sargeant (1999), Baron (1997), Kogut and Ritov (2007), Ein-Gar and Levontin (2013)

Duration Mollick (2014), Cumming et al. (2015), Cordova et al. (2015), Liu and Liu, 2016, Lukkarinen et al. (2016) Number of Retweets Kromidha and Robson (2016), De Vries et al. (2012), Liu et al. (2017) Number of Words Qian and Lin (2017), Sulaeman and Lin (2018), Zhou et al. (2018)

Number of Donors Kromidha and Robson (2016), Sudhir et al. (2016), Giudici et al. (2018), Gleasure and Morgan(2018) Zheng et al. (2014), Xu et al. (2014), Beaulieu et al. (2015), Number of Updates Kraus et al. (2016), Kuppuswamy and Bayus (2017), Block et al. (2018), Gleasure and Morgan (2018), De Larrea et al. (2019), Wang and Wang (2019) Electronics 2021, 10, 143 6 of 16

Figure 1. Sample selection process. Electronics 2021, 10, 143 7 of 16

Figure 2. Distributions of selected samples from the perspective of correspondence between each factor and final fundraising amount.

Figure 3. Distributions of samples without outliers from the perspective of correspondence between each factor and the final fundraising amount.

The distribution of fundraising is shown in Figure4. Obviously, it approximates a long-tailed distribution. The min, median, mean, max, and standard deviation of crowd- funding are 136,900, 605,883.25, 7,259,000, and 1,111,957.73, respectively. Table3 shows the descriptive statistics for each factor of the data set. As illustrated in Table3, the difference among the mean of each factor is large. However, the large differences affect the output significantly of machine-learning models. To eliminate the dimensional influence between Electronics 2021, 10, 143 8 of 16

features, data standardization is required. Standardization is a sort of technology that scales the data to a small specific interval removing the unit limit of the data and converting data into a dimensionless value. In this way, the contribution of each feature to the result is balanced, and the iterative speed of gradient descent can also be improved. We perform Min-Max normalization converts each feature to the interval [0,1].

Table 3. Descriptive statistics.

Min Median Mean Max Std.Dev. Target 100 50,000 52,830.43 1,000,000 57,881.63 NumRetweet 0 34 123.70 999 229.29 Duration 0 60 56.04 180 14.76 NumDonator 0 82 227.82 14,315 637.63 NumUpdate 0 4 5.47 42 4.67 NumWord 66 1103 1096.78 2713 409.41 Fundraising 0 136,900 605,883.25 7,259,000 1,111,957.73

Figure 4. Distribution of Fundraising. 3.3. Models The existing quantitative analysis methods cannot predict the fundraising ability, and due to the limitation of the expression ability of traditional analysis methods, there may be problems such as insufficient analysis. Therefore, we put our effort into finding a more effective way to help us analyze the social phenomenon, medical crowdfunding. Recently, machine-learning algorithms have been widely used in various fields such as biology, medical treatment, chemical engineering, and finance. However, there are rare articles about applying machine-learning algorithms to medical crowdfunding for fundraising ability prediction. Although there have been related studies in other crowdfunding fields, such as equity crowdfunding, the data distribution characteristics of medical crowdfunding are obviously different from those in other crowdfunding fields due to the different forms, objects, and goals of crowdfunding. Moreover, There are many kinds of algorithms in machine learning that can find patterns in the training data and map the factors to the target. However, the performance of such algorithms depends heavily on the representation of the collected data and logical inference method. As mentioned in [62], numerous machine- learning algorithms are applicable to a variety of contexts, and the performance of the Electronics 2021, 10, 143 9 of 16

algorithms is at varying levels. Thus, it is meaningful to use machine-learning algorithms to analyze data in the field of medical crowdfunding and get a conclusion about the suitable algorithm for specific medical crowdfunding tasks. In this section, the predictive ability of five representative machine-learning models were compared, i.e., nearest neighbor, linear regression, neural network, decision tree, and an integrated method, to analyze the most suitable machine-learning models for different learning tasks in medical crowdfunding. A brief introduction to each algorithm and the application in medical crowdfunding campaigns are stats below. K-Nearest Neighbors (KNN) proposed by Cover and Hart [63] in 1967 is a simple widely accepted model that can be used to solve both classification and regression problems for crowdfunding campaigns [54–56]. The main idea of KNN is that a sample should be most similar with k nearest samples in the data set. The first step is to calculate the Euclidean distance between the test sample and each training sample. Then, we select k nearest samples after sorting the distance. And the predicting target in this work, fundraising amount, can be calculated by mean value of k samples. Specifically, the inputs of model KNN are testing sample x∗ and the training data set which contains 11,747 instances with the label (fundraising amount of each instance). Then, the Euclidean distances between x∗ and each training sample are calculated and sorted from small to large. The output of the model is the fundraising amount of test sample x∗, which can be obtained by calculating the mean value of top k samples. Linear regression, a traditional method for medical crowdfunding campaigns, is a simple formed and basic regression algorithm, which modeling the linear relationship between our target, fundraising amount y and the set of 7 factors x. The essential of linear regression is solving the parameters ω and b by minimizing a constructed loss function. Linear regression model employed in this work can be formed as:

> yˆ = ω0 + ω1x1 + ... + ωmxm + b = X W + b, (1)

where yˆ represents the predicted value of fundraising amount, m is equal to 7 in this work, because ω denotes the weight of the 7 factors, and > is the transpose. The essential of linear regression is solving the parameters ω and b by minimizing a constructed loss function. Also, the mean-squared error is taken as a loss function in this research and defined as:

n 1 2 L(ω, b) = ∑(ωxi + b − yi) , (2) n i=1

where n is the number of samples which is equal to 11,747 in this work. Thus, the objective function can be written as:

n ∗ ∗ 2 (ω , b ) = argmin ∑(ωxi + b − yi) . (3) (w,b) i=1

Gradient descent was applied to minimize the distance between the predicted value of fundraising amount and value of training samples. It is considered to be an effective method to approach the minimum value of the objective function, since it can keep updating the parameters (seeking partial derivatives for w and b) continuously. Specifically, the input of LR in the training stage are samples with description features and the target feature, i.e., fundraising. The weights of each feature would be set randomly at the beginning, the sum of each weighted feature is the output of the model. The weights are updated in each iteration by minimizing the difference between the output and real value of the target feature. In the testing stage, sample x∗ with description features required for fundraising prediction is the input of the well-trained model LR, the fundraising amount for sample x* calculated by the sum of weighted description features. Artificial Neural Networks (ANN), an effective method for medical crowdfunding campaigns [59], abstract the human brain neuron network from the perspective of infor- Electronics 2021, 10, 143 10 of 16

mation processing, establishing a simple model and form different networks according to connection methods. Figure5 illustrates an ANN with four layers, each one containing multiple neurons (nodes). The nodes of adjacent layers are connected with a weighted linear summation and pass values by feedforward. The data set that used in this research contains 7 features and one target, which determines the number of nodes in the input layer is 7. We take ReLU as the nonlinear activation function on each node to transform the result of the linear weighted sum. The activation function is usually smooth and differen- tiable, such as ReLU, making it possible to train the network based on back-propagation. Specifically, ANN is an extension of LR, which adds the activation function to enhance the expression of the model. In the testing stage, the output and input of ANN are also the predicted fundraising amount and test samples with description features, respectively.

Figure 5. Artificial Neural Network.

Decision tree is a type of classical machine-learning algorithm able to make predictions for the medical crowdfunding campaigns by generating a tree structure that consists of leaf nodes, internal nodes, and a root node. As one of the representative algorithms of the decision tree, Classification and Regression Trees (CART) [64] proposed by Breiman et al. in 1984 can be used for both classification and regression tasks. In contrast to C4.5 proposed by Quinlan [65] in 1994, CART is essentially a binary division of the feature space (i.e., the de- cision tree generated by CART is a binary tree). The CART constructed in this work based on 7 factors and a continuous target, fundraising amount. In this regression task we select the best splitting factors for each node by measuring the Least Square Deviation (LSD) of two parts to construct a CART. Specifically, all the samples are stored in the root node at beginning. Samples in the current node split into two parts according to the best splitting factor that minimize the square deviation of both. Finally, the input space will be divided into M regions according to 7 factors. The predicting target, fundraising amount could be obtained by averaging this value of training samples in the same region. The training strategy of CART is to construct a binary tree by selecting the most important description feature before each division of samples. The sample x∗ which ask for the target feature, fundraising, is the input in the testing stage, and is would be judged by the description features of each node in the well-trained CART model and assigned to the corresponding child node. The output is the mean value of the fundraising amount that all test samples corresponding to the leaf node where sample x∗ located. Gradient Boosting Decision Tree (GBDT) [66], also known as Multiple Additive Regres- sion Tree (MART), is a representative algorithm based on the Boosting strategy in ensemble learning, which can make predictions for medical crowdfunding campaigns. Chen Tianqi improved GBDT in the Kaggle competition and created the eXtreme Gradient Boosting Electronics 2021, 10, 143 11 of 16

(Xgboost) package [67], greatly improving the algorithm’s speed and effect. Here, CART was taken as base learners because it is relatively simple, with low variance and high bias. The model was constructed by adding trees continuously according to the residual between the predicted fundraising amount and the true value of the training sample. A new tree was then included in the model to fit the residual of the previous prediction in each iteration. Predictions are aggregated in an additive manner in which each added model is trained so it will minimize the loss function. The final fundraising amount can be obtained by adding all the scores (each leaf node corresponds to a score) of the well-trained ensemble model with several basic trees. Specifically, Xgboost is an extension of the decision tree, which consists of a set of trees. In the testing stage, the fundraising of a testing sample x* can be output by summing up the weak results of each tree obtained.

4. Experiments The experiment consists of two major regression tasks. The first one employs the collected data set to predict the amount of medical crowdfunding fundraising, making a comparison among the results of five different kinds of machine-learning algorithms. Next, samples with medical crowdfunding fundraising rate (the value obtained after dividing the total amount of raised fund by target amount) over 50% are used to predict the fundraising amount. The experiments were then implemented in Python and ran on a computer with the Windows operating system, an i7-8850H CPU, and 16 GB of RAM (Intel, Santa Clara, CA, USA). One of the steps was to randomly take 80% of the data set as training samples, and the remaining 20% as the testing samples. 10-fold cross-validation was applied in training stage to enhance the generalization ability of the models. In total, five machine- learning algorithms are compared in this work, including CART, KNN, MLP, and Xgboost and the performance of these methods were evaluated based on three metrics: mean absolute error (MAE), mean-squared error (MSE), and R-squared.

4.1. Parameters The parameters optimized by grid search for each algorithm are mentioned in detail below. In the first task, the Manhattan distance between the target and the 15 nearest points was considered to calculate the regression value in KNN. The maximum depth of the CART is limited to 5 and there are 2 hidden layers in the constructed artificial neural network with 50 and 20 neurons in each one. Here, ReLU is taken as the activation function and the parameter of L2 regularization is 0.001, together with the learning rate that is set as 0.01. The model is trained based on Mean-Squared Loss and optimized with Adam [68], an extension of Stochastic Gradient (SGD). The maximum depth of xgboost is 4, the number of basic regression trees is 100, and the learning rate is 0.35. In task 2, the optimal parameters of some models differ from task 1. For example, the best maximum depth is 3 and 5 for CART and Xgboost. There are 3 hidden layers in ANN with 200, 150, 200 hidden nodes, respectively. The most suitable activate function for task 2 is tanh, a hyperbolic functions.

4.2. Experimental Analysis The Pearson correlation coefficient is a statistic method that measure linear correlation coefficient and returns a value of between −1 and +1. The correlations for the factors of medical crowdfunding campaigns, including target, type of raisers, number of retweets, duration, number of donators, number of updates, words of description, and fundraising amount are shown in Table4. According to the standard proposed by the Political Science Department at Quinnipiac University [69], the correlation between the number of retweets and fundraising, the amount is moderate, and the number of donators and fundraising amounts is strong, indicating that the number of retweets and number of donators are two significant factors influencing the final amount of medical crowdfunding fundraising. Electronics 2021, 10, 143 12 of 16

Table5 reports the results of five algorithms evaluated based on 3 metrics obtained in two different tasks. We visualize the results of two tasks separately into Figures6 and7, in which different color represents different machine-learning algorithms, and the length of each bar indicates the Mean absolute error (MAE), Mean-squared error (MSE), and R-squared value of each model. Among all the five algorithms, Xgboost produced the best result across the three performance metrics. In task 1, the performance of all kinds of models, overall, was not satisfactory because of the uneven data distribution and the limited number of attributes that are strongly related to the predicted target. Under such conditions, the performance of Linear regression, CART, ANN, and an ensemble model, Xgboost, are significantly improved compared to the simpler models KNN and LR, with not much difference among these three models. ANN performs best on MAE, while Xgboost performs best on MSE and R-squared. In task 2, we consider samples that fundraising rate is over 50%. Figure7 illustrates the performance of algorithms improved when we focus on samples that fundraising rate is over 50%. Specifically, MSE decreased and R-squared increased significantly, even though MAE reduced slightly. Xgboost performs best in MSE and MAE while CART performs best in MAE. Hence, the long-tail distribution of data limits the performance of algorithms. We can employ data more effectively when a reasonable data segmentation point is found. The algorithm based on ensemble performs best overall for predicting fundraising performance in medical crowdfunding campaigns.

Table 4. Pearson correlations.

Target Raisers NumRetweet Duration NumDonator NumUpdate NumWord Fundraising Target 1 0.073 −0.017 0.177 0.026 0.108 0.105 0.048 Raisers 0.073 1 −0.186 0.038 −0.046 0.107 0.047 −0.154 NumRetweet −0.017 −0.186 1 −0.103 0.200 * 0.044 −0.132 0.365 ** Duration 0.177 0.038 −0.103 1 −0.025 0.016 0.107 −0.109 NumDonator 0.026 −0.046 0.200 * −0.025 1 0.053 0.037 0.455 *** NumUpdate 0.108 0.107 0.044 0.016 0.053 1 0.019 0.109 NumWord 0.105 0.047 −0.132 0.107 0.037 0.019 1 −0.024 Fundraising 0.048 −0.154 0.365 ** −0.109 0.455 *** 0.109 −0.024 1 Note: “*”, “**”, and “***” represent correlation is weak, moderate, and strong, respectively.

Table 5. Performance of models in Task 1 and Task 2.

Task 1 Task 2 MAE MSE R-Squared MAE MSE R-Squared KNN 0.06044 0.01352 0.38398 0.08148 0.01438 0.74662 LR 0.07270 0.01566 0.28610 0.08071 0.01132 0.80050 ANN 0.04976 0.01057 0.51823 0.07230 0.01018 0.82069 CART 0.05147 0.01057 0.51823 0.06915 0.00978 0.82768 Xgboost 0.05140 0.01035 0.52832 0.07109 0.00962 0.83057 Note: The bold highlights the best performed algorithm based on each metric. Electronics 2021, 10, 143 13 of 16

Figure 6. Comparison performance among five machine-learning algorithms (K-Nearest Neighbor(KNN), Linear Regres- sion(LR), Artificial Neural Network(ANN), Classification and Regression tree(CART), Xgboost) based on 3 indicators (Mean absolute error(MAE), Mean-squared error(MSE), R-squared) in task 1.

Figure 7. Comparison performance among five machine-learning algorithms (K-Nearest Neighbor(KNN), Linear Regres- sion(LR), Artificial Neural Network(ANN), Classification and Regression tree(CART), Xgboost) based on 3 indicators (Mean absolute error(MAE), Mean-squared error(MSE), R-squared) in task 2.

5. Conclusions and Future Work 5.1. Discussion Findings This study first performed preprocessing and exploratory analysis based on Weibo Philanthropy samples and then introduced a series of machine-learning algorithms rarely used in medical crowdfunding before. The 10-fold cross-validation is employed in the training stage, and parameters are optimized by grid search for each algorithm. Indicators mean abstract error, mean-squared error and R-squared are applied to evaluate the per- formance of algorithms. The experimental results show the performance of Classification and Regression tree, Artificial Neural Network, Xgboost are not much different, outper- forming other algorithms, such as K-Nearest Neighbors and Linear Regression. Xgboost constructed based on ensemble performers best among all the algorithms with the smallest mean-squared error and the largest R-squared in both tasks of the experiment. The out- comes demonstrate the potential of the ensemble-based machine-learning algorithms to predict medical crowdfunding project fundraising amounts and provide inspiration for guiding follow-up research and management practices.

5.2. Limitations and Further Research Considering this is an exploratory study, even though several algorithms were com- pared and taken into consideration, there are still several limitations that must be acknowl- edged. First, in order to obtain new data that has not been studied in the past, the data selected in this study was extracted from a single medical crowdfunding platform affili- ated to Sina.com which is one of the leading Internet media and services companies for communities worldwide [70,71], however, resulting in limited data. In future research, in order to prove that the conclusion of this study is applicable to more public welfare Electronics 2021, 10, 143 14 of 16

crowdfunding platforms and users, we should obtain more data from multiple platforms, such as GoFundme and Tencent Donation. Also, the selection of attributes needs to be optimized. Since this paper intends to provide new research ideas and the perspective of innovative methods, thus the attribute selection is more based on the actual data and the conclusions of previous empirical studies. However, due to the protection of website information, limited technology and other reasons, the scope of attribute selection still needs to be expanded in future research. It can be a good idea, for example, consider and analyze the influencing factors: the influence of text/image content, the number of potential followers/donors, and the social influence of followers/re-tweeters [6,15]. To improve the performance of the model, data prepossessing and algorithm param- eter adjustment are adopted. Rather than stop at the classical model comparison and application, future researchers should consider extending the content of optimization to the construction of the model. For instance, particle swarm optimization-based extreme gradient boosting for predicting the success of medical crowdfunding campaigns might be a good strategy to obtain better performance in medical crowdfunding fundraising predictions.

Author Contributions: Conceptualization: N.P.; methodology, software: X.Z.; validation: N.P. and X.Z.; formal analysis: N.P.; resources: N.P.; writing—original draft preparation: N.P., X.Z.; writing— review and editing: B.N., Y.F.; supervision: Y.F., B.N. All authors have read and agreed to the published version of the manuscript. Funding: This work described in this paper was supported by the National Natural Science Founda- tion of China [Grant No.71971143, 71702111, 71571120]; The Natural Science Foundation of Guang- dong Province [Grant No.2020A1515010749]; Major Projects of Universities of Guangdong Provincial Department of Education [Grant No.2019KZDXM030]; Guangdong Province Postgraduate Education Innovation Research Project [Grant No.2019SFKC46]; Postgraduate Innovation Development Fund Project of Shenzhen University [Grant No.315-0000470417]. Acknowledgments: The second author would like to thank her supervisor, Xizhao Wang, for his helpful guidance. Conflicts of Interest: The authors declare no conflict of interest.

References 1. Perc, M.; Gorišek Miksi´c,N.; Slavinec, M.; Stožer, A. Forecasting Covid-19. Front. Phys. 2020, 8, 127. [CrossRef] 2. Hâncean, M.G.; Perc, M.; Lerner, J. Early spread of COVID-19 in Romania: Imported cases from Italy and human-to-human transmission networks. R. Soc. Open Sci. 2020, 7, 200780. [CrossRef][PubMed] 3. Momtazmanesh, S.; Ochs, H.D.; Uddin, L.Q.; Perc, M.; Routes, J.M.; Vieira, D.N.; Al-Herz, W.; Baris, S.; Prando, C.; Rosivall, L.; et al. All together to Fight COVID-19. Am. J. Trop. Med. Hyg. 2020, 102, 1181–1183. [CrossRef][PubMed] 4. Mansourabadi, A.H.; Sadeghalvad, M.; Mohammadi-Motlagh, H.R.; Rezaei, N. The immune system as a target for therapy of SARS-CoV-2: A systematic review of the current immunotherapies for COVID-19. Life Sci. 2020, 258, 118185. [CrossRef] 5. Mohamed, K.; Rodríguez-Román, E.; Rahmani, F.; Zhang, H.; Ivanovska, M.; Makka, S.A.; Joya, M.; Makuku, R.; Islam, M.S.; Radwan, N.; et al. Borderless collaboration is needed for COVID-19—A disease that knows no borders. Infect. Control Hosp. Epidemiol. 2020, 41, 1245–1246. [CrossRef] 6. Jin, P. Medical crowdfunding in China: Empirics and ethics. J. Med. Ethics 2019, 45, 538–544. [CrossRef] 7. Renwick, M.J.; Mossialos, E. Crowdfunding our health: Economic risks and benefits. Soc. Sci. Med. 2017, 191, 48–56. [CrossRef] 8. Snyder, J. Crowdfunding for medical care: Ethical issues in an emerging health care funding practice. Hastings Cent. Rep. 2016, 46, 36–42. [CrossRef] 9. Burtch, G.; Chan, J. Investigating the relationship between medical crowdfunding and personal bankruptcy in the United States: Evidence of a digital divide. MIS Q. 2018, 237–262. [CrossRef] 10. Snyder, J.; Mathers, A.; Crooks, V.A. Fund my treatment!: A call for ethics-focused social science research into the use of crowdfunding for medical care. Soc. Sci. Med. 2016, 169, 27–30. [CrossRef] 11. Wang, T.; Jin, F.; Cheng, Y. Early Predictions for Medical Crowdfunding: A Deep Learning Approach Using Diverse Inputs. arXiv 2019, arXiv:1911.05702. 12. News, T.P. Sina Set up a 100 Million Yuan Special Fund to Fight Against the Epidemic. [EB/OL]. Available online: http: //news.sina.com.cn/o/2020-01-29/doc-iimxxste7506408.shtml/ (accessed on 9 November 2020). Electronics 2021, 10, 143 15 of 16

13. Hiskes. Crowdfunding for Medical Bills a Band-Aid, not a Cure-All, UW Bothell Study Finds. [EB/OL]. Available online: https: //www.washington.edu/news/2017/03/13/crowdfundingformedicalbillsabandaidnotacurealluwstudyfinds/ (accessed on 9 November 2020). 14. Gao, F.; Li, X.; Cheng, Y.; Hu, Y.J. Ladies First, Gentlemen Third! The Effect of Narrative Perspective on Medical Crowdfunding; HEC Paris Research Paper No. MKG-2019-1336; HEC: Paris, France, 2019. 15. Berliner, L.S.; Kenworthy, N.J. Producing a worthy illness: Personal crowdfunding amidst financial crisis. Soc. Sci. Med. 2017, 187, 233–242. [CrossRef][PubMed] 16. Young, M.J.; Scheinberg, E. The rise of crowdfunding for medical care: Promises and perils. JAMA 2017, 317, 1623–1624. [CrossRef][PubMed] 17. Valle, V.M. An assessment of Canada’s healthcare system weighing achievements and challenges. Norteamérica 2016, 11, 193–218. [CrossRef] 18. Hong, Y.; Hu, Y.; Burtch, G. Embeddedness, pro-sociality, and social influence: Evidence from online crowdfunding. MIS Q. 2018. [CrossRef] 19. Sulaeman, D.; Lin, M. Reducing Uncertainty in Charitable Crowdfunding. In Proceedings of the PACIS, Yokohama, Japan, 26–30 June 2018. 20. Bagheri, A.; Chitsazan, H.; Ebrahimi, A. Crowdfunding motivations: A focus on donors’ perspectives. Technol. Forecast. Soc. Chang. 2019, 146, 218–232. [CrossRef] 21. Ren, J.; Raghupathi, V.; Raghupathi, W. Understanding the Dimensions of Medical Crowdfunding: A Visual Analytics Approach. J. Med. Internet Res. 2020, 22, e18813. [CrossRef] 22. Agrawal, A.; Catalini, C.; Goldfarb, A. Crowdfunding: Geography, social networks, and the timing of investment decisions. J. Econ. Manag. Strategy 2015, 24, 253–274. [CrossRef] 23. Burtch, G.; Ghose, A.; Wattal, S. Cultural differences and geography as determinants of online prosocial lending. MIS Q. 2014, 38, 773–794. [CrossRef] 24. Tan, X.; Lu, Y.; Tan, Y. An Examination of Social Comparison Triggered by Higher Donation Visibility over Social Media Platforms. In Proceedings of the International Conference on Information Systems (ICIS 2016), Dublin, Ireland, 11–14 December 2016. 25. Vismara, S. Sustainability in equity crowdfunding. Technol. Forecast. Soc. Chang. 2019, 141, 98–106. [CrossRef] 26. Schwienbacher, A.; Larralde, B. Crowdfunding of small entrepreneurial ventures. In Handbook of Entrepreneurial Finance; Oxford University Press: Oxford, UK, 2010. 27. Belleflamme, P.; Lambert, T.; Schwienbacher, A. Crowdfunding: Tapping the right crowd. J. Bus. Ventur. 2014, 29, 585–609. [CrossRef] 28. Aitamurto, T. The impact of crowdfunding on journalism: Case study of Spot. Us, a platform for community-funded reporting. J. Pract. 2011, 5, 429–445. [CrossRef] 29. Lehner, O.M. A Literature Review and Research Agenda for Crowdfunding of Social Ventures, article presented first at the 2012 Research Colloquium on Social Entrepreneurship 16th–19th July; University of Oxford, Skoll Center of SAID Business School UK: Oxford, UK, 2012. 30. Davidson, R.; Poor, N. The barriers facing artists’ use of crowdfunding platforms: Personality, emotional labor, and going to the well one too many times. New Media Soc. 2015, 17, 289–307. [CrossRef] 31. Allison, T.H.; Davis, B.C.; Short, J.C.; Webb, J.W. Crowdfunding in a prosocial microlending environment: Examining the role of intrinsic versus extrinsic cues. Entrep. Theory Pract. 2015, 39, 53–73. [CrossRef] 32. Bretschneider, U.; Leimeister, J.M. Not just an ego-trip: Exploring backers’ motivation for funding in incentive-based crowdfund- ing. J. Strateg. Inf. Syst. 2017, 26, 246–260. [CrossRef] 33. Wang, X.; Wang, L. What makes charitable crowdfunding projects successful: A research based on data mining and social capital theory. In Parallel and Distributed Computing, Applications and Technologies, Proceedings of the International Conference on Parallel and Distributed Computing: Applications and Technologies, Jeju Island, Korea, 20–22 August 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 250–260. 34. Anik, L.; Norton, M. On being the “tipping point”: Threshold incentives motivate behavior. J. Assoc. Consum. Res. 2015, 1, 1. 35. Parker, G.G.; Van Alstyne, M.W. Two-sided network effects: A theory of information product design. Manag. Sci. 2005, 51, 1494–1504. [CrossRef] 36. Mollick, E. The dynamics of crowdfunding: An exploratory study. J. Bus. Ventur. 2014, 29, 1–16. [CrossRef] 37. Davies, R. Three provocations for civic crowdfunding. Inf. Commun. Soc. 2015, 18, 342–355. [CrossRef] 38. Otero, P. Crowdfunding. A new option for funding health projects. Arch. Argent Pediatr. 2015, 113, 154–157. 39. Lee, C.H.; Zhao, J.L.; Hassna, G. Government-incentivized crowdfunding for one-belt, one-road enterprises: Design and research issues. Financ. Innov. 2016, 2, 2. [CrossRef] 40. Cameron, P.; Corne, D.W.; Mason, C.E.; Rosenfeld, J. Crowdfunding genomics and bioinformatics. Genome Biol. 2013, 14, 1–5. [CrossRef][PubMed] 41. Gustke, C. Managing Health Costs with Crowdfunding. New York Times, 30 January 2015, p. 1. 42. Bourque, A. Is Crowdfunding Your Medical Expenses Risky or Reasonable. Huffington Post, 17 April 2015, p. 1. 43. Barclay, E. The Sick Turn to Crowd Funding to Pay Medical Bills. National Public Radio Shots Blog, 24 October 2012, p. 1. 44. Sisler, J. Crowdfunding for medical expenses. CMAJ JAMC 2012, 184, 123–124. [CrossRef][PubMed] Electronics 2021, 10, 143 16 of 16

45. Dragojlovic, N.; Lynd, L.D. Crowdfunding drug development: The state of play in oncology and rare diseases. Drug Discov. Today 2014, 19, 1775–1780. [CrossRef][PubMed] 46. Sorenson, O.; Assenova, V.; Li, G.C.; Boada, J.; Fleming, L. Expand innovation finance via crowdfunding. Science 2016, 354, 1526–1528. [CrossRef] [PubMed] 47. Meer, J. Effects of the price of charitable giving: Evidence from an online crowdfunding platform. J. Econ. Behav. Organ. 2014, 103, 113–124. [CrossRef] 48. Sevim, C.; Oztekin, A.; Bali, O.; Gumus, S.; Guresen, E. Developing an early warning system to predict currency crises. Eur. J. Oper. Res. 2014, 237, 1095–1104. [CrossRef] 49. Barboza, F.; Kimura, H.; Altman, E. Machine learning models and bankruptcy prediction. Expert Syst. Appl. 2017, 83, 405–417. [CrossRef] 50. Yeh, J.Y.; Chen, C.H. A machine learning approach to predict the success of crowdfunding fintech project. J. Enterp. Inf. Manag. 2020.[CrossRef] 51. Ghahramani, Z. Probabilistic machine learning and artificial intelligence. Nature 2015, 521, 452–459. [CrossRef] 52. Mitra, T.; Gilbert, E. The language that gets people to give: Phrases that predict success on . In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing, Baltimore, MD, USA, 15–19 February 2014; pp. 49–61. 53. Li, Y.; Rakesh, V.; Reddy, C.K. Project success prediction in crowdfunding environments. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, Francisco, CA, USA, 22–25 February 2016; pp. 247–256. 54. Greenberg, M.D.; Pardo, B.; Hariharan, K.; Gerber, E. Crowdfunding support tools: Predicting success & failure. In CHI’13 Extended Abstracts on Human Factors in Computing Systems; Association for Computing Machinery: New York, NY, USA, 2013; pp. 1815–1820. 55. Etter, V.; Grossglauser, M.; Thiran, P. Launch hard or go home! Predicting the success of Kickstarter campaigns. In CHI’13: Conference on Online Social Networks; Association for Computing Machinery: New York, NY, USA, 2013; pp. 177–182. 56. Lu, C.T.; Xie, S.; Kong, X.; Yu, P.S. Inferring the impacts of social media on crowdfunding. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining, New York, NY, USA, 24–28 February 2014; pp. 573–582. 57. Wang, W.; Zheng, H.; Wu, Y.J. Prediction of fundraising outcomes for crowdfunding projects based on deep learning: A multi- model comparative study. Soft Comput. 2020, 24, 8323–8341. [CrossRef] 58. Lai, C.Y.; Lo, P.C.; Hwang, S.Y. Incorporating comment text into success prediction of crowdfunding campaigns. In Proceedings of the Pacific Asia Conference on Information Systems (PACIS), Association For Information Systems, Langkawi, Malaysia, 16–20 July 2017. 59. Kamath, R.; Kamat, R. Supervised learning model for kickstarter campaigns with R mining. Int. J. Inf. Technol. Model. Comput. IJITMC 2016, 4.[CrossRef] 60. Cheng, C.; Tan, F.; Hou, X.; Wei, Z. Success Prediction on Crowdfunding with Multimodal Deep Learning. IJCAI 2019, 2158–2164. [CrossRef] 61. Yuan, H.; Lau, R.Y.; Xu, W. The determinants of crowdfunding success: A semantic text analytics approach. Decis. Support Syst. 2016, 91, 67–76. [CrossRef] 62. Chen, X.; Wang, H.; Ma, Y.; Zheng, X.; Guo, L. Self-adaptive resource allocation for cloud-based software services based on iterative QoS prediction model. Future Gener. Comput. Syst. 2020, 105, 287–296. [CrossRef] 63. Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [CrossRef] 64. Breiman, L.; Friedman, J.; Stone, C.J.; Olshen, R.A. Classification and Regression Trees; CRC Press: Boca Raton, FL, USA, 1984. 65. Salzberg, S.L. C4. 5: Programs for Machine Learning by j. Ross Quinlan; Morgan Kaufmann Publishers, Inc.: Burlington, MA, USA, 1994. 66. Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 66, 1189–1232. [CrossRef] 67. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. 68. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. 69. To, S.H. Correlation Coefficient: Simple Definition, Formula, Easy Steps. [EB/OL]. Available online: https://www.statisticshowt o.com/probability-and-statistics/correlation-coefficient-formula/ (accessed on 9 November 2020). 70. Kim, S.E.; Lee, K.Y.; Shin, S.I.; Yang, S.B. Effects of tourism information quality in social media on destination image formation: The case of Sina Weibo. Inf. Manag. 2017, 54, 687–702. [CrossRef] 71. Zhao, Y.; Cheng, S.; Yu, X.; Xu, H. Chinese public’s attention to the COVID-19 epidemic on social media: observational descriptive study. J. Med. Internet Res. 2020, 22, e18825. [CrossRef]