<<

Using Language to Predict Kickstarter Success

Kartik Sawhney Caelin Tran Ramon Tuason

Stanford University Department of Computer Science {kartiks2, cktt13, rtuason}@stanford.edu

Abstract and persuade them to support it over other cam- paigns. Previous literature on predicting the suc- Kickstarter is a popular cess of Kickstarter campaigns has focused on ap- platform that is used by people to seek plying machine learning to time series data (e.g., support on a variety of campaigns. Exist- the amount of funds raised over time) and social ing literature has already tackled the prob- networking. In this paper, we investigate whether lem of predicting campaign success (meet- it is possible to use information present at the be- ing the funding target by the deadline), ginning of a campaign to predict its success with- typically analyzing the evolution of a cam- out using time-dependent data or data external to paign’s funding and number of backers the Kickstarter campaign itself. We have built a bi- over time to predict its success. While nary predictor of Kickstarter success that focuses time-dependent features are effective in on actual content of a campaign and not its per- predicting success, especially as a cam- formance over time, capturing (1) people’s initial paign nears its deadline, we took a differ- reactions to the language used in a campaign and ent approach to the problem and built a bi- the description of its purpose and (2) the character- nary classifier that predicts the success of a istics and types of campaigns that attract people. campaign by analyzing campaign content, linguistic features, and meta-information 2 Related Works at the time of its inception (through tech- Literature on this topic can be broadly divided niques such as sentiment analysis and la- into three categories: specific work aimed at tent Dirichlet allocation). With sufficient understanding crowdfunding dynamics, natural training, our classifier achieves over 70% language understanding systems to predict audi- accuracy on large test data sets. We ence reaction, and persuasion theories in general show that language and campaign prop- which can be used to develop our work further. erties alone can be used to make moder- ate predictions about the success of a cam- There have been several studies that leverage paign, which helps us understand how the machine learning techniques to predict the success subtleties of persuasion and the choice of of a campaign. Vincent Etter et al. (see references campaign topics can influence campaign section) analyzed the social network by con- donors and informs the process of evalu- structing a projects-backers graph and monitoring ating and developing effective campaign for tweets that mention the project. They pitches. also discuss predictions based on the time series 1 Introduction of early funding obtained. Combining features with social information helped to improve the Kickstarter is an internet service for people to raise model substantially. A study by Ethan Mollick on money from crowdfunding. For a person to re- Kickstarter dynamics found that higher funding ceive any resources, his or her campaign must goals and longer project duration lead to lower meet its target by a deadline set at the time of chances of success, while inclusion of a video publishing. In order to be successful (i.e., achieve in a project pitch and frequent updates on the full funding by the deadline), a campaign must ef- campaign increase the likelihood of full funding. fectively convey its mission to potential backers While these studies rely on time-dependent {“name” : “Spinning Pots: Transformations of Clay through features and social information for predictions, Human Hands”, “blurb” : “Creating a small making functional recent advances in natural language understanding pots/wares with clay and heat. Raising funds to purchase a have helped make predictions based on linguistic kiln.”, considerations as well. For instance, studying “goal” : 3000, “created at” : 1431738817, the linguistic aspects of politeness, Danescu- “deadline” : 1434680329} Niculescu-Mizil et al. show that Wikipedia editors are more likely to be promoted to a higher Figure 1, Campaign Example: Note that the creation time and deadline are expressed as unix timestamps. status when they are polite. Furthermore, Althoff et al. analyze social features in literature to 3.2 Baseline and Oracle determine relations that can predict whether an altruistic request will be accepted by a donor. To test the reliability of lingual features in pre- In addition, by analyzing the language and dicting a campaign’s success, we implemented words used in daily conversation, tools such as our baseline a naive Bayes (NB) classifier that as LIWC (Pennebaker et al.) can be used for uses unigram occurrences as features. Unigrams inferring psychologically meaningful styles and are individual words, such as “spinning,” “pots,” social behavior from unstructured text. Other and “transformations” from the sample above. We resources like the NTU Sentiment Dictionary and chose unigram occurrences as the features of our SentiWordNet also help in opinion mining on baseline since they provide a very basic analysis text-based content (Pang and Lee). of language and content. They do not provide insight into how words are used or how they Finally, we turn to more general theories of in- relate to one another. We randomly distributed fluence, noting the important role of reciprocity in campaigns in our dataset (160,000+ campaigns) social dynamics in investment situations as shown into 80% training data and 20% testing data. Ap- by Berg et al. In fact, Cialdini’s iconic work on the plying our classifier, we obtained a classification psychology of persuasion demonstrates the signif- accuracy of 65%. This suggests that it is possible icant influence of reciprocity in the form of limited to classify Kickstarter campaigns with significant rewards on the success of a Kickstarter campaign. accuracy.

3 Implementation of our Predictor To determine an upper limit for the Kickstarter prediction problem, we implemented as our oracle 3.1 Input-Output Behavior a support vector machine that considers unigram The input-output behavior of campaign prediction occurrences along with information such as how is straightforward: after training a predictor using many backers a campaign had by the end of its labeled (success / failure) Kickstarter campaigns, run, if it was featured in the Kickstarter website’s we use the predictor on arbitrary campaigns “Spotlight” page, and if it was presented as a (input) and see if we expect them to succeed “Staff Recommendation.” Applying our classifier, (output). For our raw dataset, we used Vitulskis et we got an average classification accuracy of 92%. al.’s pre-existing, regularly updated, and labeled The oracle illustrates the upper limit of any Kick- dataset created by crawling all Kickstarter pages. starter predictor based on data that it can receive from its start time to end time. This reflects our The evaluation metric we used for a predictor challenge of predicting a campaign’s outcome just is its classification accuracy of test campaigns. from information available at its inception. After Given an input campaign, a predictor will classify all, if you incorporate more information as the it as a success (1) or failure (0). campaign progresses, your prediction accuracy quickly increases, which is reflected in predictor As a concrete example, the following is a sam- performance in current literature. ple object representing a Kickstarter campaign that contains the relevant fields that our predictor We have a gap of about 27% between our ora- will use. This campaign failed — perhaps its un- cle, which looks at the future, and baseline, which original idea and unexciting language were con- looks at language from the beginning of cam- tributing factors. paigns. However, the problem is not so intractable

2 that language has no part in determining a cam- algorithms that operate on the entire dataset to paign’s success. Despite its simple nature, our draw comparisons between campaigns, analyzing naive Bayesian classifier actually performed above individual campaign features extracted in the first our expectations, yielding an accuracy well above step. Examples of algorithms used here include la- the random chance threshold of 50%. Naive Bayes tent Dirichlet allocation (LDA). Assignment con- assumes that, given a certain success state (“True” sists of (1) applying algorithms to compare cam- or “False”), the unigram features of a campaign paigns extracted from our training data and (2) as- are conditionally independent. Of course, this signing the results to those campaigns as features assumption is not accurate because a campaign for use in predictor training. about a given topic is likely to have many re- lated words and linguistic structures. This sug- Because there are tens of thousands of features gests that, with a more intelligent feature extrac- that our data points can have, e.g., the presence tor and a more advanced predictor, we can further of specific unigrams, most feature values that any improve our classification accuracy. given campaign has are equal to zero. Therefore, with our high-dimensional sparse datasets, we 4 Predictor Design and Implementation save ample memory with sparse matrices by stor- ing only the non-zero components of each feature In order to achieve better classification accuracy, vector in memory. We use the sklearn library’s we focused the design of our predictor on the FeatureHasher class, which uses a hash function need for more advanced features and a more to compute the matrix column corresponding to a advanced classifier. Our program consists of four specific feature name. stages: primary feature extraction, meta-feature assignment, training, and classification. The last two steps are training and executing our classifier, a support vector machine (SVM). An SVM creates a trained hyperplane to divide test campaigns into successes and failures. We chose an SVM as the foundation of our predictor because Figure 2, Program Flow: Our program has a straightforward of our need to process large numbers of features linear flow that can be simplified even further into a training for each campaign and its ability to handle many stage (primary feature extraction, meta-feature assignment, different dimensions. This part of our program SVM training) and a testing stage (SVM classification). Our processes the features and labels of our campaign most novel contributions involve the feature extraction and objects into sparse vectors, trains our SVM, and meta-feature assignment components. uses our SVM to predict the success of campaigns in our test dataset as well as the training dataset. Our program interfaces with a JSON-formatted dataset through our primary feature extractor, We use an SVM with a linear kernel instead which builds a list of campaign objects con- of a non-linear one since we are essentially try- taining features of interest, such as unigrams, ing to solve a text classification problem in which sentiment, Flesch-Kincaid readability score, and we focus on the language used in the presenta- other language features. All of these features are tion of campaigns. Linear SVMs typically per- campaign-specific, i.e., they may be extracted form well with text classification because text is without comparison to any other campaigns. inherently discrete, sparse, and high-dimensional. Some of the features are trivial to obtain, such as With a very large number of features, SVMs gen- unigrams, but others require relatively complex erally do not see a notable improvement in per- natural language processing methods, such as formance when data is mapped to a higher dimen- sentiment analysis. The goal of this step in our sional space, which happens with SVMs that have program is to find features that tell us as much as non-linear kernels. In addition, given the massive possible about the topic of a campaign and the amount of data that we process, we must also take type of language it has. speed into consideration, thus supporting the use of SVMs with linear kernels since they train rela- The next step is the meta-feature assignment. tively fast. We would also have fewer parameters Meta-features, or secondary features, come from to tune with grid search.

3 4.1 Tuning Hyperparameters data points, regularization algorithms seek to find the right balance between fitting training data and Hyperparameter optimization or model selection avoiding overfitting. Aside from the hyperparam- seeks to optimize a measure of an algorithm’s eter tuning done for C as discussed in the previous performance on any given dataset by choosing a section, we also scale and translate each feature in- set of hyperparameters for a learning algorithm. dividually such that any given feature’s value falls This concept is distinct from actual learning between -1 and 1, inclusive. This technique is use- problems, which instead optimize a loss function ful because of the SVM’s classification process. on a training set only. In other words, learning As it tries to minimize the distance between its algorithms learn parameters that model or recon- separating hyperplane and its support vectors, cer- struct their inputs well, while hyperparameter tain features may overshadow others if they tend optimization seeks to ensure the model does not to have very high values that play a relatively large overfit its data by tuning certain values. role in calculating the distance between their cor- responding support vectors and the hyperplane. We performed hyperparameter optimization By looking at all the values for a given feature and with grid search — also known as parameter scaling them to new values between -1 and 1 based sweeping — which simply searches through on this distribution, we prevent any single feature a manually-specified set of values within the from dominating a support vector’s distance from hyperparameter space of a learning algorithm. the hyperplane. Given that a grid search algorithm must be guided by some performance metric, we used internal 5 Features cross validation on the training set. We chose the following primary and secondary Because we use an SVM classifier with a linear features due to their contribution to classification kernel, we tuned its regularization constant C. accuracy and time limitations, which allowed us to This parameter tells the SVM’s optimization achieve high performance with our SVM. process how much we want to avoid misclassi- fying each training example. For larger values 5.1 Unigrams of C, the SVM will choose a smaller-margin The first features we chose to use with our SVM hyperplane if that hyperplane leads to greater were unigrams. To parse the title and summary success with correctly classifying all the training of a campaign, we used the scikit-learn library’s points. On the other hand, smaller values of C will CountVectorizer class, which has a built-in tok- cause the optimizer to look for a larger-margin enizer that parses strings into unigrams, assuring hyperplane, even if that hyperplane misclassifies that different punctuation marks are handled cor- more points. For very small values of C, get- rectly. Recall that unigrams were the only features ting misclassified examples is much more likely, used in our baseline, which achieved up to 65% ac- even if the given training data is linearly separable. curacy, so we know that they are powerful features for predicting performance. We gain knowledge To find an optimal value of C for our SVM of a given campaigns subject and purpose based given our entire training data set and full set on the unigrams extracted from its title and sum- of features, we searched through a list of mary as well as some understanding of its senti- logarithmically-spaced points between 10−6 and ment or mood. While unigrams gave us a rather 103, ultimately getting an optimal value of ap- high classification accuracy when used with naive proximately 0.007. Much smaller than the de- Bayes alone, adding features helped provide more fault value of 1.0, this value curtails overfitting and contextual information, such as fine-grained senti- helps our classifier better accommodate random- ment or language patterns, making this boundary ized campaign test samples. This is useful given even more concrete and well-defined. the innumerable ways language can be used. 5.2 Sentiment Analysis 4.2 Regularization We used scikit-learn’s sentiment analysis algo- As SVM algorithms categorize multidimensional rithm, and trained it on a corpora of labelled movie data and provide solutions that generalize to new reviews. By doing so, we can better understand

4 the attitudes and emotions behind the messages to model text corpora and discover topics that are of successful campaign authors. Rather than just addressed by a document. The fundamental idea using unigrams to determine sentiment, this al- of the algorithm is that documents are represented gorithm uses deep learning techniques to perform as random mixtures over latent topics, where each sentiment analysis on whole sentences, helping us topic is characterized by a distribution over words. to capture the subtleties of sentiment more accu- These topics provide us with more semantic infor- rately. In addition, while we use sentiment value mation than a simple term-frequency estimation as a feature, we also use the probabilistic distribu- or basic clustering technique. Furthermore, the tion to get a more comprehensive understanding fact that we identify multiple topics (as opposed and potentially resolve “neutral” sentiment. to a single centroid in K-means, for instance), helps us extract more comprehensive information 5.3 Parts-of-Speech Tagging about a given campaign. To analyze any correlation between the success of a campaign and the sentence structure, we use We divide our training dataset into two docu- the Stanford NLP group’s POS tagger to parse ments based on a campaign’s failure or success. parts-of-speech tags within a sentence. The gen- The hope is that the algorithm will be able to iden- erated POS tree helps us identify linguistic pat- tify common topics that characterize these two terns within a sentence that lead to success. For in- classes of campaigns, boosting our predictions. stance, adverbs could be appealing, but too many For estimating model parameters, we use Col- adjectives and superlatives could detract from the lapsed Gibbs Sampling, and with a few optimiza- efficacy of the campaign’s message. This high- tions such as the elimination of stop words, we get level structure, when combined with semantic un- reasonably logical topics. For example, “creativ- derstanding, helps provide the model with more ity,” “painting,” and “art” constitute one topic. We comprehensive information for predictions. then use the overlap of a campaign’s title and sum- mary with the words constituting the top ten iden- 5.4 Flesch-Kincaid Readability tified topics as our features. Overall, LDA allows Since Kickstarter campaigns should generally be the SVM to better understand the different topics accessible to a wide range of potential contribu- that a single campaign touches on. tors, it becomes important to ensure that they are fairly easy to read, leading to our inference that 5.6 Conditional Probability the simplicity of the blurb positively influences the One of our secondary features incorporates naive success of a given campaign. By using a combina- Bayes with unigrams, our baseline. We under- tion of word length and sentence length as deter- stood that we were not likely to achieve substantial minants of difficulty of the sentence, we apply the performance gains by using this feature given that Flesch-Kincaid readability tests to the campaigns unigram occurrences were already being used as in our dataset, using the results as features. The SVM features, but we wanted to combine the ben- results map to the approximate grade level of the efits of SVM as a more advanced classification al- audience that can understand the text. The read- gorithm with naive Bayes, which relies on condi- ability score can be calculated as: tional independence. While conditional indepen- dence is not completely true for our dataset, the tWords  tSyll  206.835 − 1.015 − 84.6 relatively low degree of overlap (as shown by our tSent tWords naive Bayes performance) was convincing enough Here, ‘tWords,’ ‘tSent’, and ‘tSyll’ are short-hand that we felt that we can potentially extract more for ‘total words,’ ‘total sentences,’ and ‘total syl- information by including the magnitudes of con- lables,’ respectively. A score between 100.00 to ditional probabilities (probability of the input data 90.00 corresponds to a 5th grade reading level, given that we assume success, and the probability while a score between 30.0 to 0 corresponds to a of the input data given that we assume failure) as reading level best fit for university graduates. features in our SVM.

5.5 Latent Dirichlet Allocation 5.7 Basic Campaign Data One of our secondary features came from imple- While the preliminary stages of our project fo- menting latent Dirichlet allocation, which is used cus on the language and latent topics found in

5 a campaigns title and summary, we also include some basic campaign data. Specifically, we in- Note that the baseline actually had a better aver- clude information on how much money a cam- age training classification accuracy than the SVM, paign is trying to raise and how much time it has but this difference is not statistically significant between its inception and its deadline. After all, (p > 0.05), and it might suggest that the SVM there are several key questions about the nature is better at generalizing the high volume of fea- of crowdfunding that involve these pieces of in- tures from training data for application to test data. formation. For example, while it is reasonable to assume that smaller funding goals are easier to reach, there may be many relatively expensive campaigns whose missions better encourage the public to donate. Similarly, while campaigns with more time have more opportunities to be discov- ered by potential contributors, those with closer deadlines but worthwhile missions may give their backers a greater sense of urgency, encouraging them to reach out to others to help as well. All in all, these basic campaign data can hopefully give our model a better idea of trends that improve a campaign’s chances for success.

6 Results and Discussion

On large datasets, we achieved around 71% test classification accuracy with our SVM once we fully loaded it with our features, beating the per- formance of our naive Bayes baseline (65%) and getting closer to our oracle’s performance (92%). Figure 4, Accuracy vs. Dataset Size: This shows the average train and test accuracy of the SVM and baseline as we increase the size of the dataset (randomly chosen from the full Kickstarter dataset) in increments of 10,000. Due to time constraints, the baseline was only tested up to a size of 50,000 campaigns.

We see that the training accuracy of our SVM declines as the dataset size increases (ρ = −0.86), but the test accuracy increases (ρ = 0.81) until plateauing around 0.71%. This suggests that the generalization of the SVM improves as the volume of data grows.

We noticed that naive Bayes occasionally has better test accuracy on smaller datasets (n ≤ 10, 000, but it was rare for this to occur near 10,000), which is likely due to the SVM’s need for ample amount of data to appropriately train its Figure 3, Overall Accuracy: This shows the average classifi- classification boundary. The accuracy of the base- cation accuracy of the SVM, baseline, and oracle on the train line never gets far above 65%. and test datasets. Error bars represent standard deviations. 6.1 Time Performance The entire Kickstarter dataset (160,000+ campaigns) was used for three different randomizations of the train/test sets, The time for our program to extract features, train, which had an 80/20 split. and test increased with dataset size in an O(n2)

6 polynomial fashion. Analyzing meta-features was 11 to 4,001 campaigns with an average of 1,067 the most time-intensive component of our program ± 1,148. On a category-by-category basis, there — executing on the entire Kickstarter dataset took are not enough campaigns to adequately train our almost an entire day. The baseline was notice- SVM, which shows in the wide range of results ably faster, especially with larger datasets. How- in test accuracy and the discrepancy with train ac- ever, we do not believe that time performance is curacy. In general, the SVM performed better on a very important metric in evaluating our predic- larger categories, but this was also variable. For tor. This is because it is only necessary to update example, “Latin” is one of the smallest categories, the model periodically once a significant quantity but it had great test and train accuracies. of new Kickstarter ventures complete their cam- paigns and, from the point of view of real-world 6.3 Discarded Features application, it is only necessary to predict the suc- cess of a given campaign a single time: once it is In addition to the features that made it to the final created. version of our predictor, there were others that we considered and rejected due to complexity, issues 6.2 Performance by Category with runtime performance, or lack of justification for increased predictor accuracy. Users must choose a category for their campaign at the time of publishing. Example categories Regarding primary features, we considered in- include “Web,” “Video games,” and “Ceramics.” cluding bigrams as well as unigrams. We thought We hoped that analyzing our dataset by category that bigrams could provide a better indication would provide insight into the performance of of the kind of language that a campaign author our predictor in specific fields. Unfortunately, no can use to persuade others to contribute to his or category had more than 10,000 campaigns, which her cause. However, bigrams added a significant is not enough for us to draw sound conclusions number of features and did not contribute much about trends in the performance of our predictor. more than unigrams, so we did not include them as features. We also omitted the author name feature, which was supposed to provide insight into authors with previously successful Kickstarters; location of the Kickstarter campaign author, which we realized just correlated with population density; and the length of the cam- paign description, which was subsumed by our Flesch-Kincaid readability feature.

We originally included K-means centroids as campaign meta-features. We obtained a number of topics (centroids) from K-means during the meta- assignment step of the training phase. During the testing phase, we assigned the closest centroid to each test campaign as a feature. The idea behind using K-means was to cluster algorithms based on textual similarity, hoping to identify interest- Figure 5, Accuracy of SVM on Individual Categories: For ing groups (corresponding to different campaign illustration of the range of performance on different cate- topics). However, because K-means assumes each gories, this shows the categories with the best and worst test campaign to have one “topic centroid,” we instead classification accuracies. use LDA, which allows each campaign to be rep- resented as a combination of topics. Since we are On all categories, the test accuracy was 71% ± still able to get important clustering information 11% with a range of 25% to 100%, and the train- with added benefits of multiple topics, we decided ing accuracy was 88% ± 4% with a range of 81% to use LDA instead of K-means. K-means, as a to 100%. The size of the dataset ranged from meta-feature, also had a hugely detrimental effect

7 on runtime performance before we removed it. combined to help develop the most effective cam- paigns. 7 Future Work 7.4 Role of Rewards 7.1 Analysis of Video Transcript Cialdini et al. highlights the important role of re- As observed by Mollick, including a video of wards as an incentive for contributing to a cam- the proposed product or service in the Kickstarter paign. It would be interesting to analyze this fur- campaign can increase the chances of it achiev- ther by developing a strategy to associate value ing full funding. Videos can also play a large role with rewards. Difficulties with this involve the in drawing the audience emotionally, encouraging different goals and priorities that different contrib- them to donate. Therefore, it would be interest- utors can have, but it may be possible to extract ing to run natural language parsing tools on video trends regarding the nature of the rewards offered transcripts to evaluate them for pathos and overall that can help with a campaign’s success. effectiveness. More ambitious options would be to run computer vision techniques that classify or 7.5 GloVe Representation describe the objects being presented (e.g. beach, wholesome family, etc.), or to run audio analysis We would also like to make use of GloVe, an un- software that can give a good idea of what kind supervised learning algorithm created by the Stan- of sounds the video is presenting (e.g. soothing ford NLP Group for generating vector represen- voice, action-movie background music, etc.). tations for words. Training for GloVe (Global Vectors for Word Representation) is performed 7.2 Analysis of Campaign Mentions on Social on a given corpuss statistics on aggregated global Networks word-word co-occurrence, and the representations Etter et al. analyzed the effect of social net- that the model produces present insightful linear work interactions on campaigns by constructing a substructures of the word vector space. GloVe projects-backers graph and monitoring Twitter for representation training models are effective and tweets that mention campaigns. While our project efficient tools for capturing co-occurrence infor- currently relies on campaign characteristics at the mation, given that they are decently fast to train time of its inception and does not monitor its popu- and scalable to very large corpora. Even with larity on social networks, it would be interesting to small corpora and vectors with low numbers of di- combine the two approaches, especially given the mensions, GloVe can still work reasonably well. crucial influence of social networks. Other related For example, using GloVe reveals that the closest factors may include the social media presence of a words to “frog” aside from “frogs” and “toad” are campaign author, or how popular its subject matter “litoria,” “leptodactylidae,” and “rana,” which are is in general. all uncommonly-used words for different species of frogs. Hopefully, using GloVe will allow us to 7.3 Campaign Improvement better understand the underlying semantic conno- Recommendations tations found in titles and summaries. Our SVM identifies the most important features 7.6 Coreference Resolution that lead to a successful campaign, thereby clas- sifying a test campaign as successful or not. The Coreference resolution refers to the task of clus- next logical step is to go beyond mere binary clas- tering different instances of an entity being men- sification to logistical regression, scoring a cam- tioned. This is particularly useful in NLP tasks, paign on various parameters. This in turn can be including retrieving information about specific used to offer recommendations that help improve named entities, and machine translation. In this campaigns on certain well-defined aspects. For in- context, we believe that this technique will help stance, one of the features could involve the most disambiguate pronominal references to previously optimum use of certain adverbs in a sentence, and mentioned entities, thereby helping our classifier based on previous successful examples, the model better train with more specific mentions. This idea can make suggestions to improve the campaign on is inspired by “A Multi-Pass Sieve for Coreference this particular feature. Features can eventually be Resolution” by Raghunathan et al.

8 8 Conclusion Institute SWS. By using metadata and linguistic features and fine- James W. Pennebaker and Martha E. Francis. tuned classification algorithms, we were able to Linguistic Inquiry and Word Count. Lawrence develop a binary classifier that can predict the suc- Erlbaum Associates, Inc. 1999. cess of a Kickstarter campaign using the informa- tion available at the time of creation with reason- Bo Pang and Lillian Lee. “Opinion Mining and able accuracy. This in turn can help improve the Sentiment Analysis.” Yahoo Research and odds of campaigns achieving their goals. While Cornell University. we restrict ourselves to Kickstarter campaigns in this paper, this can broadly be understood as a text Joyce Berg, John Dickhaut, and Kevin McCabe. classification problem with multiple dependencies “Trust, Reciprocity, and Social History.” Uni- influencing the decision. Thus, our work can be versity of Minnesota. used in various fields — from helping evaluate the effectiveness of slide decks for investors and VCs Robert Cialdini. Influence: The Psychology of to providing general recommendations to improve Persuasion. HarperCollins Publishers, LLC. documents. It will be interesting to investigate how the model changes with different applications Karthik Raghunathan, Heeyoung Lee, Sudar- and if there is a common framework that can be shan Rangarajan, Nathanael Chambers,Mihai used for these related problems. Surdeanu, Dan Jurafsky, and Christopher 9 Acknowledgements Manning. “A Multi-Pass Sieve for Coreference Resolution.” Stanford University. • Timothy Lee: Thank you for your sugges- tions and support of the project over the Jeffrey Pennington, Richard Socher, and Christo- course of the quarter. pher D. Manning. “GloVe: Global Vectors for Word Representation.” Stanford University. • Percy Liang: Thank you for helping us get the ground running with initial ideas and Stanford Log-linear Part-Of-Speech Tagger. strategies. http://nlp.stanford.edu/software/tagger.shtml 10 References Vitulskis et al. “Web Robots Kickstarter Vincent Etter, Matthias Grossglauser, and Patrick Datasets.” https://webrobots.io/kickstarter- Thiran. “Launch Hard or Go Home!” Ecole´ datasets/ Polytechnique Fed´ erale´ de Lausanne (EPFL). Lausanne, Switzerland.

Ethan Mollick. “The Dynamics of Crowdfunding: An Exploratory Study.” The Wharton School of the University of Pennsylvania. Philadelphia, Pennsylvania.

Cristian Danescu-Niculescu-Mizil, Moritz Sudhof, Dan Jurafsky, Jure Leskovec, and Christopher Potts. “A Computational Approach to Politeness with Application to Social Fac- tors.” Stanford University and Max Planck Institute SWS.

Tim Althoff, Cristian Danescu-Niculescu-Mizil, and Dan Jurafsky. “How to Ask for a Favor: A Case Study on the Success of Altruistic Requests.” Stanford University and Max Planck

9