Picking Winners: a Big Data Approach to Evaluating Startups and Making Venture Capital Investments by Ajay Saini S.B
Total Page:16
File Type:pdf, Size:1020Kb
Picking Winners: A Big Data Approach To Evaluating Startups And Making Venture Capital Investments by Ajay Saini S.B. Massachusetts Institute of Technology (2018) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2018 ○c Massachusetts Institute of Technology 2018. All rights reserved. The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created. Author............................................................. Department of Electrical Engineering and Computer Science May 25, 2018 Certified by . Tauhid Zaman KDD Career Development Professor in Communications and Technology Thesis Supervisor Accepted by. Katrina LaCurts Chair, Master of Engineering Thesis Committee 2 Picking Winners: A Big Data Approach To Evaluating Startups And Making Venture Capital Investments by Ajay Saini Submitted to the Department of Electrical Engineering and Computer Science on May 25, 2018, in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science Abstract We consider the problem of evaluating the quality of startup companies. This can be quite challenging due to the rarity of successful startup companies and the complexity of factors which impact such success. In this work we collect data on tens of thousands of startup companies, their performance, the backgrounds of their founders, and their investors. We develop a novel model for the success of a startup company based on the first passage time of a Brownian motion. The drift and diffusion of the Brownian motion associated with a startup company are a function of features based its sector, founders, and initial investors. All features are calculated using our massive dataset. Using a Bayesian approach, we are able to obtain quantitative insights about the features of successful startup companies from our model. To test the performance of our model, we use it to build a portfolio of companies where the goal is to maximize the probability of having at least one company achieve an exit (IPO or acquisition), which we refer to as winning. This picking winners framework is very gen- eral and can be used to model many problems with low probability, high reward outcomes, such as pharmaceutical companies choosing drugs to develop or studios selecting movies to produce. We frame the construction of a picking winners portfolio as a combinatorial optimization problem and show that a greedy solution has strong performance guarantees. We apply the picking winners framework to the problem of choosing a portfolio of startup companies. Using our model for the exit probabilities, we are able to construct out of sam- ple portfolios which achieve exit rates as high as 60%, which is nearly double that of top venture capital firms. Thesis Supervisor: Tauhid Zaman Title: KDD Career Development Professor in Communications and Technology 3 4 Acknowledgments First, I would like to thank my advisor, Tauhid Zaman, for all of his guidance and mentor- ship throughout the course of this project. His research advising style is exactly the kind that works best for me. He not only gave me the independence to explore problems in the research of my choosing, but also always took the time to work through questions that came up along the way with me. Tauhid always encouraged me to think big in terms of the applications of my work and in the time I worked with him, changed the way I think about problems. His investment in his students and in this project is what made it a truly enjoyable experience. I would also like to thank David Scott Hunter, a graduate student of Tauhid, with whom I began this project. His advice on data collection and modeling insights were invaluable during the first phases of the project. My gratitude also goes out to all the friends I’ve made at MIT. It was the people I spent time with that made my MIT experience as unforgettable and life-changing as it was. Special thanks in particular to both my living group, Pi Lambda Phi, and the MIT Sport Taekwondo team for being such amazing communities and creating so many memorable moments for me. Lastly, I would like to thank my family, whose immense support throughout my life has allowed me to be where I am today. 5 6 Contents 1 Introduction 15 1.1 Our Contributions . 17 2 Literature Review 19 3 Data Analysis 23 3.1 Funding Rounds Data . 24 3.2 Sector, Investor, and Leadership Data and Features . 25 3.2.1 Imputation of Missing Data . 32 4 Model for Startup Company Funding 33 4.1 Brownian Motion Model for Company Value . 33 4.2 Modeling drift and diffusion . 34 4.3 The Data Likelihood . 36 5 Bayesian Model Specification 39 5.1 Homoskedastic Model . 39 5.2 Heteroskedastic Model . 40 5.3 Robust Heteroskedastic Model . 41 5.4 Hierarchical Bayesian Model . 42 5.5 Model Estimation . 43 6 Parameter Estimation Results 45 6.1 Model Fit Scores . 45 7 6.2 Robust Heteroskedastic Model . 46 6.3 Hierarchical Bayesian Model . 49 7 Picking Winners Portfolio 53 7.1 Objective Function . 53 7.2 Independent Events . 54 7.3 Picking Winners and Log-Optimal Portfolios . 56 7.4 Portfolio Construction for Brownian Motion Model . 58 8 Model Validation with Picking Winners Portfolios 61 9 An Online Platform for Venture Capitalists 65 9.1 Company Search and Analysis . 66 9.2 Portfolio Building . 67 9.3 Personalization: Saving Companies and Portfolios . 69 10 Conclusion 71 A Proofs 73 A.1 Proof of Theorem 7.1.1 . 73 A.2 Proof of Lemma 1 . 74 A.3 Proof of Theorem 7.3.1 . 74 A.4 Proof of Theorem 7.3.2 . 76 B Sector Names 81 C Top School Names 83 D Calculation of Company Variance 85 E Sector Groupings for Hierarchical Model 87 F Details of the MCMC Sampler 89 F.1 Details of the Metropolis Step . 89 8 F.2 The Homoskedastic Model Parameters . 90 F.3 The Heteroskedastic and Robust Heteroskedastic Model Parameters . 91 F.4 The Hierarchical Model Parameters . 91 G Companies Chosen in the Portfolios of the four Models 95 9 10 List of Figures 3-1 (left) Histogram of the founding year of the startup companies in our dataset. (right) Distribution of the maximum funding round achieved up to 2016 broken down by company founding year. 26 3-2 (left) Plot of the evolution of funding rounds for different startup com- panies. (right) Boxplot of the time to reach different maximum funding rounds before 2016. 26 3-3 Pie graph of the funding round distribution of companies with maximum acquisition fraction less than 0.5 (left) and greater than 0.5 (right). 28 3-4 Pie graph of the funding round distribution of companies with competitors acquisition less than 0.5 (left) and greater than 0.5 (right). 29 3-5 Pie graph of the funding round distribution of companies with Executive IPO less than 0.5 (left) and greater than 0.5 (right). 32 6-1 Boxplots of the 훽2010 sample values for selected sector (left) and non-sector (right) features. 47 6-2 Boxplots of the 훾 sample values for selected sector (left) and non-sector (right) features. 48 6-3 Boxplots of the 훽2010 sample values for selected features across industry groups. 50 6-4 Boxplots of the 훾 sample values for selected features across industry groups. 51 11 8-1 Performance curve for different model portfolios constructed using com- panies with seed or series A funding in 2011 (left) and 2012 (right). Also shown are the performances of venture capital firms Greylock Partners (Greylock), Founders Fund (Founders), Kleiner Perkins Caufield & Byers (KPCB), Sequoia Capital (Sequoia), and Khosla Ventures (Khosla). For reference the performance curves for a random portfolio and a perfect port- folio are also shown. 62 8-2 A pie graph showing the distribution of companies for both the independent and correlated portfolios for the hierarchical model model. 64 9-1 The advanced company search feature on the picking winners website. 66 9-2 An example of a company information page on the picking winners website. 67 9-3 The portfolio building page with access to the portfolio from categories, portfolio from names, and portfolio from csv upload tools on the picking winners website. 68 12 List of Tables 6.1 The deviance information criterion scores for each of the four models trained with observation year 2010. 46 6.2 Posterior mean of 훽, 훿, and coefficient of variation for the sector features with the largest magnitude coefficient of variation. 48 6.3 Posterior mean of 훽, 훿, and coefficient of variation for the non-sector fea- tures with the largest magnitude coefficient of variation. 49 6.4 Statistics of the posterior distribution of 휈 and 휏 across sectors. The poste- rior mean is shown with the fifth and ninety-fifth quantiles in parentheses. 52 8.1 The top companies in our 2011 portfolio constructed with greedy opti- mization and the robust heteroskedastic correlated model. Shown are the company name, the highest funding round achieved, the predicted exit probability, and the greedy objective value (probability of at least one exit) when the company is added rounded to the nearest hundredth. Companies which exited are highlighted in bold. 63 8.2 The top companies in our 2012 portfolio constructed with greedy opti- mization and the robust heteroskedastic correlated model. Shown are the company name, the highest funding round achieved, the predicted exit probability, and the greedy objective value (probability of at least one exit) when the company is added rounded to the nearest hundredth.