Picking Winners: A Big Data Approach To Evaluating Startups And Making Investments by Ajay Saini S.B. Institute of Technology (2018) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2018 ○c Massachusetts Institute of Technology 2018. All rights reserved. The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created.

Author...... Department of Electrical Engineering and Computer Science May 25, 2018 Certified by ...... Tauhid Zaman KDD Career Development Professor in Communications and Technology Thesis Supervisor Accepted by...... Katrina LaCurts Chair, Master of Engineering Thesis Committee 2 Picking Winners: A Big Data Approach To Evaluating Startups And Making Venture Capital Investments by Ajay Saini

Submitted to the Department of Electrical Engineering and Computer Science on May 25, 2018, in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science

Abstract We consider the problem of evaluating the quality of startup companies. This can be quite challenging due to the rarity of successful startup companies and the complexity of factors which impact such success. In this work we collect data on tens of thousands of startup companies, their performance, the backgrounds of their founders, and their investors. We develop a novel model for the success of a startup company based on the first passage time of a Brownian motion. The drift and diffusion of the Brownian motion associated with a startup company are a function of features based its sector, founders, and initial investors. All features are calculated using our massive dataset. Using a Bayesian approach, we are able to obtain quantitative insights about the features of successful startup companies from our model. To test the performance of our model, we use it to build a portfolio of companies where the goal is to maximize the probability of having at least one company achieve an exit (IPO or acquisition), which we refer to as winning. This picking winners framework is very gen- eral and can be used to model many problems with low probability, high reward outcomes, such as pharmaceutical companies choosing drugs to develop or studios selecting movies to produce. We frame the construction of a picking winners portfolio as a combinatorial optimization problem and show that a greedy solution has strong performance guarantees. We apply the picking winners framework to the problem of choosing a portfolio of startup companies. Using our model for the exit probabilities, we are able to construct out of sam- ple portfolios which achieve exit rates as high as 60%, which is nearly double that of top venture capital firms.

Thesis Supervisor: Tauhid Zaman Title: KDD Career Development Professor in Communications and Technology

3 4 Acknowledgments

First, I would like to thank my advisor, Tauhid Zaman, for all of his guidance and mentor- ship throughout the course of this project. His research advising style is exactly the kind that works best for me. He not only gave me the independence to explore problems in the research of my choosing, but also always took the time to work through questions that came up along the way with me. Tauhid always encouraged me to think big in terms of the applications of my work and in the time I worked with him, changed the way I think about problems. His investment in his students and in this project is what made it a truly enjoyable experience. I would also like to thank David Scott Hunter, a graduate student of Tauhid, with whom I began this project. His advice on data collection and modeling insights were invaluable during the first phases of the project. My gratitude also goes out to all the friends I’ve made at MIT. It was the people I spent time with that made my MIT experience as unforgettable and life-changing as it was. Special thanks in particular to both my living group, Pi Lambda Phi, and the MIT Sport Taekwondo team for being such amazing communities and creating so many memorable moments for me. Lastly, I would like to thank my family, whose immense support throughout my life has allowed me to be where I am today.

5 6 Contents

1 Introduction 15 1.1 Our Contributions ...... 17

2 Literature Review 19

3 Data Analysis 23 3.1 Funding Rounds Data ...... 24 3.2 Sector, Investor, and Leadership Data and Features ...... 25 3.2.1 Imputation of Missing Data ...... 32

4 Model for Startup Company Funding 33 4.1 Brownian Motion Model for Company Value ...... 33 4.2 Modeling drift and diffusion ...... 34 4.3 The Data Likelihood ...... 36

5 Bayesian Model Specification 39 5.1 Homoskedastic Model ...... 39 5.2 Heteroskedastic Model ...... 40 5.3 Robust Heteroskedastic Model ...... 41 5.4 Hierarchical Bayesian Model ...... 42 5.5 Model Estimation ...... 43

6 Parameter Estimation Results 45 6.1 Model Fit Scores ...... 45

7 6.2 Robust Heteroskedastic Model ...... 46 6.3 Hierarchical Bayesian Model ...... 49

7 Picking Winners Portfolio 53 7.1 Objective Function ...... 53 7.2 Independent Events ...... 54 7.3 Picking Winners and Log-Optimal Portfolios ...... 56 7.4 Portfolio Construction for Brownian Motion Model ...... 58

8 Model Validation with Picking Winners Portfolios 61

9 An Online Platform for Venture Capitalists 65 9.1 Company Search and Analysis ...... 66 9.2 Portfolio Building ...... 67 9.3 Personalization: Saving Companies and Portfolios ...... 69

10 Conclusion 71

A Proofs 73 A.1 Proof of Theorem 7.1.1 ...... 73 A.2 Proof of Lemma 1 ...... 74 A.3 Proof of Theorem 7.3.1 ...... 74 A.4 Proof of Theorem 7.3.2 ...... 76

B Sector Names 81

C Top School Names 83

D Calculation of Company Variance 85

E Sector Groupings for Hierarchical Model 87

F Details of the MCMC Sampler 89 F.1 Details of the Metropolis Step ...... 89

8 F.2 The Homoskedastic Model Parameters ...... 90 F.3 The Heteroskedastic and Robust Heteroskedastic Model Parameters . . . . 91 F.4 The Hierarchical Model Parameters ...... 91

G Companies Chosen in the Portfolios of the four Models 95

9 10 List of Figures

3-1 (left) Histogram of the founding year of the startup companies in our dataset. (right) Distribution of the maximum funding round achieved up to 2016 broken down by company founding year...... 26

3-2 (left) Plot of the evolution of funding rounds for different startup com- panies. (right) Boxplot of the time to reach different maximum funding rounds before 2016...... 26

3-3 Pie graph of the funding round distribution of companies with maximum acquisition fraction less than 0.5 (left) and greater than 0.5 (right)...... 28

3-4 Pie graph of the funding round distribution of companies with competitors acquisition less than 0.5 (left) and greater than 0.5 (right)...... 29

3-5 Pie graph of the funding round distribution of companies with Executive IPO less than 0.5 (left) and greater than 0.5 (right)...... 32

6-1 Boxplots of the 훽2010 sample values for selected sector (left) and non-sector (right) features...... 47

6-2 Boxplots of the 훾 sample values for selected sector (left) and non-sector (right) features...... 48

6-3 Boxplots of the 훽2010 sample values for selected features across industry groups...... 50

6-4 Boxplots of the 훾 sample values for selected features across industry groups. 51

11 8-1 Performance curve for different model portfolios constructed using com- panies with seed or series A funding in 2011 (left) and 2012 (right). Also shown are the performances of venture capital firms (Greylock), Founders Fund (Founders), Caufield & Byers (KPCB), (Sequoia), and (Khosla). For reference the performance curves for a random portfolio and a perfect port- folio are also shown...... 62 8-2 A pie graph showing the distribution of companies for both the independent and correlated portfolios for the hierarchical model model...... 64

9-1 The advanced company search feature on the picking winners website. . . . 66 9-2 An example of a company information page on the picking winners website. 67 9-3 The portfolio building page with access to the portfolio from categories, portfolio from names, and portfolio from csv upload tools on the picking winners website...... 68

12 List of Tables

6.1 The deviance information criterion scores for each of the four models trained with observation year 2010...... 46

6.2 Posterior mean of 훽, 훿, and coefficient of variation for the sector features with the largest magnitude coefficient of variation...... 48

6.3 Posterior mean of 훽, 훿, and coefficient of variation for the non-sector fea- tures with the largest magnitude coefficient of variation...... 49

6.4 Statistics of the posterior distribution of 휈 and 휏 across sectors. The poste- rior mean is shown with the fifth and ninety-fifth quantiles in parentheses. . 52

8.1 The top companies in our 2011 portfolio constructed with greedy opti- mization and the robust heteroskedastic correlated model. Shown are the company name, the highest funding round achieved, the predicted exit probability, and the greedy objective value (probability of at least one exit) when the company is added rounded to the nearest hundredth. Companies which exited are highlighted in bold...... 63

8.2 The top companies in our 2012 portfolio constructed with greedy opti- mization and the robust heteroskedastic correlated model. Shown are the company name, the highest funding round achieved, the predicted exit probability, and the greedy objective value (probability of at least one exit) when the company is added rounded to the nearest hundredth...... 64

G.1 The top companies in our 2011 portfolio constructed with greedy optimiza- tion and the homoskedastic independent model...... 95

13 G.2 The top companies in our 2011 portfolio constructed with greedy optimiza- tion and the homoskedastic correlated model...... 96 G.3 The top companies in our 2012 portfolio constructed with greedy optimiza- tion and the homoskedastic independent model...... 96 G.4 The top companies in our 2012 portfolio constructed with greedy optimiza- tion and the homoskedastic correlated model...... 96 G.5 The top companies in our 2011 portfolio constructed with greedy optimiza- tion and the heteroskedastic independent model...... 97 G.6 The top companies in our 2011 portfolio constructed with greedy optimiza- tion and the heteroskedastic correlated model...... 97 G.7 The top companies in our 2012 portfolio constructed with greedy optimiza- tion and the heteroskedastic independent model...... 97 G.8 The top companies in our 2012 portfolio constructed with greedy optimiza- tion and the heteroskedastic correlated model...... 98 G.9 The top companies in our 2011 portfolio constructed with greedy optimiza- tion and the robust heteroskedastic independent model...... 98 G.10 The top companies in our 2011 portfolio constructed with greedy optimiza- tion and the robust heteroskedastic correlated model...... 98 G.11 The top companies in our 2012 portfolio constructed with greedy optimiza- tion and the robust heteroskedastic independent model...... 99 G.12 The top companies in our 2012 portfolio constructed with greedy optimiza- tion and the robust heteroskedastic correlated model...... 99 G.13 The top companies in our 2011 portfolio constructed with greedy optimiza- tion and the hierarchical independent model...... 99 G.14 The top companies in our 2011 portfolio constructed with greedy optimiza- tion and the hierarchical correlated model...... 100 G.15 The top companies in our 2012 portfolio constructed with greedy optimiza- tion and the hierarchical independent model...... 100 G.16 The top companies in our 2012 portfolio constructed with greedy optimiza- tion and the hierarchical correlated model...... 100

14 Chapter 1

Introduction

An important problem in is evaluating the quality of a startup company. This problem is known to be extremely challenging for a variety of reasons. First, suc- cessful companies are rare, so there are not many to study in order to discern patterns of success. Second, at the time a startup company is founded, there is not very much data available. Typically one only knows basic information about the company’s founders, the sector or market in which it operates, and the identity of initial investors. It is not a priori clear if this raw data is sufficient to allow one to measure a startup company’s quality or predict whether or not it will succeed. Third, it is not obvious how to model the evolution of a startup company and how to use this model to measure its quality. Fourth, one needs a principled approach to test the validity of any startup company model or quality measure. The problem of evaluating startup company quality is faced by venture capital firms. These firms spend a large amount of time and resources screening and selecting startup company deals (Da Rin et al. 2011, Metrick and Yasuda 2011, Sahlman 1990). In particular, firms take a hands-on approach to investing by evaluating the attractiveness and risk behind every investment, considering factors that include relative market size, strategy, technology, customer adoption, competition, and the quality and experience of the management team. The investment screening process is oftentimes a thorough and intensive one that can take several months. Venture capital firms could greatly improve their investment process if they had a quantitative model which could perform a preliminary evaluation of the quality of a large number of potential deals. They could use such a model to prioritize and more

15 efficiently allocate their human resources for evaluating different deals.

While venture capital firms still make up a large majority of the investors in early-stage startups, with the creation and adoption of the Jumpstart Our Business Startups (JOBS) Act (Congress 2012), early-stage investment opportunities are now available to the ev- eryday investor in the United States. Before this policy change, only certain accredited investors were allowed to take equity stakes in most private companies. With the adoption of this policy, there are now many companies such as FlashFunders and NextSeed (Flash- Funders 2018, NextSeed 2018) that assist the everyday investor in finding and participating in startup investment opportunities. While these sites provide some relevant information about the startup companies of interest, users of this sites do not have direct access to the company in the same way that a venture capital firm does. Thus, oftentimes they do not have the information needed to execute a thorough and intensive selection process. Intu- ition suggests that these investment opportunities will have a lower success-rate than that of venture capital backed startups. This means that there is a need for quantitative measures of startup company quality which can be developed from publicly available data. Such measures can serve to level the playing field and empower the everyday investor.

It is useful to take an operational perspective for validating a model for startup company quality. Such a model would be part of the process used by venture capital firms or other investors to build portfolios of startup companies. Therefore, the operational value of a model is its ability to produce successful portfolios. In typical venture capital portfolios, a few companies are responsible for most of the returns (Peters 2014). Startups can produce excessively high returns if they exit, which means they are either acquired or they have an (IPO). For instance, Peter Thiel invested $500,000 in Facebook for 10% of the company in 2004, and in 2012 he sold these shares for $1 billion, resulting in a return of 2,000 times his initial investment (Julianne Pepitone and Stacy Cowley 2012).

In general, an investor in a startup company will lose money unless the company exits, which we refer to as winning. Because only a few winners are needed for a portfolio to be profitable, one way to build a portfolio is to pick companies that maximize the probability of at least one winner. We refer to this as the picking winners problem. In addition to building portfolios of startup companies, this framework can be used to model a variety of

16 problems which have a low probability, high reward structure. For instance, most new drugs developed by pharmaceutical companies fail. However, with a small probability some of the drugs are successful. These winning drugs provide an immense profit for the company, as well as a large benefit to society. Having one winning drug can be enough for a company to be successful. For another example, consider a studio that has to select a set of movies to produce. The studio will be successful if they can produce one blockbuster. In these examples, one approach to the problem is to select a portfolio of items (drugs, movies, startup companies) to maximize the probability that at least one of them has exceptional performance, or “wins”.

1.1 Our Contributions

In this work we present a data driven approach to evaluate the quality of startup compa- nies. Our results are especially useful to venture capital firms which may have to analyze thousands of startup companies before making investment decisions. We provide a way to quantify the quality of a startup company. Our work involves a large, comprehensive dataset on startup companies, a novel model for their evolution, and a new approach to validate our model. Our first contribution is the collection and analysis of a massive dataset for several thou- sand companies. This data consists of information on the companies’ funding round dates, investors, and detailed professional and educational history of the founders. Exploratory analysis of this data reveals interesting properties about the progression of the funding rounds of startup companies and factors that impact their success. Our second contribution is a predictive model for the evolution of a startup company’s funding rounds. This models describes a startup company using a Brownian motion with a company dependent drift and diffusion. The first passage times of the Brownian motion at different levels correspond to the startup company receiving a new round of funding, and the highest level corresponds to the company exiting. One useful property of our model is that it naturally incorporates the censoring that occurs due to limited observation times. We estimate the model parameters using a Bayesian approach. This allows us to perform a

17 statistical analysis of features that impact company success and how this impact varies by industry. Our third contribution is a validation of our model with picking winners portfolios. We build such portfolios for future years using model estimation results on past data. We find that our portfolios achieve exit rates as high as 60%. The performance of our portfolios exceeds that of top venture capital firms. This not only provides evidence supporting our model, but it also demonstrates its potential operational utility for investment decisions. Our final contribution is a web platform which makes our data analysis, model esti- mation, and portfolio building results publicly available to venture capital investors. In addition to both a company search feature and a portfolio building feature, our platform provides a company tracking and portfolio saving feature. The platform has applications in helping venture capital investors understand which features about a company are anoma- lous and indicative of success from a data perspective, prioritize companies to look at, and also in training new investors to make better portfolios. The remainder of the paper is structured as follows. We begin with a literature review in Section 2. We present an exploratory analysis of our startup company data in Section 3. In Section 4 we introduce our Brownian motion model for startup funding. We take a Bayesian approach to develop four variations of our core model in Section 5. In Section 5.5 we discuss the estimation of our model which is further detailed in Appendix F. We study the insights gained from the four estimated Brownian motion models in Section 6. In Section 7 we formulate the picking winners problem and discuss some of its theoretical properties and relationship with submodular optimization and log-optimal portfolios. We present the performance of picking winners portfolios based on our model a in Section 8. In Section 9 we describe in detail the functionality of a web platform we built to make our results publicly accessible to venture capital investors. We conclude in Section 10. All proofs as well as more detailed information about the data, model formulation, and model estimation are included in the appendix.

18 Chapter 2

Literature Review

There have been several studies that take a quantitative approach to determining what fac- tors are correlated with startup company success. It is argued in Porter and Stern (2001) that geographic location is a important factor for success in innovative activity. The work in (Guzman and Stern 2015b) and (Guzman and Stern 2015a) uses a logit model to pre- dict startup success based on features such as company name, patents, and registration, and then use the model to assess the entrepreneurial quality of geographic locations. In (Azoulay et al. 2018) it is found that that successful entrepreneurs are generally older. Eesley et al. (2014) has found that founding team composition has an effect on the startup’s ultimate performance. In particular, this study found that more diverse teams tend to exhibit higher performance, where higher performance is measured by exit rate. Additionally, re- searchers have considered how startup success correlates with how connected the founders are within a social network (Nann et al. 2010). Another study (Gloor et al. 2011) considers how startup entrepreneurs’ e-mail traffic and social network connectedness correlate with startup success. In Marmer et al. (2011), the researchers studied how the founders of Sil- icon Valley startups influence a company’s succes and found that the founders’ ability to learn and adapt to new situations is most important to company success. In (Xiang et al. 2012) the authors use topic modeling on news articles to predict . Nanda et al. (2017) find that with venture capital firms, successful investments increase the likelihood of success in future investments. The question of building portfolios of startup companies was studied in (Bhakdi 2013),

19 where a simulation-based approach was used to properly structure a portfolio of early-stage startups to adjust for market volatility. In the private sector, several venture capital firms have been adopting analytical tools to assist in the investing process. Ventures (Google 2018), the venture capital arm of Alphabet, Inc., is known to use quantitative algo- rithms to help with investment decisions (Kardashian 2015). Social Capital (SocialCapital 2018) is a venture capital firm that has a service called capital-as-a-service which makes investment decisions based on data about a company (Carroll 2017). Correlation Ventures (Correlation Ventures 2018), a venture capital firm with over $350 million under manage- ment, uses a predictive model to make startup company investment decisions. Correlation Ventures has truly adopted a quantitative approach to their venture capital decisions — they ask startups to submit five planning, financial, and legal documents, they take only two weeks to make an investment decision, and they never take administrative board seats. Interestingly enough, they only take co-investment opportunities, and they rely on some vetting by a primary investor (Leber 2012).

There is a large body of research related to the picking winners problem. There is a natural relationship between this problem and a variety of covering problems. There is the problem of placing a fixed number of sensors in a region to detect some signal as fast as pos- sible. This has been studied for water networks (Leskovec et al. 2007) and general spatial regions (Guestrin et al. 2005). There is also a problem in information retrieval of selecting a set of documents from a large corpus to cover a topic of interest (Agrawal et al. 2009, Chen and Karger 2006, El-Arini et al. 2009, Swaminathan et al. 2009). Another related problem is selecting a set of users in a social network to seed with a piece of information in order to maximize how many people the message reaches under a social diffusion process (Kempe et al. 2003). Another way to view this problem is that one wants the message to reach a certain target user in the network, but one does not know precisely who this user is, so one tries to cover as much of the network as possible. One can view the sensors, documents, or seed users as items in a portfolio and winning means one of the sensors detects the signal, one of the documents covers the relevant topic, or one of the seed users is able to spread the message to the unknown target user.

The similarity between picking winners and these covering problems has important im-

20 plications. The objective functions in these covering problems are submodular and maxi- mizing them is known to be NP-hard (Karp 1975). The same holds for the picking winners problem. The submodular property is important because it means that one has a lower bound on the performance of a greedy solution (Nemhauser et al. 1978). This fact, plus the simplicity with which one can calculate greedy solutions, motivates us to use a greedy approach to pick winners. There are also interesting connections between the picking winners problem and tradi- tional portfolio construction methods. For instance, if the items have binary returns, mean- ing the return is either very high or very low, then there are striking similarities between the optimal portfolio in the picking winners problem and what is known as the log-optimal portfolio (Cover 1991). In fact, we show in Section 7 that when there is independence the portfolios are equivalent, and under certain conditions, the picking winners portfolio produces a strong solution to the log-optimal portfolio selection problem.

21 22 Chapter 3

Data Analysis

We collected a large amount of relevant data regarding startup company performance and features. For each company, there were two categories of data we acquired: the time it reaches different funding rounds and features relating to its sector, investors, and leader- ship. We obtained our data from three different sources. The first is Crunchbase (Crunch- Base 2018), a public database of startup companies. The information in Crunchbase is provided by public users, subject to approval by a moderator (Kaufman 2013). The second is a privately maintained database known as Pitchbook that also collects information on startup companies (Pitchbook 2018). The two databases combined provide us with data for over 83,000 companies founded between 1981 and 2016. These databases give us in- formation about when a company receives a funding round or achieves an exit, and which investment groups are involved in a particular funding round. Additionally, these databases provide us with information on over 558,000 company employees. In particular, we have some information relating to which companies a person has worked for, their roles within these companies, and their employment timeline. To build on this person-level informa- tion, we also use LinkedIn, a business and employment-focused social network (LinkedIn 2018). Including LinkedIn data gave us complete educational and occupational histories for the employees.

23 3.1 Funding Rounds Data

A startup company typically receives funding in a sequence of rounds. The initial funding is known as the seed round. After this, the company can receive funding in a series of ‘alphabet rounds’ which are called series A, series B, etc. The alphabet rounds typically do not go beyond series F. Additionally, a company can reach an exit, which is when a startup company is either acquired or has an IPO. Crunchbase provided the funding round and exit dates for each startup company, while the Pitchbook data that we used only provided the founding date. Therefore, our base set of companies was dictated by Crunchbase. We used Pitchbook to resolve founding date errors in Crunchbase. For instance, there were some companies in Crunchbase where the listed founding date occurred after later funding rounds. In these cases, we ignored the Crunchbase founding date and used the Pitchbook date.

For our analysis we only use companies founded in 2000 or later. We use this cutoff because before this year the databases did not contain very many unsuccessful companies, creating a biased dataset. Furthermore, for our analysis we are particularly interested in measuring the likelihood of a company in an early round eventually reaching an exit. There- fore, we only consider startup companies where we have reliable information on their seed or series A funding rounds. In particular, if we do not have information on when a startup company reached one of these early rounds, then we omit them from our analysis. Addi- tionally, we chose to focus on startup comanies that are founded in the United States. Using 2000 as a cutoff year and only considering American companies with reliable information for their early rounds left us with approximately 24,000 companies founded between 2000 and 2016. We plot the number of companies founded in each year in Figure 3-1. As can be seen, there are a few companies in the early years, but by 2011 the number of companies increases by a large amount. This increase is most likely due to the fact that Crunchbase was founded in 2011 and after this date many companies began entering their information into the database. The drop in the number of companies in 2016 is due to the fact that we collected our data in the middle of 2016.

We first wanted to understand the distribution of the maximum funding round achieved

24 (as of 2016) for these companies. We plot in Figure 3-1 this distribution broken down by year. We observe a few interesting properties from this figure. First, the average fraction of companies founded in a given year that have an IPO is typically a few percent. For acquisitions, this value goes up to 21%, but as can be seen, the fraction of acquisitions changes a great deal over the years. After 2011, the fraction of acquisitions falls below 14% and decreases each subsequent year. The decrease is most likely due to data censoring, as companies founded in these later years may not have had enough time to exit. From 2000 to 2002 the fraction of acquisitions actually increases. This may be due to the small number of companies in our dataset and sampling bias leading to an under-reporting of less successful companies. We next wish to understand the temporal dynamics of the funding rounds. Figure 3-2 shows the evolution of funding rounds for different well known companies that achieved IPOs. As can be seen, the rate of achieving an IPO fluctuates between companies. The boxplot in Figure 3-2 shows the distribution of the time to hit the different rounds. From here it can be seen that IPOs happen after about six years, but acquisitions typically take three years. The time between the other funding rounds (series B, series C, etc.) is about a year. The typical variation in these times across companies is about one to two years. Furthermore, we can see that most of the acquisitions happen around the same time that most companies are at series C funding. Beyond this point, there are fewer companies that get acquired, thus indicating that as a company gets older and progresses past series C, it’s exit will more likely be an IPO rather than an acquisition if it ever exits.

3.2 Sector, Investor, and Leadership Data and Features

We now describe the data and features that we used in our model. We were able to build a wide-variety of features for our model, ranging from simple features such as a company’s sector, to more complex features relating to the similarity of the academic and occupational backgrounds of the startup founders. The features for each company are constructed using only information known at the time the company received it’s first funding round, which we refer to as the baseline date of the company. We now describe these features in more

25 1 4000 Seed 0.9 A B 3500 C 0.8 D E 3000 F 0.7 Acquired IPO

2500 0.6

2000 0.5

0.4 Fraction 1500

0.3 1000 0.2 Number of companies

500 0.1

0 0 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022 Year founded Year founded

Figure 3-1: (left) Histogram of the founding year of the startup companies in our dataset. (right) Distribution of the maximum funding round achieved up to 2016 broken down by company founding year.

Exit Twitter 16 Yelp Series F Facebook Etsy 14

Series E 12

10 Series D

8

Series C Years 6

Series B 4 Funding Round Series A 2

0 Founding 2004 2006 2008 2010 2012 2014 2016 Series B Series C Series D Series E Series F Acquisition IPO Year Funding Round

Figure 3-2: (left) Plot of the evolution of funding rounds for different startup companies. (right) Boxplot of the time to reach different maximum funding rounds before 2016.

26 detail. Sector Data and Features. The majority of the features that were used in our model re- late to the sector that a startup company was involved in. In particular, from Crunchbase we were able to obtain sector labels for each company, with some of the companies belonging to multiple sectors. While Crunchbase has informative labels for many different features, we chose to include binary indicators for a subset of them. We made a point to include a wide-variety of sectors in our model, ranging from Fashion to Artificial Intelligence. We provide a complete list of the sectors that we used in Appendix B. Investor Data and Features. From the Crunchbase data, we are able to construct a dynamic network of investors and companies, such that there is a node for every investor and company that exists at the particular time of interest, and there is an edge connecting an investor to a company if the investor has participated in a funding round for that company before the particular time of interest. We construct this dynamic network using all of our available data – meaning that we consider roughly 83,000 companies and 48,000 investors. We derive features for an individual company based on this dynamic network at the time when the company had its first funding round. Recall that we omit companies without reli- able time information for their seed or series A funding rounds. Therefore, for a particular company we consider the dynamic network of investors at the time of the earliest of these rounds. Using this dynamic network, we construct a feature that we call the investor neighbor- hood. For a company 푖 having an earliest funding date of 푡푖, the value of this feature is the number of startup companies in existence before year 푡푖 that share at least one investor in common with company 푖. We then normalize this value by the total number of companies founded before 푡푖. This feature measures the relative reach of the company’s investors. Another feature that is derived from this dynamic network is the maximum IPO fraction.

For each initial investor 푗 connected to company 푖, we define 푓푗 as the fraction of companies connected to 푗 at 푡푖 that also had an IPO before 푡푖. The feature value is then the maximum value of 푓푗 amongst all initial investors in 푖. A related feature we define is called maximum acquisition fraction. This feature is identical to maximum IPO fraction, except we use the fraction of companies that were ac-

27 2011 Max. acquisition fraction <0.5 2011 Max. acquisition fraction >0.5 709 companies 8 companies IPO

Acquisition Acquisition

Series A Series A Series F Series E Series D

Series C Series C

Series B Series B

Figure 3-3: Pie graph of the funding round distribution of companies with maximum ac- quisition fraction less than 0.5 (left) and greater than 0.5 (right).

quired rather than the fraction that had an IPO. Both maximum IPO fraction and maximum acquisition fraction are measures of the success rate of a company’s initial investors. To visualize the impact of the investor features, we plot the distribution of companies final funding round conditioned on the maximum acquisition fraction feature (rounded to the nearest whole number) in Figure 3-3.We can see that the exit rate is slightly higher for companies with a high value for this feature. To be precise, among companies where this feature is less than 0.5 (709 companies), 22% are acquisitions and 1% are IPOs, while among companies where this feature is greater than 0.5 (8 companies), 25% are acquisi- tions. This is a small difference, but still seems to agree with previous studies (Nanda et al. 2017). However, because of the small number of companies with a high value for this feature, one cannot make any strong conclusions from Figure 3-4. Competitors Data and Features. We also construct a dynamic network of companies where edges between companies signify that they are competitors. Specifically, a directed

edge 푒푖푗 exists between company 푖 with baseline date 푡푖 and company 푗 with baseline date

푡푗 < 푡푖 if at least one of company 푖 or 푗 listed the other as a competitor on Crunchbase or if both companies 푖 and 푗 listed the same company 푘 with baseline date 푡푘 < 푡푖 as a customer on Crunchbase. Using this network, we define features competitors A, competitors B, competitors C, competitors D, competitors E, competitors F, competitors acquisition, and competitors

28 2011 Competitors acquisition <0.5 2011 Competitors acquisition >0.5 713 companies 4 companies IPO

Acquisition

Series A Acquisition Series F Series E Series A Series D

Series C Series B Series B

Figure 3-4: Pie graph of the funding round distribution of companies with competitors acquisition less than 0.5 (left) and greater than 0.5 (right).

IPO. Each of these features for a company 푖 is calculated as the number of edges 푒푖푗 point- ing to companies 푗 whose highest round at time 푡푖 is the specified round for the feature di- vided by the out-degree of 푖. Additionally, for every company 푖 we include a feature called had competitor info which is a binary indicator for whether the company self-reported it’s competitors on Crunchbase. We include had competitor info as a feature because whether or not founders were willing to self-report their competitors is potentially a psychological factor which influences whether or not the founders are able to run a successful company. To visualize the impact of the competitor features, we plot the distribution of compa- nies final funding round conditioned on the competitors acquisition feature (rounded to the nearest whole number) in Figure 3-4. We can see that the exit rate is slightly higher for companies with a high value for this feature. Among companies where this feature is less than 0.5 (713 companies), 22% are acquisitions and 1% are IPOs, while among companies where this feature is greater than 0.5 (4 companies), 25% are acquisitions. As with the investment feature from Figure 3-3, the huge imbalance in companies prevents one from making any strong conclusions from Figure 3-4. Leadership Data and Features. The leadership features of a company are derived from the Crunchbase and LinkedIn data for its founders and executives. We are particularly interested in measuring the experience, education, and ability of the company’s leadership. We first use the Crunchbase data to consider the employees, executives, and advisors of

29 the company of interest. We construct indicator features Job IPO, Job Acquired, Executive IPO, Executive Acquired, Advisory IPO, and Advisory Acquired which indicate if someone affiliated with the company has been part of a previous company that was either acquired or reached an IPO before our date of interest. In particular, for the Job variables we consider people who work for the company but are not executives or advisors, for the Executive variables we consider people who are labeled as an executive in the company, and for the Advisor variables we consider people who are labeled as advisors but not executives.

Next we consider the features based on the LinkedIn data. Note that we took particular care to ensure that we only use LinkedIn information that could be known before our time of interest. One LinkedIn feature that we use is previous founder which is the fraction of the leadership that had previously founded a company before the given company’s baseline date. We also use number of companies affiliated, which is the average number of compa- nies each leadership member was affiliated with before joining the given company. A final pair of experience features is work overlap mean and work overlap standard deviation. To construct these features, we calculate the Jaccard index of previous companies for each pair of leadership members. The Jaccard index is defined as the intersection of previous com- panies divided by the union of previous companies for the pair of members. We then take the mean and standard deviation of these values across all pairs of leadership members to obtain the two features. We chose to construct these feature using the LinkedIn data instead of the Crunchbase data, because we often found that the LinkedIn information suggested that a person had previous startup experience, when the Crunchbase data did not contain this information. We believe that this is due to the fact that Crunchbase is slightly biased to include companies that were somewhat successful.

An education feature we use is from top school which is the fraction of leadership members that went to a top school. This list of top schools was created from a combination of known top rankings (U. S. News and World Report National Universities Rankings 2017) and our own knowledge and is provided in Appendix C. We also have features for the highest education level of the leadership, measured by degree received. These features are high school, bachelors, master’s, and Ph.D. For each degree, we measure the fraction of leadership members whose maximum education level equals that degree to obtain the

30 feature value. We also have features based on the education and academic major overlap. These fea- tures are education overlap mean, education standard deviation, major overlap mean, and major standard deviation. For each one we calculate the relevant Jaccard index over all pairs of leadership members, and then take the mean and standard deviation. For educa- tion, the Jaccard index is taken with respect to the schools attended by each member and for major, the Jaccard index is taken with respect to the academic majors of each member. A more complex feature we use is major company similarity which captures the similar- ity of the academic major of the leadership and the company sector. We use the WordNet lexical database to create a semantic similarity score between each member’s major and the sector of their company (NLTK 3.0 documentation 2017). We use the Palmer-Wu sim- ilarity score, which measures the similarity of words in a semantic network based on the distances to their most common ancestor and to the root (Wu and Palmer 1994). This score is zero for totally different words and one for equivalent words. We average the Palmer-Wu major-sector similarity score for each member to obtain the feature value. A final feature we use is leadership age. This is simply the average age of all leader- ship members when the company was founded. To estimate the age we assume that each member is 18 when finishing high-school and 22 when finishing undergraduate education. With this assumption, we set the age of a member to be equal to 22+ company founding year − year member received/would have received his undergraduate degree. For emphasis, we mention again that great care was taken to ensure that the information that we used from LinkedIn did not violate causality. In particular, we made sure that an investor could have known the information that we acquired from LinkedIn when they were considering investing in a startup. In order to understand the impact of the leadership features, we plotted the distribution of companies by funding round with the investor features rounded to the nearest whole number. Shown in Figure 3-5 is an example for the feature Executive IPO. From these graphs, we can see that there is a higher exit rate for companies with a high Executive IPO value. The large difference in the exit rates suggests that previous success is a good indicator of future success.

31 2011 Executive IPO <0.5 2011 Executive IPO >0.5 542 companies 175 companies IPO IPO

Acquisition Series F Series A Series E Acquisition Series D Series A

Series C Series B

Series C Series B Series F Series E Series D

Figure 3-5: Pie graph of the funding round distribution of companies with Executive IPO less than 0.5 (left) and greater than 0.5 (right).

3.2.1 Imputation of Missing Data

For many of the LinkedIn features we were unable to obtain data. We still needed some value for these features to use the resulting feature vector in our models. To do this we imputed the missing values. We concatenated all the 푀 dimensional feature vectors for 푁 companies, creating a 푀 × 푁 matrix with missing values. To impute these missing values, we use a matrix completion algorithm known as Soft-Impute which achieves the imputation by performing a low-rank approximation to the feature matrix using nuclear norm regularization (Mazumder et al. 2010). Soft-Impute requires a regularization param- eter and a convergence threshold. To obtain the regularization parameter, we replace all missing values with zero and then calculate the singular values of the resulting matrix. We set the regularization parameter to be the maximum singular value of this filled in matrix divided by 100 and we set the convergence threshold to 0.001. We then apply Soft-Impute with these parameters to the incomplete feature matrix to fill in the missing values. This imputed feature matrix is used for all subsequent model fitting and prediction tasks. Note that we repeat this process for each training set and testing set that we consider, to ensure that we are not violating causality.

32 Chapter 4

Model for Startup Company Funding

We now present a stochastic model for how a company reaches different funding rounds and the relevant probability calculations needed to construct portfolios. Our model captures the temporal evolution of the funding rounds through the use of a Brownian motion process.

4.1 Brownian Motion Model for Company Value

We assume a company has a latent value process 푋(푡) which is a Brownian motion with time-dependent drift 휇(푡) and diffusion coefficient 휎2(푡). The latent value process 푋(푡) is initiated with value 0 when the company receives its first funding round. The set of possible funding rounds that we use is ℛ = {Seed, A, B, C, D, E, F, Exit}. We denote each funding round by an index 0 ≤ 푙 ≤ 7, with 푙 = 0 corresponding to seed funding, 푙 = 1 corresponding to series A funding, etc. The final level is 푙 = 7, which corresponds to

exiting. For each funding round 푙 we have a level ℎ푙 ≥ 0, and the levels are ordered such

that ℎ푙−1 ≤ ℎ푙. We choose the spacing of these levels to be linear, but one is free to choose

them arbitrarily. In our model, we let ℎ푙 = ∆푙 for some ∆ > 0. A company receives round

푙 of funding when 푋(푡) hits ℎ푙 for the first time, and we denote this time as 푡푙. Therefore, the time to receive a new round of funding is the first passage time of a Brownian motion. The first passage time distribution for a Brownian motion with arbitrary time-varying drift and diffusion terms is difficult to solve. In particular, one must solve the Fokker- Planck equation with an appropriate boundary condition (Molini et al. 2011). However,

33 we can solve the Fokker-Planck equation exactly with the appropriate boundary condition when the ratio of the drift to the diffusion is constant in time. Therefore, we assume that

2 2 the drift and diffusion terms are of the form 휇(푡) = 휇0푓(푡) and 휎 (푡) = 휎0푓(푡), where 휇0, 2 휎0, and 푓(푡) are appropriately chosen. Under these assumptions, we have the following standard result for the first passage time distribution.

Theorem 4.1.1 [Molini et al. (2011)] For a Brownian motion 푋(푡) with drift 휇(푡) =

2 2 휇0푓(푡), diffusion 휎 (푡) = 휎0푓(푡), and initial value 푋(푣0) = 0, let 푉훼 = inf푡>푣0 {푋(푡) ≥ 훼}

denote the first passage time to level 훼 > 0 after time 푣0. Then, the probability density func-

tion (PDF) of 푉훼 is

2 (훼−푀)2 휎 (푣)훼 − 푓0(푣; 푣0, 휇(푡), 휎(푡), 훼) = √ 푒 4푆 (4.1.1) 16휋푆3

and the cumulative distribution function (CDF) is

(︂푀 − 훼)︂ {︂푀훼}︂ (︂−(푀 + 훼))︂ 퐹0(푣; 푣0, 휇(푡), 휎(푡), 훼) = Φ √ + exp Φ √ . (4.1.2) 2푆 푆 2푆

푣 푣 ∫︀ 1 ∫︀ 2 where 푀 = 휇(푠)d푠, 푆 = 2 휎 (푠)d푠, and Φ(·) is the standard normal CDF. 푣0 푣0

4.2 Modeling drift and diffusion

From Section 3, recall that companies which succeed typically take a constant amount of time to reach each successive funding round, which motivates these companies having a positive and constant drift coefficient for some time period. Additionally, many companies succeed in reaching early funding rounds somewhat quickly, but then they fail to ever reach an exit. This motivates these companies having a drift that decreases in time. Lastly, as time gets very large, a company that has not achieved an exit will most likely not achieve a new funding round. This translates to our latent value process moving very little as time gets large, which can be modeled by having the drift and diffusion terms moving toward zero. To incorporate these properties, we use the following model for the drift and diffusion

34 coefficients:

(︂ 푡−휈 )︂ − 휇(푡) = 휇0 1 {푡 ≤ 휈} + 푒 휏 1 {푡 > 휈}

(︂ 푡−휈 )︂ − 2 2 휏 휎 (푡) = 휎0 1 {푡 ≤ 휈} + 푒 1 {푡 > 휈}

2 where 휇0, 휎0, 휈, and 휏 are appropriately chosen based on the data. Under this model, the drift and diffusion are constant for a time 휈, after which they decay exponentially fast with time constant 휏. In our model, every company will have the same 휈 and 휏. However, each company will have a different drift term 휇0 and diffusion term 휎0 that will be determined by its features. R푀 2 For a company 푖 we define a feature vector x푖 ∈ , a drift 휇푖0, and a diffusion 휎푖0. We 푀 define a parameter vector 훽푦 ∈ R for year 푦. A company 푖 that has baseline year (earliest 푇 funding year) 푦 has 휇푖0 = 훽푦 x푖. Furthermore, we set 훽푦+1 = 훽푦 +휖 where 휖 = [휖1, . . . , 휖푀 ] 2 and 휖푖 is normally distributed with zero mean and variance 훿푖 for 1 ≤ 푖 ≤ 푀. This time- varying coefficient model allows us to capture any dynamics in the environment which could increase or decrease the importance of a feature. For instance, consider a feature which is whether or not the company is in a certain sector. The coefficient for this feature would change over time if the market size for this sector changes. Additionally, this time- varying model allows us to capture uncertainty in future values of the drift weights, which we will utilize in Section 7.4 to construct the joint distribution of first passage times for the companies. Additionally, we found that the volatility of the funding rounds data varies substantially between sectors. For instance, companies in some sectors will frequently achieve early funding rounds very quickly, but then fail to ever reach an exit. To model this phenomenon, we introduce heteroskedasticity into the model and allow each company to have its own

푀 2 diffusion coefficient. We define another parameter vector 훾 ∈ R , and we let 휎푖0 = (︀ 푇 )︀2 푔 훾 x푖 . We let 푔(푧) be the positive region of the hyperbola with asymptotes at 푧 = 0 and 푔(푧) = 푧. The exact definition and motivation behind 푔(푧) are in Appendix D. In practice, we considered many other models for the drift and diffusion terms. We found that the model presented here performed well with respect to fitting the data, and ad-

35 ditionally we found this model to be intuitive and interpretable. Therefore, we will restrict our future results to those that were obtained using this particular model.

4.3 The Data Likelihood

Using our definitions of drift and diffusion, we can now define the likelihood for the ob- served data of a given company. For notation, suppose that we observe a company at time

푡표푏푠. Before this time it reaches a set of funding rounds with indices i = {푖1, 푖2, ...푖퐿} . The

random variables corresponding to the times of these rounds are t = {푡푖1 , 푡푖2 , ..., 푡푖퐿 } (we assume 푖0 = 0 and normalize the times so 푡푖0 = 0). There may be missing data, so the funding rounds are not necessarily consecutive. For a given company 푐, we use t푐 for the

times of the funding rounds, x푐 for the feature vector, and 푐푦 for the baseline year.

If an exit is observed for a company 푐, then 푖퐿 = 7 and by the independent increments property and translation invariance of Brownian motion, the likelihood is given by

퐿 ∏︁ P(t푐, 푡표푏푠, x푐, ℎ푖퐿 = 7∆|훽푐푦 , 훾, 훿, 휈, 휏) = 푓0(푡푖푙 ; 푡푖푙−1 , 휇(푡), 휎(푡), ℎ푖푙 − ℎ푖푙−1 ), (4.3.1) 푙=1 where 푓0 is defined in equation (4.1.1).

If the company does not exit before 푡표푏푠, then 푖퐿 < 7 and there is a censored inter-round time which is greater than or equal to 푡표푏푠 − 푡푖퐿 . In this case the likelihood is given by

P(t푐, 푡표푏푠, x푐, ℎ푖퐿 < 7∆|훽푐푦 , 훾, 훿, 휈, 휏) = 퐿 ∏︁ (1 − 퐹0(푡표푏푠; 푡푖퐿 , 휇(푡), 휎(푡), ∆)) 푓0(푡푖푙 ; 푡푖푙−1 , 휇(푡), 휎(푡), ℎ푖푙 − ℎ푖푙−1 ), (4.3.2) 푙=1 where 푓0 and 퐹0 are defined in equations (4.1.1) and (4.1.2) respectively.

For simplicity in notation, let the set of companies be 퐶 and let T = {t푐, i푐, {ℎ푖푗 |∀푖푗 ∈ i푐}|∀푐 ∈ 퐶} be the set of time series information (funding rounds levels and time to reach each level) for all companies 푐 in our data. Furthermore, let X be the data matrix containing the features for all companies in our data.

We use the following additional definitions: 퐶푒 is the set of companies in our data that

36 have exited, 퐶푓 is the set of companies in our data that have not exited, 훽 be the set of all

훽푦, and 푐푦 is the baseline year of company 푐. Using this notation we can write the full data likelihood as

P(T, X|훽, 훾, 훿, 휈, 휏) = (︁ ∏︁ )︁(︁ ∏︁ )︁ P(t푐, 푡표푏푠, x푐, ℎ푖퐿 = 7∆|훽푐푦 , 훾, 훿, 휈, 휏) P(t푐, 푡표푏푠, x푐, ℎ푖퐿 < 7∆|훽푐푦 , 훾, 훿, 휈, 휏) . 푐∈퐶푒 푐∈퐶푓 (4.3.3)

37 38 Chapter 5

Bayesian Model Specification

In this section we present four variations of the Brownian motion model using a Bayesian approach which we focus our analysis on for the remainder of the paper. Throughout this section, we use the subscript 푖 to denote the 푖th component of a parameter vector, and denote 풩 (휇, 휎2) as a normal distribution with mean 휇 and variance 휎2, ℐ퐺(푎, 푏) as an inverse gamma distribution with shape parameter 푎 and scale parameter 푏, 퐸푥푝(휆) as an exponential distribution with mean 휆−1, and 풰(푎, 푏) as a uniform distribution on [푎, 푏].

5.1 Homoskedastic Model

The first variation of the model that we consider is a homoskedastic model. For this varia-

2 tion, we let 휎0 be the same for all companies rather than computing it as a function of the company features and omit the parameter 훾 from the model.

Here, the posterior distribution of the parameters is as follows:

푌 2 2 (︁ ∏︁ )︁ 2 P(훽, 휎0, 훿, 휈, 휏|T, X) ∝ P(T, X|훽, 휎0, 훿, 휈, 휏) P(훽푦|훽푦−1) P(훽0)P(휎0)P(훿)P(휈)P(휏), 푦=1 (5.1.1)

39 with prior distributions defined as follows:

훽푦푖 ∼ 풩 (훽(푦−1)푖, 훿푖)

2 훽0푖 ∼ 풩 (휇훽, 휎훽)

2 휎0 ∼ ℐ풢(푎휎2 , 푏휎2 ) 0 0 (5.1.2) 훿푖 ∼ 퐸푥푝(휆0)

휈 ∼ 풰(푎휈, 푏휈)

log 휏 ∼ 풰(푎휏 , 푏휏 ).

The specification of the priors can be found in Appendix F.

5.2 Heteroskedastic Model

The second model we introduce is a heteroskedastic model with drift and diffusion modeled as described in Section 4.2.

For this model, the posterior distribution of our parameters Θ ={훽푦 for all 푦 in our data, 훾, 훿, 휈, and 휏} given the data over 푌 total years can be written as:

푌 (︁ ∏︁ )︁ P(훽, 훾, 훿, 휈, 휏|T, X) ∝ P(T, X|훽, 훾, 훿, 휈, 휏) P(훽푦|훽푦−1) P(훽0)P(훾)P(훿)P(휈)P(휏), 푦=1 (5.2.1) with prior distributions defined as follows:

훽푦푖 ∼ 풩 (훽(푦−1)푖, 훿푖)

2 훽0푖 ∼ 풩 (휇훽, 휎훽)

2 훾푖 ∼ 풩 (휇훾, 휎훾) (5.2.2) 훿푖 ∼ 퐸푥푝(휆0)

휈 ∼ 풰(푎휈, 푏휈)

log 휏 ∼ 풰(푎휏 , 푏휏 ).

The specification of the priors can be found in Appendix F.

40 5.3 Robust Heteroskedastic Model

We introduce a modified version of the heteroskedastic model described in Section 5.2. Note that the likelihood for companies that have not exited given by equation (4.3.2) di- rectly encodes the information that a company has not exited by multiplying the funding

round time series by 1 − 퐹0(·), where 1 − 퐹0(·) is the probability that the company does not proceed to the next round. However, equation (4.3.1) for companies that have exited does not directly encode such information. Instead, if a company exits, the sum of the in-

crements ℎ푙 − ℎ푙−1 of its brownian motion will equal 7∆, which indirectly encodes the fact that the company has exited. Note that we can use the cumulative distribution function of the Inverse Gaussian distribution to directly encode information that a company has exited similar to how we use it to encode information that a company has not exited in equation (4.3.2). We do so by multiplying the likelihood in equation (4.3.1) by the inverse Gaussian

CDF between the values 0 and 푡표푏푠, which is the timeframe in which we observed the com- pany. For this model, we modify the data likelihood for a company that has exited from equation (4.3.1) to the below equation:

P(t푐, 푡표푏푠, x푐, ℎ푖퐿 = 7∆|훽푦, 훾, 훿, 휈, 휏) 퐿 ∏︁ = 퐹0(푡표푏푠; 0, 휇(푡), 휎(푡), 7∆) 푓0(푡푖푙 ; 푡푖푙−1 , 휇(푡), 휎(푡), ℎ푖푙 − ℎ푖푙−1 ). (5.3.1) 푙=1

We use the same posterior distributions as the Bayesian heteroskedastic model described in Section 5.2. Note that another way to view our modification is that we are proposing that the actual time series of a company that has exited matters less than simply the fact that the company has exited. By adding this extra term to the likelihood, we are adding a term that averages out all possible funding paths a company could have taken to exit within the time period 0 to 푡표푏푠 and including direct information that the company has exited. We have found that this method performs substantially better than the unmodified het- eroskedastic model on the portfolio selection problem as shown in Section 8. We will refer to this model as the robust heteroskedastic model while we refer to the unmodified model

41 as the heteroskedastic model for the remainder of the paper.

5.4 Hierarchical Bayesian Model

Companies in different industries may have different dynamics in terms of obtaining fund- ing rounds and achieving exits. In order to capture this phenomenon, we propose a hi- erarchical Bayesian model where companies are grouped by industry. In this section, we present the hierarchical structure for this model. Assign each company in the dataset to a group 푔 among 퐺 total groups and let each

푔 푔 푔 푔 푔 푔 group 푔 have its own set of parameters Θ ={훽푦 for all 푦 in our data, 훾 , 훿 , 휈 , and 휏 }. Here, we use the same posterior as in equation (5.2.1) with the exception being that the likelihood for each company is computed using the parameters for that company’s group. The group parameters are each drawn from a prior specified by a set of global parameters common to all groups. The resulting hierarchical priors on the parameters are as follows:

푔 푔 푔 훽푦푖 ∼ 풩 (훽(푦−1)푖, 훿푖 )

푔 ′ ′2 ′ ′2 훽0푖|휇훽, 휎훽 ∼ 풩 (휇훽, 휎훽 )

′ 2 휇훽 ∼ 풩 (휇훽, 휎훽)

′2 휎 ∼ ℐ풢(푎 2 , 푏 2 ) 훽 휎훽 휎훽 푔 ′ ′2 ′ ′2 훾푖 |휇훾, 휎훾 ∼ 풩 (휇훾, 휎훾 )

′ 2 휇훾 ∼ 풩 (휇훾, 휎훾) (5.4.1)

′2 2 2 휎훾 ∼ ℐ풢(푎휎훾 , 푏휎훾 )

푔 ′ 훿푖 |휆0 ∼ 퐸푥푝(휆0)

휆0 ∼ 풢(푎휆0 , 푏휆0 )

푔 휈 ∼ 풰(푎휈, 푏휈)

푔 log 휏 ∼ 풰(푎휏 , 푏휏 ).

The specification of the priors on the global parameters can be found in Appendix F.

42 5.5 Model Estimation

To estimate the models we first select a value for the level spacing ∆ and for the observation year 푡표푏푠. We choose ∆ = 10 and we set 푡표푏푠 equal to December 31st of the chosen year.

We include in the training set all companies founded between 2000 and 푡표푏푠. All data used in the model estimation must have been available before 푡표푏푠, otherwise we do not include it. For instance, the observed funding round times occur before 푡표푏푠. If a company had an exit after 푡표푏푠, this information is not included during model estimation. Also, all company features are constructed using data available when the company received its first funding round. We do this because these features are what would have been available to someone who wanted to invest in the company in an early round, which is the investment time frame on which we focus. For the hierarchical model we divided companies into seven groups based on the sector they are in. The seven sectors are: Software, AI, Biotechnology, Healthcare, Hardware, Fi- nance, and None. The “None" group was used for companies with no industry information. Details about how we divided the companies into sector groups are in Appendix E. Addi- tionally, because companies were split into groups based on their industry, we removed all sector features from the feature vectors of the company when estimating the hierarchical model.

We used blocked Gibbs sampling to estimate the model with each parameter vector (훽푦, 훾, 훿) in it’s own block and all scalar values such as 휈, 휏, and the hierarchical parameters in their own block. We found blocked Gibbs sampling to have the best tradeoff between model estimation quality and speed. The exact details for how we sampled each parameter block are in Appendix F. In our estimation, we ran 5 chains each of which containing 25,000 samples. We used a burn-in period of 15,000 and took every tenth sample beyond that to reduce autocorrelation in the samples. Convergence was assessed with the Gelman-Rubin criterion (Gelman and Rubin 1992).

43 44 Chapter 6

Parameter Estimation Results

Here, we present the results of estimating the four models (referred to as homoskedastic, heteroskedastic, robust heteroskedastic, and hierarchical) presented in Section 5. First, we present the deviance information criterion for all models in order to determine which one gives the best tradeoff between number of parameters and in-sample fit. Then, we analyze the statistical significance of the estimated parameter values in order to understand which features are most important in determining the success probability of a company. In Section 8, we will show that the robust heteroskedastic model gives the best startup company port- folios. Therefore, among the homoskedastic, heteroskedastic, and robust heteroskedastic models, we show parameter estimation results for the robust heteroskedastic model in this section. In all sections describing parameter value estimation, we focus our analysis on the model trained with observation year 2010.

6.1 Model Fit Scores

We calculated the deviance information criterion (DIC) (Spiegelhalter et al. 2002) for each of the four models in order to evaluate the tradeoff between the number of parameters in each model and how well each model fits the data. The scores are shown in Table 6.1. Note that a smaller DIC score indicates a better model fit. Among the four models, the heteroskedastic model had the lowest score, and thus the best tradeoff between fit to the data and number of parameters. The robust heteroskedastic

45 Table 6.1: The deviance information criterion scores for each of the four models trained with observation year 2010. Model Type DIC Score Homoskedastic 41026.51 Heteroskedastic 38650.72 Robust Heteroskedastic 52741.08 Sector Hierarchical 44170.65 model had the highest DIC score. This may be due to the fact that when we include the extra terms for the exit probabilities, it is equivalent to adding extra data points. That can result in a lower likelihood value since the probabilities are less than one. However, we will see in Section 8 that this model actually does the best in terms of picking portfolios. This is because the model trades off having not as good of a fit on the training data for generalizing better on the testing data due the modification made to its likelihood function which acts as a regularizer. Lastly, the hierarchical model has a higher DIC than the heteroskedastic model. This is because the hierarchical model has many more parameters, which the DIC score penalizes.

6.2 Robust Heteroskedastic Model

We now study the robust heteroskedastic model estimation results in more detail, as this is the best performing model in terms of portfolio construction (see Section 8). Shown in

Figure 6-1 are boxplots of the 훽2010 samples for selected features, with a separate plot for sector features. We see that industries which typically are not cyclical and known for being constant sources of revenue such as animation, , and advertising are positive and statistically significant, and hence associated with a higher company drift. However, sectors which are known for being highly competitive or over-regulated such as social media, biotechnology, finance, and healthcare are negative and statistically significant, and hence associated with a lower company drift. From the right boxplot which shows non- sector features, we see that having previous entrepreneurship success among the executives of a company yields a higher company drift (the executive ipo and executive acquisition features) while having an education similar to the field of the company results in a lower

46 2010 Robust Heteroskedastic Model 2010 Robust Heteroskedastic Model Sector Features Non-sector Features

GPU Competitors Animation acquisition Competitors B Logistics

Open Source Edu overlap std Insurance Masters Business Intelligence

Social Media Bachelors

Search Engine Age

Biotechnology Education similar to company Education Num companies Finance affiliated

Advertising Executive Acquisition Mobile Executive IPO E-Commerce Job IPO Health Care

-20 -10 0 10 20 30 40 50 60 -30 -20 -10 0 10 20 30 β β2010 2010

Figure 6-1: Boxplots of the 훽2010 sample values for selected sector (left) and non-sector (right) features. company drift. Note that the positive correlation with company exit probability that our model shows for the executive ipo feature matches the empirical analysis of the feature shown in Section 3.2. Shown in Figure 6-2 are boxplots of the 훾 sample values for selected features. From the

left boxplot which shows sector features, we see a similar trend to what we saw in the 훽2010 values in that high risk, highly regulated industries like biotechnology and health are nega- tively correlated with diffusion and thus result in a lower exit probability. However, here we also see that insurance, which creates a positive drift, creates a negative diffusion. For the non-sector features shown in the right boxplot, we see that while competitors acquisition was positively correlated with drift, it is negatively correlated with diffusion which indi- cates that having a lot of competitors which have been acquired results in a lower volatility startup that still has a high chance of exiting. Furthermore, we see the same trend as we saw

with the 훽2010 values in that having previously successful leadership is positively correlated with diffusion and thus results in a higher exit probability. We note that the non-sector features tend to be statistically significant more often than

47 2010 Robust Heteroskedastic Model 2010 Robust Heteroskedastic Model Sector Features Non-sector Features

Had competitor Machine Learning info Competitors acquisition Insurance Competitors F

Messaging Competitors A

Age Medical Device Num companies affiliated Biotechnology Max acquisition proportion

Mobile Advisory IPO

Executive Personal Health Acquisition Job Acquisition

Apps Job IPO

-20 -15 -10 -5 0 5 10 15 -15 -10 -5 γ0 5 10 15 γ

Figure 6-2: Boxplots of the 훾 sample values for selected sector (left) and non-sector (right) features. the sector features. This suggests that non-sector features such as those related to the founding team, executives, and competitors of a company are more important in determin- ing success than the specific industry of the company. Recall that for the time-varying 훽 parameters we have the standard deviation 훿 which characterizes the variation of these parameters over time. We show the top coefficient of variation, defined as the median delta 훿 divided by the absolute value of the average of the medians of the 훽 samples over all years, for both the sector and non-sector features in Tables 6.2 and 6.3. The mean values of these features are much lower than the top 훽

Table 6.2: Posterior mean of 훽, 훿, and coefficient of variation for the sector features with the largest magnitude coefficient of variation. Sector feature Avg. Median 훽 Median 훿 Coefficient of variation Facebook -0.0125 1.558 124.51 Computer -0.0517 3.911 75.71 Open source -0.0628 2.1379 34.05 Biotechnology -0.1699 3.0617 18.02 Health diagnostics 0.1302 1.4768 11.34

48 Table 6.3: Posterior mean of 훽, 훿, and coefficient of variation for the non-sector features with the largest magnitude coefficient of variation. Non-sector feature Avg. Median 훽 Median 훿 Coefficient of variation High school -0.2795 1.951 6.979 Max ipo proportion 0.2777 1.735 6.246 Competitors D 0.2988 0.042 4.599 Education overlap 0.6290 2.580 4.102 standard deviation Phd -0.7373 2.850 3.865 features, which makes their coefficient of variation high. Therefore, these features, despite being highly variable, likely do not strongly impact the exit probability of a company. Finally, we investigate the timing parameters. We find that the posterior distribution of 휈 has a mean of 3.21 years with fifth and ninety-fifth quantiles at 2.99 years and 4.00 years respectively. We find that 휏 has a posterior mean of 3.50 years with fifth and ninety- fifth quantiles at 2.71 years and 4.03 years respectively. The estimated value of 휈 makes sense because the majority of exits in our data are acquisitions and as we saw from our data analysis, acquisitions take approximately three to four years to happen. Beyond the four year mark, we see decay in both the drift and diffusion of the company. The estimated value of 휏, the rate at which the drift and diffusion decay also makes sense based on our data because after 휈 + 휏 ≈ 7 years, very few companies exit.

6.3 Hierarchical Bayesian Model

For the sector hierarchical bayesian model, we divided companies into 7 groups: Software, AI, Biotechnology, Healthcare, Hardware, Finance, and None. The details of the division are in Appendix E. In our analysis in this section, we ignore the group “None" because the goal of the analysis is to understand how the dynamics of companies vary between industries. For the sector hierarchical bayesian model, we use our parameter estimation results to understand how the features of companies which are important to success differ between sectors. Shown in Figures 6-3 and 6-4 are boxplots of the 훽2010 and 훾 samples for selected features across the sector groups.

From the boxplots of the 훽2010 values, we can see notable differences in how the pre-

49 2010 Software 2010 Healthcare 2010 Hardware

had competitor info competitors E competitors ipo competitors C edu overlap avg competitors acquisition

competitors E work overlap avg edu overlap avg

competitors D phd competitors B work overlap avg age masters from top school bachelors education similar to company age Executive Acquisition

num companies affiliated Executive IPO max ipo proportion max acquisition proportion Job Acquisition Executive Acquisition Advisory IPO

Executive IPO Job IPO

-15 -10 -5 0 5 10 15 20 25 -8 -6 -4 -2 0 2 4 6 8 10 -3 -2 -1 0 1 2 3 4 5 2010 2010 2010

2010 Finance 2010 Biotech 2010 AI

competitors D competitors ipo had competitor info

competitors E major overlap avg masters competitors C work overlap avg

age edu overlap std

num companies affiliated edu overlap avg Job Acquisition Advisory Acquisition num companies affiliated

-5 -4 -3 -2 -1 0 1 2 -7 -6 -5 -4 -3 -2 -1 0 1 2 -3 -2 -1 0 1 2 3 2010 2010 2010

Figure 6-3: Boxplots of the 훽2010 sample values for selected features across industry groups.

50 2010 AI 2010 Biotech 2010 Finance

masters

age was previous founder num companies affiliated

Executive IPO

0 1 2 3 -2 0 2 4 -1 0 1

2010 Hardware 2010 Healthcare 2010 Software had competitor info bachelors competitors acquisition competitors F competitors D age competitors A high school num companies affiliated age education similar to company max acquisition proportion was previous founder Advisory IPO Executive Acquisition max ipo proportion Executive IPO Job Acquisition

-1 0 1 -5 0 5 -5 0 5 10

Figure 6-4: Boxplots of the 훾 sample values for selected features across industry groups.

vious exits of a company’s leaders affects it’s drift and hence success probability. For example, the executive acquisition feature is positive with statistical significance for the Software sector while it is negative with statistical significance for the Health sector, thus indicating that in the health sector, previous acquisitions among the executives is actually negatively correlated with company success, while in the Software sector, it is positively correlated with success. Furthermore, we see that the advisory ipo feature is positively cor- related with drift in the Hardware sector but negatively correlated with drift in the BioTech sector, indicating that having advisors with previous exits also influences success probabil- ity differently between sectors. We can also see some similarities between companies in different sectors. For exam- ple, having highly educated founders is positively correlated with drift in both the Finance and Health sectors as seen by the highest degree masters feature in Finance and highest degree phd in Health. Additionally, having a high proportion of competitors that have gone public is positively correlated with drift in both the AI and Software sectors as seen by the competitors ipo feature.

51 Table 6.4: Statistics of the posterior distribution of 휈 and 휏 across sectors. The posterior mean is shown with the fifth and ninety-fifth quantiles in parentheses. Sector 휈 휏 AI 9.21 (2.44, 14.50) 1826.29 (0.044, 11690.583) BioTech 9.11 (2.48, 14.20) 1488.98 (0.034, 10715.03) Hardware 8.66 (2.98, 14.28) 1322.41 (0.038, 9252.23) Healthcare 5.61 (4.95, 6.24) 2.02 (1.21, 3.26) Software 4.03 (3.96, 4.12) 3.21 (2.68, 3.82)

From the boxplots of the 훾 values, we see that the Finance and Hardware sectors had no statistically significant features and the AI sector only had one statistically significant feature, indicating that while the features we chose are predictive of the drift values of the companies in these sectors, they are not significantly predictive of the diffusion values. However, one notable observation is that for the executive ipo feature, we see a notable difference in the 훾 values between sectors. Namely, the executive ipo feature is positively correlated with diffusion and hence exit probability in the BioTech sector and negatively correlated in the Software sector. We also see a similarity between the BioTech and Health- care sectors in that the num companies affiliated feature is negatively correlated with dif- fusion in both, thus indicating that having founders who have worked at many companies before starting their company is negatively correlated with exit probability. Lastly, we examine the 휈 and 휏 parameters in order to see how the time dynamics of companies vary between sectors. Table 6.4 shows statistics for the posterior distribution of these parameters. From this table, we can see notable differences in the lifetimes of com- panies in different sectors. Specifically, the high values of 휈 and 휏 for the AI, BioTech, and Hardware sectors show that companies in these sectors tend to have much longer growth phases and have potential to last for many years without exiting. Conversely, the Health- care and Software industries have much shorter growth phases and will likely fail if they don’t exit within the first 휈 + 휏 ≈ 7 years of their lifetime.

52 Chapter 7

Picking Winners Portfolio

We now present the picking winners portfolio framework that we will use to evaluate the performance of our model for evaluating the quality of startup companies. We describe the framework in general and provide theoretical analysis of the portfolio construction al- gorithm. We also compare our framework to the well known log-optimal portfolio frame- work.

7.1 Objective Function

푚 Consider a set of events ℰ = {퐸푖}푖=1 defined on a probability space (Ω, ℱ, P), with sample space Ω, 휎-algebra ℱ, and probability measure P. Each event 퐸푖 corresponds to item 푖 winning. The aim is to select a subset 풮 ⊆ [푚] = {1, . . . , 푚} of size 푘 such that we (︀⋃︀ )︀ maximize the probability that at least one item in 풮 wins, which is given by P 푖∈풮 퐸푖 . (︀⋃︀ )︀ For notational convenience, we denote 푈(풮) = P 푖∈풮 퐸푖 . The optimization problem we want to solve is then

max 푈(풮). (7.1.1) 풮⊆[푚],|풮|=푘

In general, solving this optimization problem can be difficult. In fact, we have the following result.

Theorem 7.1.1 Maximizing 푈(풮) is NP-hard.

53 Despite the computational complexity in maximizing 푈(풮), the objective function pos- sesses some nice properties which we can use to obtain good solutions efficiently, which are given by the following well-known result.

Lemma 1 The function 푈(풮) is non-negative, non-decreasing and submodular.

Monotonically non-decreasing submodular functions have an important property which is that one can bound the sub-optimality of a greedy solution. Such a solution is obtained by sequentially adding elements to the set 풮 such that each new addition results in the largest

marginal increase in the objective function. Formally, let 풮퐺 = (푓1, 푓2, ..., 푓푘) be the greedy

solution of the optimization problem. The greedy solution is an ordered set, with item 푓푖 푖 being added on the 푖th step of the greedy optimization. Let 푆퐺 = (푓1, 푓2, ..., 푓푖) be the 0 solution obtained after the 푖th item is added to 푆퐺, with 풮퐺 = ∅. Then the 푖th item added to 푆퐺 is given by

(︁ 푖−1 ⋃︁ )︁ (︀ 푖−1)︀ 푓푖 = arg max 푈 푆 푓 − 푈 푆 , 1 ≤ 푖 ≤ 푘. (7.1.2) 푖−1 퐺 퐺 푓∈[푚]/푆퐺

For submodular functions, we have the following well known result.

Theorem 7.1.2 (Nemhauser et al. (1978)) Let 푈(풮) be a non-decreasing submodular func- tion. Let 풮퐺 be the greedy maximizer of 푈(풮) and let 풮푊 be the maximizer of 푈(풮) . Then

푈(풮 ) 퐺 ≥ 1 − 푒−1. (7.1.3) 푈(풮푊 )

Therefore, greedy solutions have performance guarantees for submodular maximization problems. In addition, many times they can be computed very easily. For these reasons, we will use a greedy approach for our problem.

7.2 Independent Events

We first consider the simplest case where all 퐸푖 are independent. Let 푝푖 = P (퐸푖) and 푐 let 퐸푖 denote the complement of 퐸푖. Because of the independence of the events, we can

54 rewrite the objective function as

(︃ )︃ ⋂︁ 푐 푈(풮) = 1 − P 퐸푖 푖∈풮 ∏︁ = 1 − (1 − 푝푖). (7.2.1) 푖∈풮

The optimization problem can then be written as

∏︁ max 푈(풮) = min (1 − 푝푖). (7.2.2) 풮⊆[푚],|풮|=푘 풮⊆[푚],|풮|=푘 푖∈푆

The form of the objective function for independent events leads to a simple solution, given by the following theorem.

Theorem 7.2.1 For a set of independent events ℰ = {퐸1, 퐸2, ..., 퐸푚} let 푝푖 = P(퐸푖) and without loss of generality assume that 푝1 ≥ 푝2 ≥ ... ≥ 푝푚. Let the 푘 element subset 풮푊 ⊆

[푚] maximize 푈(풮). Then 풮푊 consists of the indices with the largest 푝푖, i.e. 풮푊 = [푘].

The proof of this result is obvious from the form of 푈(풮) in equation (7.2.1). From Theo- rem 7.2.1 we see that if the events are independent, then one simply chooses the most likely events in order to maximize the probability that at least one event occurs. This simple so- lution relies crucially upon the independence of the events. Another useful property when the events are independent concerns the greedy solution.

Lemma 2 For a set of independent events ℰ = {퐸1, 퐸2, ..., 퐸푚}, let the 푘 element greedy maximizer of 푈(풮) be 풮퐺. Then 풮푊 = 풮퐺.

In this case we have that the greedy solution is also an optimal solution. Indeed, the inde- pendent events case is almost trivial to solve. This also suggests when the events have low dependence, greedy solutions should perform well.

55 7.3 Picking Winners and Log-Optimal Portfolios

We now study the relationship between the picking winners portfolio and another well known portfolio known as a log-optimal portfolio. To do this, we consider a model where the items have a top-heavy payoff structure, similar to the picking winners problem. We will let the random variable 푋푖 denote the return of the item associated with index 푖, and we assume there are 푚 possible items. In particular, we will consider the case where there are two possible values for the returns 푎 and 푏 such that 0 < 푎 < 푏, and we will let 푋푖(휔) = 푏

for 휔 ∈ 퐸푖, and 푋푖(휔) = 푎 otherwise. This can approximately model the returns of a startup company which are huge if it exits, otherwise are negligible. Because it can be hard to predict the exact returns when a company exits, one can assume that they all have the same average value 푏 given no other information. In this way, the only factor differentiating companies is their probability of exiting.

There are many different ways to construct a portfolio. The traditional Markowitz ap- proach maximizes the mean portfolio return subject to an upper bound on its variance (Markowitz 1952). These portfolios are parametric because one must select the upper bound on the variance. Another approach is the log-optimal portfolio, where one constructs a portfolio which maximizes the expected natural logarithm of the portfolio return (Cover 1991). In our analysis, we assume that investment decisions are binary for each item, i.e. an item is either in the portfolio or it is not. We do not allow for continuous investment amounts for the items. Under this setting, a portfolio is a subset of [푚], as in the picking ∑︀ winners problem. We define the return of a portfolio 풮 as 푋푖/|풮|. The corresponding 푖∈풮 expected log return is

[︃ (︃ )︃]︃ ∑︁ 푉 (풮) = E ln 푋푖/|풮| . (7.3.1) 푖∈풮

We constrain the portfolio to contain 푘 elements, as in the picking winners problem. Then the log-optimal portfolio is defined by the following optimization problem:

max 푉 (풮). (7.3.2) 풮⊆[푚],|풮|=푘

56 There are no arbitrary parameters in the log-optimal portfolio formulation, unlike the Markowitz portfolio. This allows us to directly compare the log-optimal and picking winners portfo- 푚 lios. Once again, we begin by considering the simple case where the events {퐸푖}푖=1 are independent. We have the following result:

Theorem 7.3.1 For a set of independent events ℰ = {퐸1, 퐸2, ..., 퐸푚} let 푝푖 = P(퐸푖)

and without loss of generality assume that 푝1 ≥ 푝2 ≥ ... ≥ 푝푚. Let the 푘 element subset

풮퐿 ⊂ [푚] maximize 푉 (풮). Then 풮퐿 consists of the indices with the largest 푝푖, i.e. 풮퐿 = [푘].

Comparing the above result and Theorem 7.2.1, we see that when the events are indepen- dent the problem of determining a log-optimal portfolio reduces to the problem of finding a greedy solution to the picking winners problem. The equivalence only relies on the as- sumptions that 0 < 푎 < 푏 and independence. In particular, we make no assumptions about the value of the marginal probabilities P(퐸푖). While the independence assumption results in a clean equivalence between the log- optimal and picking winners portfolios, in reality this assumption will be violated. How- ever, we are interested in examples where the probability of a particular item winning is small. In this setting, we can quantify how much the two portfolio types deviate from each other.

Theorem 7.3.2 Let 풮퐿 denote a log-optimal portfolio, i.e. it is optimal for (7.3.2), and let 풮푊 denote a portfolio that is optimal for the picking winners problem. Let 풢푙 denote the collection of all subsets of [푚] with cardinality 푙. Suppose there exists 휆 ∈ [0, 1] and (︀ 1 ]︀ 푝 ∈ 0, 푘 such that for all 푙 ∈ [푘], and for all 푇 ∈ 풢푙 the following holds

⎛(︃ )︃ ⎛ ⎞⎞ 푙 푘−푙 ⋂︁ ⋂︁ ⋂︁ 푐 푙 푘−푙 (1 − 휆)푝 (1 − 푝) ≤ P ⎝ 퐸 ⎝ 퐹 ⎠⎠ ≤ (1 + 휆)푝 (1 − 푝) . (7.3.3) 퐸∈푇 퐹 ∈풮∖푇

Then it follows that

푈(푆 ) − 푈(푆 ) 2휁(3)휆푘푝(1 − 푝) 푊 퐿 ≤ . 푈(푆 ) (︀ 푏−푎 )︀ 푘 푊 ln 1 + 푘푎 (1 − 휆)(1 − (1 − 푝) )

푠 ∑︀ 1 where 휁(푠) is the Riemann zeta function evaluated at 푠, i.e. 휁(푠) = 푛푠 . 푛=1

57 We will begin by interpreting the assumptions for Theorem 7.3.2. We begin by consid- ering (7.3.3). This assumption effectively places restrictions on the dependency between events. To gain some intution, consider the case where all of the events have the same prob-

ability, and let 푝푖 = 푝 for all 푖 ∈ [푚]. Then it is clear that 휆 is a measure of the dependency between events. In particular, when the events are independent we can choose 휆 = 0, and as the events become more correlated we expect that we will have to choose a larger 휆 for equation (7.3.3) to hold. Additionally, Theorem 7.3.2 does not require the probability of all (︀ 1 ]︀ the events to be the same, it just requires there to be some 푝 ∈ 0, 푘 and some 휆 ∈ [0, 1]

where (7.3.3) holds. If the items have very different values for 푝푖, then Theorem 7.3.2 might produce a very weak bound. If all the events of interest have an equal probability of (︀ 1 ]︀ winning, and once again we we let let 푝 = 푝푖 for all 푖 ∈ [푚], then the assumption 푝 ∈ 0, 푘 implies that the expected number of winning items for any portfolio of size 푘 is one or less. For our applications of interest, the winning probabilities are somewhat small and thus this seems like a reasonable assumption. The main insight of Theorem 7.3.2 is that for our particular applications of interest, a log-optimal portfolio must also be a good solution to the picking winners problem. Throughout this discussion, suppose that the conditions in the theorem hold. As an ex- ample, assume that 휆 = 0.5, 푘 = 10, and 푝 = 0.01. This assumes that the probability of an individual item winning is low and allows from some correlation. For the possible values of the returns, let us assume 푏 = $109 and 푎 = $1 (these are the approximate values for a very successful exit, such as for Facebook). Then applying Theorem 7.3.2 we have that the percent difference in the objective function (probability of at least one winner) for a win- ning and log-optimal portfolio is less than 27%. This example illustrates that under these conditions, which are relevant for investing in startup companies, the optimal portfolio for the picking winners problem will be similar in performance to a log-optimal portfolio.

7.4 Portfolio Construction for Brownian Motion Model

We now discuss portfolio construction for our Brownian motion model. Define 퐸푖 as the event that company 푖 reaches an exit sometime in the future. We wish to build a portfolio

58 of 푘 companies to maximize the probability that at least one company exits. To do this, we must be able to evaluate the exit probability. In our models this can easily be done. (︀⋃︀ )︀ Recall that for a set of companies 풮, we define 푈(풮) = P 푖∈풮 퐸푖 . We now show how to calculate the exit probabilities.

If we assume that the performance of the companies is independent, then all we need

to calculate is 푝푖 = P(퐸푖) for each company 푖 ∈ 풮 to construct the portfolio. Recall that this is the probability that the company’s Brownian motion hits level 7∆ in finite time. We

assume we are given 휇푖(푡) and 휎푖(푡) for company 푖. Then, from (4.1.2) the exit probability is

푝푖 = lim 퐹0 (푡; 0, 휇푖(푡), 휎푖(푡), 7∆) . (7.4.1) 푡→∞

Additionally, recall that we are uncertain in the future values of 훽푦. This causes uncertainty in the drift which results in the companies no longer being independent. To see why this is the case, assume we are given the parameter vector 훽푦−1. This will be the situation when we train on past data up to year 푦 − 1 and try to build a portfolio for companies founded in year 푦. Therefore, we need to know 훽푦 = [훽푦1, 훽푦2, ...훽푦푀 ]. Recall that 훽푦푖 is normally distributed with mean 훽푦−1,푖 and standard deviation 훿푖. This results in random drift coefficients for all companies founded in year 푦. We do not know the exact value of the company drifts, but we do know that they will all be affected the same way by the actual realization of 훽푦. This is how the uncertainty creates a positive correlation between the companies. To calculate 푈(풮), we simply average over the uncertainty in 훽푦. Using equations (7.2.1) and (7.4.1) we have

[︃ (︃ ⃒ )︃]︃ ⋃︁ ⃒ 푈(풮) = E P 퐸 ⃒훽 훽푦 푖⃒ 푦 푖∈풮 ⃒ [︃ ]︃ ∏︁ = 1 − E훽푦 (1 − 푝푖) 푖∈풮 [︃ ]︃ ∏︁ = 1 − E훽 (1 − lim 퐹0 (푡; 휇푖(푡), 휎푖(푡), 7∆)) . (7.4.2) 푦 푡→∞ 푖∈풮

59 Because 훽푦 is jointly normal and it is simple to evaluate 퐹0(·), this expectation can be easily done using Monte Carlo integration. For our model, because the number of startup companies that we are choosing from is quite large, we create our portfolio using the greedy approach. This is computationally feasible because in practice the portfolio will not be very large (typical venture capital firms manage no more than a few hundred companies) and it is easy to evaluate 푈(풮) for a set 풮. We consider the performance of portfolios built using our model in Section 8.

60 Chapter 8

Model Validation with Picking Winners Portfolios

While we tested our models for many different years, we report the results for the 2011 and 2012 testing sets. We use these years because it gives a sufficient amount of time for the companies to achieve an exit (we collected the data to conduct this analysis in late 2016). For each test year, we define the companies with baseline year (earliest funding year) equal to that year as the test set, and all other companies used in model estimation as the training set. The models are estimated on companies with baseline year before the test year, as described in Section 5.5. The features used to calculate the exit probabilities of the test companies are built using only data that would have been available at the time of the companies’ earliest funding round. This way, we only use the data that would be available to an early investor. We use the greedy portfolio construction method from equation (7.1.2). We build port- folios under two assumptions, one that companies are dependent and the other that they are independent. The difference in the two assumptions is how we average over uncertainty in the drift parameters. For the dependent assumption, we do the averaging for all companies jointly, as shown in equation (7.4.2). For the independent assumption, we average equa- tion (7.4.1) over the uncertainty for each company individually, and then use the resulting exit probabilities to construct the portfolio. The averaging is done using Monte Carlo in- tegration. Specifically, for every 훽 sample, we generate a random draw from a normal

61 30 2011 30 2012 Homoskedatsic Correlated Homoskedatsic Correlated Heteroskedastic Correlated Heteroskedastic Correlated Robust Heteroskedastic Correlated Perfect Robust Heteroskedastic Correlated Perfect 25 Hierarchical Sector Correlated 25 Hierarchical Sector Correlated Ordered Logistic Ordered Logistic

20 20

15 15

Greylock Capital Sequoia Founders Capital Fund 10 10 Khosla Sequoia Ventures

Number of exits Number of exits Capital Founders 5 Fund a16z 5

Random KPCB Random KPCB Khosla Greylock 0 Ventures 0 Capital 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Portfolio size Portfolio size

Figure 8-1: Performance curve for different model portfolios constructed using companies with seed or series A funding in 2011 (left) and 2012 (right). Also shown are the perfor- mances of venture capital firms Greylock Partners (Greylock), Founders Fund (Founders), Kleiner Perkins Caufield & Byers (KPCB), Sequoia Capital (Sequoia), and Khosla Ven- tures (Khosla). For reference the performance curves for a random portfolio and a perfect portfolio are also shown.

distribution for each element of the 훽 vector for the testing year, with mean given by 훽2010 or 훽2011 (depending on the testing year) and standard deviation given by the corresponding element in 훿. We then average the relevant function over these random variates.

We also benchmark our model against an ordered logistic regression model using our features and labels corresponding to the largest round that a company achieved before the test year.

We construct portfolios for all four models detailed in Section 5 with a maximum of 30 companies, which is a reasonable size for a typical venture capital firm. To visualize the performance of our portfolios, we plot the portfolio size versus the number of companies that exit in the portfolio as shown in Figure 8-1 We refer to this as a portfolio performance curve. It is similar to a receiver operating characteristic (ROC) curve used in machine learning to evaluate the performance of binary classifiers.

We consider five different types of portfolios. Ordered logistic regression is our bench- mark model as discussed earlier. Independent and correlated correspond to our Brownian motion model with the different dependence assumptions. Perfect corresponds to the port- folio where every company exits. Random corresponds to portfolios where the number of

62 Table 8.1: The top companies in our 2011 portfolio constructed with greedy optimization and the robust heteroskedastic correlated model. Shown are the company name, the highest funding round achieved, the predicted exit probability, and the greedy objective value (probability of at least one exit) when the company is added rounded to the nearest hundredth. Companies which exited are highlighted in bold. 2011 Company Highest funding round Exit probability Objective value Funzio Acquisition 0.54 0.54 Longboard Media Acquisition 0.43 0.74 AirTouch A 0.49 0.86 Communications Sequent B 0.47 0.92 Bitzer Mobile Acquisition 0.47 0.95 SHIFT Acquisition 0.44 0.96 Livestar Acquisition 0.38 0.97 Jybe Acquisition 0.4 0.98 Whitetruffle Seed 0.4 0.99 Adteractive A 0.27 0.99 exits is the portfolio size multiplied by the fraction of exits in the test set. This corresponds roughly to picking a portfolio randomly. We also show the performance points for top venture capital firms in these years. The coordinates of these points are the number of test companies that the venture capital firm invests in and the number of these companies that exit before 2017.

For our best performing model, the robust heteroskedastic model, we see that we obtain six exits with a ten company portfolio in 2011, giving an exit rate of 60%. This is higher than the top venture capital firms shown in Figure 8-1. The robust heteroskedastic model is also the dominant model for 2012. For this model, we list the companies that the model chose for the test years 2011 and 2012 in Tables 8.1 and 8.2. Tables showing the top ten companies in both the independent and correlated portfolios for all four models can be found in Appendix G.

Another notable result we found was that in the portfolios selected by the hierarchical model, there are significant differences in the distribution of sectors in the independent and correlated portfolios. Shown in Figure 8-2 are pie graphs of the distribution of sectors in each of the portfolios. Note that the correlated portfolio contains companies from a variety of different sectors while the independent portfolio contains almost entirely AI companies.

63 Table 8.2: The top companies in our 2012 portfolio constructed with greedy optimization and the robust heteroskedastic correlated model. Shown are the company name, the highest funding round achieved, the predicted exit probability, and the greedy objective value (probability of at least one exit) when the company is added rounded to the nearest hundredth. 2012 Company Highest funding round Exit probability Objective value Struq Acquisition 0.76 0.76 Metaresolver Acquisition 0.73 0.93 Trv C 0.67 0.97 Glossi Inc Seed 0.75 0.99 Boomerang Seed 0.71 0.99 Alicanto A 0.3 1.0 SnappyTV Acquisition 0.25 1.0 AVOS Systems A 0.17 1.0 Adept Cloud Acquisition 0.17 1.0 Runa Acquisition 0.21 1.0

Correlated Portfolio Independent Portfolio 20 companies 20 companies

Software 1 exit AI 4 companies 4 exits 6 companies AI None 4 exits 1 exit 20 companies 3 companies BioTech 0 exits 1 company Hardware 0 exits 1 companies Finance 0 exits 5 companies

Figure 8-2: A pie graph showing the distribution of companies for both the independent and correlated portfolios for the hierarchical model model.

This is due to the fact that the correlated portfolio chooses companies by jointly averag- ing over all samples of the betas and aims to minimize pairwise correlation between the companies, hence creating a more diverse portfolio. Conversely, the independent portfolio simply uses the mean of the beta samples, making it possible that it’s selected point in the solution space that heavily favors a single industry.

64 Chapter 9

An Online Platform for Venture Capitalists

In order to make our model and startup company analysis available to the public, we devel- oped a web platform which gives users access to both our model estimation and portfolio selection results. The target audience of the web platform is venture capital investors. However, it is accessible to anyone who is curious about startups and what influences their success. The platform is available at www.pickingwinners.org. For the results displayed on the website, we used the parameter estimation for the robust heteroskedastic model trained with observation year 2016 (we collected the data at the end of 2016). For details on the models see Section 5.3 and for details on how the observation year is used in estimation see Section 5.5. The platform has three primary features. The first feature is a company search and analysis feature which allows users to access all companies we used in this research and view information about the company such as which of its features are anomalous, what the model estimated the various 훽 (mean) and 훾 (variance) coefficients to be, and what the funding round time series of the company looks like. This feature has applications in allowing venture capital investors to understand which factors are most indicative of the success of a company from our data modeling perspective. The second feature is a portfolio building tool which allows investors to use our model predictions and portfolio building approach to build and evaluate portfolios of their own. This feature has applications in

65 Figure 9-1: The advanced company search feature on the picking winners website.

both allowing investors to prioritize companies to talk to and in training new investors in evaluating startup portfolios. Lastly, we provide a user profile feature which allows investors to keep track of companies they are interested in and portfolios they have created.

9.1 Company Search and Analysis

We released a database of all companies studied in our research and provide two ways for users to search through it. The first method is by simply typing in the name of the com- pany. The second is through an advanced search feature where users can select attributes of industry, highest funding round achieved, year founded, and state located, and receive a list of all companies in the database that match the selected attributes. The feature is shown in Figure 9-1. When a user selects a company through either of the search features, they are redirected to a page about the company. The page contains basic information about the company such as the industries, baseline year (year of first funding), and a description. It also displays a graph of the funding rounds of the company in addition to the mean exit probability of the company as calculated by the model (shown as the “Score" field). Lastly, the page shows a table of z-scores and the sample mean of the 훽 (mean coefficient) and 훾 (variance coefficient) values. We show these values for anomalous features of the company, which

66 Figure 9-2: An example of a company information page on the picking winners website. we define as having a z-score greater than 2 or less than −2. These values give an investor insight into what features for a particular company are both different from most other com- panies and also most indicative of company success according to our model. An example of a company page is shown in Figure 9-2 for the company Facebook.

9.2 Portfolio Building

We provided three tools through which users can use the estimation results of our model to build venture capital portfolios. The first is a portfolio from categories tool which is similar to the company advanced search feature in that users can select attributes from the four categories industry, highest funding round achieved, year founded, and state located.

67 Figure 9-3: The portfolio building page with access to the portfolio from categories, port- folio from names, and portfolio from csv upload tools on the picking winners website.

However, this time the user also enters a portfolio size and is given an optimal portfolio chosen from the set of companies matching the selected attributes using our correlated portfolio selection algorithm detailed in Section 8. The second is a portfolio from names tool which allows users to enter the names of companies they would like to be in a portfolio and then have the model return the list of companies ordered by their mean exit probability. The last tool is a portfolio from csv upload tool which allows users to upload a comma- separated values (csv) file containing the names of companies they would like to be in a portfolio. From there, the model will return a list of the companies ordered by their mean exit probability. The reason we included the portfolio from csv upload tool in addition to the portfolio from names tool is a matter of user convenience. We found that some users prefer to manually type in names for short lists of companies while others prefer to use csv files for cases when they want to analyze a large number of companies. A picture of the portfolio building page is shown in Figure 9-3. Note that the portfolio from names tool can be accessed by clicking “Select Companies" and the portfolio from csv upload tool can be accessed by clicking “Upload a CSV".

We have found that our portfolio building tools have a number of applications beyond building venture capital portfolios. For example, one common problem in the venture world

68 is narrowing down a very large list of companies into a more manageable number to have in-person meetings with. Our portfolio from csv upload tool allows investors to upload a list of such companies and then get a prioritized list of companies ordered by how likely they are to succeed. Furthermore, our portfolio selection tools can be used as training tools for new investors. With our platform, new investors have the ability to compare some of their chosen portfolios against those chosen by the model in addition to having the model evaluate the quality of some of their chosen portfolios by ordering the companies they chose based on the model’s predicted mean exit probability.

9.3 Personalization: Saving Companies and Portfolios

In order to use the features of the website, users must register accounts in which they specify their name, email, and organization which they are affiliated with. From there, we provide personalization features to users by allowing them to both save companies they are interested in looking at and save portfolios of interest. Through talking with investors, we have found that a common problem in the venture world is that there are not many online tools that allow investors to keep track of the hun- dreds or thousands of companies that they are potentially interested in. There are also not many tools that allow venture capital investors to easily keep track of their investment portfolios, whether they be potential portfolios they are considering or portfolios they have actually invested in. The goal of this feature is to provide both tools integrated with the company search and portfolio building tools. When a user views the company page for a given company, they can click the “Favorite" button which is displayed in the upper right in Figure 9-2. Clicking this button allows the user to optionally write notes about the company before saving it to the user’s profile. Similarly, when a user makes a portfolio using any of the portfolio construction methods detailed in Section 9.2, they can click the “Save Portfolio" button which is displayed above the portfolio. Clicking this button requires the user to name the portfolio and optionally write notes about it before saving it to the user’s profile. Each user has a profile page which displays their name, email, and organization. Addi-

69 tionally, it displays both a list of the companies the user has favorited and a list of portfolios the user has saved which link to the saved companies and portfolios respectively.

70 Chapter 10

Conclusion

We have presented a Bayesian modeling framework for evaluating the quality of startup companies using online data. We first analyzed features constructed from our dataset to gain an empirical understanding of which factors influence company success. Then, we in- troduced a Brownian motion model for the evolution of a startup company’s funding rounds to calculate the relevant probabilities of winning, or exiting in the case of startup compa- nies. From there, we developed four variants of the Brownian motion model using either different likelihood or prior distribution specifications. We estimated the four models on data for thousands of companies. Because we focused on a Bayesian modeling framework, we were able to use our parameter estimations to determine which features are associated with successful companies with statistical significance. Furthermore, our use of hierarchi- cal Bayesian modeling to estimate a sector-specific model allowed us to gain insight into how the factors of success differ for companies in different industries. In addition to our framework for evaluating the quality of startups, we have presented a new framework for constructing portfolios of items which either have huge returns or very low returns. We formulated the problem as one of maximizing the probability at least one item achieving a large return, or winning. The submodularity of this probability allowed us to efficiently construct portfolios using a greedy approach. We applied our portfolio construction method to investing in startup companies, which have this top-heavy return structure. To calculate the relevant probabilities of winning, we used our estimated Brownian motion model. We constructed portfolios with our approach which outperform

71 top venture capital firms. The results show that our modeling and portfolio construction method are effective and provide a quantitative methodology for venture capital investment. Lastly, we built a web platform to make the results of our work publicly available to venture capital investors. Our platform provides a company search feature, a portfolio building feature, and a company tracking and portfolio saving feature. The platform has applications in helping venture capital investors understand which features about a com- pany are anomalous and indicative of success from a data perspective, prioritize companies to look at, and also in training new investors to make better portfolios. There are several future steps for this work. One concerns the manner in which com- pany exits are treated. Acquisitions and IPOs are equivalent in our model, but in reality IPOs are much more rare and can result in a much higher payoff. Our model could be mod- ified to distinguish between IPOs and acquisitions, or it could incorporate the monetary value of the IPO or acquisition. For IPOs this data is public, but acquisition data is generally private and may be difficult to obtain. Another direction concerns the features used in our model. While we included many relevant features, there could be other useful features which have predictive power. For instance, the value of different macro-economic indicators in the founding year of a com- pany may have impact on its exit probability. In addition to the the types of features used, one could also explore the way in which they are incorporated into the model. We used a simple linear mapping between the features and the drift and diffusion. However, more complex models which use non-linear mappings may improve performance.

72 Appendix A

Proofs

A.1 Proof of Theorem 7.1.1

We show that maximizing 푈(풮) is NP-hard by reducing it to the maximum coverage prob- lem. In the maximum coverage problem one is given a set 풰 of 푛 elements and a collection 푁 ⋃︀ ℰ = {퐸푖}푖=1 of 푁 subsets of 풰 such that 퐸∈ℰ 퐸 = 풰. The goal is to select 푀 sets from ℰ such that their union has maximum cardinality. This is known to be an NP-hard optimization problem. To show that this is an instance of maximizing 푈(풮) we assume that the sample space Ω is countable and finite with 푅 elements. We also assume that each element 휔 ∈ Ω has equal probability, i.e. P(휔) = 푅−1. Let ℱ be the 휎-algebra of Ω. For −1 ⋃︀ any set 풮 ∈ ℱ, we can write 푈(풮) = 푅 | 휔∈풮 휔|. Then we have

⃒ ⃒ ⃒ ⋃︁ ⃒ max 푈(풮) = max 푅−1 ⃒ 휔⃒ 풮⊆ℰ,|풮|=푀 풮⊆ℰ,|풮|=푀 ⃒ ⃒ ⃒휔∈풮 ⃒ ⃒ ⃒ ⃒ ⋃︁ ⃒ = max ⃒ 휔⃒ . 풮⊆ℰ,|풮|=푀 ⃒ ⃒ ⃒휔∈풮 ⃒

Therefore, maximizing 푈(풮) is equivalent to the maximum coverage problem.

73 A.2 Proof of Lemma 1

The function 푈(풮) is non-negative and non-decreasing because it is the probability of a set of events. We must show that it is also submodular. A submodular function 푓 satisfies

(︁ ⋃︁ )︁ (︁ ⋃︁ )︁ 푓 풮 푣 − 푓 (풮) ≥ 푓 풯 푣 − 푓 (풯 ) (A.2.1) for all elements 푣 and pairs of sets 풮 and 풯 such that 풮 ⊆ 풯 . We show that the function 푈(풮) is submodular as follows. We let the 휎-algebra of the probability space be ℱ. Con- ⋃︀ ⋃︀ sider sets 풮, 풯 , 푣 ∈ ℱ such that 풮 ⊆ 풯 . We can write 푣 = 푣풮 푣풯 푣0 where we define ⋂︀ ⋂︀ ⋂︀ 푐 ⋂︀ 푐 푣풮 = 푣 풮, 푣풯 = 푣 풯 풮 , and 푣0 = 푣 풯 . Then we have

(︁ ⋃︁ )︁ 푈 풯 푣 − 푈 (풯 ) = P (푣0)

and

(︁ ⋃︁ )︁ (︁ ⋃︁ )︁ 푈 풮 푣 − 푈 (풮) = P 푣풯 푣0

≥ P (푣0) (︁ ⋃︁ )︁ ≥ 푈 풯 푣 − 푈 (풯 ) ,

thus showing that 푈(풮) satisfies the submodularity condition.

A.3 Proof of Theorem 7.3.1

We will begin by defining some notation. For 0 ≤ 푙 ≤ 푘, let 풢푙(풮) denote the collection of all subsets of 풮 with cardinality 푙. Additionally, for 0 ≤ 푞 ≤ 푘, we also define the events

74 푊푞(풮) and 푌푞(풮) as follows

⎛(︃ )︃ ⎛ ⎞⎞ ⋃︁ ⋂︁ ⋂︁ ⋂︁ 푐 푊푞(풮) = ⎝ 퐸푖 ⎝ 퐸푗 ⎠⎠

푇 ∈풢푞(풮) 푖∈푇 푗∈풮∖푇

푘 ⋃︁ 푌푞(풮) = 푊푡(풮). 푡=푞

In particular, 푊푞(풮) is the event where exactly 푞 of the event 퐸푖 for 푖 ∈ 풮 occur. Naturally,

푌푞(풮) is the event where 푞 or more of the events 퐸푖 for 푖 ∈ 풮 occur. Using this notation, when 풮 is of cardinality 푘 we get the following

푘 ∑︁ (︂푞푏 + (푘 − 푞)푎)︂ 푉 (풮) = P(푊 (풮)) ln 푞 푘 푞=0

푘−1 ∑︁ (︂푞푏 + (푘 − 푞)푎)︂ (︂푘푏)︂ = (P(푌 (풮)) − P(푌 (풮))) ln + P(푌 ) ln 푞 푞+1 푘 푘 푘 푞=0

푘 ∑︁ (︂ 푏 − 푎 )︂ = ln(푎) + P (푌 (풮)) ln 1 + . (A.3.1) 푞 (푞 − 1)푏 + (푘 − 푞 + 1)푎 푞=1

The second line above follows from the definition of 푌푞(풮) and 푊푞(풮), and the third line follows from the basic properites of a logarithm. We now will conclude by showing that 풮 = [푘] maximizes every term in the sum given in (A.3.1), and thus it is maximizes 푉 (풮). Now, because 푏 > 푎 by assumption, we have that

(︂ 푏 − 푎 )︂ ln 1 + > 0. (푞 − 1)푏 + (푘 − 푞 + 1)푎

for all 푞 ∈ [푘]. Due to the positivity of this value, maximizing every term in the sum given

in (A.3.1) reduces to the problem of maximizing P (푌푞(풮)). Finally, we show that that

풮 = [푘] maximizes P(푌푞(풮)) for all 푞 ∈ [푘]. If [푘] is a maximizer for P(푌푞(풮)), then

75 we are done. We define 푆퐿 = [푘] and for purpose of contradiction, we suppose that there † † † exists an 풮 of cardinality 푘 such that 풮 ̸= 푆퐿, and P(푌푞(풮 )) > P(푌푞(푆퐿)). Therefore, † † we know there exists a 푟 ∈ 풮 and a 푡 ∈ 풮퐿 such that 푟∈ / 풮퐿, 푡∈ / 풮 , and 푝푡 ≥ 푝푟, † otherwise we would have 풮 = 푆퐿. Then from the definition of 푌푞(풮), we have that

† (︀ † )︀ (︀ † 푐)︀ P(푌푞(풮 )) = 푝푟P 푌푞(풮 ) | 퐸푟 + (1 − 푝푟)P 푌푞(풮 ) | 퐸푟 (︀ † 푐)︀ (︀ (︀ † )︀ (︀ † 푐)︀)︀ = P 푌푞(풮 ) | 퐸푟 + 푝푟 P 푌푞(풮 ) | 퐸푟 − P 푌푞(풮 ) | 퐸푟 † (︀ (︀ † )︀ (︀ † )︀)︀ = P(푌푞(풮 ∖ {푟})) + 푝푟 P 푌푞−1(풮 ∖ {푟}) − P 푌푞(풮 ∖ {푟}) .

By a similar logic, we have that

† † (︀ (︀ † )︀ (︀ † )︀)︀ P(푌푞(풮 ∪ {푡} ∖ {푟})) = P(푌푞(풮 ∖ {푟})) + 푝푡 P 푌푞−1(풮 ∖ {푟}) − P 푌푞(풮 ∖ {푟}) .

† † † † Finally, 푌푞−1(풮 ∖ {푟}) ⊇ 푌푞(풮 ∖ {푟}) and thus it follows that P(푌푞(풮 )) ≤ P(푌푞(풮 ∪ † {푡} ∖ {푟})). By repeating this argument for all elements of 풮 that are not in 푆퐿, we can † arrive at the conclusion P(푌푞(풮 )) ≤ P(푌푞(푆퐿)). However, this is a contradiction, and thus 푆퐿 is a maximizer for P(푌푞(풮)).

A.4 Proof of Theorem 7.3.2

We will begin in a way that is very similar to the proof in section A.3. In particular, we begin by defining 푊푞(풮) and 푌푞(풮) for 푙 ∈ [푘] as follows

⎛(︃ )︃ ⎛ ⎞⎞ ⋃︁ ⋂︁ ⋂︁ ⋂︁ 푐 푊푞(풮) = ⎝ 퐸푖 ⎝ 퐸푗 ⎠⎠

푇 ∈풢푞(풮) 푖∈푇 푗∈풮∖푇

푘 ⋃︁ 푌푞(풮) = 푊푡(풮). 푡=푞

Now, let 푆푘 denote a binomial random variable with parameters 푘 and 푝. Then, from the assumptions given in the statement of the theorem, we have that for the log-optimal

76 portfolio 푆퐿,

푘 ⎛(︃ )︃ ⎛ ⎞⎞ ∑︁ ∑︁ ⋂︁ ⋂︁ ⋂︁ 푐 P (푌푗 (풮퐿)) = P ⎝ 퐸 ⎝ 퐹 ⎠⎠

푖=푗 푇 ∈풢푖(풮퐿) 퐸∈푇 퐹 ∈풮퐿∖푇 푘 ∑︁ ∑︁ ≤ (1 + 휆) 푝푖(1 − 푝)푘−푖

푖=푗 푇 ∈풢푖(풮퐿) 푘 ∑︁ (︂푘)︂ = (1 + 휆) 푝푖(1 − 푝)푘−푖 푖 푖=푗

= (1 + 휆)P (푆푘 ≥ 푗) .

for all 푗 ∈ [푘]. By the same logic, we also get that for the picking winners portfolio 푆푊 ,

P (푌푗 (풮푊 )) ≥ (1 − 휆)P (푆푘 ≥ 푗) .

Then, using equation (A.3.1), the inequalities given above, and the fact that P(푌1(풮)) = 푈(풮), we obtain

(︂ 푏 − 푎)︂ 푉 (풮 ) − 푉 (풮 ) ≤ (푈(푆 ) − 푈(푆 )) ln 1 + 퐿 푊 퐿 푊 푘푎 푘 ∑︁ (︂ 푏 − 푎 )︂ +2휆 P(푆 ≥ 푗) ln 1 + . 푘 푘푎 + (푗 − 1)(푏 − 푎) 푗=2

77 Now, using Chebyshev’s inequality, the variance of a binomial random variable, and the above inequality we have that

(︂ 푏 − 푎)︂ 푉 (풮 ) − 푉 (풮 ) ≤ (푈(푆 ) − 푈(푆 )) ln 1 + 퐿 푊 퐿 푊 푘푎 푘 ∑︁ (︂ 푏 − 푎 )︂ + 2휆 P (|푆 − 푘푝| ≥ 푗 − 푘푝) ln 1 + 푛 푘푎 + (푗 − 1)(푏 − 푎) 푗=2

(︂ 푏 − 푎)︂ ≤ (푈(푆 ) − 푈(푆 )) ln 1 + 퐿 푊 푘푎 푘 ∑︁ 푘푝(1 − 푝) (︂ 푏 − 푎 )︂ + 2휆 ln 1 + . (푗 − 푘푝)2 푘푎 + (푗 − 1)(푏 − 푎) 푗=2

Now, recall that ln(1 + 푥) ≤ 푥 for 푥 > −1. Therefore

푘 (︂ 푏 − 푎)︂ ∑︁ 푘푝(1 − 푝) 1 푉 (풮 ) − 푉 (풮 ) ≤ (푈(푆 ) − 푈(푆 )) ln 1 + + 2휆 퐿 푊 퐿 푊 푘푎 (푗 − 푘푝)2 푗 − 1 푗=2

푘 (︂ 푏 − 푎)︂ ∑︁ 1 ≤ (푈(푆 ) − 푈(푆 )) ln 1 + + 2휆푘푝(1 − 푝) 퐿 푊 푘푎 (푗 − 1)3 푗=2

(︀ 1 ]︀ where for the second inequality we used the assumption 푝 ∈ 0, 푘 . Some further analysis gives

푘 (︂ 푏 − 푎)︂ ∑︁ 1 푉 (풮 ) − 푉 (풮 ) ≤ 푐 ln 1 + + 2휆푘푝(1 − 푝) 퐿 푊 푘푎 (푗 − 1)3 푗=2

∞ (︂ 푏 − 푎)︂ ∑︁ 1 ≤ (푈(푆 ) − 푈(푆 )) ln 1 + + 2휆푘푝(1 − 푝) 퐿 푊 푘푎 푗3 푗=1

(︂ 푏 − 푎)︂ ≤ (푈(푆 ) − 푈(푆 )) ln 1 + + 2휆푘푝(1 − 푝)휁(3). (A.4.1) 퐿 푊 푘푎

78 Now, because 푆퐿 is log-optimal, we have that 푉 (풮퐿)−푉 (풮푊 ) ≥ 0. Using this relationship we can rearrange equation (A.4.1) to obtain

(︂ 푏 − 푎)︂ (푈(푆 ) − 푈(푆 )) ln 1 + ≤ 2휆푘푝(1 − 푝)휁(3) 푊 퐿 푘푎

2휆푘푝(1 − 푝)휁(3) 푈(푆푊 ) − 푈(푆퐿) ≤ (︀ 푏−푎 )︀ . (A.4.2) ln 1 + 푘푎

We next lower bound 푈(푆푊 ) using the assumptions of Theormem 7.3.2 and basic proper- tiesof the binomial distribution to obtain

푈(푆푊 ) = P (푌푗 (풮푊 ))

= (1 − 휆)P (푆푘 ≥ 1) ≥ (1 − 휆)(1 − (1 − 푝)푘). (A.4.3)

Combining this lower bound with equation (A.4.2) we obtain our final result.

푈(푆 ) − 푈(푆 ) 2휆푘푝(1 − 푝)휁(3) 푊 퐿 ≤ . 푈(푆 ) (︀ 푏−푎 )︀ 푘 푊 ln 1 + 푘푎 (1 − 휆)(1 − (1 − 푝) )

79 80 Appendix B

Sector Names

The sector names that we used from the Crunchbase database are: 3d printing, advertising, analytics, animation, apps, artificial intelligence, automotive, autonomous vehicles, big data, bioinformatics, biotechnology, bitcoin, business intelligence, , com- puter, computer vision, dating, developer apis, e-commerce, e-learning, edtech, education, Facebook, fantasy sports, fashion, fintech, finance, financial services, fitness, gpu, hard- ware, health care, health diagnostics, hospital, insurance, internet, internet of things, iOS, lifestyle, logistics, machine learning, medical, medical device, messaging, mobile, nan- otechnology, network security, open source, personal health, pet, photo sharing, renewable energy, ride sharing, robotics, search engine, social media, social network, software, solar, sports, transportation, video games, virtual reality, and virtualization.

81 82 Appendix C

Top School Names

The schools used for the top school feature are: Berkeley, Brown, California Institute of Technology, Carnegie Mellon, Columbia, Cornell, Dartmouth, Duke, Harvard, Johns Hop- kins, Massachusetts Institute of Technology, Northwestern, Princeton, Stanford,University of Chicago, University of Pennsylvania, Wharton, and Yale.

83 84 Appendix D

Calculation of Company Variance

In Section 4.2, we said that the company variance is equal to 푔(훾푇 푥)2 for a function 푔(푧) which is used to calculate the company’s standard deviation. Our goal in designing this function was to make it so that a small value of 훾푇 푥 results in a low standard deviation and a high value of 훾푇 푥 results in a high standard deviation. Since the standard deviation also must be positive, we chose to pick the positive region of the hyperbola 푔(푧) with asymptotes at 푧 = 0 and 푔(푧) = 푧. This choice allows for the standard deviation to be on the same scale as the mean (i.e. on the order of 훾푇 푥). We also found that using an asymmetric function made the Bayesian estimation of the model converge much more easily. The exact function is detailed below. Note that all trigonometric functions use radians.

√︁ 2 2 2 휋 −퐵푧 + (퐵푧) − 4퐶(퐴푧 − tan 8 ) 푔(푧) = , where 2퐶 휋 휋 휋 퐴 = − tan2 cos + sin2 ≈ −2.776 * 10−17 8 8 8 휋 휋 휋 퐵 = 2 sin cos (−1 − tan2 ) ≈ −0.828 8 8 8 휋 휋 휋 퐶 = − tan2 sin2 + cos2 ≈ 0.828. 8 8 8

85 86 Appendix E

Sector Groupings for Hierarchical Model

Below are the names of the six sector groups we used for the hierarchical model formulated in Section 5.4 in addition to the industries from Section B that are in each of the groups. For each company, we took the set of all specific sector names associated with the com- pany and then placed it in the group with which it had the most matching sector names. Some companies did not have any industry information. For those companies, we created a seventh group called “None" when training the hierarchical model.

∙ AI: analytics, artificial intelligence, autonomous vehicles, big data, computer vision, machine learning ∙ BioTech: bioinformatics, biotechnology, nanotechnology ∙ Finance: business intelligence, fintech, finance, financial services, insurance ∙ Hardware: 3d printing, automotive, fantasy sports, fashion, gpu, hardware, pet, renewable energy, robotics, solar, sports, transportation, virtual reality ∙ Healthcare: health care, health diagnostics, hospital, medical, medical device, per- sonal health ∙ Software: advertising, animation, apps, bitcoin, cloud computing, computer, dating, developer apis, e-commerce, e-learning, edtech, education, facebook, fitness, inter- net, internet of things, lifestyle, logistics, messaging, mobile, network security, open source, photo sharing, ride sharing, search engine, social media, social network, software, video games, virtualization, ios

87 88 Appendix F

Details of the MCMC Sampler

We use a blocked Metropolis-within-Gibbs scheme to sample from the posterior distribu- tion of the model parameters. In Section F.1, we describe the details of the Metropolis step used to sample the parameters. In the following sections of this appendix, we describe how we sample the parameters of each of the four models described in Section 5. Note that in the following sections we use Θ to describe the set of model parameters and the notation

Θ−훼 to describe the set of model parameters excluding parameter 훼.

F.1 Details of the Metropolis Step

Let 훼푖 denote the 푖th sample of parameter block 훼. In each step of the random walk Metropolis-Hastings, the proposal for sample (푖 + 1) is drawn from a normal distribu-

2 tion with mean 훼푖 and variance 휖 . In the case where the parameter block 훼 is a vector, we use a multivariate normal proposal distribution with mean 훼푖 and diagonal covariance matrix equal to 휖2I. Graves (2011) found that tuning the step size of MCMC algorithms helps with keeping a high acceptance probability during the sampling. Therefore, we adjust the step size 휖 based on the acceptance rate of previous samples according to the schedule proposed by the default implementation of the PyMC3 framework (Salvatier et al. 2016) as follows:

1. Initialize the step size to 1.

89 2. Every 100 samples, do the following:

∙ If the acceptance rate is less than .05, multiply 휖 by .5

∙ If the acceptance rate is less than .2, multiply 휖 by .9

∙ If the acceptance rate is greater than .5, multiply 휖 by 1.1

∙ If the acceptance rate is greater than .75, multiply 휖 by 2

∙ If the acceptance rate is greater than .95, multiply 휖 by 10

∙ Else do not change 휖

F.2 The Homoskedastic Model Parameters

The set of model parameters in the homoskedastic model described in Section 5.1 is Θ ={훽푦 2 for all 푦 in our data, 휎0, 훿, 휈, 휏}. For this model, we set the hyperparameters to be 휇훽 = 0, 2 휎 = 100, 푎 2 = 3, 푏 2 = 1, 휆 = 1, 푎 = 0, 푏 = 50, 푎 = −10, and 푏 = 10. In 훽 휎0 휎0 0 휈 휈 휏 휏 general, we choose our hyperpriors to be flat and uninformative. For the priors on 휈 and 휏, we incorporate information from our data analysis in Section 3. Specifically, our data showed that no companies ever exit after 50 years, so we set the upper bound of 휈 to be 50. Also, from our data analysis, we can reasonably expect that company performance has a half life of less than 10 years, so we bound 휏 by -10 and 10. We note that for this model, the posterior cannot be written in closed form for any of the model parameters due to lack of conjugacy between the data likelihood P(T, X|Θ) and the parameter priors. Therefore, we sample each parameter block 훼 ∈ Θ using a random walk Metropolis step according to the rules Section F.1 with sampling distribution

푌 (︁ ∏︁ )︁ 2 P(훼|T, X, Θ−훼) ∝ P(T, X|Θ−훼) P(훽푦|훽푦−1) P(훽0)P(휎0)P(훿)P(휈)P(휏), (F.2.1) 푦=1

where the likelihood and priors are as described in Section 5.1. Here, the parameter blocks

2 in the Metropolis-within-Gibbs sampler are 훽푦 for all 푦, 휎0, 훿, 휈, and 휏.

90 F.3 The Heteroskedastic and Robust Heteroskedastic Model Parameters

The set of model parameters in both the heteroskedastic and robust heteroskedastic models

described in Sections 5.2 and 5.3 respectively is Θ ={훽푦 for all 푦 in our data, 훾, 훿, 휈, 휏}. 2 2 For this model, we set the hyperparameters to be 휇훽 = 0, 휎훽 = 100, 휇훾 = 0, 휎훾 = 100,

휆0 = 1, 푎휈 = 0, 푏휈 = 50, 푎휏 = −10, and 푏휏 = 10. In general, we choose our hyperpriors to be flat and uninformative. For the priors on 휈 and 휏, we incorporate information from our data analysis in Section 3. Specifically, our data showed that no companies ever exit after 50 years, so we set the upper bound of 휈 to be 50. Also, from our data analysis, we can reasonably expect that company performance has a half life of less than 10 years, so we bound 휏 by -10 and 10. We note that for these models, the posterior cannot be written in closed form for any of

the model parameters due to lack of conjugacy between the data likelihood P(T, X|Θ) and the parameter priors. Therefore, we sample each parameter block 훼 ∈ Θ using a random walk Metropolis step according to the rules Section F.1 with sampling distribution

푌 (︁ ∏︁ )︁ P(훼|T, X, Θ−훼) ∝ P(T, X|Θ−훼) P(훽푦|훽푦−1) P(훽0)P(훾)P(훿)P(휈)P(휏), (F.3.1) 푦=1

where the appropriate likelihood and priors are used for each model type described in Sec- tions 5.2 and 5.3. Here, the parameter blocks in the Metropolis-within-Gibbs sampler are

훽푦 for all 푦, 훾, 훿, 휈, and 휏.

F.4 The Hierarchical Model Parameters

푔 The set of model parameters in the hierarchical model described in Section 5.4 is Θ ={훽푦 푔 푔 푔 푔 ′ ′2 ′ ′2 ′ for all 푦 in our data, 훾 , 훿 , 휈 , 휏 , 휇훽, 휎훽 , 휇훾, 휎훾 , 휆0}. For this model, we set the hyper- 2 2 parameters to be 휇훽 = 0, 휎 = 1, 푎 2 = 3, 푏 2 = 1, 휇훾 = 0, 휎 = 1, 푎 2 = 3, 푏 2 = 1, 훽 휎훽 휎훽 훾 휎훾 휎훾

푎휆0 = 3, 푏휆0 = 1 푎휈 = 0, 푏휈 = 50, 푎휏 = −10, and 푏휏 = 10. We choose our hyperpriors to be stronger here than in the previous models in order to regularize and prevent overfit-

91 ting because the hierarchical model has many more parameters than the previous models. For the priors on 휈 and 휏, we incorporate information from our data analysis in Section 3. Specifically, our data showed that no companies ever exit after 50 years, so we set the upper bound of 휈 to be 50. Also, from our data analysis, we can reasonably expect that company performance has a half life of less than 10 years, so we bound 휏 by -10 and 10.

Note that for all 푔 ∈ 퐺, the set of all groups in the hierarchical model, the parameters

푔 푔 푔 푔 푔 훽푦 , 훾 , 훿 , 휈 , and 휏 have a posterior distribution that cannot be written in closed form due to lack of conjugacy between the data likelihood. Therefore, we sample each parameter

푔 푔 푔 푔 푔 block 훼 ∈ {훽푦 for all 푦 in our data, 훾 , 훿 , 휈 , 휏 } using a random walk Metropolis step according to the rules Section F.1 with sampling distribution

푌 (︁ ∏︁ 푔 푔 )︁ P(훼|T, X, Θ−훼) ∝ P(T, X|Θ−훼) P(훽푦 |훽푦−1) · 푦=1

(︁ ∏︁ 푔 2 푔 2 푔 푔 푔 )︁ 2 2 P(훽0 |휇훽, 휎훽)P(훾 |휇훽, 휎훽)P(훿 |휆0)P(휈 )P(휏 ) P(휇훽)P(휎훽)P(휇훾)P(휎훾)P(휆0), 푔∈퐺 (F.4.1) where the likelihood and priors are as defined in Section 5.4.

′ ′2 ′ ′2 ′ We can sample the posterior distribution of parameters 휇훽, 휎훽 , 휇훾, 휎훾 , and 휆0 directly due to conjugacy with their prior distributions.

′ ′2 ′ Parameters 휇 and 휎 . The conditional distribution of 휇 is a normal with mean 휇 ′ 훽 훽 훽 휇훽 2 2 and variance 휎 ′ so it can be directly sampled. The mean 휇휇′ and variance 휎 ′ are 휇훽 훽 휇훽

퐺 퐷 ′2 −2 −1 ∑︁ ∑︁ 푔 휇 ′ = (퐷퐺 + 휎 휎 ) 훽 휇훽 훽 훽 0푑 푔=1 푑=1 (F.4.2)

2 ′2 −2 −1 ′2 휎 ′ = (퐷퐺 + 휎 휎 ) 휎 , 휇훽 훽 훽 훽 where 퐷 is the dimensionality of the data, 퐺 is the number of groups in the hierarchy,

2 휇훽 = 0, and 휎훽 = 1 as previously specified.

′2 The conditional distribution of 휎훽 is an inverse-gamma distribution with shape param- eter 푎 ′2 and scale parameter 푏 ′2 so it can be directly sampled. The shape parameter 푎 ′2 휎훽 휎훽 휎훽

92 and scale parameter 푏 ′2 are 휎훽

퐷퐺 푎휎′2 = 푎휎2 + 훽 훽 2 퐺 퐷 (F.4.3) 1 ∑︁ ∑︁ 푔 ′ 2 푏휎′2 = 푏휎2 + (훽 − 휇 ) , 훽 훽 2 0푑 훽 푔=1 푑=1

where 퐷 is the dimensionality of the data and 퐺 is the number of groups in the hierarchy.

We also have 푎 2 = 3 and 푏 2 = 1 as previously specified. 휎훽 휎훽

′ ′2 ′ ′ Parameters 휇훾 and 휎훾 . The conditional distribution of 휇훾 is a normal with mean 휇휇훾 2 2 and variance 휎 ′ so it can be directly sampled. The mean 휇 ′ and variance 휎 ′ are 휇훾 휇훾 휇훾

퐺 퐷 ′2 −2 −1 ∑︁ ∑︁ 푔 ′ 휇휇훾 = (퐷퐺 + 휎훾 휎훾 ) 훾0푑 푔=1 푑=1 (F.4.4)

2 ′2 −2 −1 ′2 휎 ′ = (퐷퐺 + 휎 휎 ) 휎 , 휇훾 훾 훾 훾

where 퐷 is the dimensionality of the data, 퐺 is the number of groups in the hierarchy,

2 휇훾 = 0, and 휎훾 = 1 as previously specified.

′2 The conditional distribution of 휎훾 is an inverse-gamma distribution with shape param-

′2 ′2 ′2 eter 푎휎훾 and scale parameter 푏휎훾 so it can be directly sampled. The shape parameter 푎휎훾

′2 and scale parameter 푏휎훾 are

퐷퐺 푎 ′2 = 푎 2 + 휎훾 휎훾 2 퐺 퐷 (F.4.5) 1 ∑︁ ∑︁ 푔 ′ 2 푏 ′2 = 푏 2 + (훾 − 휇 ) , 휎훾 휎훾 2 0푑 훾 푔=1 푑=1 where 퐷 is the dimensionality of the data and 퐺 is the number of groups in the hierarchy.

2 2 We also have 푎휎훾 = 3 and 푏휎훾 = 1 as previously specified.

′ ′ Parameter 휆0. The conditional distribution of 휆0 is a gamma distribution with shape

parameter 푎 ′ and rate parameter 푏 ′ so it can be directly sampled. The shape parameter 휆0 휆0

93 푎 ′ and rate parameter 푏 ′ are 휆0 휆0

푎 ′ = 푎 + 퐷퐺 휆0 휆0 퐺 퐷 (F.4.6) ∑︁ ∑︁ 푔 푏 ′ = 푏 + 휆 , 휆0 휆0 푑 푔=1 푑=1 where 퐷 is the dimensionality of the data and 퐺 is the number of groups in the hierarchy.

We also have 푎휆0 = 3 and 푏휆0 = 1 as previously specified.

94 Appendix G

Companies Chosen in the Portfolios of the four Models

Here, we show the top 10 companies in both the independent and correlated portfolios selected by each of our four models for test years 2011 and 2012. The companies which exited in each portfolio are highlighted in bold and the exit probabilities and objective values are rounded to the nearest hundredth.

Table G.1: The top companies in our 2011 portfolio constructed with greedy optimization and the homoskedastic independent model. 2011 Company Highest funding round Exit probability Sequent B 0.65 Syapse C 0.31 AirTouch Communications A 0.29 Whitetruffle Seed 0.29 Metromile D 0.29 Funzio Acquisition 0.22 Bitzer Mobile Acquisition 0.19 Axtria C 0.17 Livestar Acquisition 0.16 nprogress Seed 0.08

95 Table G.2: The top companies in our 2011 portfolio constructed with greedy optimization and the homoskedastic correlated model. 2011 Company Highest funding round Exit probability Objective value Sequent B 0.57 0.57 AirTouch A 0.35 0.71 Communications Syapse C 0.34 0.79 Bitzer Mobile Acquisition 0.3 0.83 Whitetruffle Seed 0.4 0.85 Funzio Acquisition 0.29 0.87 Metromile D 0.4 0.88 nprogress Seed 0.17 0.89 Livestar Acquisition 0.2 0.9 Snapguide Acquisition 0.14 0.91

Table G.3: The top companies in our 2012 portfolio constructed with greedy optimization and the homoskedastic independent model. 2012 Company Highest funding round Exit probability Boomerang Seed 0.22 Struq Acquisition 0.21 Glossi Inc Seed 0.21 Trv C 0.2 ScaleGrid Seed 0.07 UpGuard B 0.06 Viewfinder Acquisition 0.04 Mozio Seed 0.04 HighGround A 0.04 iLumen Seed 0.04

Table G.4: The top companies in our 2012 portfolio constructed with greedy optimization and the homoskedastic correlated model. 2012 Company Highest funding round Exit probability Objective value Struq Acquisition 0.32 0.32 Trv C 0.29 0.51 ScaleGrid Seed 0.25 0.63 Glossi Inc Seed 0.31 0.68 Boomerang Seed 0.3 0.72 UpGuard B 0.17 0.74 Viewfinder Acquisition 0.1 0.75 The Zebra A 0.12 0.76 Bolster Seed 0.09 0.77 Alicanto A 0.13 0.77

96 Table G.5: The top companies in our 2011 portfolio constructed with greedy optimization and the heteroskedastic independent model. 2011 Company Highest funding round Exit probability Metromile D 0.47 Leaky Acquisition 0.27 Syapse C 0.25 AirTouch Communications A 0.21 Snapguide Acquisition 0.2 Fullscreen Acquisition 0.13 Axtria C 0.1 Livestar Acquisition 0.08 TappIn Acquisition 0.07 BloomBoard B 0.03

Table G.6: The top companies in our 2011 portfolio constructed with greedy optimization and the heteroskedastic correlated model. 2011 Company Highest funding round Exit probability Objective value Metromile D 0.49 0.49 Snapguide Acquisition 0.27 0.63 Syapse C 0.28 0.72 Axtria C 0.41 0.78 AirTouch A 0.27 0.83 Communications NextDocs C 0.25 0.86 Striiv A 0.11 0.88 Leaky Acquisition 0.36 0.89 Bitzer Mobile Acquisition 0.13 0.91 AdexLink Seed 0.08 0.91

Table G.7: The top companies in our 2012 portfolio constructed with greedy optimization and the heteroskedastic independent model. 2012 Company Highest funding round Exit probability UpGuard B 0.56 Bolster Seed 0.53 Beam Dental A 0.34 Struq Acquisition 0.26 Flutter Acquisition 0.25 Glossi Inc Seed 0.25 HighGround A 0.19 iLumen Seed 0.18 Camino Real A 0.18 Newsle Acquisition 0.17

97 Table G.8: The top companies in our 2012 portfolio constructed with greedy optimization and the heteroskedastic correlated model. 2012 Company Highest funding round Exit probability Objective value UpGuard B 0.55 0.55 Bolster Seed 0.53 0.76 Beam Dental A 0.4 0.83 Struq Acquisition 0.3 0.88 Glossi Inc Seed 0.29 0.91 Flutter Acquisition 0.29 0.93 iLumen Seed 0.2 0.94 Camino Real A 0.2 0.95 ScaleGrid Seed 0.27 0.96 HighGround A 0.2 0.96

Table G.9: The top companies in our 2011 portfolio constructed with greedy optimization and the robust heteroskedastic independent model. 2011 Company Highest funding round Exit probability Funzio Acquisition 0.55 AirTouch Communications A 0.48 Bitzer Mobile Acquisition 0.46 Sequent B 0.45 SHIFT Acquisition 0.43 Longboard Media Acquisition 0.41 Livestar Acquisition 0.37 Jybe Acquisition 0.37 Whitetruffle Seed 0.37 Adynxx A 0.32

Table G.10: The top companies in our 2011 portfolio constructed with greedy optimization and the robust heteroskedastic correlated model. 2011 Company Highest funding round Exit probability Objective value Funzio Acquisition 0.54 0.54 Longboard Media Acquisition 0.43 0.74 AirTouch A 0.49 0.86 Communications Sequent B 0.47 0.92 Bitzer Mobile Acquisition 0.47 0.95 SHIFT Acquisition 0.44 0.96 Livestar Acquisition 0.38 0.97 Jybe Acquisition 0.4 0.98 Whitetruffle Seed 0.4 0.99 Adteractive A 0.27 0.99

98 Table G.11: The top companies in our 2012 portfolio constructed with greedy optimization and the robust heteroskedastic independent model. 2012 Company Highest funding round Exit probability Struq Acquisition 0.8 Glossi Inc Seed 0.79 Metaresolver Acquisition 0.74 Boomerang Seed 0.74 Trv C 0.7 Mozio Seed 0.31 Alicanto A 0.25 adRise B 0.21 Runa Acquisition 0.19 SnappyTV Acquisition 0.17

Table G.12: The top companies in our 2012 portfolio constructed with greedy optimization and the robust heteroskedastic correlated model. 2012 Company Highest funding round Exit probability Objective value Struq Acquisition 0.76 0.76 Metaresolver Acquisition 0.73 0.93 Trv C 0.67 0.97 Glossi Inc Seed 0.75 0.99 Boomerang Seed 0.71 0.99 Alicanto A 0.3 1.0 SnappyTV Acquisition 0.25 1.0 AVOS Systems A 0.17 1.0 Adept Cloud Acquisition 0.17 1.0 Runa Acquisition 0.21 1.0

Table G.13: The top companies in our 2011 portfolio constructed with greedy optimization and the hierarchical independent model. 2011 Company Highest funding round Exit probability 100Plus Acquisition 1.0 Nodeable Acquisition 1.0 Hadapt Acquisition 1.0 Saygent A 1.0 Celeris Corporation Seed 0.99 Colabo A 0.99 Bityota A 0.99 SeeVolution Seed 0.99 Gingerd A 0.99 DeepField Seed 0.99

99 Table G.14: The top companies in our 2011 portfolio constructed with greedy optimization and the hierarchical correlated model. 2011 Company Highest funding round Exit probability Objective value 100Plus Acquisition 0.5 0.5 OFACS LLC Seed 0.38 0.74 Cloudability B 0.41 0.85 AnaBios Seed 0.38 0.9 CurTran Seed 0.28 0.93 Tykoon Seed 0.36 0.95 OCZ Technology A 0.39 0.96 ICEX Seed 0.38 0.97 Nodeable Acquisition 0.5 0.97 Phizzle A 0.15 0.98

Table G.15: The top companies in our 2012 portfolio constructed with greedy optimization and the hierarchical independent model. 2012 Company Highest funding round Exit probability Cagenix Seed 1.0 GenePeeks Seed 1.0 Lab Automate Technologies Seed 1.0 Rothman Healthcare Seed 1.0 BioSurplus A 1.0 Bellbrook Labs Seed 1.0 BerGenBio D 1.0 BioElectronics Seed 1.0 BioAnalytix B 0.99 Relmada Therapeutics A 0.95

Table G.16: The top companies in our 2012 portfolio constructed with greedy optimization and the hierarchical correlated model. 2012 Company Highest funding round Exit probability Objective value Lab Automate Seed 0.51 0.51 Technologies PromoJam A 0.44 0.72 QBotix B 0.37 0.84 Rentalroost.com Seed 0.31 0.91 Survata A 0.22 0.95 Churchkey Can Co Seed 0.23 0.96 CeutiCare Seed 0.26 0.97 RFEyeD Seed 0.2 0.98 PernixData Acquisition 0.45 0.98 Ondore Seed 0.2 0.99

100 Bibliography

Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, and Samuel Ieong. Diversifying search results. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 5–14. ACM, 2009. Pierre Azoulay, Benjamin Jones, J Daniel Kim, and Javier Miranda. Age and high-growth en- trepreneurship. Technical report, National Bureau of Economic Research, 2018. John Bhakdi. Quantitative vc: A new way to growth. The Journal of , 17(1):14–28, 2013. Ashley Carroll. Capital-as-a-service: A new operating system for early stage investing, 2017. https://medium.com/social-capital/capital-as-a-service-a-new- operating-system-for-early-stage-investing-6d001416c0df/ . Harr Chen and David R Karger. Less is more: probabilistic models for retrieving fewer relevant doc- uments. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 429–436. ACM, 2006. US Congress. Jumpstart our business startups act. Public Law, 112-106, 2012. Correlation Ventures. http://correlationvc.com/ , 2018. Thomas M Cover. Universal portfolios. Mathematical finance, 1(1):1–29, 1991. CrunchBase, 2018. https://www.crunchbase.com/#/home/index . Marco Da Rin, Thomas F Hellmann, and Manju Puri. A survey of venture capital research. Techni- cal report, National Bureau of Economic Research, 2011. Charles E Eesley, David H Hsu, and Edward B Roberts. The contingent effects of top management teams on venture performance: Aligning founding team composition with strat- egy and commercialization environment. Strategic Management Journal, 35(12):1798–1817, 2014. Khalid El-Arini, Gaurav Veda, Dafna Shahaf, and Carlos Guestrin. Turning down the noise in the bl- ogosphere. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 289–298. ACM, 2009. FlashFunders. https://www.flashfunders.com/ , 2018. Andrew Gelman and Donald B Rubin. Inference from iterative simulation using multiple sequences. Statistical science, pages 457–472, 1992. P Gloor, Pierre Dorsaz, and Hauke Fuehres. Analyzing success of startup entrepreneurs by mea- suring their social network distance to a business networking hub. In Proceedings 3rd in- ternational conference on collaborative innovation networks coins, Basel. Sept, pages 8–10, 2011. Ventures Google. https://www.gv.com/ , 2018.

101 Todd L Graves. Automatic step size selection in random walk metropolis algorithms. arXiv preprint arXiv:1103.5986, 2011. Carlos Guestrin, Andreas Krause, and Ajit Paul Singh. Near-optimal sensor placements in gaussian processes. In Proceedings of the 22nd international conference on Machine learning, pages 265–272. ACM, 2005. Jorge Guzman and Scott Stern. Nowcasting and placecasting entrepreneurial quality and perfor- mance. Technical report, National Bureau of Economic Research, 2015a. Jorge Guzman and Scott Stern. Where is ? Science, 347(6222):606–609, 2015b. Julianne Pepitone and Stacy Cowley. Facebook’s first big investor, Peter Thiel, cashes out. CN- NTech, August 2012. Kirk Kardashian. Could algorithms help create a better venture capitalist?, August 2015. http: //fortune.com/2015/08/05/venture-capital-hits-average/. Richard M Karp. On the computational complexity of combinatorial problems. Networks, 5(1): 45–68, 1975. Matt Kaufman. CrunchBase, People+, and the EFF, 2013. https:// about.crunchbase.com/2013/11/crunchbase-people-and-the-eff/ . David Kempe, Jon Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 137–146. ACM, 2003. Jessica Leber. An algorithm to pick startup winners, July 2012. https:// www.technologyreview.com/s/428427/an-algorithm-to-pick-startup- winners/. Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, and Na- talie Glance. Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 420–429. ACM, 2007. LinkedIn. https://www.linkedin.com/ , 2018. Harry Markowitz. Portfolio selection. The journal of finance, 7(1):77–91, 1952. M Marmer, Bjoern Lasse Herrmann, E Dogrultan, and R Berman. Startup genome report: A new framework for understanding why startups succeed. Berkley University and Stanford Univer- sity, Tech. Rep, 2011. Rahul Mazumder, Trevor Hastie, and Robert Tibshirani. Spectral regularization algorithms for learning large incomplete matrices. Journal of machine learning research, 11(Aug):2287– 2322, 2010. Andrew Metrick and Ayako Yasuda. Venture capital and the finance of innovation. John Wiley & Sons, Inc., 2011. Annalisa Molini, Peter Talkner, Gabriel G Katul, and Amilcare Porporato. First passage time statis- tics of brownian motion with purely time dependent drift and diffusion. Physica A: Statistical Mechanics and its Applications, 390(11):1841–1852, 2011. Ramana Nanda, Sampsa Samila, and Olav Sorenson. The persistent effect of initial success: Evi- dence from venture capital. 2017. Stefan Nann, Jonas S Krauss, Michael Schober, Peter A Gloor, Kai Fischbach, and Hauke Führes. The power of alumni networks-success of startup companies correlates with online social net- work structure of its founders. MIT Sloan Working Paper, 4766-10, 2010.

102 George L Nemhauser, Laurence A Wolsey, and Marshall L Fisher. An analysis of approximations for maximizing submodular set functionsUi.˚ Mathematical Programming, 14(1):265–294, 1978. NextSeed. https://www.nextseed.com/ , 2018. NLTK 3.0 documentation. WordNet Interface. http://www.nltk.org/howto/ wordnet.html , 2017. Basil Peters. Venture Capital Funds - How the Math Works, 2014. http:// www.angelblog.net/Venture_Capital_Funds_How_the_Math_Works.html . Pitchbook. https://pitchbook.com/ , 2018. Michael E Porter and Scott Stern. Innovation: location matters. MIT Sloan management review, 42 (4):28, 2001. William A Sahlman. The structure and governance of venture-capital organizations. Journal of financial economics, 27(2):473–521, 1990. John Salvatier, Thomas V Wiecki, and Christopher Fonnesbeck. Probabilistic programming in python using pymc3. PeerJ Computer Science, 2:e55, 2016. SocialCapital. https://www.socialcapital.com/ , 2018. D. J. Spiegelhalter, N. G. Best, B. P. Carlin, and A. van der Linde. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society, 64:583–639, 2002. Ashwin Swaminathan, Cherian V Mathew, and Darko Kirovski. Essential pages. In Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology-Volume 01, pages 173–182. IEEE Computer Society, 2009. U. S. News and World Report National Universities Rankings. https://www.usnews.com/ best-colleges/rankings/national-universities , 2017. Zhibiao Wu and Martha Palmer. Verbs semantics and lexical selection. In Proceedings of the 32nd annual meeting on Association for Computational Linguistics, pages 133–138. Association for Computational Linguistics, 1994. Guang Xiang, Zeyu Zheng, Miaomiao Wen, Jason I Hong, Carolyn Penstein Rosé, and Chao Liu. A supervised approach to predict company acquisition with factual and topic features using profiles and news articles on techcrunch. In ICWSM, 2012.

103