Serendipity and strategy in rapid innovation T. M. A. Fink∗†, M. Reeves‡, R. Palma‡ and R. S. Farr† †London Institute for Mathematical Sciences, Mayfair, London W1K 2XF, UK ∗Centre National de la Recherche Scientifique, Paris, France ‡BCG Henderson Institute, The Boston Consulting Group, New York, USA

Abstract. Innovation is to organizations what evolution is to organ- process. isms: it is how organisations adapt to changes in the environment Serendipity. On the other hand, a serendipitous approach is and improve. Yet despite steady advances in our understanding of seen in firms like Apple, which is notoriously opposed to mak- evolution, what drives innovation remains elusive. On the one hand, ing innovation choices based on incremental consumer demands, organizations invest heavily in systematic strategies to accelerate in- novation. On the other, historical analysis and individual experience and Tesla, which has invested for years in their vision of long- suggest that serendipity plays a significant role in the discovery pro- distance electric cars [13]. In science, many of the most impor- cess. To unify these two perspectives, we analyzed the mathematics of tant discoveries have serendipitous origins, in contrast to their innovation as a search process for viable designs across a universe of published step-by-step write-ups, such as penicillin, heparin, component building blocks. We then tested our insights using histor- X-rays and nitrous oxide [9]. The role of vision and intuition ical data from language, gastronomy and technology. By measuring the number of makeable designs as we acquire more components, tend to be under-reported: a study of 33 major discoveries in we observed that the relative usefulness of different components is biochemistry “in which serendipity played a crucial role” con- not fixed, but cross each other over time. When these crossovers are cluded that “when it comes to ‘chance’ factors, few scientists unanticipated, they appear to be the result of serendipity. But when ‘tell it like it was’” [14, 15]. we can predict crossovers ahead of time, they offer an opportunity Key results. To unify these two perspectives and understand to strategically increase the growth of our product space. Thus we what drives innovation, in this Article we do four things. First, find that the serendipitous and strategic visions of innovation can be viewed as different manifestations of the same thing: the changing we study data from three sectors: language, gastronomy and importance of components over time. technology (Fig. 1). We measure how the number of makeable products (words, recipes and software products) grows as we Innovation is how governments, institutions and firms adapt to acquire new components (letters, ingredients and development changes in the environment and improve [1]. Organizations that tools). We observe that the relative usefulness of components innovate are more likely to prosper and stand the test of time; is not fixed, but cross each other in time. Second, to explain those that fail to do so fall behind their competitors and suc- these crossovers, we prove a conservation law for the innova- cumb to market and environmental change [2, 3]. But despite tion process over time (eq. (1)). The conserved quantity is a the importance of innovation, what drives innovation remains combination of the usefulness of components and the complex- elusive [1, 4]. There is a perennial tension between a strategic ity of products. We use it to forecast crossovers in the future approach, which views innovation as a rational process which based on information we already have about the products we can be measured and prescribed [5–8]; and a belief in serendip- can make. Third, we identify a spectrum of innovation strate- ity and the intuition of extraordinary individuals [9–11]. gies dependent on how far into the future we forecast: from Strategy. The strategic approach is seen in firms like P&G and short term gain to long term growth. A short-sighted strategy Unilever, which use process manuals and consumer research to maximizes what a new component can do for us now, whereas maintain a reliable innovation factory [12], and Zara, which sys- a far-sighted strategy maximizes what it could do for us later. tematically scales new products up and down based on real-time We apply both strategies to our three sectors and find that they sales data. In scientific discovery, “traditional scientific training differ from each other to the extent that each sector contains and thinking favor logic and predictability over chance” [9]. If crossovers (Fig. 4). Fourth, we resolve the tension between the discoveries are actually made in the way that scientific publi- strategic and serendipitous interpretations of innovation. Both cations suggest, the path to invention is a step-wise, rational

4 R 10 cayenne Rails cocoa 104 jQuery UI Language F Gastronomy 100 Technology 1000 lime 50 X 1000 Sauce Labs

100 10 100 5

10 10 1 0.5 Makeable words ( usefulness ) Makeable recipes ( usefulness ) 1 1 Makeable software ( usefulness ) 0 2 4 6 8 10 12 14 16 18 20 22 24 26 0 127 254 381 0 331 662 993 Acquired letters Acquired ingredients Acquired development tools 1st R 1st cayenne 1st Rails 2nd F 2nd cocoa 2nd jQuery UI

Rank 3rd X Rank 3rd lime Rank 3rd Sauce Labs

FIG. 1: Products, components and usefulness. (Top) We studied products and components from three sectors. In language, the products are 79,258 English words and the components are the 26 letters. In gastronomy, the products are 56,498 recipes from the allrecipes.com, epicurious.com, and menupan.com [16] and the components are 381 ingredients. In technology, the products are 1158 software products catalogued by stackshare.io and the components are 993 development tools used to make them. (Bottom) The usefulness of a component is the number of products we can make that contain it. We find that the relative usefulness of a component depends on how many other components have already been acquired. For each sector, we show the usefulness of three typical components: averaged at each stage over all possible choices of the other acquired components and—for gastronomy—for a particular random order of component acquisition (points). 2 can be viewed as different manifestations of the changing im- that you believe will help you reach this goal. Let’s now sup- portance of components over time. When component crossovers pose each player approaches this differently. Your approach is are unexpected, they appear to be the result of serendipity. But to follow your gut, arbitrarily selecting bricks that look intrigu- when we can forecast crossovers in advance, they provide an ing. Alice uses what we call a short-sighted strategy, carefully opportunity to strategically increase the growth of our product picking Lego men and their firefighting hats to immediately space. make simple toys. Meanwhile, Bob chooses pieces such as axels, Lego game. Let’s illustrate these ideas using Lego bricks. wheels, and small base plates that he noticed are common in Think back to your childhood days. You’re in a room with two more complex models, even though he is not able to use them friends Bob and Alice, playing with a big box of Lego bricks— straightaway to produce new toys. We call this a far-sighted say, a fire station set. All three of you have the same goal: to strategy. build as many new toys as possible. As you continue to play, Who wins. At the end of the day, who will have innovated each of you searches through the box and chooses those bricks the most? That is, who will have built the most new toys? We find that, in the beginning, Alice will lead the way, surging ahead with her impatient strategy. But as the game progresses, E A fate will appear to shift. Bob’s early moves will begin to look I R serendipitous when he is able to assemble a complex fire truck N from his choice of initially useless axels and wheels. It will seem T 6 O that he was lucky, but we will soon see that he effectively cre- S L ated his own serendipity. What about you? Picking components C on a hunch, you will have built the fewest toys. Your friends had U D an information-enabled strategy, while you relied on chance. M 13 Language P Spectrum of strategies. What can we learn from this? If in- H G novation is a search process, then your component choices to- Y day matter greatly in terms of the options they will open up B F to you tomorrow. Do you pick components that quickly form V 20 K simple products and give you a return now, or do you choose W those components that give you a higher future option value? Z X By understanding innovation as a search for designs across a J Q universe of components, we made a surprising discovery. Infor- 26 mation about the unfolding process of innovation can be used egg wheat to form an advantageous innovation strategy. But there is no butter onion garlic one superior strategy. As we shall see, the optimal strategy de- milk vegetable_oil pends on time—how far along the innovation process we have cream tomato olive_oil advanced—and the sector—some sectors contain more oppor- 95 black_pepper pepper tunities for strategic advantage than others. vanilla cayenne vinegar cane_molasses bell_pepper cinnamon 1 Results parsley chicken Components and products. Just like the Lego toys are made up Gastronomy 190 lemon_juice beef cocoa of distinct kinds of bricks, we take products to be made up corn bread of distinct components. A component can be an object, like a scallion mustard ginger touch screen, but it can also be a skill, like using Python, or a basil celery 286 carrot routine, like customer registration. Only certain combinations potato chicken_broth of components form products, according to some predetermined yeast rice mushroom universal recipe book of products. Examples of products and cheese soy_sauce the components used to make them are shown in Fig. 1. Now cumin oregano 381 suppose that we possess a basket of distinct components, which

Google Analytics we can combine in different ways to make products. We have GitHub jQuery nginx more than enough copies of each component for our needs, so Bootstrap Slack we do not have to worry about running out. There are N possi- JavaScript New Relic Redis ble component types in total, but at any given stage n we only Google Apps 248 Amazon S3 have n of these N possible building blocks. At every stage, we Amazon EC2 Git AngularJS pick a new type of component to add to our basket. Node.js MySQL Usefulness. The usefulness of a component is the number of Amazon CloudFront Trello Rails products we can make that contain it [17]. In other words, the PostgreSQL Technology 496 Ruby MongoDB usefulness uα of some component α is how many more products Python MailChimp we can make with α in our basket than without α in our basket. Mixpanel Pingdom PHP As we gather more components, uα increases or stays the same; FIG. 2: Crossovers. The relative Docker Mandrill it cannot decrease. We write u ( ) to indicate this dependence Sublime Text α n usefulness of different components 745 Elasticsearch Heroku on n: uα(n) is the usefulness of α given possession of α and changes as the number of components Stripe Sass we possess increases. For example, if you Google Drive n − 1 other components, the combined set of components being SendGrid npm . Averaging over all choices of the n−1 other components from are only allowed six letters, the ones that show Jenkins n Bower up in the most words are a, e, i, o, s, r. For gastro- Grunt the N − 1 that are possible gives the mean usefulness, uα(n). nomy and technology, for clarity we only show the 993 Usefulness experiment. To measure the usefulness of different 40 components most useful when we have all N components. A pure components as the innovation process unfolds and we acquire short-sighted strategy acquires components in the order that they more components, we did the following experiments. Using data intersect the diagonal; whereas a pure far-sighted strategy acquires from each of our three sectors, we put a given component α into them in the order that they intersect a vertical. If there are no an empty basket, and then added, one component at a time, the crossovers, the strategies are the same. 3 remaining N − 1 other components, measuring the usefulness some in gastronomy and many in technology. This means, for of α at every step. We averaged uα(n) over all possible orders example, that the most useful letters for making words in Scrab- in which to add the N − 1 components to obtain uα(n). (We ble (a basket of seven letters) are nearly the same as the most explain how in Methods 3.b.) We repeated this process for all of useful letters for making words with a full basket (26 letters); the components α. Typical results from these experiments are the key ingredients in a small kitchen (20 ingredients) are mod- shown in Fig. 1. We find that the mean usefulnesses of different erately different from those in a big one (80 ingredients); the components cross each other as the number of components in most-used development skills for a young software firm (ex- our basket increases. As Fig. 1 shows for gastronomy, this is perience with 40 tools) are significantly different from those true for both the average over all possible orderings of compo- for an advanced one (160 tools). We call components that do nents (lines) as well as a specific random ordering (points). not cross in time isochronic, like the letters; and those that do Bumps charts. To visualise the relative usefulness of compo- anisochronic, like the tools. nents over time, for each sector we created its “bumps chart” Why crossovers happen. To understand why crossovers hap- (Fig. 2). These show the rank order of mean usefulness at every pen, let’s have a closer look at how the mean usefulness in- stage of the innovation process. We see that the crossovers in creases for a single component (Fig. 3). To make a product of Fig. 1 are commonplace, but that some sectors contain more complexity s, we must possess all s of its distinct components. crossovers than others. There are few crossings in language, So making a complex product is harder than making a simple one, because there are more ways that we might be missing a necessary component. We therefore group together the prod- SMALL KITCHEN BIG KITCHEN ucts we can make containing α according to their complexity. 127 ingredients: almond to fenugreek 381 ingredients: almond to zucchini That is, the usefulness u ( , s) of component α is how many Recipe α n complexity more products of complexity s we can make with α in our bas- 21 ket than without α in our basket. Summing uα(n, s) over s gives 597 recipes in total 56,498 recipes in total 17 uα(n). The advantage of this refined grouping is that, by un- derstanding the behaviour of uα(n, s), we can understand the 13 more difficult uα(n). Our key result, which we prove in Meth- s−1 9 ods 3.b, is that uα(n, s)/n is constant over all stages of the innovation process. In other words, for two stages n and n0, 5 0 0 s−1 A 1 B uα(n , s) ' uα(n, s)(n /n) . (1) 600 Recipes 0 0 Recipes 6000 This tells us that the number of products containing α of com- 21 plexity s grows much faster for higher complexities than for 89 recipes contain cocoa 4801 recipes contain cocoa lower complexities. Early on, uα(n, s) will tend to be small for 17 higher complexities, but depending on how far ahead we look, 13 the bigger growth rate can more than compensate for this, as we see in Fig. 3. Summing eq. (1) over size s, we find 9 0 2 5 uα(n ) ' uα(n, 1) + uα(n, 2) x + uα(n, 3) x + ..., (2)

0 C 1 D where x = n /n. The growth of the mean usefulness of α 100 Recipes with cocoa 0 0 Recipes with cocoa 1000 strongly depends on the complexity of products containing α. Valence. So far we have only characterised a component by Cocoa is more useful than cayenne Cayenne is more useful than cocoa its usefulness: the number of products we can make that contain 21 it. Now we introduce another way of describing a component: 43 recipes contain cayenne 7950 recipes the average complexity of the products it appears in. We call 17 contain cayenne this the valence. The valence vα of component α is the aver- 13 age complexity of the products it appears in at stage N, when we have all N components. Think of the valence as the typi- 9 cal number of co-stars a component performs with, plus one. 5 We show the usefulness and valence for each of the components in our three sectors in Fig. 4ABC. More valent components are E 1 F unlikely to be useful until we possess a lot of other components, 100 Recipes with cayenne 0 0 Recipes with cayenne 1000 so that we have a good chance of hitting upon the ones they FIG. 3: Why crossovers happen. On the right is a big kitchen with need. These are the wheels and axels in our Lego set. On the 381 ingredients. On the left is a small kitchen with one-third as many other hand, less valent components are likely to boost our prod- ingredients. In the big kitchen (B), we can make a total of 56,498 uct space early on, when we have acquired fewer components. recipes. Each bar counts recipes with the same number of ingredients These are the Lego men and their firefighting hats. This insight (complexity). When we move to the smaller kitchen (A), the number of makable recipes shrinks dramatically to 597, or 1.0%. But this suggests that more valent components will tend to rise in rela- reduction is far from uniform across different bars. Higher bars shrink tive usefulness, and less valent components fall. This is verified more, on average by an extra factor of 3 with each bar. Thus the in our experiments: components on the right of the plots in number of recipes of complexity one (first bar) shrinks about 3-fold; Fig. 4ABC tend to rise in the bumps charts in Fig. 2, such as the number of complexity two (second bar) 9-fold, and so on. Of onion, tomato, Javascript and Git; whereas components on the all the recipes in the big kitchen, 4801 contain cocoa (D) and 7950 left tend to fall, like cocoa, vanilla, Google Apps and SendGrid. contain cayenne (F). The cayenne recipes tend to be more complex, containing on average 10.6 ingredients, whereas the cocoa recipes are 2 Discussion simpler, averaging 7.2 ingredients. Because higher bars suffer stronger Interpreting crossovers. A crossover in the usefulness of com- reduction, overall fewer cayenne recipes (0.5%) survive in the smaller kitchen (E) than cocoa recipes (1.8%) (C). Thus cayenne is more ponents means that the things that matter most today are useful in the big kitchen, but cocoa is more useful in the small kitchen. not the same as the things that will matter most tomorrow. 4

How we interpret crossovers in practice depends on whether has a two-fold advantage at first, but later far-sighted wins by they are unanticipated, and take us by surprise, or anticipated, a factor of two. For technology, short-sighted surges ahead by and can be planned for and exploited. When they are unantic- an order of magnitude, but later far-sighted is dominant. ipated, beneficial crossovers can seem to be serendipitous. But Serendipity. Writing about the The Three Princes of when they can be anticipated, crossovers provide an opportu- Serendip, Horace Walpole records that the princes “were al- nity to strategically increase the growth of our product space. ways making discoveries, by accidents and sagacity, of things To harness this opportunity, we turn to forecasting component they were not in quest of”. Serendipity is the fortunate develop- crossovers using the complexity of products containing them. ment of events, and many organizations and researchers stress Short-sighted strategy. To maximise the size of our product its importance [9, 10]. Crossovers in component usefulness help space when crossovers are unanticipated, the optimal approach us see why. Components which depend on the presence of many is to acquire, at each stage, the component that is most useful others can be of little benefit early on. But as the innovation from the ones that are remaining. Think of this as a “greedy” process unfolds and the acquired components pay off, the re- approach. It has a geometric interpretation: it is equivalent to sults will seem serendipitous, because a number of previously acquiring the components that intersect the diagonals in Fig. low-value components become invaluable. Thus, what appears 2. At every stage we lock in to a specific component, unaware as serendipity is not happenstance but the delayed fruition of of the future implications of the choices we make. A component components reliant on the presence of others. After the acqui- poorly picked is an opportunity lost. sition of enough other components, these components flourish. Far-sighted strategy. Using only information about the prod- For example, the initially useless axels and wheels were later ucts we can already make with our existing components, how- found to be invaluable to building many new toys. In a similar ever, we can forecast the usefulness of our components into way, the low value attributed to Flemming’s initial identifica- the future. Eq. (2) shows us how, and we give an example in tion of lysosome was later revised to high value in the years Methods 3.c. Here the optimal approach is to acquire the com- leading to the discovery of penicillin, when other needed com- ponent that will be most useful at some later stage n0. This also ponents emerged, such as sulfa drugs which showed that safe has a geometric interpretation: it is equivalent to acquiring the antiseptics are possible [9]. Interestingly, the word “serendip- components that intersect a vertical at n0 in Fig. 2, and thus ity” does not have an antonym. But as our bumps charts show, depends on how far into the future we forecast. for every beneficial shift in a crossover, there is a detrimental Strategy comparison. A short-sighted strategy considers only one. Each opportunity for serendipity goes hand-in-hand with a the usefulness uα, whereas a far-sighted strategy considers both chance for anti-serendipity: the acquisition of components use- the usefulness uα and the valence vα. Short-sighted maximises ful now but less useful later. Avoiding these over-valued compo- what a potential new component can do for us now, whereas far- nents is as important as acquiring under-valued ones to securing sighted maximises what it could do for us later. Depending on a large future product space. our desire for short-term gain versus long-term growth, we have Strategy. Our research shows that the most important a spectrum of strategies dependent on n0. A pure short-sighted components—materials, skills and routines—when an organiza- strategy (n0 = n) and a pure far-sighted strategy (n0 = N) tion is less developed tend to be different from when it is more are compared in Fig. 4DEF. Like the Lego approaches of Bob developed. Instead, the relative usefulness of components can and Alice, both strategies beat acquiring components in a ran- change over time, in a statistically repeatable way. Recognising dom order. As our theory predicts, the extent to which the how an organization’s priorities depend on its maturity enable two strategies differ from each other increases with the number it to balance short-term gain with long-term growth. For ex- of crossovers. For language, they are nearly identical, because ample, our insights provide a framework for understanding the there are hardly any crossovers. For gastronomy, short-sighted poverty trap. When a less-developed country imitates a more-

E wheategg Google Analytics A A I B C R N T 20 000 butter onion 700 L O C 600 S garlic jQuery GitHub 4 D M milk 2×10 U 500 nginx G Y vegetable_oil P cream B H black_peppertomato Bootstrap New RelicSlack JavaScript vanilla 1×10 4 olive_oil 400 Amazon EC2 F cane_molasses pepper Redis Git V 8000 vinegar Google AppsAngularJS Amazon S3 K cayenne Node.js W bell_pepper 300 Amazon CloudFrontMySQL 5000 cinnamon parsley Rails chicken TrelloRuby cocoa lemon_juice beef PostgreSQL MongoDB Mixpanel PHP MailChimp Z scallion Docker X corn mustard bread basil PingdomPython 4000 carrotcelery Sass npm 2000 yeastpotato rice 200 SendGrid MandrillElasticsearchSublime Text cheesemushroomginger oreganocumin StripeHeroku Jenkins

Usefulness: no. of words a letter is in chicken_broth J Q Bower soy_sauce Grunt

1000 Usefulness: no. of recipes an ingredient is in Usefulness: no. of software products a tool is in 6.6 6.8 7.0 7.2 7.4 7.6 7.8 7 8 9 10 11 12 26 28 30 32 34 36 38 Valence: average complexity of words a letter is in Valence: average complexity of recipes an ingredient is in Valence: average complexity of software a tool is in D E 1000 F

500 104 104

100 1000 1000 50

Far-sighted strategy 100 Impatient strategy 10

Total makeable words 100 Total makeable recipes

Pseudo-random(alphabetical) Total makeable software 5 10 Language Gastronomy Technology 10 1 0 5 10 15 20 25 0 50 100 150 200 250 300 350 0 200 400 600 800 Acquired letters Acquired ingredients Acquired development tools

FIG. 4: (ABC) Scatter plots of component usefulness versus component valence for our three sectors. For gastronomy and technology, we only show the top 40 components; the complete set is in Fig. 5. (DEF) Both the short-sighted and far-sighted strategies beat a typical random component ordering (here alphabetical), but they diverge from each other only insofar that there are crossings in the bumps charts. 5 developed country by acquiring similar production capabilities The same must be true when we replace n by n0, and therefore [6], it is unable to quickly reap the rewards of its investment, n 0 0 n0 because it does not have in place enough other needed capabil- uα(n, s) n/ s = uα(n , s) n / s . (3) ities. This in turn prevents it from further investment in those needed components. Our analysis gives quantitative backing to When the number of components is big compared to the prod- 0 n n0 s the “lean start-up” approach to building companies and launch- uct size (n, n  s), we can approximate s and s by n ing products [18]. Start-ups are wise to employ a short-sighted and n0s, and thus strategy and release a minimum viable product. Without the s−1 0 0s−1 resources to sustain a far-sighted approach, they need to quickly uα(n, s)/n ' uα(n , s)/n . bring a simple product to market. On the other hand, firms that For simplicity, we use this approximation in the Results section, can weather an initial drought will see their sacrifice more than but we could just as well have used the exact expression in eq. paid off when their far-sighted approach kicks in. By tracking (3). how potential new components combine with existing ones, or- 3.c Forecasting crossovers in usefulness ganizations can develop an information-advantaged strategy to Here we show how we can forecast the usefulness of components adopt the right components at the right time. In this way they at stage n0 from information we have at some earlier stage n, can create their own serendipity, rather than relying on intu- where n is the number of components we have acquired. As in ition and chance. Fig. 3, we have a set k of 127 ingredients in a small kitchen— 3 Methods almond to fenugreek—and a set K of 381 ingredients in a big kitchen—almond to zucchini. 3.a Data In the small kitchen, we can make a total of 597 recipes. Our three data sets—described in Fig. 1—were obtained as fol- Of these 597 recipes, 43 contain cayenne, but they are not all lows. In language, our list of 79,258 common English words is equally complex. Two of the 43 recipes contain one ingredient from the built-in WordList library in Mathematica 10. Of the (namely, cayenne itself) and have complexity one; one recipe 84,923 KnownWords, we only considered those made from the contains two ingredients and has complexity two; 18 contain 26 letters a–z, ignoring case: we excluded words containing a three ingredients and have complexity three; and so on. Simi- hyphen, space, etc. In gastronomy, the 56,498 recipes can be larly, 89 of the 597 recipes contain cocoa: six have complexity found in the supplementary material in [16]. In technology, the one; 22 have complexity two; and so on. Using eq. (2), we can 1158 software products and the development tools used to make write the mean usefulness of these two components as them can be found at the site stackshare.io. 3.b Proof of components invariant 0 2 3 4 5 7 uca(n |k ) ' 2 + x + 18x + 12x + 8x + x + x and Let α be some component. Let N1 be the set of N − 1 other 0 2 3 4 uco(n |k ) ' 6 + 22x + 37x + 16x + 8x , possible components not including α, n1 be a subset of n − 1 components chosen from , and be a subset of s − 1 com- N1 s1 where x = n0/127. As expected, ponents chosen from n1. The usefulness uα(n, s) is how many more products of complexity s that we can make from the com- 0 uca(n |k ) = 43 and ponents together with α, than from the components alone: x=1 n1 n1 0 uco(n |k ) = 89. X x=1 uα(n, s) = prod(α ∩ s1) − prod(s1), In the big kitchen, we can make a total of 56,498 recipes. ⊆ s1 n1 Of these, 7950 contain cayenne and 4801 contain cocoa. Again using eq. (2), where prod(α ∩ s1) takes the value 0 if the combination of com- ponents α ∩ s1 forms no products of complexity s and 1 if 0 2 28 30 uca(n |K ) ' 2 + 19x + 64x + ... + 2x + 2x and α ∩ s1 forms one product of complexity s. (Occasionally, the 0 2 20 21 same combination of components α ∩ s1 forms multiple prod- uco(n |K ) ' 6 + 54x + 195x + ... + 2x + 3x . ucts: for example, beef, butter and onion together form two dis- where x = n0/381. As expected, tinct recipes of length three. In such cases, prod(α ∩ s1) takes the value 2 if α ∩ s1 forms two products, and so on.) The ex- 0 uca(n |K ) = 7950 and pected usefulness of component α, uα(n, s), is the average of x=1 N−1 u (n0| ) = 4801. uα(n, s) over all subsets n1 ⊆ N1; there are n−1 such subsets. co K x=1 Therefore So far, none of this is surprising. The punchline is that we can N−1 X uα(n, s) = 1 n−1 uα(n, s) estimate the usefulness of components in the big kitchen from n1⊆N1 what we know about our small kitchen. To do so, we simply N−1 X X evaluate the small-kitchen polynomials at the big-kitchen stage: = 1 n−1 prod(α ∩ s1) − prod(s1). 0 0 n1⊆N1 s1⊆n1 u (n | ) ' u (n | ) ' 3569 and ca K n0=381 ca k x=3 0 u (n0| ) ' u (n0| ) ' 1485. Consider some particular combination of components s1. The co K n0=381 co k x=3 0 double sum above will count s1 once if s = n, but multiple times 0 In log terms—log usefulness being the natural unit of measure— if s < n, because s1 will belong to multiple sets n1. How many? these are accurate to within 11% and 9% of the true values. In In any set n1 that contains s1, there are n − s free elements to choose, from N − s other components. Therefore the double particular, this predicts the crossover of cayenne and cocoa in N−s Figure 3. sum will count every combination s1 a total of n−s times, and

N−sN−1 X uα(n, s) = n−s n−1 prod(α ∩ s1) − prod(s1) [1] D. Erwin, D. Krakauer, ‘Insights into innovation’, Science 304, 1117 s ⊆N 1 1 (2004). nN [2] M. Reeves, K. Haanaes, J. Sinha, Your Strategy Needs a Strategy = N/n s s uα(N , s). (Harvard Business Review Press, 2015). [3] C. Weiss et al., ‘Adoption of a high-impact innovation in a homoge- 6

neous population’, Phys Rev X, 4, 041008 (2014). [13] K. Bullis, ‘How Tesla is driving electric car innovation’, MIT Tech [4] J. McNerney et al., ‘Role of design complexity in technology im- Rev, 8 (2013). provement’, Proc Natl Acad Sci, 108, 9008 (2011). [14] J. Comroe, ‘Roast pig and scientific discovery: Part II’, Am Rev [5] R. Van Noorden, ‘Physicists make ‘weather forecasts’ for economies’, Respir Dis, 115, 853 (1977). Nature, 1038, 16963 (2015):. [15] F. Tria et al., ‘The dynamics of correlated novelties’, Sci Rep 4, [6] A. Tacchella et al., ‘A new metric for countries’ fitness and products’ 5890 (2014). complexity’, Sci Rep, 2, 723 (2012). [16] Y.-Y. Ahn, S. E. Ahnert, J. P. Bagrow, A.-L. Barabsi, ‘Flavor net- [7] P. Drucker, ‘The discipline of innovation’, Harvard Bus Rev 8, 1 work and the principles of food pairing’, Sci Rep 1, 196 (2011). (2002). [17] We make no assumptions about the values of different products, [8] V. Sood et al., ‘Interacting branching process as a simple model of which will depend on the market environment and may change with innovation’, Phys Rev Lett, 105, 178701 (2010). time. But we can be sure that maximising the number of products is [9] M. Rosenman, ‘Serendipity and scientific discovery’, Res Urban a proxy for maximising any reasonable property of them. A similar Economics, 13, 187 (2001). proxy is used in evolutionary models, where evolvability is defined as [10] F. Johansson, ‘When success is born out of serendipity’, Harvard the number of new phenotypes in the adjacent possible (1-mutation Bus Rev 18, 22 (2012). boundary) of a given phenotype; see A. Wagner, ‘Robustness and [11] W. Isaacson, The Innovators: How a Group of Hackers, Geniuses, evolvability: a paradox resolved’, Proc Roy Soc B 91, 275 (2008). and Geeks Created the Digital Revolution, (2014). [18] E. Ries, The Lean Startup, (Portfolio Penguin, 2011). [12] B. Brown, S. Anthony, ‘How P&G tripled its innovation success rate’, Harvard Bus Rev 6 (2011). 7

wheat egg onion butter garlic

milk vegetable_oil cream olive_oil black_peppertomato 104 cane_molasses cayennevinegar vanilla pepper bell_pepper cinnamon chicken cocoa corn beef parsley lemon_juice bread scallionmustard ginger potato basil carrotcelery yeast cheese rice mushroom chicken_broth lemonlard cheddar_cheese macaroni oregano cumin cream_cheesewalnut nutmeg apple honey parmesansoy__saucecheese thyme bacon starch pork cilantrogreen_bell_pepper pecan almond olivefish cucumberwhite_winerosemary raisin vegetable tamarind shrimp coriander gelatin pineappleorange_juiceorangebuttermilk coconutlime_juice bean seed sesame_oil redpork_wine_sausage nut ham mozzarella_cheese bay turmeric dill cider chivelettuce strawberry cherry yogurt soybean pea banana oatlime milk_fatmint 1000 wine shallot fennel fenugreek grape_juice pumpkin zucchini peanut_butter broccoli meat tabasco_pepper celery_oil cranberrywhole_grain_wheat_flourlemon_peel cabbage turkey sage beef_broth raspberry coffee apricot roasted_sesamefeta_cheese_seed sake sesameorange_seed_peel avocadosherry crab sweet_potatoswiss_cheese radish shiitake rum maple_syrup pear clam peanutasparagussalmon tuna squash smoke peach blue_cheese lamb black_bean marjoram fruit date artichoke tarragon blueberry horseradish kidney_bean cardamom leek chickpeaoyster hazelnut whitebrandybreadmango brown_rice cottage_cheese_ romanosmoked_cheese_sausagecauliflower cured_pork grape lentil mandarin egg_noodlebeer currant barley caraway scalloppimento saffron corn_flake pistachio anise plum beet goat_cheese cashew peanut_oil squid berry turnip rhubarb roasted_beef cereal chinese_cabbage seaweed cod provolone_cheeseveal peppermintmelon tequila sauerkraut corn_gritfig kelp tomato_juice mussel lemongrass whiskeyblackberry citrus roasted_pork roasted_peanut bourbon_whiskey chicory lima_bean watermelon brassica endive parsnip rye_flour kale watercress lobster savory grapefruit mace okra porcini 100 enokidake tea champagnemacadamia_wine _nut kiwi wasabi lime_peel_oilstar_anise thai_pepper popcorn brussels_sproutsmoked_salmon rye_bread root gin anise seed yam bitter_orange wheat_ _bread buckwheat lavender potato_chip catfish nira red_kidney_beanmatsutake oatmeal truffle rose sour_nectarinecherry port_wine katsuobushi cognac papayamackerel chicken_liver sour milk black_tea tangerine _ chervil bone_oil red_bean cacao gruyere_cheese octopus juniper_berry flower liver kumquatquince frankfurter malt woodapple_palmbrandy caviar prawn munstercherry_cheese_brandy rutabaga green_tea haddock chayote black_mustard_seed_oil black_sesame_seedpassion_fruit shellfish Number of recipes an ingredient appears in ( usefulness ) sassafras cassava bartlett_cheese_pearroquefort mung_bean cabernet_sauvignon_wine eellicorice prickly_pear roasted_meat orange_flower japanese_plum coconut_oil salmon_roe camembert_cheese sumac artemisia guava pear_brandy mandarin_peel concord_grape black_currant 10 baked_potato armagnac condiment huckleberrybeef_liver clove herring litchi lingonberry carob ouzo gardenialeaf black_raspberry grape_brandy jasmine sunflower_oil bergamot elderberry smoked_fish violet kohlrabi blackberry_brandy sea_algae pork_liver spearmint hop citrus_peel carnationrapeseed chamomile balm roasted_almond sheep_cheese raw_beef holy_basil strawberry_juice red_algae soybean_oil pimenta

4 6 8 10 12 14 Average complexity of recipes an ingredient appears in(valence)

egg wheat butter onion garlic milk vegetable_oil cream tomato olive_oil black_pepper pepper vanilla cayenne vinegar cane_molasses bell_pepper cinnamon parsley chicken lemon_juice beef cocoa corn bread scallion 95 mustard ginger basil celery carrot potato chicken_broth yeast rice mushroom cheese soy_sauce cumin oregano parmesan_cheese macaroni lard lemon thyme cheddar_cheese cream_cheese walnut starch green_bell_pepper nutmeg 190 honey apple almond cilantro pecan white_wine bacon pork raisin bean rosemary fish cucumber olive coconut orange orange_juice tamarind vegetable buttermilk pineapple shrimp coriander bay lime_juice 286 gelatin red_wine pork_sausage sesame_oil chive seed ham mozzarella_cheese oat turmeric nut shallot lettuce cider dill pea zucchini cherry lime strawberry yogurt soybean peanut_butter celery_oil

FIG. 5: (Top) The valence-usefulness scatter plot for all ingredients that are used in two or more recipes (365 of the 381 ingredients). (Bottom) The relative usefulness of different ingredients as the number of ingredients we possess increases, for the 100 ingredients most useful when we have all 381 ingredients. 8

Google Analytics

jQuery GitHub

500 nginx Bootstrap NewSlack Relic JavaScript Redis Amazon S3 Google Apps AngularJS MySQL Amazon EC2 Node.js Git Amazon CloudFront Rails Trello PostgreSQLRuby Python MongoDB MixpanelPHP Pingdom Docker MailChimpElasticsearchSublime Text Sass Google Drive Mandrill npm JenkinsBower SendGridStripeHeroku Grunt Zendesk Java VagrantDropbox HTML5 ApacheCloudFlare HTTP ServerBitbucketWordPress gulp Amazon Route 53 Backbone.js Optimizely ReactIntercom Amazon RDS Sentry JIRA Objective-C Vim Memcached Android SDK DigitalOceanjQuery UI CoffeeScriptLess HipChat Chef Django Travis CIGo VirtualBox 100 Mailgun Asana SegmentPagerDutyInVision Skype Twilio Xcode CircleCI SidekiqPayPal Underscore RabbitMQ Capistrano ExpressJS Ansible ConfluenceBrowserStackSeleniumMarkdown RequireJS HAProxy Atom AmazonCodeshipPapertrail SES TestFlight Android Studio VarnishStatusPage.ioD3.jsCrashlyticsUnicorn KISSmetrics Scala Handlebars.js Swift GitHub Pages Code Climate Amazon ElastiCacheAmazonEBS Olark Flask Logentries Ember.jsPuppet Labs Amazon CloudWatch AdRoll .NET Datadog Balsamiq Amazon SQS Postman GitLab Disqus SourceTreeIntelliJIDEA 50 Pivotal TrackerCassandra Browserify Mocha Laravel Google Maps Rollbar Socket.IO ParseMongoLab Rackspace Cloud ServersUserVoice AmazonPusher DynamoDBAmazon VPCPhpStorm Braintree Fastly Keen IO Zapier Solr Compass boot2docker Chartbeat NagiosHadoopBugsnagC#Basecamp Sinatra Salesforce Sales Cloud CeleryFoundation Apache Maven Yeoman Algolia Heap Airbrake Ionic Jasmine Amazon EC2 Container Service Microsoft SQL ServerPassenger Visual Studio Help Scout Google App Engine Compose SymfonyHAML Spring Loggly AWS Elastic Beanstalk DNSimpleCustomer.io Flux Firebase SQLite MaxCDNMarketo Amazon Redshift FabricLinode Buffer Jade Desk.com Meteor Karma Kafka Windows Azure Librato GraphiteCloudinary Sauce Labs Eclipse StatsD DevisePhoneGap WebStorm Gradle Crazy Egg Apache Tomcat Google Compute EngineZookeeperHubSpot R Play TeamCity SoftLayerAkamai RubyMine AWS CloudFormationTumblrMariaDB Number of software products a tool appears in ( usefulness ) iDoneThis Microsoft IISMemCachier Piwik OpenStack Jekyll Emacs Qualaroo Swiftype Filepicker Clojure Amazon SNSAmazonEMR WebpackDrupal Material Design for Angular InfluxDBDyn SaltFlowdock Recurly TrackJS HighchartsAWS ElasticLogstashKibana Load Balancing(ELB) Stylus Campaign Monitor Hubot MustacheCoveralls Gunicorn SVN(Subversion) Apache Mesos MongooseMEAN PerfectEmbedly Audience ConsulRaygunNeo4j PackerClickTaleAWS IAM WistiaPerl C++HockeyApp OVH AWS OpsWorks UserTestingStorm HBase ErlangHoneybadgerDjango REST framework Amazon RDSStack for PostgreSQL Overflow IronMQ TornadoScout PostmarkApacheGrafana Spark Zopimwercker Flurry CodeIgniter AzureHeroku Websites Postgres Azure Storage HackPad GoSquaredHarvest YiiTerraform RedmineOpenShift AmazonCloud9 Kinesis IDE Framer LookerCouchDBApiary DeployBotHHVM(MiddlemanHipHopPumaGroovy Virtual Clicky MachineNotepadShopifySemanticUI) ++ Ghost Phabricator Geckoboard 10 Marionette ZenPayroll Amplitude HoganApp.jsAnnie Discourse Aviary PubNubimgix C3.js NetBeans IDE Redis Cloud UXPin OneLoginSumoLitmus Logic Jetty RunscopeOracle SemaphoreSails.js Nexmo waffle.io EdgeCast Squarespace BeanstalkdZeroMQ Transifex TowerBoxPostGISLeaflet HelloSignGearman sendwithusResquePyCharm AWS Lambda jQueryUnbounce MobileCakePHP Zencoder Join.me BeanstalkUrban Airship YammerXamarin

20 30 40 50 60 70 Average complexity of software product a tool appears in(valence)

Google Analytics GitHub jQuery nginx Bootstrap Slack JavaScript New Relic Redis Google Apps Amazon S3 Amazon EC2 Git AngularJS Node.js MySQL Amazon CloudFront Trello Rails PostgreSQL Ruby MongoDB Python MailChimp Mixpanel Pingdom 248 PHP Docker Mandrill Sublime Text Elasticsearch Heroku Stripe Sass Google Drive SendGrid npm Jenkins Bower Grunt Zendesk Dropbox Vagrant HTML5 Java Amazon Route 53 Bitbucket Apache HTTP Server gulp WordPress Amazon RDS 496 CloudFlare JIRA Backbone.js Objective-C Sentry Intercom React Optimizely Memcached jQuery UI DigitalOcean Less Vim Android SDK HipChat CoffeeScript Mailgun Chef Django VirtualBox Go Travis CI InVision Skype Twilio 745 Asana PagerDuty Segment Xcode RabbitMQ Underscore CircleCI ExpressJS Ansible HAProxy RequireJS PayPal Sidekiq Capistrano Atom Markdown Selenium BrowserStack Confluence Amazon SES Varnish Codeship D3.js TestFlight

FIG. 6: (Top) The valence-usefulness scatter plot for the 365 technology tools most useful in making software products. (Bottom) The relative usefulness of different tools as the number of tools we possess increases, for the 100 tools most useful when we have all 993 tools.