Serendipity and strategy in rapid innovation T. M. A. Fink∗†, M. Reeves‡, R. Palma‡ and R. S. Farr† †London Institute for Mathematical Sciences, Mayfair, London W1K 2XF, UK ∗Centre National de la Recherche Scientifique, Paris, France ‡BCG Henderson Institute, The Boston Consulting Group, New York, USA

Innovation is to organizations what evolution is to organisms: it while, Bob chooses pieces such as axels, wheels, and small base is how organisations adapt to changes in the environment and plates that he noticed are common in more complex models, improve [1]. Governments, institutions and firms that innovate even though he is not able to use them straightaway to produce are more likely to prosper and stand the test of time; those new toys. We call this a far-sighted strategy. that fail to do so fall behind their competitors and succumb Who wins. At the end of the day, who will have innovated to market and environmental change [2, 3]. Yet despite steady the most? That is, who will have built the most new toys? We advances in our understanding of evolution, what drives inno- find that, in the beginning, Alice will lead the way, surging vation remains elusive [1, 4]. On the one hand, organizations ahead with her impatient strategy. But as the game progresses, invest heavily in systematic strategies to drive innovation [5– fate will appear to shift. Bob’s early moves will begin to look 8]. On the other, historical analysis and individual experience serendipitous when he is able to assemble a complex fire truck suggest that serendipity plays a significant role in the discovery from his choice of initially useless axels and wheels. It will seem process [9–11]. To unify these two perspectives, we analyzed that he was lucky, but we will soon see that he effectively cre- the mathematics of innovation as a search process for viable ated his own serendipity. What about you? Picking components designs across a universe of building blocks. We then tested on a hunch, you will have built the fewest toys. Your friends had our insights using historical data from language, gastronomy an information-enabled strategy, while you relied on chance. and technology. By measuring the number of makeable designs Spectrum of strategies. What can we learn from this? If in- as we acquire more components, we observed that the relative novation is a search process, then your component choices to- usefulness of different components is not fixed, but cross each day matter greatly in terms of the options they will open up other over time. When these crossovers are unanticipated, they to you tomorrow. Do you pick components that quickly form appear to be the result of serendipity. But when we can predict simple products and give you a return now, or do you choose crossovers ahead of time, they offer an opportunity to strate- those components that give you a higher future option value? gically increase the growth of our product space. Thus we find By understanding innovation as a search for designs across a that the serendipitous and strategic visions of innovation can universe of components, we made a surprising discovery. Infor- be viewed as different manifestations of the same thing: the mation about the unfolding process of innovation can be used changing importance of component building blocks over time. to form an advantageous innovation strategy. But there is no Lego game. Let’s illustrate the idea using Lego bricks. Think one superior strategy. As we shall see, the optimal strategy de- back to your childhood days. You’re in a room with two friends pends on time—how far along the innovation process we have Bob and Alice, playing with a big box of Lego bricks—say, a advanced—and the sector—some sectors contain more oppor- fire station set. All three of you have the same goal: to build as tunities for strategic advantage than others. many new toys as possible. As you continue to play, each of you Components and products. Just like the Lego toys are made searches through the box and chooses those bricks that you be- up of distinct kinds of bricks, we take products to be made up lieve will help you reach this goal. Let’s now suppose each player of distinct components. A component can be an object, like a approaches this differently. Your approach is to follow your gut, touch screen, but it can also be a skill, like using Python, or a arbitrarily selecting bricks that look intriguing. Alice uses what routine, like customer registration. Only certain combinations we call a short-sighted strategy, carefully picking Lego men and of components form products, according to some predetermined their firefighting hats to immediately make simple toys. Mean- universal recipe book of products. Examples of products and

4 R 10 cayenne Rails cocoa 104 jQuery UI Language F Gastronomy 100 Technology 1000 lime 50 X Sauce Labs

1000 arXiv:1608.01900v4 [physics.soc-ph] 17 Mar 2017

100 10 100 5

10 10 1 0.5 Makeable words ( usefulness ) Makeable recipes ( usefulness ) 1 1 Makeable ( usefulness ) 0 2 4 6 8 10 12 14 16 18 20 22 24 26 0 127 254 381 0 331 662 993 Acquired letters Acquired ingredients Acquired development tools 1st R 1st cayenne 1st Rails 2nd F 2nd cocoa 2nd jQuery UI

Rank 3rd X Rank 3rd lime Rank 3rd Sauce Labs

FIG. 1: Products, components and usefulness. (Top) We studied products and components from three sectors. In language, the products are 79,258 English words and the components are the 26 letters. In gastronomy, the products are 56,498 recipes from the allrecipes.com, epicurious.com, and menupan.com [12] and the components are 381 ingredients. In technology, the products are 1158 software products catalogued by stackshare.io and the components are 993 development tools used to make them. (Bottom) The usefulness of a component is the number of products we can make that contain it. We find that the relative usefulness of a component depends on how many other components have already been acquired. For each sector, we show the usefulness of three typical components: averaged at each stage over all possible choices of the other acquired components and—for gastronomy—for a particular random order of component acquisition (points). 2

the components used to make them are shown in Fig. 1. Now it cannot decrease. We write uα(n) to indicate this dependence suppose that we possess a basket of distinct components, which on n: uα(n) is the usefulness of α given possession of α and we can combine in different ways to make products. We have n − 1 other components, the combined set of components being more than enough copies of each component for our needs, so n. Averaging over all choices of the n−1 other components from we do not have to worry about running out. There are N possi- the N − 1 that are possible gives the mean usefulness, uα(n). ble component types in total, but at any given stage n we only Usefulness experiment. To measure the usefulness of different have n of these N possible building blocks. At every stage, we components as the innovation process unfolds and we acquire pick a new type of component to add to our basket. more components, we did the following experiments. Using data Usefulness. The usefulness of a component is the number of from each of our three sectors, we put a given component α into products we can make that contain it [13]. In other words, the an empty basket, and then added, one component at a time, usefulness uα of some component α is how many more products the remaining N − 1 other components, measuring the useful- we can make with α in our basket than without α in our basket. ness of α at every step. We averaged uα(n) over all possible As we gather more components, uα increases or stays the same; orders in which to add the N − 1 components to obtain uα(n). (We explain how in SI B.) We repeated this process for all of E A the components α. Typical results from these experiments are I R shown in Fig. 1. We find that the mean usefulnesses of different N components cross each other as the number of components in T 6 O S SMALL KITCHEN BIG KITCHEN L C 127 ingredients: almond to fenugreek 381 ingredients: almond to zucchini U Recipe D complexity M 13 21 Language P H 597 recipes in total 56,498 recipes in total G 17 Y B 13 F V 20 K 9 W Z 5 X J Q A 1 B 26 600 Recipes 0 0 Recipes 6000 egg wheat butter onion 21 garlic milk 89 recipes contain cocoa 4801 recipes contain cocoa vegetable_oil cream 17 tomato olive_oil 95 black_pepper pepper 13 vanilla cayenne vinegar 9 cane_molasses bell_pepper cinnamon parsley 5 chicken Gastronomy 190 lemon_juice beef cocoa C 1 D corn bread scallion 100 Recipes with cocoa 0 0 Recipes with cocoa 1000 mustard ginger basil celery Cocoa is more useful than cayenne Cayenne is more useful than cocoa 286 carrot potato chicken_broth 21 yeast rice mushroom 43 recipes contain cayenne 7950 recipes cheese 17 soy_sauce contain cayenne cumin oregano 381 13

Google Analytics GitHub 9 jQuery Bootstrap Slack 5 JavaScript New Relic Google Apps E 1 F 248 Amazon S3 Amazon EC2 Git 100 Recipes with cayenne 0 0 Recipes with cayenne 1000 AngularJS Node.js MySQL Amazon CloudFront FIG. 3: Why crossovers happen. On the right is a big kitchen with Trello Rails 381 ingredients. On the left is a small kitchen with one-third as many PostgreSQL Technology 496 Ruby ingredients. In the big kitchen (B), we can make a total of 56,498 MongoDB Python MailChimp recipes. Each bar counts recipes with the same number of ingredients Mixpanel Pingdom (complexity). When we move to the smaller kitchen (A), the number PHP FIG. 2: Crossovers. The relative Docker of makable recipes shrinks dramatically to 597, or 1.0%. But this Mandrill Sublime Text usefulness of different components 745 Elasticsearch reduction is far from uniform across different bars. Higher bars shrink Heroku changes as the number of components Stripe more, on average by an extra factor of 3 with each bar. Thus the Sass we possess increases. For example, if you Google Drive number of recipes of complexity one (first bar) shrinks about 3-fold; SendGrid npm are only allowed six letters, the ones that show Jenkins the number of complexity two (second bar) 9-fold, and so on. Of Bower up in the most words are a, e, i, o, s, r. For gastro- Grunt all the recipes in the big kitchen, 4801 contain cocoa (D) and 7950 nomy and technology, for clarity we only show the 993 contain cayenne (F). The cayenne recipes tend to be more complex, 40 components most useful when we have all N components. A pure containing on average 10.6 ingredients, whereas the cocoa recipes are short-sighted strategy acquires components in the order that they simpler, averaging 7.2 ingredients. Because higher bars suffer stronger intersect the diagonal; whereas a pure far-sighted strategy acquires reduction, overall fewer cayenne recipes (0.5%) survive in the smaller them in the order that they intersect a vertical. If there are no kitchen (E) than cocoa recipes (1.8%) (C). Thus cayenne is more crossovers, the strategies are the same. useful in the big kitchen, but cocoa is more useful in the small kitchen. 3

our basket increases. As Fig. 1 shows for gastronomy, this is lower complexities. Early on, uα(n, s) will tend to be small for true for both the average over all possible orderings of compo- higher complexities, but depending on how far ahead we look, nents (lines) as well as a specific random ordering (points). the bigger growth rate can more than compensate for this, as Bumps charts. To visualise the relative usefulness of compo- we see in Fig. 3. Summing eq. (1) over size s, we find nents over time, for each sector we created its “bumps chart” 0 2 (Fig. 2). These show the rank order of mean usefulness at every uα(n ) ' uα(n, 1) + uα(n, 2) x + uα(n, 3) x + ..., (2) stage of the innovation process. We see that the crossovers in 0 Fig. 1 are commonplace, but that some sectors contain more where x = n /n. The growth of the mean usefulness of α crossovers than others. There are few crossings in language, strongly depends on the complexity of products containing α. some in gastronomy and many in technology. This means, for Valence. So far we have only characterised a component by example, that the most useful letters for making words in Scrab- its usefulness: the number of products we can make that contain ble (a basket of seven letters) are nearly the same as the most it. Now we introduce another way of describing a component: useful letters for making words with a full basket (26 letters); the average complexity of the products it appears in. We call the key ingredients in a small kitchen (20 ingredients) are mod- this the valence. The valence vα of component α is the aver- erately different from those in a big one (80 ingredients); the age complexity of the products it appears in at stage N, when most-used development skills for a young software firm (ex- we have all N components. Think of the valence as the typi- perience with 40 tools) are significantly different from those cal number of co-stars a component performs with, plus one. for an advanced one (160 tools). We call components that do We show the usefulness and valence for each of the components not cross in time isochronic, like the letters; and those that do in our three sectors in Fig. 4ABC. More valent components are anisochronic, like the tools. unlikely to be useful until we possess a lot of other components, Why crossovers happen. To understand why crossovers hap- so that we have a good chance of hitting upon the ones they pen, let’s have a closer look at how the mean usefulness in- need. These are the wheels and axels in our Lego set. On the creases for a single component (Fig. 3). To make a product of other hand, less valent components are likely to boost our prod- complexity s, we must possess all s of its distinct components. uct space early on, when we have acquired fewer components. So making a complex product is harder than making a simple These are the Lego men and their firefighting hats. This insight one, because there are more ways that we might be missing a suggests that more valent components will tend to rise in rela- necessary component. We therefore group together the prod- tive usefulness, and less valent components fall. This is verified ucts we can make containing α according to their complexity. in our experiments: components on the right of the plots in Fig. 4ABC tend to rise in the bumps charts in Fig. 2, such as That is, the usefulness uα(n, s) of component α is how many more products of complexity s we can make with α in our bas- onion, tomato, Javascript and Git; whereas components on the left tend to fall, like cocoa, vanilla, Google Apps and SendGrid. ket than without α in our basket. Summing uα(n, s) over s gives Interpreting crossovers. A crossover in the usefulness of com- uα(n). The advantage of this refined grouping is that, by un- ponents means that the things that matter most today are derstanding the behaviour of uα(n, s), we can understand the not the same as the things that will matter most tomorrow. more difficult uα(n). Our key result, which we prove in SI B, is s−1 How we interpret crossovers in practice depends on whether that uα(n, s)/n is constant over all stages of the innovation process. In other words, for two stages n and n0, they are unanticipated, and take us by surprise, or anticipated, and can be planned for and exploited. When they are unantic- 0 0 s−1 uα(n , s) ' uα(n, s)(n /n) . (1) ipated, beneficial crossovers can seem to be serendipitous. But when they can be anticipated, crossovers provide an opportu- This tells us that the number of products containing α of com- nity to strategically increase the growth of our product space. plexity s grows much faster for higher complexities than for To harness this opportunity, we turn to forecasting component

E wheategg Google Analytics A A I B C R N T 20 000 butter onion 700 L O C 600 S garlic jQuery GitHub 4 D M milk 2×10 U 500 nginx G Y vegetable_oil P cream B H black_peppertomato Bootstrap New RelicSlack JavaScript vanilla 1×10 4 olive_oil 400 Amazon EC2 F cane_molasses pepper Redis Git V 8000 vinegar Google AppsAngularJS Amazon S3 K cayenne Node.js W bell_pepper 300 Amazon CloudFrontMySQL 5000 cinnamon parsley Rails chicken TrelloRuby cocoa lemon_juice beef PostgreSQL MongoDB Mixpanel PHP MailChimp Z scallion Docker X corn mustard bread basil PingdomPython 4000 carrotcelery Sass npm 2000 yeastpotato rice 200 SendGrid MandrillElasticsearchSublime Text cheesemushroomginger oreganocumin StripeHeroku Jenkins

Usefulness: no. of words a letter is in chicken_broth J Q Bower soy_sauce Grunt

1000 Usefulness: no. of recipes an ingredient is in Usefulness: no. of software products a tool is in 6.6 6.8 7.0 7.2 7.4 7.6 7.8 7 8 9 10 11 12 26 28 30 32 34 36 38 Valence: average complexity of words a letter is in Valence: average complexity of recipes an ingredient is in Valence: average complexity of software a tool is in D E 1000 F

500 104 104

100 1000 1000 50

Far-sighted strategy 100 Impatient strategy 10

Total makeable words 100 Total makeable recipes

Pseudo-random(alphabetical) Total makeable software 5 10 Language Gastronomy Technology 10 1 0 5 10 15 20 25 0 50 100 150 200 250 300 350 0 200 400 600 800 Acquired letters Acquired ingredients Acquired development tools

FIG. 4: (ABC) Scatter plots of component usefulness versus component valence for our three sectors. For gastronomy and technology, we only show the top 40 components; the complete set is in SI Fig. 5. (DEF) Both the short-sighted and far-sighted strategies beat a typical random component ordering (here alphabetical), but they diverge from each other only insofar that there are crossings in the bumps charts. 4 crossovers using the complexity of products containing them. others can be of little benefit early on. But as the innovation Short-sighted strategy. To maximise the size of our product process unfolds and the acquired components pay off, the re- space when crossovers are unanticipated, the optimal approach sults will seem serendipitous, because a number of previously is to acquire, at each stage, the component that is most useful low-value components become invaluable. Thus, what appears from the ones that are remaining. Think of this as a “greedy” as serendipity is not happenstance but the delayed fruition of approach. It has a geometric interpretation: it is equivalent to components reliant on the presence of others. After the acqui- acquiring the components that intersect the diagonals in Fig. sition of enough other components, these components flourish. 2. At every stage we lock in to a specific component, unaware For example, the initially useless axels and wheels were later of the future implications of the choices we make. A component found to be invaluable to building many new toys. In a similar poorly picked is an opportunity lost. way, the low value attributed to Flemming’s initial identifica- Far-sighted strategy. Using only information about the prod- tion of lysosome was later revised to high value in the years ucts we can already make with our existing components, how- leading to the discovery of penicillin, when other needed com- ever, we can forecast the usefulness of our components into the ponents emerged, such as sulfa drugs which showed that safe future. Eq. (2) shows us how, and we give an example in SI C. antiseptics are possible [9]. Interestingly, the word “serendip- Here the optimal approach is to acquire the component that will ity” does not have an antonym. But as our bumps charts show, be most useful at some later stage n0. This also has a geomet- for every beneficial shift in a crossover, there is a detrimental ric interpretation: it is equivalent to acquiring the components one. Each opportunity for serendipity goes hand-in-hand with a that intersect a vertical at n0 in Fig. 2, and thus depends on chance for anti-serendipity: the acquisition of components use- how far into the future we forecast. ful now but less useful later. Avoiding these over-valued compo- Strategy comparison. A short-sighted strategy considers only nents is as important as acquiring under-valued ones to securing the usefulness uα, whereas a far-sighted strategy considers both a large future product space. the usefulness uα and the valence vα. Short-sighted maximises Strategy. Our research shows that the most important what a potential new component can do for us now, whereas far- components—materials, skills and routines—when an organiza- sighted maximises what it could do for us later. Depending on tion is less developed tend to be different from when it is more our desire for short-term gain versus long-term growth, we have developed. Instead, the relative usefulness of components can a spectrum of strategies dependent on n0. A pure short-sighted change over time, in a statistically repeatable way. Recognising strategy (n0 = n) and a pure far-sighted strategy (n0 = N) how an organization’s priorities depend on its maturity enable are compared in Fig. 4DEF. Like the Lego approaches of Bob it to balance short-term gain with long-term growth. For ex- and Alice, both strategies beat acquiring components in a ran- ample, our insights provide a framework for understanding the dom order. As our theory predicts, the extent to which the poverty trap. When a less-developed country imitates a more- two strategies differ from each other increases with the number developed country by acquiring similar production capabilities of crossovers. For language, they are nearly identical, because [6], it is unable to quickly reap the rewards of its investment, there are hardly any crossovers. For gastronomy, short-sighted because it does not have in place enough other needed capabil- has a two-fold advantage at first, but later far-sighted wins by ities. This in turn prevents it from further investment in those a factor of two. For technology, short-sighted surges ahead by needed components. Our analysis gives quantitative backing to an order of magnitude, but later far-sighted is dominant. the “lean start-up” approach to building companies and launch- Serendipity and strategy. Our research helps resolve the ten- ing products [18]. Start-ups are wise to employ a short-sighted sion between a strategic approach to innovation, which views strategy and release a minimum viable product. Without the re- innovation as a rational process which can be measured and sources to sustain a far-sighted approach, they need to quickly prescribed [3, 4, 7, 8]; and a belief in serendipity and the intu- bring a simple product to market. On the other hand, firms ition of extraordinary individuals [9–11]. A strategic approach that can weather an initial drought will see their sacrifice more is seen in firms like P&G and Unilever, which use process manu- than paid off when their far-sighted approach kicks in. By track- als and consumer research to maintain a reliable innovation fac- ing how potential new components combine with existing ones, tory [14], and Zara, which systematically scales new products organisations can develop an information-advantaged strategy up and down based on real-time sales data. In scientific discov- to adopt the right components at the right time. In this way ery, “traditional scientific training and thinking favor logic and they can create their own serendipity, rather than relying on predictability over chance” [9]. If discoveries are actually made intuition and chance. in the way that scientific publications suggest, the path to in- vention is a step-by-step, rational process. On the other hand, a serendipitous approach is seen in firms like Apple, which is [1] D. Erwin, D. Krakauer, ‘Insights into innovation’, Science 304, 1117 notoriously opposed to making innovation choices based on in- (2004). cremental consumer demands, and Tesla, which has invested for [2] M. Reeves, K. Haanaes, J. Sinha, Your Strategy Needs a Strategy years in their vision of long-distance electric cars [15]. In science, (Harvard Business Review Press, 2015). [3] C. Weiss et al., ‘Adoption of a high-impact innovation in a homoge- many of the most important discoveries have serendipitous ori- neous population’, Phys Rev X, 4, 041008 (2014). gins, in contrast to their published step-by-step write-ups, such [4] J. McNerney et al., ‘Role of design complexity in technology im- as penicillin, heparin, X-rays and nitrous oxide [9]. The role of provement’, Proc Natl Acad Sci, 108, 9008 (2011). [5] R. Van Noorden, ‘Physicists make ‘weather forecasts’ for economies’, vision and intuition tend to be under-reported: a study of 33 Nature, 1038, 16963 (2015):. major discoveries in biochemistry “in which serendipity played [6] A. Tacchella et al., ‘A new metric for countries’ fitness and products’ a crucial role” concluded that “when it comes to ‘chance’ fac- complexity’, Sci Rep, 2, 723 (2012). [7] P. Drucker, ‘The discipline of innovation’, Harvard Bus Rev 8, 1 tors, few scientists ‘tell it like it was’” [16, 17]. (2002). Serendipity. Writing about the The Three Princes of [8] V. Sood et al., ‘Interacting branching process as a simple model of Serendip, Horace Walpole records that the princes “were al- innovation’, Phys Rev Lett, 105, 178701 (2010). [9] M. Rosenman, ‘Serendipity and scientific discovery’, Res Urban ways making discoveries, by accidents and sagacity, of things Economics, 13, 187 (2001). they were not in quest of”. Serendipity is the fortunate develop- [10] F. Johansson, ‘When success is born out of serendipity’, Harvard ment of events, and many organizations and researchers stress Bus Rev 18, 22 (2012). [11] W. Isaacson, The Innovators: How a Group of Hackers, Geniuses, its importance [9, 10]. Crossovers in component usefulness help and Geeks Created the Digital Revolution, (2014). us see why. Components which depend on the presence of many [12] Y.-Y. Ahn, S. E. Ahnert, J. P. Bagrow, A.-L. Barabsi, ‘Flavor net- work and the principles of food pairing’, Sci Rep 1, 196 (2011). 5

0 [13] We make no assumptions about the values of different products, The same must be true when we replace n by n , and therefore which will depend on the market environment and may change with time. But we can be sure that maximising the number of products is n 0 0 n0 a proxy for maximising any reasonable property of them. A similar uα(n, s) n/ s = uα(n , s) n / s . (3) proxy is used in evolutionary models, where evolvability is defined as the number of new phenotypes in the adjacent possible (1-mutation When the number of components is big compared to the prod- boundary) of a given phenotype; see A. Wagner, ‘Robustness and 0 n n0 s uct size (n, n  s), we can approximate s and s by n evolvability: a paradox resolved’, Proc Roy Soc B 91, 275 (2008). 0s [14] B. Brown, S. Anthony, ‘How P&G tripled its innovation success and n , and thus rate’, Harvard Bus Rev 6 (2011). s−1 0 0s−1 [15] K. Bullis, ‘How Tesla is driving electric car innovation’, MIT Tech uα(n, s)/n ' uα(n , s)/n . Rev, 8 (2013). [16] J. Comroe, ‘Roast pig and scientific discovery: Part II’, Am Rev Respir Dis, 115, 853 (1977). For simplicity, we use this approximation in the main [17] F. Tria et al., ‘The dynamics of correlated novelties’, Sci Rep 4, manuscript, but we could just as well have used the exact 5890 (2014). expression in eq. (3). [18] E. Ries, The Lean Startup, (Portfolio Penguin, 2011). C. Forecasting crossovers in usefulness Online supplementary information (SI) Here we show how we can forecast the usefulness of components at stage n0 from information we have at some earlier stage A. Data n, where n is the number of components we have acquired. Our three data sets—described in Fig. 1—were obtained as fol- As in Fig. 3, we have a set k of 127 ingredients in a small lows. In language, our list of 79,258 common English words is kitchen—almond to fenugreek—and a set K of 381 ingredients from the built-in WordList library in Mathematica 10. Of the in a big kitchen—almond to zucchini. 84,923 KnownWords, we only considered those made from the In the small kitchen, we can make a total of 597 recipes. 26 letters a–z, ignoring case: we excluded words containing a Of these 597 recipes, 43 contain cayenne, but they are not all hyphen, space, etc. In gastronomy, the 56,498 recipes can be equally complex. Two of the 43 recipes contain one ingredient found in the supplementary material in [12]. In technology, the (namely, cayenne itself) and have complexity one; one recipe 1158 software products and the development tools used to make contains two ingredients and has complexity two; 18 contain them can be found at the site stackshare.io. three ingredients and have complexity three; and so on. Simi- larly, 89 of the 597 recipes contain cocoa: six have complexity B. Proof of components invariant one; 22 have complexity two; and so on. Using eq. (2), we can Let α be some component. Let N1 be the set of N − 1 other write the mean usefulness of these two components as possible components not including α, n1 be a subset of n − 1 0 2 3 4 5 7 components chosen from N1, and s1 be a subset of s − 1 com- uca(n |k ) ' 2 + x + 18x + 12x + 8x + x + x and ponents chosen from n1. The usefulness uα(n, s) is how many 0 2 3 4 uco(n |k ) ' 6 + 22x + 37x + 16x + 8x , more products of complexity s that we can make from the com- ponents n1 together with α, than from the components n1 alone: where x = n0/127. As expected,

X 0 uα(n, s) = prod(α ∩ s1) − prod(s1), u (n | ) = 43 and ca k x=1 ⊆ s1 n1 u (n0| ) = 89. co k x=1 where prod(α ∩ s1) takes the value 0 if the combination of com- In the big kitchen, we can make a total of 56,498 recipes. ponents α ∩ s1 forms no products of complexity s and 1 if Of these, 7950 contain cayenne and 4801 contain cocoa. Again α ∩ s1 forms one product of complexity s. (Occasionally, the using eq. (2), same combination of components α ∩ s1 forms multiple prod- ucts: for example, beef, butter and onion together form two dis- 0 2 28 30 uca(n |K ) ' 2 + 19x + 64x + ... + 2x + 2x and tinct recipes of length three. In such cases, prod(α ∩ s1) takes 0 2 20 21 uco(n |K ) ' 6 + 54x + 195x + ... + 2x + 3x . the value 2 if α ∩ s1 forms two products, and so on.) The ex- pected usefulness of component α, uα(n, s), is the average of 0 N−1 where x = n /381. As expected, uα(n, s) over all subsets n1 ⊆ N1; there are n−1 such subsets. Therefore u (n0| ) = 7950 and ca K x=1 N−1 X u (n0| ) = 4801. uα(n, s) = 1 n−1 uα(n, s) co K x=1 n1⊆N1 So far, none of this is surprising. The punchline is that we can N−1 X X = 1 n−1 prod(α ∩ s1) − prod(s1). estimate the usefulness of components in the big kitchen from n1⊆N1 s1⊆n1 what we know about our small kitchen. To do so, we simply

0 evaluate the small-kitchen polynomials at the big-kitchen stage: Consider some particular combination of components s1. The 0 0 0 double sum above will count s1 once if s = n, but multiple times uca(n | ) ' uca(n | ) ' 3569 and 0 K n0=381 k x=3 if s < n, because s1 will belong to multiple sets n1. How many? 0 0 uco(n | ) ' uco(n | ) ' 1485. In any set n1 that contains s1, there are n − s free elements K n0=381 k x=3 to choose, from N − s other components. Therefore the double In log terms—log usefulness being the natural unit of measure— sum will count every combination a total of N−s times, and s1 n−s these are accurate to within 11% and 9% of the true values. In X particular, this predicts the crossover of cayenne and cocoa in u (n, s) = N−sN−1 prod(α ∩ ) − prod( ) α n−s n−1 s1 s1 Figure 3. s1⊆N1 nN = N/n s s uα(N , s). 6

wheat egg onion butter garlic

milk vegetable_oil cream olive_oil black_peppertomato 104 cane_molasses cayennevinegar vanilla pepper bell_pepper cinnamon chicken cocoa corn beef parsley lemon_juice bread scallionmustard ginger potato basil carrotcelery yeast cheese rice mushroom chicken_broth lemonlard cheddar_cheese macaroni oregano cumin cream_cheesewalnut nutmeg apple honey parmesansoy__saucecheese thyme bacon starch pork cilantrogreen_bell_pepper pecan almond olivefish cucumberwhite_winerosemary raisin vegetable tamarind shrimp coriander gelatin pineappleorange_juiceorangebuttermilk coconutlime_juice bean seed sesame_oil redpork_wine_sausage nut ham mozzarella_cheese bay turmeric dill cider chivelettuce strawberry cherry yogurt soybean pea banana oatlime milk_fatmint 1000 wine shallot fennel fenugreek grape_juice pumpkin zucchini peanut_butter broccoli meat tabasco_pepper celery_oil cranberrywhole_grain_wheat_flourlemon_peel cabbage turkey sage beef_broth raspberry coffee apricot roasted_sesamefeta_cheese_seed sake sesameorange_seed_peel avocadosherry crab sweet_potatoswiss_cheese radish shiitake rum maple_syrup pear clam peanutasparagussalmon tuna squash smoke peach blue_cheese lamb black_bean marjoram fruit date artichoke tarragon blueberry horseradish kidney_bean cardamom leek chickpeaoyster hazelnut whitebrandybreadmango brown_rice cottage_cheese_ romanosmoked_cheese_sausagecauliflower cured_pork grape lentil mandarin egg_noodlebeer currant barley caraway scalloppimento saffron corn_flake pistachio anise plum beet goat_cheese cashew peanut_oil squid berry turnip rhubarb roasted_beef cereal chinese_cabbage seaweed cod provolone_cheeseveal peppermintmelon tequila sauerkraut corn_gritfig kelp tomato_juice mussel lemongrass whiskeyblackberry citrus roasted_pork roasted_peanut bourbon_whiskey chicory lima_bean watermelon brassica endive parsnip rye_flour kale watercress lobster savory grapefruit mace okra porcini 100 enokidake tea champagnemacadamia_wine _nut kiwi wasabi lime_peel_oilstar_anise thai_pepper popcorn brussels_sproutsmoked_salmon rye_bread root gin anise seed yam bitter_orange wheat_ _bread buckwheat lavender potato_chip catfish nira red_kidney_beanmatsutake oatmeal truffle rose sour_nectarinecherry port_wine katsuobushi cognac papayamackerel chicken_liver sour milk black_tea tangerine _ chervil bone_oil red_bean cacao gruyere_cheese octopus juniper_berry flower liver kumquatquince frankfurter malt woodapple_palmbrandy caviar prawn munstercherry_cheese_brandy rutabaga green_tea haddock chayote black_mustard_seed_oil black_sesame_seedpassion_fruit shellfish Number of recipes an ingredient appears in ( usefulness ) sassafras cassava bartlett_cheese_pearroquefort mung_bean cabernet_sauvignon_wine eellicorice prickly_pear roasted_meat orange_flower japanese_plum coconut_oil salmon_roe camembert_cheese sumac artemisia guava pear_brandy mandarin_peel concord_grape black_currant 10 baked_potato armagnac condiment huckleberrybeef_liver clove herring litchi lingonberry carob ouzo gardenialeaf black_raspberry grape_brandy jasmine sunflower_oil bergamot elderberry smoked_fish violet kohlrabi blackberry_brandy sea_algae pork_liver spearmint hop citrus_peel carnationrapeseed chamomile balm roasted_almond sheep_cheese raw_beef holy_basil strawberry_juice red_algae soybean_oil pimenta

4 6 8 10 12 14 Average complexity of recipes an ingredient appears in(valence)

egg wheat butter onion garlic milk vegetable_oil cream tomato olive_oil black_pepper pepper vanilla cayenne vinegar cane_molasses bell_pepper cinnamon parsley chicken lemon_juice beef cocoa corn bread scallion 95 mustard ginger basil celery carrot potato chicken_broth yeast rice mushroom cheese soy_sauce cumin oregano parmesan_cheese macaroni lard lemon thyme cheddar_cheese cream_cheese walnut starch green_bell_pepper nutmeg 190 honey apple almond cilantro pecan white_wine bacon pork raisin bean rosemary fish cucumber olive coconut orange orange_juice tamarind vegetable buttermilk pineapple shrimp coriander bay lime_juice 286 gelatin red_wine pork_sausage sesame_oil chive seed ham mozzarella_cheese oat turmeric nut shallot lettuce cider dill pea zucchini cherry lime strawberry yogurt soybean peanut_butter celery_oil

FIG. 5: (Top) The valence-usefulness scatter plot for all ingredients that are used in two or more recipes (365 of the 381 ingredients). (Bottom) The relative usefulness of different ingredients as the number of ingredients we possess increases, for the 100 ingredients most useful when we have all 381 ingredients. 7

Google Analytics

jQuery GitHub

500 nginx Bootstrap NewSlack Relic JavaScript Redis Amazon S3 Google Apps AngularJS MySQL Amazon EC2 Node.js Git Amazon CloudFront Rails Trello PostgreSQLRuby Python MongoDB MixpanelPHP Pingdom Docker MailChimpElasticsearchSublime Text Sass Google Drive Mandrill npm JenkinsBower SendGridStripeHeroku Grunt Zendesk Java VagrantDropbox HTML5 ApacheCloudFlare HTTP ServerBitbucketWordPress gulp Amazon Route 53 Backbone.js Optimizely ReactIntercom Amazon RDS Sentry JIRA Objective-C Vim Memcached Android SDK DigitalOceanjQuery UI CoffeeScriptLess HipChat Chef Django Travis CIGo VirtualBox 100 Mailgun Asana SegmentPagerDutyInVision Skype Twilio Xcode CircleCI SidekiqPayPal Underscore RabbitMQ Capistrano ExpressJS Ansible ConfluenceBrowserStackSeleniumMarkdown RequireJS HAProxy Atom AmazonCodeshipPapertrail SES TestFlight Android Studio VarnishStatusPage.ioD3.jsCrashlyticsUnicorn KISSmetrics Scala Handlebars.js Swift GitHub Pages Code Climate Amazon ElastiCacheAmazonEBS Olark Flask Logentries Ember.jsPuppet Labs Amazon CloudWatch AdRoll .NET Datadog Balsamiq Amazon SQS Postman GitLab Disqus SourceTreeIntelliJIDEA 50 Pivotal TrackerCassandra Browserify Mocha Google Maps Rollbar Socket.IO ParseMongoLab Rackspace Cloud ServersUserVoice AmazonPusher DynamoDBAmazon VPCPhpStorm Braintree Fastly Keen IO Zapier Solr Compass boot2docker Chartbeat NagiosHadoopBugsnagC#Basecamp Sinatra Salesforce Sales Cloud CeleryFoundation Apache Maven Yeoman Algolia Heap Airbrake Ionic Jasmine Amazon EC2 Container Service Microsoft SQL ServerPassenger Visual Studio Help Scout Google App Engine Compose SymfonyHAML Spring Loggly AWS Elastic Beanstalk DNSimpleCustomer.io Flux Firebase SQLite MaxCDNMarketo Amazon Redshift FabricLinode Buffer Jade Desk.com Meteor Karma Kafka Windows Azure Librato GraphiteCloudinary Sauce Labs StatsD DevisePhoneGap WebStorm Gradle Crazy Egg Google Compute EngineZookeeperHubSpot R Play TeamCity SoftLayerAkamai RubyMine AWS CloudFormationTumblrMariaDB Number of software products a tool appears in ( usefulness ) iDoneThis Microsoft IISMemCachier Piwik OpenStack Jekyll Emacs Qualaroo Swiftype Filepicker Clojure Amazon SNSAmazonEMR WebpackDrupal Material Design for Angular InfluxDBDyn SaltFlowdock Recurly TrackJS HighchartsAWS ElasticLogstashKibana Load Balancing(ELB) Stylus Campaign Monitor Hubot MustacheCoveralls SVN(Subversion) Apache Mesos MongooseMEAN PerfectEmbedly Audience ConsulRaygunNeo4j PackerClickTaleAWS IAM WistiaPerl C++HockeyApp OVH AWS OpsWorks UserTestingStorm HBase ErlangHoneybadgerDjango REST framework Amazon RDSStack for PostgreSQL Overflow IronMQ TornadoScout PostmarkApacheGrafana Spark Zopimwercker Flurry CodeIgniter AzureHeroku Websites Postgres Azure Storage HackPad GoSquaredHarvest YiiTerraform RedmineOpenShift AmazonCloud9 Kinesis IDE Framer LookerCouchDBApiary DeployBotHHVM(MiddlemanHipHopPumaGroovy Virtual Clicky MachineNotepadShopifySemanticUI) ++ Ghost Phabricator Geckoboard 10 Marionette ZenPayroll Amplitude HoganApp.jsAnnie Aviary PubNubimgix C3.js NetBeans IDE Redis Cloud UXPin OneLoginSumoLitmus Logic RunscopeOracle SemaphoreSails.js Nexmo waffle.io EdgeCast Squarespace BeanstalkdZeroMQ Transifex TowerBoxPostGISLeaflet HelloSignGearman sendwithusResquePyCharm AWS Lambda jQueryUnbounce MobileCakePHP Zencoder Join.me BeanstalkUrban Airship YammerXamarin

20 30 40 50 60 70 Average complexity of software product a tool appears in(valence)

Google Analytics GitHub jQuery nginx Bootstrap Slack JavaScript New Relic Redis Google Apps Amazon S3 Amazon EC2 Git AngularJS Node.js MySQL Amazon CloudFront Trello Rails PostgreSQL Ruby MongoDB Python MailChimp Mixpanel Pingdom 248 PHP Docker Mandrill Sublime Text Elasticsearch Heroku Stripe Sass Google Drive SendGrid npm Jenkins Bower Grunt Zendesk Dropbox Vagrant HTML5 Java Amazon Route 53 Bitbucket Apache HTTP Server gulp WordPress Amazon RDS 496 CloudFlare JIRA Backbone.js Objective-C Sentry Intercom React Optimizely Memcached jQuery UI DigitalOcean Less Vim Android SDK HipChat CoffeeScript Mailgun Chef Django VirtualBox Go Travis CI InVision Skype Twilio 745 Asana PagerDuty Segment Xcode RabbitMQ Underscore CircleCI ExpressJS Ansible HAProxy RequireJS PayPal Sidekiq Capistrano Atom Markdown Selenium BrowserStack Confluence Amazon SES Varnish Codeship D3.js TestFlight

FIG. 6: (Top) The valence-usefulness scatter plot for the 365 technology tools most useful in making software products. (Bottom) The relative usefulness of different tools as the number of tools we possess increases, for the 100 tools most useful when we have all 993 tools.