Scaling up analogical innovation with crowds and AI

Aniket Kittura,1, Lixiu Yub, Tom Hopec, Joel Chand, Hila Lifshitz-Assafe, Karni Gilonc, Felicia Nga, Robert E. Krauta, and Dafna Shahafc aHuman–Computer Interaction Institute, Carnegie Mellon University, Pittsburgh, PA 15213; bRobert Bosch, Research and Technology Center, Division 3, Pittsburgh, PA 15222; cSchool of Computer Science and Engineering, The Hebrew University of Jerusalem, 9190401 Jerusalem, Israel; dCollege of Information Studies, University of Maryland, College Park, MD 20742; and eDepartment of Information, Operations and Management Sciences, Leonard N. Stern School of Business, New York University, New York, NY 10012

Edited by Ben Shneiderman, University of Maryland, College Park, MD, and accepted by Editorial Board Member Eva Tardos November 7, 2018 (received for review May 20, 2018)

Analogy—the ability to find and apply deep structural patterns Wide Web. Here we describe initial steps toward a future where across domains—has been fundamental to human innovation people, augmented by machines, can search through billions of in science and technology. Today there is a growing opportu- sources based on deep structural similarity rather than simple nity to accelerate innovation by moving out of a single keywords to solve important societal problems. For example, sci- person’s mind and distributing it across many information pro- entists or designers might find potential solutions in other fields cessors, both human and machine. Doing so has the potential to the problems they are trying to solve, and lawyers or legal to overcome cognitive fixation, scale to large idea repositories, scholars might find legal precedents sharing similar systems of and support complex problems with multiple constraints. Here we relations to a contemporary case. lay out a perspective on the future of scalable analogical innova- The key insight from this paper is that instead of considering tion and first steps using crowds and artificial intelligence (AI) to analogy as the province of a single mind (e.g., a “lone genius”), augment that quantitatively demonstrate the promise one can disaggregate the analogical processing typically done by of the approach, as well as core challenges critical to realizing a single individual into discrete steps assigned to different sets this vision. of individuals and/or machine agents. This approach has several potential advantages, including leveraging each agent’s comple- analogy | innovation | crowdsourcing | AI | machine learning mentary strengths while ameliorating their weaknesses; scaling up the number of agents to increase the number of potential found; and capturing the mental work that each agent he ability to find and apply analogies from other domains does so that it can be built upon by others. However, the dis- Thas been fundamental to human achievement across numer- aggregation also introduces several new challenges, including ous domains, including architecture, design, technology, art, and how to coordinate many diverse agents’ efforts and determine mathematics (1–4). For example, in 2013 a group of engineers which of the many possible configurations of human and machine partnered with a world-renowned origami expert to design a processors are most effective at boosting analogical innovation. large solar array to be carried by a narrow rocket. Using an anal- ogy to origami-folding techniques, they were able to fit the array Approach into 1/10th of its deployed size (5). The history of innovation in In this paper, the approach to addressing these challenges builds science and industry is replete with similar cases of analogical on a foundation of past research in cognitive psychology, engi- transfer, in which ideas from one domain are profitably used to neering, management, and design that has investigated the pro- solve a problem in another one (1, 6, 7): Salvador Luria advanced cesses that humans use to find and apply analogies (e.g., refs. 4, the theory of bacterial mutation by applying an analogy between 6, and 11–16) as well as research on the development of meth- bacteria and slot machines, and the Wright brothers used an ods and tools to help people design by analogy (e.g., refs. 17–22). analogy to a bicycle to design a steerable aircraft. Analogical processing within a single individual’s mind has been Despite its importance, finding and applying analogies to drive extensively studied, with past research suggesting that it involves innovation remains challenging. While people can find deep rela- three core processes: abstracting the problem into a schema tional similarities between domains given the appropriate data (i.e., representing problems and potential solutions in a way that (8), the rapid increase of information makes it harder to mine drops out surface features and facilitates comparison), searching any single field for analogies, much less identify deep similari- for analogies (i.e., identifying potentially fruitful domains that ties across multiple fields. Furthermore, the sensitivity of human are distant from an initial problem and finding analogies within memory to surface similarity means that people often become them), and applying the found analogies to generate solutions to fixated on surface-level details that prevent them from retrieving the original problem (12, 13). distant analogs or applying them (9, 10). Finally, many real-world problems are complex and have multiple subproblems. Multiple analogies at different levels of abstraction might be needed to This paper results from the Arthur M. Sackler Colloquium of the National Academy of solve the set of subproblems. These three challenges—scalability, Sciences, “Creativity and Collaboration: Revisiting Cybernetic Serendipity,” held March fixation, and complexity—are fundamental barriers to analogical 13–14, 2018, at the National Academy of Sciences in Washington, DC. The complete pro- innovation. gram and video recordings of most presentations are available on the NAS website at On the other hand, the recent explosion of data available www.nasonline.org/Cybernetic Serendipity.y Author contributions: A.K., L.Y., T.H., J.C., K.G., R.E.K., and D.S. designed research; A.K., online together with novel machine-learning and crowdsourc- L.Y., T.H., J.C., K.G., F.N., R.E.K., and D.S. performed research; A.K., L.Y., T.H., J.C., K.G., ing techniques represents new opportunities to develop methods F.N., R.E.K., and D.S. contributed new reagents/analytic tools; A.K., L.Y., T.H., J.C., K.G., for finding analogies in multiple domains. For example, there F.N., R.E.K., and D.S. analyzed data; and A.K., L.Y., T.H., J.C., H.L.-A., K.G., F.N., R.E.K., and are more than 9 million patents in the US Patent database; D.S. wrote the paper.y more than 2 million product and solution ideas submitted to The authors declare no conflict of interest.y ideation platforms such as InnoCentive, Kickstarter, Quirky, This article is a PNAS Direct Submission. B.S. is a guest editor invited by the Editorial and OpenIDEO (www.InnoCentive.com, www.Kickstarter.com, Board.y www.Quirky.com, www.OpenIDEO.com); hundreds of millions Published under the PNAS license.y of scientific papers and legal cases searchable on Google Scholar; 1 To whom correspondence should be addressed. Email: [email protected] and billions of web pages and videos searchable on the World Published online February 4, 2019.

1870–1877 | PNAS | February 5, 2019 | vol. 116 | no. 6 www.pnas.org/cgi/doi/10.1073/pnas.1807185116 itre al. et Kittur 46, (36–38, address they 51). dis- challenges 50, the 48, propose and systems AI-powered we and and crowd- Instead, human both head. processors, information person’s the machine. many Traditionally one across problem. process inside the original tributing the place to takes solutions process generate apply- to and analogies, them for ing searching problem, the abstracting processes: core 1. Fig. generate case to the in them or, problem encouraging repre- their of to as problem representations abstract developed such abstract multiple 27–29), been (e.g., more (17, have problem build sentations approaches source designers Several their individual 26). of help and often features 25 designers surface refs. that on they shown because fixated domains has other attack are work from analogs to of relevant up retrieve body splitting to army fail large an A of target. analog can- an a involving than analogs radiation retrieve or to cer likely For more radiation 23). (24) much 13, Lees’ are 9, and problem Duncker (8, solve object to the trying to people example, relations differ- share from that analogs domains, similar structurally ent or attributes “far,” similarities, share over object that surface fixa- an analogs of to of within-domain, sensitive because or highly “near,” analogies favoring is find analogical retrieval to of Human fail practice tion. people and that science is the innovation in challenge recurring A Fixation key the overcome complexity. and to scalability forms human fixation, various of leveraging challenges in with, capabilities machine experimented and have we 1 configurations differ- Fig. machines. in and distributed humans ( be multiple across can configurations processes ent application and search, itself. right can bottom-heavy, to that a it as chair such egg-shaped applies designs sta- in and resulting the domains problem, of original other the one in picks found designer mechanisms the bilization or Finally, cups toys. spill-proof as Wobble of such Weeble structure inspi- analogies surface for finding the domain, search by original to the unfettered problem domains abstracted distant the in uses over). rations tip she not or does designer he olds; on 5-y Next, the focuses by and First, moveable (e.g., problem chair. requirements the core of kindergarten parts a surface-level away designing abstracts of task the Bottom ntefloigscin eepoehwteabstraction, the how explore we sections following the In ( 1 Fig. oe fteaaoydie noainpoes ossigo three of consisting process, innovation analogy-driven the of Model umrzsorsuis iulyilsrtn different illustrating visually studies, our summarizes ) Bottom Top eosrtstepoes o xml,consider example, For process. the demonstrates ) umrzsasre fsuisw aecnutdwith conducted have we studies of series a summarizes nlgclisiaini rsne,pol fe alt recog- to fail an often when people even other presented, that from is shown has inspirations inspiration work analogical analogical previous useful Finally, and chairs) (e.g., domains. kindergarten relevant problem about find source is problem the to the of that details knowing sufficient the the without have schema. seeing or not good might without domain a schema context source the generate given to the workers problem either Subsequent the in abstracting of versed process sufficiently be not part strength. small a of the a whole—into challenge only the solve core of sees and person a each origi- turning inspirations cognition—that and an collaborative find fixation of who avoiding schemas thus people problem, mecha- develop from who the Such problem problem. people apply nal original separates could the process to people analogies a the of source in set the discovered on inspi- nisms another fixation for by Yet search unfettered and problem. domains, of problem distant set peo- abstracted in different of the rations a set use Downstream, the one could problem. In 1), people the 35). (Fig. abstract chair (34, could kindergarten tasks ple a cognitive across designing complex people of in of case engage crowdsourcing thousands to of globe and crowds the technologies assembling web support that in developments recent 30–33). refs. (e.g., problems relational their the represen- of represent structure to specialized ontologies) using functional formalized, as (such be tations can that problems of etrso h rbe,o h rgnlpolmdescrip- found problem schema problem original abstract the the or given concrete Crowds problem, problem, itself. a the tion of of schema abstract features the through crowdsourcing ( given search repository the inspirations to idea Using asked community-generated were it. a crowds address above, repos- data to described de- a process searches mechanisms a and for mind (37), in itory problem ideation a has problem-based already signer In innovation. analogical stage. the each of from retention outputs selective best the allowed stages enabling con- into evaluations, transfer the intermediate analogical of of Furthermore, process derived. the features up were breaking schemas surface which on from diverse examples fixation crete more avoided significantly and explored domains people brain- using schema, as When abstract inspiration. such for an examples techniques, concrete and design using novel or traditional more storming significantly than to ideas led This useful advan- inspirations. a taking the then problem, of concrete and tage original, schemas the single the solved group on a third based for inspirations schema analogical a found generate to trying when could well product. but as products the do related identify not multiple to asked behind gener- were principles Crowds subsequently they common domains. were when diverse schemas that from high-quality and ated inspirations find judges to expert high used as multiple rated Given schemas by generate details. mechanism to quality able surface a were omitting crowds schema and training, features), problem this things) comb-like a a detaches into uses (e.g., (e.g., rake) (e.g., product purpose a They concrete a from schemas: leaves a specifying induce separate pro- of to to training rerepresentation device brief learn a a workers shown developed crowd were (36) help al. to et cedure other Yu Mellon from all Carnegie participants. consent informed and the all included These by and Board schema). approved Review its were Institutional and here product described for a studies 2 Fig. of (see example schemas abstracted an of and process distributed this target the to 13). them 9, transfer (8, and domain relations structural deep the nize hspoescudfi nsvrlpae.Codwresmight workers Crowd places. several in fail could process This of advantage takes paper this in approach the contrast, In ehv xlrdtoatraecngrtoso distributed of configurations alternate two explored have We workers crowd of group second a stage, abstraction the After of value the explored we 37) (36, experiments of series a In PNAS | eray5 2019 5, February | www.Quirky.com o.116 vol. | o 6 no. | for ) 1871

COMPUTER SCIENCES PSYCHOLOGICAL AND COLLOQUIUM COGNITIVE SCIENCES PAPER In the second stage, new workers searched in these domains to find inspirations that might be adapted to solve the original problem. The key insight here is that a rich set of expert- generated ideas, solutions, and skills has already been docu- mented on the web, such that nonexperts might find and appro- priate these resources and suggest promising directions despite not possessing deep expertise themselves. Crowd workers who were given the abstract problem schema identified more distant and fruitful expert domains in which to search, compared with crowd workers who were given the orig- inal problem description. For example, those given the problem Fig. 2. Example of a concrete product description and a schema represen- schema of “fit objects of different sizes into a container” iden- tation of it. tified distant professions such as contortionists, carpenters, or experts on Japanese aesthetics, whereas workers given the orig- inal problem description identified professions closer in domain significantly more analogs that matched the deep problem struc- to power strips, such as computer technicians, electricians, or ture in the repository. Moreover, other participants given a interior designers. When workers searched the web for inspira- sample of the schema-inspired analogs generated more creative tions in the distant domains, they found interesting analogs: For ideas than those under the other conditions. Free ideation (36) example, individuals searching in a carpenter’s domain retrieved instead starts from the identification of an interesting problem– curved drawers as analogs, and those searching in the domain mechanism schema from one or more examples (e.g., a power of Japanese aesthetics retrieved multilevel buildings. Adapting strip that supports more plugs by placing them at different analogs from the abstract problem condition generated signifi- heights) and uses it to first identify other domains in which cantly more creative ideas, such as placing plugs in curved lines the schema could be applied (e.g., storing planes in a hangar or at differing heights, respectively (38). so they do not block each other) and then apply the mech- anism to that domain to generate a solution (e.g., designing With AI. Another approach to scalability involves using AI to win- sections of a hangar at different heights so planes’ wings can now down large datasets to promising analogically similar items. overlap but not touch). Crowds were able to reliably gener- Doing so may have complementary benefits to using crowds, ate high-quality schemas from multiple examples, use those with the machine able to process millions or billions of items schemas to find distant and useful domains and analogs, and at a speed unrivaled by people. Thus, even if the results of the generate more creative ideas than they were able to in control AI’s search are noisy and require humans to inspect, select, and conditions. adapt the found analogs to generate creative solutions, boost- ing the likelihood of humans processing useful analogs could Scalability nonetheless significantly accelerate innovation. Another key challenge with analogy is scalability. The work Finding analogies is challenging for machines because doing described in the previous section looked at relatively small so requires an understanding of the deep relational similar- idea repositories, where searchers were looking through hun- ity between two entities that may be very different in terms dreds of ideas. Today, more sources of potential analogies of surface attributes (39). For example, Chrysippus’ analogy are easily available than ever before—millions of patents, re- between sound waves and water waves required ignoring many search papers, videos, products, web pages, and more. How- surface differences between the two, such as the viscous liquid ever, searching through these sources to find distant but useful nature of water or its visibility (8). Much work addressing this analogs often becomes overwhelming. This section describes in computational analogy has focused on fully structured data, two processes for reliably finding useful analogical inspira- often with logic-based representations. For example, Gentner tions in very large datasets. The first one is crowd based: and coworkers’ (40) seminal structure-mapping engine relies on Crowd workers first identify types of experts who are likely predicate calculus representations. In contrast, most descrip- to have interesting insights into solving an abstracted prob- tions of products, problems, or ideas in existing online databases lem, and then new workers retrieve inspirations from those are short or sparse or lack consistent structure. Although logic- expert domains. The second system uses artificial intelli- based representations are very expressive, they can be difficult gence (AI) to go through large datasets and suggest potential to generate reliably from unstructured text, even for seemingly analogies. simple items, and doing so requires significant training and effort. Representations of more complex items, like biological With Crowds. Finding “outside-the-box” inspirations in very large organisms, can take tens of person hours per item (31). More idea repositories such as the web is difficult, because without concerning from our standpoint, automatic extraction of rela- guidance, people select domains to search based on surface tional representations across domains remains a difficult open similarity to the problem’s domain (similar to the challenge in problem (41). Fixation above). To address this challenge, we explored a two- Conversely, recent advances in data mining and information stage process (38) in which crowds first identify domains of retrieval rely on regularities in the surface attributes of language expertise that are distant from the initial problem but relevant (e.g., word co-occurrence, parts of speech) (42) to calculate simi- enough to inspire useful and nonobvious solutions. For exam- larity measures, for example word embeddings (43), vector-space ple, we rerepresented the problem of fitting more plugs into a models like latent semantic indexing (44), and probabilistic topic power strip with its schematic representations, e.g., fitting objects modeling approaches like latent Dirichlet allocation (45). While of different sizes into a container. We then asked crowd work- these approaches scale well on raw text and excel at detecting ers to use the schematic representations to recommend relevant surface similarity, they are often unable to detect deep relational domains of expertise. For example, they were asked to “suggest similarity between documents whose word distributions are dis- three types of experts who might provide useful or interest- parate. This problem is especially acute when the source and ing perspectives in solving [the schematic problem] and explain target domains are different. why,” resulting in relevant expert domains such as “plumber,” We introduce two insights that together make this problem “magician,” and “architect.” tractable for analogical innovation. First, rather than trying to

1872 | www.pnas.org/cgi/doi/10.1073/pnas.1807185116 Kittur et al. itre al. et Kittur Clearance Copyright through Inc. conveyed Center, permission Computing 46; for ref. Association purpose from product’s of to Machinery, permission the network with of neural Republished representations it mechanism. recurrent output how and and a (i.e., text train mechanisms product to to raw product related annotations take parts the these the of used and purposes We for?) works). the good about it is be What to (i.e., considered they that description 3. Fig. shown for) as good works), is it it how (i.e., what mechanisms (i.e., to product related parts the the of and purposes considered the they about that be description to product a crowds of Specifically, parts and mining. the models analogy annotated AI for develop in suited to traces for metrics embedded those similarity search mechanisms uses then and they and purpose analogs as the the workers label crowd approach and of this analogies level, traces high a behavioral At collects (46). that features search surface for beyond allows ideas, goes two these on based text, that unstructured ways alone. datasets in existing learn with possible models struc- be analogical machine-learning not of help would signals can rich that harvest crowdsourcing compu- to ture in possible analogies advances it useful that made is find have insight and second above, represent The described to tationally. experiments ways the representa- useful in mechanism are useful proved and which purpose trading tions, research words, the our other Specifically, whether extraction). (in investigated of scale ease at for with expressivity off reasoned can and useful that representations learned practically structural be retrieving coarse use for this can that reasoning, one analogies, idea analogical the structured exploits fully approach of problem the solve u prahfrcmuainlyfidn nlge from analogies finding computationally for approach Our eakdcodwrest noaeteprso h product the of parts the annotate to workers crowd asked We loigteue oepoedsatbtrlvn at fthe of parts the relevant using retrieved but inspirations thus distant given explore Participants generators), space. to motion-powered design user products, the other allowing bat- backup for (e.g., relevance teries structural maintaining diverse while returned meetup results approach and purpose–mechanism pods the shampoo while (e.g., cell apps); results highly relevant (e.g., returned less baseline results but random diverse the nondiverse chargers); but base- and standard cases relevant the phone 4, highly Fig. retrieved in illustrated line different As ran- 4). a a (Fig. or and baseline baseline, dom purpose retrieval information similar standard a a mechanism), with products cell (i.e., given representation a were approach purpose–mechanism Participants example, the phone. from for the drawn (47), charges inspirations also product that existing case an phone were redesign participants which to in task, asked ideation standard a through ated space. different similarity from the exemplars span provides system that clusters and the items results, distant diverse the provide clusters to represented spaces, are similarity dimensions vector both as Since mechanism). using (e.g., different dimension a another a having on (e.g., distant are dimension but one purpose) on similar match closely that items finding scale. representations at mechanism vector and take output purpose to and products’ mecha- trained of text and was purpose description network of product neural corpus recurrent raw a a Using pairs, 3. label Fig. nism in example the in ae faaoyfidn;ti eutsget htsprtn the separating that higher suggests even result this yielded mech- finding; background and analogy purpose of ignoring rates of explicitly Impor- similarity (49). but on abstracts anism based vector the pairs in using words ranking baseline the tantly, with a of all papers did of research than representations in precision analogies higher known substantially of discovery out?). enabled find abstract authors paper’s research paper’s scientific scheme. the a this of ( with did example annotated and an What out?), shows find work? 5 ( or Fig. it know?], it do or (Did authors do ings paper’s to the want did authors (How this paper’s might the How do ( work? research(ers)?], this with other by scheme help furthered be annotation can higher- ( an goals/questions (48): the yielded study elements the elements: This four what of new (“findings”). results two the found and incorporate (“background”) problem to level adapted lower-level was the on done is matching if is fruitful subproblems. papers more research be subproblems over to search likely lower-level analogical most predic- that the algorithmic suggesting typically for tions), to rationales are interpretable related generating paper (e.g., causally given Furthermore, and a subproblem). pervad- directly by (lower-level algorithms contributed lives the mechanisms audit digital (higher-level systems to their algorithmic citizens how ing in explore enabling harm might by of paper problem) risk research the a reduce also on example, to but dependent For description, hierarchically have other. product are cases each typical that legal a purposes or distinct than multiple text patents, more papers, only research complex not prod- more as of example, domains such For other domain artifacts structures. that the different possible is require for it might above, useful explored was innovation uct approach for schema Domains. 46 Complex anism ref. More to (see Extending conditions com- baseline condition the to blind in judges details). those by good with as pared rated twice ideas approximately many as generated approach purpose–mechanism h ytmsaiiyt epgnrt raieieswsevalu- was ideas creative generate help to ability system’s The by search analogical of type a enable representations These h eutn etrrpeettoso ae’ abstract paper’s a of representations vector resulting The scheme purpose–mechanism original the intuition, this test To PNAS i akrud[htohr(higher-level) other [What background ) | eray5 2019 5, February ii ups Wa pcfi thing(s) specific [What purpose ) hl h purpose–mech- the While | o.116 vol. iii | mechanism ) o 6 no. iv find- ) | 1873

COMPUTER SCIENCES PSYCHOLOGICAL AND COLLOQUIUM COGNITIVE SCIENCES PAPER Cell Phone Charger Case Cell phone case that acts as a secondary battery for your phone when charge is running low. It protects your phone while charging it. Simple design would allow easy replace- SEED

tablet well after the battery is dead.

OUR APPROACH SURFACE (TF-IDF) RANDOM

Flash Charge Carabiner Multi-adapter case Solar pool skimmer

USB tower with backup Cell phone battery Shampoo pods battery with GPS INSPIRATIONS

Human pulley-powered Cell phone case Dog meetup app generator suit with GPS

A case that tracks steps and Carry extra batteries! Have small solar panels on generates power using your Make a phone that is dura- the backside of the phone Fig. 4. Overview and excerpts of the ideation ex- movements ble and big enough to hold with cooling pads behind periment, comparing analogical inspirations found A case with multiple wireless extra battery life them to keep the phone cool using scalable coarse structural representations and while its in the sun, which power hubs that you can Allow the phone to die baseline methods. (Top) Seed product. Workers were plug in and charge your could charge the phone, for before the GPS goes, There- example, at the beach. asked to solve the same problem in a different phone wirelessly anywhere way. (Middle) Top three inspirations for each con- there is an outlet Make the case a little bigger dition. The TF-IDF baseline returns results from the IDEAS matter the battery life A case that draws power on one end so it can house a same domain, while our method returns a broader Have a universal charging small battery that holds from any light source near cord therefor you can charge range of products. (Bottom) Ideas generated by you, similar to the technolo- enough charge to fully users exposed to the different conditions. Repub- - charge a phone once and gy used in some wireless droid and apple lished with permission of Association for Comput- keyboards. then it is depleted and it can be recharged itself. ing Machinery, from ref. 46; permission conveyed through Copyright Clearance Center, Inc. higher-level and lower-level purposes of a paper may be an For example, the design of a kindergarten chair requires address- important extension to the purpose–mechanism approach. In a ing multiple constraints, such as its safety (preventing it from case study involving a domain expert in mechanical engineering tipping over or pinching fingers) and flexibility (making it easy to who had spent months seeking analogies from other domains move or store). Furthermore, each of these constraints could be (e.g., materials science, civil engineering, aerospace engineer- represented at multiple levels of abstraction, with more abstract ing), our approach provided twice as many papers that the expert levels (e.g., “safety”) potentially matching more analogs but run- identified as valuable, unexplored sources of new ideas compared ning the risk of including less relevant analogs to the target with a standard machine-learning baseline. These results suggest problem. Below, we discuss extending our general approach to that the strategy of obtaining intermediate structural represen- support multiple constraints and developing a computational sys- tations holds promise for enabling AI systems to suggest useful tem to help augment people’s exploration of multiple constraints analogies from a range of different real-world datasets, ranging and levels of abstraction. from relatively simple consumer product descriptions to complex research papers. With Crowds. We developed reliable crowd processes for de- composing complex multiconstraint problems into multiple- Complexity constraint schemas (e.g., safety vs. flexibility), manipulating the Exploring increasingly complex real-world problems quickly level of abstraction for each of those schemas (e.g., safety vs. reveals the challenges with assuming that problems involve only a pinching fingers), and then integrating the analogs found for single analogical schema. Instead, many product design and engi- each of those schemas into a solution that addressed each of neering problems involve multiple, often conflicting schemas. the relevant constraints (50). This research found that crowd

1874 | www.pnas.org/cgi/doi/10.1073/pnas.1807185116 Kittur et al. itre al. et Kittur judged were that chairs for designs create to all integrate constraints to the able that were of dolls workers crowd Weeble Using other cups. as inspirations, coffee these such buttress-stabilized flying ways, and novel themselves tip right in not constraints could that the will domains remote satisfy stackable, then in examples could is inspirational workers diverse movable, crowd find Other easily extremities). better- is protects and a that over, (e.g., into chair constraints chair) a well-defined design kindergarten comprising creative statement a structured design prob- (e.g., design open-ended lem ill-formed, an transform could workers through conveyed Inc. permission Center, 48; Clearance ref. Copyright from Associ- Machinery, of Computing permission large for with ation over Republished papers. queries research analogical complex support of datasets to scheme, annotation four-part fied 5. Fig. xml farsac ae btatanttdwt u modi- our with annotated abstract paper research a of Example napr ftepoutdsrpin(etnal o different for sizes. (“extendable different description of product bars the soap of part to a dish on soap ( a 6 adjust Fig. to ways ing core the relations understanding the for of crucial structure. are properties relational tar- that key description a involved the in entities product features specifying and given surface by which a its manner beyond in for focus geted system focus that a abstract a then introduced specify and ana- we can augmenting 51 designer to ref. a approach in of AI innovation, level our logical the on of manipulations Building abstrac- allowing abstraction. targeted constraints, and those constraints of tion specific multiple, both in on designers supporting focusing of importance critical the highlights AI. the With chair from better significantly distant to still led led were designs. that They over) that domain. tipping schemas problem inspirations prevent original safety), relevant (e.g., (e.g., more concrete to constraint constraint the the the away made of schemas. abstracted constraint nature that the schemas specific for higher-level abstraction with of Compared level the was rations original the inspira- to the exposure representation. from when problem generated than were original) received they and tions practical more (i.e., better n oteqeyadte ok o ace.Fg demonstrates 6 Fig. matches. for looks then accord- and corpus query the the abstracts to first ing system specific the designer’s particular, the In to needs. tuned matches (52) relevant knowledge analogically commonsense of for database large a search tionally thus are and Some abstraction the quantities. to spatial irrelevant dropped. other are “bars,” or like weights, words, the adjust heights, in can func- “size” that different products cleaning word to consider the or to abstract description solubility, product can water original designer color, the its Similarly, keeping as tion. dropping problem, such and the product”) others “personal abstract out (like they soap of Next, properties bars”). some soap of sizes o xml,cnie einrwoi neetdi explor- in interested is who designer a consider example, For inspi- the of usefulness the affected that factor important One u ytmte ssti ou-btatdqeyt computa- to query focus-abstracted this uses then system Our Top hsapoc fadesn opeiywt crowds with complexity addressing of approach This eosrtsoritrae einr rtfocus first Designers interface. our demonstrates ) PNAS ovydtruhCprgtCerneCne,Inc. Center, permission Clearance 51; Copyright Association through ref. conveyed of from Machinery, permission Computing for with Republished abstrac- of tion. levels and complex aspects multiple for with search problems analogical an supporting query, then ( system analogies ( using The problem problem finds the bases. a of knowledge of aspect common-sense aspect this abstract relevant then a and on focus to 6. Fig. | eray5 2019 5, February ebitasse htalw designers allows that system a built We Bottom | o h focus-abstracted the for ) o.116 vol. | o 6 no. | Top 1875 )

COMPUTER SCIENCES PSYCHOLOGICAL AND COLLOQUIUM COGNITIVE SCIENCES PAPER the abstracted query and shows example matches found for the scale can power AI approaches that can find more analogs in dis- soap dish problem—a knife rolodex with multiple slots for dif- tant domains than current machine-learning approaches, which ferent sizes and a telescopic frame adjusting to different phones, are highly influenced by surface features. This computational which the designer might adapt to the soap dish. approach can play a valuable role by identifying inspirations The system was evaluated across a range of focused query for human problem solvers by performing a first pass of sift- scenarios from seeds sampled from Quirky. For example, when ing through vast idea repositories of millions (e.g., patents) or designing a camping coffee maker, a designer might be interested billions (e.g., the web) of items. in separately exploring the inspiration space around cooking We have explored only a small number of the possible config- without electricity or around notifying the user when the food urations of people and machine processors for boosting analogi- is done. Significant benefits were found for both the focusing cal discovery. These explorations (including several unsuccessful aspect of the system (which provided more relevance compared ones not discussed here) suggest several principles about which with wholesale matching) and specifying the level of abstrac- configurations lead to successful outcomes and what challenges tion (which returned more distant inspirations compared with remain to be addressed. Specifically, we have found that humans traditional machine-learning baselines). Formative studies with are needed in several points in the process—vetting candidate designers suggested that they found these aspects of the tool use- inspirations retrieved by AI, applying them to the problem at ful in more systematically engaging with aspects of the problem hand, and integrating across multiple potentially conflicting inspi- and abstractions, leading them to consider possibilities they had rations. This brings up questions that require future study about not previously of. where and when in a distributed analogical innovation pipeline to deploy what kinds of expertise for maximum effectiveness. Discussion Domain expertise can have positive effects, for example enabling In summary, this paper described ways of distributing the process deep understanding of problem constraints and ways to adapt new of analogical innovation across many human and machine infor- solutions. On the other hand, it can also have negative effects, mation processors, mediated through a computational system including greater fixation on defining the problem, finding solu- that employs each to its maximum benefit. A series of crowd- tions, and evaluating their suitability. We speculate that adapting sourcing processes and AI systems demonstrate the promise and integrating analogical inspirations to solve complex R&D of scaling up serendipity in a distributed way. These processes problems is likely to require deep knowledge of the various con- and systems overcome three key challenges to distributing anal- straints of a domain (e.g., business, technical, operational). In ogy: (i) fixation, allowing humans and machines to look beyond contrast, the interstitial steps of the pipeline, including abstraction surface features to find structurally similar analogs in distant and finding domains and analogies, may be less sensitive to the domains; (ii) scalability, scaling up the process of finding analogs need for domain knowledge or in fact might benefit from includ- in large idea repositories; and (iii) complexity, supporting mul- ing people outside of the domain to increase diversity. However, tiple constraints and levels of abstraction. See Table 1 for a our research has not yet tackled the role of human expertise. summary of selected findings. The role of AI in the innovation pipeline is also a rich subject Increasing the efficiency of the methods used to use large for future study. In the short term, additional training data for numbers of people in the analogical innovation process through purpose–mechanism representations could likely improve the hit distributed analogy could have a number of important benefits. rate of relevant analogies found and, with minor modifications, First, increasing the number of people involved increases the be useful even for complex domains such as scientific literature capacity to find and use analogies across many domains. Second, and patents. However, we believe there is a long way to go to by distributing the steps to different people, alternative innova- support the true hierarchical, complex, and interrelated nature tion paths could be explored effectively and in parallel. Third, of real-world problems. Increasing the expressivity of represen- disaggregating the innovation pipeline opens up participation to tation to model hierarchies of subproblems and multiple levels more types of people; one need not be an expert in a problem of abstraction among them could have significant additional domain, for example, to find analog solutions in remote domains benefits. For example, being able to automatically model and that experts have already proposed. Fourth, in contrast to inno- explore the space of multiple subproblems could vastly expand a vation contests such as InnoCentive that recruit many people designer’s capabilities to explore a problem space, similar to how to innovate but reward only a few contest winners (53), the approaches in computer-aided design can already do so for more sequential method proposed here allows people to build on each formalized domains such as material and shape. Furthermore, other’s work and preserves the value of labor from the majority many scientific analogies rely on systems of interconnected rela- of participants. tions rather than independent relations. For example, the Bohr Involving AI in the process adds complementary benefits. model of the atom is similar to the structure of the solar sys- While people are unparalleled in their ability to induce and apply tem not only because the electrons orbit the nucleus but also deep relational schemas from unstructured real-world data, they in their relative size compared with the nucleus; meanwhile, the are limited in their ability to broadly search across huge repos- high speed of electrons (vs. planets) suggests that other factors itories of potential analogs. Our research has shown how using such as relativity might be in play. Supporting such interrelated coarse structural schemas (i.e., purposes and mechanisms) at relations could greatly increase the expressivity and power of

Table 1. Summary of selected results Challenges Findings Fixation Schemas help crowds find more analogies, especially far analogies (not sharing surface features). Crowds are able to reliably generate high-quality schemas. Scalability Schemas help crowds identify a more diverse set of domains to explore, which in turn results in more creative solutions. Coarse representations that can be learned and reasoned with at scale can enable AI systems to suggest useful analogies from very large, unstructured corpora. Complexity Crowds can generate multiple constraints at different levels of abstraction. The level of abstraction influences the quality of the solutions (abstract domain, concrete constraints worked best). AI can help users generate targeted abstractions. Analogies suggested by the system achieved higher relevance and domain distance (far analogies).

1876 | www.pnas.org/cgi/doi/10.1073/pnas.1807185116 Kittur et al. 30. 29. 28. 27. 26. 25. 24. 23. 22. 21. 20. 19. 18. 17. 16. 15. 14. 13. 12. 11. 10. itre al. et Kittur these by inspired inno- be boosting may in researchers effective future that distributed be hope of may We that configurations vation. cognition possible AI the and of human subset explores here small described a research only persistent, the However, on-demand, one. more reliable a and toward innovation of process the algorithmic any datasets. for existing those account from in to biases learn and to datasets noisy AI but enabling large-scale involve in to advances need research likely anal- will further distributed expressivity in to increases approaches However, ogy. computational and crowd both 9. 8. 7. 6. 5. 4. 3. 2. 1. F ,e l 21)Temaigo na”ad“a” h mato structuring of impact The “far”: and “near” of meaning The (2013) al. et K, Fu innovation: Increasing (2007) AB Markman KL, Wood E, Clauss JP, Laux JS, Linsey analogi- to distance analogical of relationship The (2007) CD Schunn role BT, critical Christensen The product NASA: at a boundaries in knowledge Dismantling innovation (2018) and H Lifshitz-Assaf brokering Technology (1997) RI Sutton A, transfer. Hargadon analogical and induction Schema (1983) KJ Holyoak solving. problem ML, of Analogical Gick case (1980) The KJ cognition: Holyoak to ML, approach Gick vitro vivo/in in in The (2001) effectiveness I problem-solving Blanchette K, and Dunbar Marginality (2010) KR Lakhani LB, Jeppesen Gr S aao 19)Uigaaoyt xedtebhvorsaesaein space state behaviour the extend to analogy Using (1998) V Kazakov JS, Gero design-by-analogy. for method A WordTrees: (2008) KL Wood AB, and Markman J, Presentation Linsey innovation: Increasing (2008) AB Markman KL, WordTree Wood the of JS, study A Linsey analogy: by Design (2012) KL Wood AB, Markman JS, Linsey analogy. analogies: Representing in (2006) T Kurtoglu representation KL, Wood AB, and Markman JT, Modality Murphy JS, Linsey (2008) AB transfer: Markman in KL, similarity Wood of JS, problem-solving. On roles Linsey (1945) The LS Lees (1993) K, Duncker KD Forbus MJ, Rattermann D, Gentner (2000) SD Savransky (1961) WJJ creativity. Gordon and analogy, Design, design: (1997) innovative AK for Goel analogies of pitfalls and benefits the On (2011) al. et J, Chan GnnrD adr 18)Aaoia eidn:Ago ac shr ofind. to hard is match good A reminding: Analogical (1985) R Landers D, Gentner (1996) P Thagard KJ, Holyoak KJ, Holyoak Kepler. Johannes of works the in creativity and Analogy (1997) al. et D, in Gentner Kepler. space Johannes of more case The make discovery: scientific to in origami Analogy (2002) use D Gentner engineers BYU 2013) 12, (December M new during Monk thinking analogical of choice. value and consumer influence The and (2002) P Moreau comparison DW, Dahl Structural (2010) J Loewenstein analogies. of AB, science the Markman and Mathematics (2003) C Pask (1966) MB Hesse nsmay ehv rsne n ahtwr transforming toward path one presented have we summary, In fSde,Sde) p113–143. pp Sydney), Sydney, of design. 13.1407.1–13.1407.14. pp DC), Conference Washington, Annual ASEE 2008 the of Proceedings method. design-by-analogy Conference WordTree IDETC/CIE the of evaluation re-representation. problem for method Conference Engineering Information in and Computers & Conferences Technical Engineering Design International innovation. of probability the Increasing Manuf Anal Des Eng Intell Artif soundness. inferential from retrievability Separating Solving Problem Inventive of York). New of modality and commonness, distance, analogical examples. on based output. performance design Ideation on analogy of distance of effect Des the and databases design 145–159. pp York), New Conference Engineering and Computers in & method. Information Conferences Technical design-by-analogy Engineering Design a International towards 2007 ASME experiments of trilogy A design. engineering of case The 35:29–38. structure: preinventive and function cal innovation. open in identity professional of firm. development 15:1–38. analogy. search. broadcast Cybernetics and 607–613. Man pp Systems, York), on New Conference International the of Proceedings MA). Cambridge, Press, 403–459. (MIT pp DC), Washington, Association, Psychological (Am J Processes Vaid and SM, Structures Smith Conceptual of Investigation An Thought: ative Reasoning Based space. ideation. product Psychol Consum J IN). Dame, 135:021007. h al Universe Daily The opttoa oeso raieDsg IV Design Creative of Models Computational rnsCgi Sci Cognit Trends ehDes Mech J d ann ,Nresa J(pigr otn,p 21–39. pp Boston), (Springer, NJ Nersessian L, Magnani eds , ra Sci Organ 20:126–137. oesadAaoisi Science in Analogies and Models akRes Mark J d c Q Sci Adm niern fCetvt:Itouto oteTI Methodology TRIZ the to Introduction Creativity: of Engineering . yetc:TeDvlpeto raieCapacity Creative of Development The Synectics: 133:081004. . 5:334–339. A o fMcaia nier,NwYr) p265–282. pp York), New Engineers, Mechanical of Soc (Am 21:1016–1033. 39:47–60. CCPes oaRtn FL). Raton, Boca Press, (CRC 42:716–749. 22:85–100. ehDes Mech J Aeia oit fMcaia Engineers, Mechanical of Society (American rceig fIECCE20 SE2006 ASME 2006 IDETC/CIE of Proceedings etlLas nlg nCetv Thought Creative in Analogy Leaps: Mental d c Q Sci Adm sco Monogr Psychol EEExpert IEEE A o o niern Education, Engineering for Soc (Am 134:041009. Ui or aePes Notre Press, Dame Notre (Univ onPsychol Cogn d eoJ,MhrM (Univ ML Maher JS, Gero eds , 63:746–782. onPsychol Cogn mJPhys J Am rceig fte2008 the of Proceedings 12:62–70. 58:i–114. rceig fthe of Proceedings 25:524–575. 71:526–534. 12:306–355. d adTB, Ward eds , onPsychol Cogn e Cognit Mem (Harper, Mech J Model- (IEEE, Cre- 36. 35. 34. 33. 32. 31. 53. 52. 51. 50. 49. 48. 47. 46. 45. 44. 43. 42. 41. 40. 39. 38. 37. ojnto ihteIre ainlCbrBra ntePieMinister’s Prime the in Institute. Bureau Research in Cyber Industrial Center the National Security and Israel Office; Cyber Jerusalem the of with University conjunction Hebrew the 1764/15; Grant grant; Mellon’s Foundation Alon Science an Carnegie Israel Google; CMMI-1535539; IIS- Bosch; CHS-1526665, and and initiative; Grants Web2020 support IIS-1217559, NSF their by IIS-1816242, for supported was 1149797, community Serendipity work (NSF) This Cybernetic Sciences feedback. Revisiting helpful of Collaboration: Academy National and and Creativity on important quium and challenging ACKNOWLEDGMENTS. of set faces. society innova- diverse our finding that the problems of to process solutions the tive accelerate further and examples Y ,Ktu ,KatR 21)Dsrbtdaaoia dagnrto:Inventing generation: idea analogical Distributed (2014) RE Kraut A, Kittur L, Crowdsourcing Yu Cascade: (2013) JA Landay DS, Weld D, Edge G, Little LB, Chilton turk. mechanical with studies user Crowdsourcing (2008) B Suh EH, Chi A, catego- Kittur phenomenon biology The (2014) for RB basis Stone functional S, A Immel (2002) F, KL Berthelsdorf Wood R, Arlitt S, Szykman DA, Mcadams RB, in Stone creativity J, Fostering Hirtz DANE: (2011) J Yen AK, Goel M, Helms B, Wiltgen S, Vattam LkaiK(08 noetv.o A.HradBsns colCs o 608-170 No. Case School Business Harvard (A). InnoCentive.com (2008) K Lakhani infrastructure. knowledge in investment large-scale A : (1995) DB Lenat needs. design specific for mining Analogy (2018) al. et K, Gilon multi- with generation idea analogical Distributed (2016) A Kittur RE, Kraut L, Yu representa- word for vectors Global Glove: (2014) C Manning R, Socher J, Pennington for system initiative mixed A (2018) A Kittur D, Shahaf T, Hope JC, Chang system. J, support design Chan engineering mechanical ideal anal- the Toward through (2002) innovation DG Accelerating Ullman (2017) D Shahaf A, Kittur J, Chan T, Hope allocation. Dirichlet Latent by (2003) rep- MI Indexing Jordan word (1990) AY, Ng R of DM, Harshman Blei estimation TK, Landauer Efficient GW, (2013) Furnas ST, J Dumais Dean S, Deerwester G, Corrado K, Chen T, Mikolov NLP. in metaphor of Models (2010) E Shutova framenet Out-of-domain (2017) I engine: Gurevych structure-mapping T, Martin The I, (1989) Kuznetsov D S, Gentner Hartmann KD, Forbus B, Falkenhainer analogy. for framework theoretical A Structure-mapping: (1983) D Gentner crowd in thinking box” the- “outside- Encouraging (2016) RE Kraut A, Kittur L, Yu crowds. with ideas analogical for Searching (2014) RE Kraut A, Kittur L, Yu ihcrowds. with 1999–2008. pp York), New (ACM, Systems puting Systems creation. Computing taxonomy in 453–456. pp Factors York), New Human (ACM, on al. et Conference M, SIGCHI Burnett the of ceedings design. inspired biologically of support in Des Mech framework J computation human A rizer: efforts. previous evolving and Reconciling design: engineering 115–122. pp London), (Springer, design. inspired biologically through and HradBsns col abig,MA). Cambridge, School, Business (Harvard ACM ’18 121:1–121:11. CHI pp York), Systems, New Computing (ACM, in MC Shraefel Factors E, Human Cutrell on Conference CHI 2018 1236–1245. pp Computing Social & Work Cooperative constraints. ple 1532–1543. pp PA), (EMNLP) Processing tion. ’18 CSCW Computing, Monroy-Hern Social K, 31:1–31:21. pp Karahalios & G, Work Fitzpatrick Cooperative eds Computer-Supported on papers. ence research between analogies finding Des Eng Res ’17 KDD Mining, Data and Discovery Knowledge mining. ogy 1022. analysis. semantic latent 2013. 16, January posted Preprint, arXiv:1301.3781. space. vector in resentations 688–697. pp PA), Stroudsburg, Linguistics Linguistics, Computational Computational for Association the of ing pp 1, Vol PA), Papers Stroudsburg, Long Linguistics, Computational Linguistics: for 471–482. (Assoc Computational A Koller for P, Association Blunsom the of ter labeling. role semantic examples. and Algorithm 7:155–170. Computing, Social & Work expertise. Cooperative ’16 Computer-Supported CSCW of on domains Conference ACM identifying through innovation Systems, Computing in Factors Human on Conference ’14 ACM CHI Annual 32nd the of ings ’14 1254. CHI Systems, puting rceig fte21 ofrneo miia ehd nNtrlLanguage Natural in Methods Empirical on Conference 2014 the of Proceedings 38:33–38. d rwtrS okunA(C,NwYr) p1225–1234. pp York), New (ACM, A Cockburn S, Brewster eds , d jr ,KntnJ(C,NwYr) p1214–1222. pp York), New (ACM, J Konstan P, Bjørn eds , 136:111105. 13:55–64. rceig fte2r C IKDItrainlCneec on Conference International SIGKDD ACM 23rd the of Proceedings rceig fteSGH ofrneo ua atr nCom- in Factors Human on Conference SIGCHI the of Proceedings d økrS rwtrS ads ,BadunLfnM akyWE Mackay M, Beaudouin-Lafon P, Baudish S, Brewster S, Bødker eds , rceig fte1t C ofrneo Computer-Supported on Conference ACM 19th the of Proceedings dMro AscfrCmuainlLnusis Stroudsburg, Linguistics, Computational for (Assoc Y Marton ed , rceig fteSGH ofrneo ua atr nCom- in Factors Human on Conference SIGCHI the of Proceedings PNAS rceig fte1t ofrneo h uoenChap- European the of Conference 15th the of Proceedings d rwtrS okunA(C,NwYr) p1245– pp York), New (ACM, A Cockburn S, Brewster eds , h uhr hn h rhrM ake Collo- Sackler M. Arthur the thank authors The mScIfSci Inf Soc Am J ri Intell Artif | eray5 2019 5, February 41:1–63. d rwtrS okunA(C,NwYork), New (ACM, A Cockburn S, Brewster eds , 41:391–407. einCetvt 2010 Creativity Design rceig fte2s C Confer- ACM 21st the of Proceedings rceig fte4t nulMeet- Annual 48th the of Proceedings ne ,NcosJ(C,NwYork), New (ACM, J Nichols A, andez ´ AM e ok,p 235–243. pp York), New (ACM, d hn ,KenP(so for (Assoc P Koehn J, Chang eds , | o.116 vol. rceig fte19th the of Proceedings ahLanRes Learn Mach J e n Des Eng Res d ar ,NgiY Nagai T, Taura eds , rceig fthe of Proceedings | d aaaM, Lapata eds , o 6 no. d e A, Dey eds , 13:65–82. Commun Proceed- onSci Cogn | 3:993– 1877 eds , Pro- ,

COMPUTER SCIENCES PSYCHOLOGICAL AND COLLOQUIUM COGNITIVE SCIENCES PAPER