<<

Patterns in the Emergence of Nanotechnology: The Case of

Sarah Kaplan* University of Toronto, Rotman School 105 St. George Street Toronto, ON, M5S 3E6, Canada 416-978-7403 [email protected]

Keyvan Vakili University of Toronto, Rotman School 105 St. George Street Toronto, ON, M5S 3E6, Canada 607-708-4960 [email protected]

This draft: April 11, 2011

Preliminary. Grace Gui, Sara Sojung Lee, and Neal Parikh provided much-appreciated research assistance. We are grateful to Michael Lounsbury for the suggestion to study fullerenes and nanotubes within the broad domain of nanotechnology, Juan Alcácer for help and advice in creating the initial data set, to Hanna Wallach for her early test-driving of topic modeling on these data. This work is partially supported by the Mack Center for Technological Innovation at the Wharton School, University of Pennsylvania (where Kaplan is a Senior Fellow) and the Canadian Social Sciences and Humanities Research Council under grant # 410-2010-0219. The usual disclaimers apply.

*Corresponding author.

Patterns in the Emergence of Nanotechnology: The Case of Fullerenes

Abstract:

We propose a new methodology – topic modeling – to evaluate the emergence and interpretation of new and use this approach to examine trends in the field of nanotechnology. We treat not as measures of innovation but rather as historical documents written by particular human beings, in particular places, at particular times. We argue that studying the language in patents provides a closer reading of interpretations of an emerging than can be obtained through indicators such as USPTO classifications. By analyzing the text of the abstracts of 2,826 patents, we find that interpretations of what this “general purpose technology” was and how it might be used changed over time. Our descriptive results contrast with extant literature on sources and value of inventions.

1. Introduction

There is increasing interest in understanding technologies and markets in their emergent or nascent phases (Aldrich & Fiol, 1994; Santos & Eisenhardt, 2009). While we know a great deal about the development of technological trajectories once competing new technologies emerge (Christensen & Rosenbloom, 1995; Dosi, 1982; Sahal, 1981; Utterback, 1994; Nelson &

Winter 1977), we know much less about what goes on in the era of ferment where variation has been assumed to be due to forces (Tushman & Rosenkopf, 1992).

Traditional research in science and technology studies (Bijker, Hughes & Pinch, 1987) and in the management of technology (Henderson, 1995; Tushman & Rosenkopf, 1992;

Utterback, 1994) has shown that social processes are at work in the construction of technologies over the course of the technology life cycle but offered few insights into how the technologies emerge initially. More recently, scholars have begun to examine the potential for social factors in shaping the quantity and quality of technological variations in the era of ferment. Some attribute variation to the backgrounds of individual inventors or organizations (Fleming, 2002; Fleming &

Singh, 2010). Others have suggested that institutional actors such as social movements, the

Emergence of Nanotechnology - 1 - media, communities or firms affect which technologies are funded and explored (Kaplan &

Radin, 2011; Kennedy, 2008; Weber, Heinze & DeSoucey, 2008; Wry, Greenwood, Jennings &

Lounsbury, 2010).

Implicit in these models is an assumption that such social factors as career backgrounds or external forces shape inventors’ and entrepreneurs’ interpretations and choices about which technologies to pursue. In these highly uncertain settings, actors’ cognitive frames become the basis for action. Because new technologies are inherently “equivocal” (Weick, 1990: 2), they are subject to sensemaking by actors (scientists, managers, regulators, patent examiners, etc.) who need to make choices about how to act. These interpretations and actions can, in a co- evolutionary manner, shape the direction of the technical change itself (Kaplan & Tripsas, 2008).

While cognition is receiving increasing attention in management research in general (Elsbach,

Barr & Hargadon, 2005; Walsh, 1995), research on the emergence of new technologies has been largely silent about cognition’s role (Kaplan & Tripsas, 2008; Nightingale 1998).

Filling this gap requires addressing important methodological challenges. Tracking the emergence of a new technological field risks falling into a “Whiggish” progressivist history in which the past is retrospectively understood based on the outcomes that we observe

(Lamoureaux, Raff & Temin, 2004). We may miss the paths not taken, the “flotsam and jetsam” of history (Schneiberg 2007), that disappear once a particular path is followed. Or, we may impose categories developed later in time on interpretations occurring at the period of emergence.

These challenges dog large-scale studies using data such as patents or scientific publications to track knowledge flows systematically over time. These studies have drawn conclusions about technology evolution and the emergence of technological novelty based on

Emergence of Nanotechnology - 2 - proxies such as US Patent and Trademark Office (USPTO)-designated technology classifications

(e.g., Fleming & Sorenson, 2004) or key words (Azoulay, Ding & Stuart, 2007) that are potentially problematic. In the case of USPTO patent classes, these are pre-established categories used to facilitate patent examiners’ searches for prior art (Benner & Waldfogel, 2008) and therefore may lag substantially the emergence of new technological arenas. The USPTO does, from time to time, update its classifications and then reclassifies existing patents according to the new system, but these classifications risk ex post interpretation of the prior emergence of a technological arena. They may also miss truncated paths in which a new idea emerges but dies off quickly. So far, the field has not had a means for identifying new technological ideas as they emerge nor for tracking their evolution over time.

In this paper, we introduce a new method – topic modeling of text – for addressing these challenges and show how its application in the analysis of patent texts provides new avenues into avoiding retrospective bias while analyzing the emergence of new technologies. The idea behind the use of textual (or content) analysis is that language is tightly connected with human cognition

(Duriau et al 2007). Topic modeling uses Bayesian statistical techniques to determine the meanings of words by looking at co-presence with other words in the same document or block of text. This method is becoming established in computer science research, but is only beginning to emerge as a technique for the social sciences (Ramage et al 2009, McCallum et al 2006,

McCallum et al 2007). It generates categories (topics) from the texts themselves rather than imposing pre-determined categories on the information. This approach allows an examination of interpretations as they develop contemporaneously and provides a systematic way of tracking the evolution of interpretations over time, even over very large bodies of texts.

We illustrate this method’s potential through a descriptive analysis of the emergence and

Emergence of Nanotechnology - 3 - evolution of a new nanotechnology: fullerenes. Specifically, we use the texts from the abstracts of all fullerene-related patents to examine how this technology has been understood over time.

We treat patents not as measures of innovation (Jaffe and Trajtenberg 2002, Griliches 1998) but as historical documents written by scientists at particular periods of time (Alcacer & Gittelman

2006). Topic modeling provides us with a direct method for tracking the interpretations embodied in the texts of patents.

Using this method, we develop a portrayal of the evolution of fullerenes and nanotubes that contributes to our knowledge about the nature of technological emergence and how to measure it. We show that the understanding of fullerenes and nanotubes evolved over time, from an early focus on methods to a later focus on different types of applications. The sources of these inventions, both in terms of the types of inventors and their geographic location, tended to be from outside the core of technology development. That is, the inventors of the important breakthroughs tended not to have much targeted experience in the domain and tended not to be located in the standard technology clusters ( Valley and Route 128). Only later ideas, particularly those about specific applications, tended to be generated within those clusters.

Using topic modeling allows us to disaggregate measures of the creativity of the invention and measures of its value (where in the past, the calculation of the breakthrough nature of an invention and of value were based in a count of subsequent citations to the patent). In doing so, we find that topic-generating patents (those that initiate a new idea or theme) were more likely to have higher forward citations than follow on patents, yet not all highly cited patents were those that created breakthroughs. Controlling for a patent’s degree of cognitive breakthrough, we find that social factors such as the number of inventors listed on the patent contributed to its subsequent number of citations, factors that might be less associated with a

Emergence of Nanotechnology - 4 - patent’s “value” than with its position in a social network. These analyses contribute a more granular understanding of technology emergence and evolution using the words of the inventors themselves to understand the technology’s interpretation over time.

2. Nanotechnology as an emerging technological field

This paper focuses on nanotechnology, or more appropriately, nanotechnologies (as there are many). Nanotechnologies are “very small” (a nanometer = 1 billionth of a meter) technologies that often have different properties at the nanoscale than they do at larger sizes.

Explorations in this field are occurring across a wide range of disciplines (e.g., , , ) as well as in the medical sciences, and . While nanotechnology is widely recognized as involving an increasing ability to manipulate matter, there is no consensus as to how this will happen or where such inventions could be applied. As a result, nanotechnology is bursting with meanings.

We examine the evolution of one particular nanotechnology: (and the chemically related nanotubes). Prior studies of the emerging field of nanotechnology have found fullerenes and nanotubes to be a useful site for analysis (Wry et al. , 2010). Figure 1 shows images of each of these related technologies, both of which have the chemical formula of

C60 or Carbon 60). Buckminsterfullerenes (also known as fullerenes or “buckyballs” because of their geodesic dome shape) were discovered in 1985 by Dr. , and

Harold Kroto (for which they won the in 1996). Carbon nanotubes in the fullerene “family” and their discovery is attributed to of NEC Corporation in

1991. Fullerenes and carbon nanotubes can be conceived of as “general purpose technologies”

(Bresnahan & Trajtenberg, 1995; Helpman, 1998) because they can be applied in a very broad range of potential applications from medicine to automotive to . Methodologically,

Emergence of Nanotechnology - 5 - fullerenes and nanotubes are appropriate for the application of topic modeling because they are the subject of substantial patenting and scientific publication over time.

-- Insert Figure 1 about here --

To date, very little commercialization of anything technologically significant has taken place, thus we must rely on patents as indicators of potential commercial applications.

These patents show that revolutionary new applications are being developed (e.g., implantable medical devices to control insulin levels for diabetics, more targeted treatments for cancer, structural materials for combat and sports gear, the material for super lightweight batteries and new computing processors that provide quantum leaps in speed and storage capability). Because of this range of potential applications, researchers and managers in organizations have broad purview to guide the of the technology in many directions. As a result, their interpretations of what the technology is and how it might be used have consequences for the development and evolution of the technology. Research and development (and ultimately commercialization) resources will be placed in some areas and not others depending on the interpretations and choices these researchers make.

3. Methodology and sample

3.1. Patents as a means for assessing a knowledge space.

The use of patents to track innovation has a long-established history (Jaffe & Trajtenberg, 2002).

Researchers have used counts of patents as proxies for innovative effort (Kaplan, 2008;

Schmookler, 1972) as well as innovative output (often when using the counts of the citations to the focal patents to proxy importance) (Griliches, 1990; Jaffe & Trajtenberg, 2002). Given that patents codify inventions, it seems also reasonable to use them to capture knowledge and ideas.1

1 Scientific publications are also a means for tracking scientific ideas. Further exploration of the emergence of the field of fullerenes and nanotubes would take into account such publications as well. However, as scholars have identified a correlation

Emergence of Nanotechnology - 6 - Many scholars have followed this path. These works, for example, measure knowledge spillovers from public and private scientific research by counting technologically-related patents

(as measured by patent classifications) granted in nearby geographic locations (Alcacer &

Chung, 2007; Furman, Kyle, Cockburn & Henderson, 2005; Jaffe, 1986; Jaffe, Trajtenberg &

Henderson, 1993). Studies have also used patents to capture the ties that contribute to knowledge flows by examining networks of co-authorships, citations and patent classes (e.g., Amburgey,

Dacin & Singh, 1996; Fleming & Sorenson, 2004; Singh, 2005; Sorenson & Fleming, 2004;

Sorenson, Rivkin & Fleming, 2006). Yet, in all of these cases, patents have been used as proxies of knowledge and innovation, and often recognized as imperfect proxies, at that. Most studies that use patents as variables acknowledge the myriad reasons they cannot fully or accurately capture the underlying constructs they propose to measure.

Benner and Waldfogel (2008) offer a detailed critique of the use of patent classifications to proxy location in technological space. One concern is that most scholars focus on the primary three-digit patent classification even when many patents are classified in multiple classes and sub-classes. Further, in the case of a field such as nanotechnology, as they note (Benner &

Waldfogel 2008: 1558), this classification (class 977) is only a cross-reference class and therefore, by the rules of the Patent Office, cannot be included as a primary patent class. Thus, an emerging field such as nanotechnology would not be captured in primary three-digit classification measures that tend to be used in technology management studies, despite the

USPTO having created a cross-reference class to facilitate search.

But, even if these challenges were addressed (by including all patent classifications, for example), the use of such classifications may not be suitable for tracking the emergence of new

between scientific publishing and patenting (Agrawal & Henderson 2002; Murray 2002), an analysis of patenting patterns is a reasonable place to start.

Emergence of Nanotechnology - 7 - technologies are they “reflect the uneven growth derived from the first general scheme created in

1990” and revised many times since then USPTO 2005 p. 1), they are “primarily designed to assist patent examiners performing patentability searches” (p. 1) and the classification of patents into these categories are inherently the “subjective” assessments of examiners based on their interpretations of the claims and the rules for making classification decisions (p. 9).2

Thus, as Alcacer and Gittelman (2006) remind us, patents are historical documents produced by inventors, prosecuted by patent attorneys and evaluated by patent examiners. This suggests that rather than count patents or their citations to evaluate knowledge, it should be useful to focus our attention on what the authors of the patents write in these documents. This allows the patents to represent contemporaneous interpretations of the technology, which avoids problems of retrospection in tracking the emergence of new technological arenas. Scholars have recognized that patent texts such as assigned keywords can be used to assess how patentable a technology might be (Azoulay, Ding & Stuart 2007) or the degree of closeness of different scientists in scientific space (Jaffe 1986).

3.2 Methodology: text analysis of patents

Following this line of thinking, the methodological move made in the study reported here is to treat patents not as proxies or measures but rather as historical documents written by

2 Note that the Handbook of Classification (USPTO 2005) describes the inherently subjective manner in which patents are classified by their staff. “To ensure uniform classification of patent documents and to provide for “infringement” type searches, the claimed subject matter interpreted in light of the total disclosure contained in a patent, i.e. the “claimed disclosure”, has been selected as the primary informational content of the patent that receives “mandatory classifications”. This narrows down to manageable proportions the subjective judgments that must be made relative to the uniform placement of patents.” p. 9 “The present USPC system reflects the uneven growth derived from the first general scheme created in 1900. Classification before 1900 closely paralleled economic groupings of the period with informal and arbitrary subdivisions to provide manageable size collections. Relationships among such patent collections, if they existed, were lost in the alphabetical ordering of titles assigned to each of the collections. Search notes, class and subclass definitions, or schedule explanations either did not exist or, at most, were primitive. While all of the present major groupings have been “revised” since 1900, each class reflects the theories of classification that existed at the time it was reclassified.” p. 1 Classification orders have been issued 1905 times in the history of the USPTO (through February 2011). Since 2003, an average of 10 classification orders have been made each year http://www.uspto.gov/patents/resources/classification/orders/index.jsp, accessed Marcy 29, 2011. The patent office does provide information on the dates each patent class was established at: http://www.uspto.gov/patents/resources/classification/numeric/index.jsp, accessed March 29, 2011.

Emergence of Nanotechnology - 8 - particular human beings, in particular places, at particular times. Studying the language in the documents should give a closer reading of the understanding of an emerging technology. The idea behind the use of textual (or content) analysis – an idea that has recently been rearticulated by Duriau and colleagues (2007) – is that language is tightly connected with human cognition.

This is the Whorf-Sapir hypothesis (Sapir, 1944; Whorf, 1956) from which many content analysis techniques are drawn. More specifically, groups of words can be seen to represent important themes (Abrahamson & Hambrick, 1997; Huff, 1990).

We measure interpretations using the text in the abstracts of patents to understand how different actors describe what the technology is and could be. Doing this analysis over large numbers of patents requires automated text analysis procedures. Where the concern is specifically about identifying themes and trends, newly developed computer science text analysis techniques such as topic modeling will be most appropriate (Blei, Griffiths, Jordan &

Tenenbaum, 2004; Blei & Lafferty, 2007; Mimno, Wallach & McCallum, 2008; Ramage, Rosen,

Chuang, Manning & McFarland, 2009; Wallach, 2008). The goal of topic modeling techniques as developed in the computer sciences is unsupervised analysis of text designed both to develop a model that would predict future texts and to provide a representation of the topics in an existing corpus (Chang, Boyd-Graber, Wang, Gerrish & Blei, 2009; Hall, Jurafsky & Manning, 2008).

For our purposes, we focus on this second goal – representing topics in a body of text – and we will use these data to track the emergence and evolution of meaning over time.

The topic modeling approach we use is based in the Bayesian statistical technique of latent dirichlet allocation to determine the meanings of words by looking at co-presence with other words in the same document or block of text (after removing “stop words” such as “the,”

Emergence of Nanotechnology - 9 - “and,” “that,” or “were”) (Blei, Ng & Jordan 2002).3 The approach assumes that there is a latent set of topics within each document and that any word appearing in the document is attributable to one of these topics. The same word may have different meanings depending on its association with other words in a document. Given a particular corpus of texts, topic models infer a set of topics and identify the words associated with each topic. Practically speaking, the algorithm identifies these themes (or topics) through the association with semantically related words in blocks of text and then assigns weights to each theme for each block of text. The output is a list of topics and vectors of the weight of each topic in each document. For our analysis, we used the publicly available “Stanford Topic Modeling Toolbox” developed by the Stanford Natural

Language Processing Group and made available in 2009 (Ramage et al 2009).4 While topic modeling was initially developed for computer science applications, e.g., for improving Internet search algorithms, the Stanford toolbox was adapted specifically with the needs of social scientists in mind.

This method allows the researcher to quantify meaning over large numbers of texts and to understand the relationship between different texts according to the themes or topics emphasized in each. For computer science applications such as the development of predictive models for text searches, the best fit model often produces very large numbers of topics.5 However, Chang et al

(2009) show that these best fit models do not produce topics that represent clearly distinct meanings and that smaller numbers of topics make interpretation more feasible and reliable. In our data, the best fit according to machine learning criteria would include hundreds of topics, where the distinctions between them are statistically reliable but hard to interpret even by experts in the field. Therefore, we specified in the model the maximum number of topics that would be

3 Based on standard lists of stopwords, we removed 905 different words from the abstracts. 4 See http://nlp.stanford.edu/software/tmt/tmt-0.3/ for further description details on the toolbox. 5 This often involves holding out part of the corpus, training the algorithm on one portion and then testing its accuracy in modeling the text in the held out portion, see Wallach et al, 2009.

Emergence of Nanotechnology - 10 - still be interpretable by nanotechnologists with specific knowledge of fullerenes and nanotubes, which for our analysis was 100 topics. This provided both statistically and semantically meaningful topics.

The algorithm does not provide a summary name for each topic; the topics are simply numbered (though some scholars have experimented with the automatic labeling of topics, this approach is not reliable enough to have been widely adopted, Mei, Shen & Zhai, 2007). Thus, a final step requires topical experts to evaluate the words in the topic and assign a name that most accurately represents the meaning of the topic. In our study, we provided a spreadsheet with each of 100 topics and the top 20 words associated with each topic to 3 field experts (professors and doctoral students in nanotechnology fields). They each separately created a short name for each topic. We then met as a group to discuss differences and agreed on the most appropriate name.

3.3. Sample of fullerene and related patents

To examine the evolution of fullerenes and related technologies, we collected 2,826 fullerene (and ) patents granted by the US patent office through December 2007, the first of which was applied for in 1990. To assure our sample is not biased by previous methodologies used to categorize patents, we identified the population of patents using three separate search techniques. First, we selected all utility patents with the terms “fullerene” or

“carbon nanotube” in the title, abstract or claims in the patent. Second, we used the Derwent technology classifications to select all patents they identify as pertaining to either of these technologies.6 Finally, the US Patent Office established a nanotechnology “cross reference” patent class (#977) in 2004, which was applied retroactively to all previously-granted patents the

USPTO deemed relevant as well as all new nanotechnology patents. Several of the subclasses

6 We selected the following DWPI (Derwent World Patent Index) codes: B05-U; C05- U; E05-U; E31-U02; L02-H04B; U21- C01T; X12-D02C2D; X12-D07E2A; X12-E03D; X16-E06A1A (mainly related to “fullerene type cage structures”) as identified using the DWPI manual code lookup function and searching for “fullerene” or “nanotube.”

Emergence of Nanotechnology - 11 - pertain to fullerenes and nanotubes (977/735-752). All the patents classified in these categories were selected. We did not use the International Patent Classification system because the most relevant classes (B82B 1/00 and 3/00) were not granular enough to separate out fullerenes and nanotubes from other nanotechnology patents. Figure 2 demonstrates that no individual sampling technique provided a comprehensive picture of patents that could plausibly be associated with fullerenes and carbon nanotubes, and we believe our approach to developing the population of patents in this field compensates for biases created by any one classification system. Note in particular that the text search method adds a proportionally large number of patents to our population (1,585 out of the 2,826 identified patents are unique to the text search method). This provides us with a more comprehensive portrait of the emerging field than the published classification systems.

-- Insert Figure 2 about here --

While the first nanotube patent application was filed in 1990, the number of patents in this technological field grew rapidly in subsequent years. Figure 3 shows the number of patents by application year (only successfully granted patents are included). The decline in the number of patents after 2003 is due to an average thirty four-month lag between the application date and the eventual patent grant.7 In the analyses presented below, we account for this right truncation in the data where appropriate.

-- Insert Figure 3 about here --

3.4 Deriving fullerene and nanotube topics

For each patent, the abstracts from its US Patent Office document were used in the topic modeling analysis.8 Analysis of the data shows that in several cases multiple patents with the

7 In our sample, the range of time to grant from application is 5 months to 98 months with a mean of 34 months. 8 The abstracts available from Derwent were not used because Derwent often retrospectively modifies the texts of the abstracts

Emergence of Nanotechnology - 12 - same abstract have been applied to protect a single invention. To prevent multiple counting of such cases, we grouped patents with identical abstracts in a single patent family. This resulted in

2384 patent families based on the 2,826 patents. Using the “Stanford Topic Modeling Toolbox”9, we subsequently identified 100 separate topics based on the abstracts. As noted above, the modeling algorithm first created a pool of all the words appearing in all of the abstracts (less all of the “stop words”). Using the latent dirichlet allocation method (Blei, Ng & Jordan 2002), it then identified the topics and reported the frequency ratios with which each of the words may appear in each topic. Using the top 20 words identified by the model for each topic, a panel of technical experts reviewed and named the topics.

Table 1 shows sample topics with the top 20 words associated with each. The algorithm also produces a vector of weights of each topic in each patent abstract. Patents may contain several topics, though of different weights. Figure 4 shows a sample abstract and the weight of its most important topics (on average, only three topics have weights more than 10 percent and there will be a larger number of topics that register only a small presence, with weights under 1 percent). For example, this particular abstract for patent number 5494558 “Production of fullerenes by sputtering” contains topic 10 (“producing fullerene-containing soot”) with 11 percent weight and topic 41 (“combustion synthesis of carbon nano-materials”) with 78 percent weight. Note that the sum of the weights of all the topics for any given patent is 100 percent.

-- Insert Table 1 and Figure 4 about here --

To understand the frequency and importance of each topic in the whole sample, we need to assess the number of patents appearing in each topic. Since each patent contains multiple

they provide in their database. While this may make the patents potentially more comprehensible or consistent, it takes away from the original language used by the inventors, thus making those abstracts less useful when attempting to understand the contemporaneous interpretations of the technology. 9 See http://nlp.stanford.edu/software/tmt/tmt-0.3/ for further description details on the toolbox. Ramage et al. (2009) provides further discussion on the use of topic modeling in the social science.

Emergence of Nanotechnology - 13 - topics with different weights, there are multiple ways to calculate the number of patents in each topic. First, for each topic, we can simply sum up the weights of the topic in each of the patents that contains it (“weighted sum”). Alternatively, similar to the USPTO patent technological classification, we can choose a primary topic for each patent (the topic with maximum weight).

For example, topic 41 is the primary topic of the patent shown in Figure 4.

Table 2 demonstrates these two measures and their differences for a hypothetical sample of 10 patents and 3 topics. Based on the data presented in Table 2, one can conclude that topic 3 appears more frequently in the whole dataset relative to those of the two other topics, i.e. topic 3 has the highest “weighted sum” among three. However, using the “primary topic” concept, we see that only three patents belong to topic 3 whereas four patents belong to topic 2. While each patent abstract has one or two top words of topic 3, there are few patents that include a good number of them. By contrast, while there are a few patent abstracts that include the top words of topic 2, these few include a good number of these top words. This suggests that topic 3 is a more general and inclusive topic whereas topic 2 is a more specific and exclusive one.

-- Insert Table 2 about here --

In brief, the “weighted sum” measure gives us a good idea of how much each topic generally appears in the whole dataset. On the other hand, assigning a single primary topic to each patent gives us a sense of the prominence of a topic where it appears. It also prevents the possible erroneous counting of patents in topics that do not strongly represent them. Taking these two measures together provides a reasonable perspective on the importance of different topics and can be used to track from multiple perspectives the evolution of topics over time. The summary statistics are reasonably similar for each of these two measures for the whole sample of topics calculated over the period 1991-2005 (Table 3). The number of patents in each topic

Emergence of Nanotechnology - 14 - ranges from a low of 8 to a high of 47 with an average around 23 depending on which measure we choose. There is also a high correlation of .83 between the calculated total number of patents in the topics based on each of the measures.

-- Insert Table 3 about here --

Previously, the main social science approach to characterizing the knowledge represented by a patent was to use the patent office-assigned patent classifications. However, the correlation between themes developed using topic modeling and the USPTO assigned technological classes

(using primary topics and primary 3-digit patent class) shows that patents containing the same topic may spread over several different patent classes. Similarly, patents in the same patent class may belong to different topics. Although there is some correlation between primary topics and the primary patent classifications (the average correlation is 0.23 with a standard deviation of

0.10)10, these two different means of capturing the knowledge in a patent convey different information, topics being generated by the writings of the scientists (and others who help construct the patent) and the patents classifications being assigned by patent examiners based on their understanding of what the patent application contained and using previously established classification categories.

4. Results: perspectives on fullerenes and nanotubes

While there are a wide variety of potential applications of topic modeling of patents to address numerous questions about science, technology and knowledge flows, as a means for demonstrating this method’s potential impact on social science research in this domain, we focus on the emergence of a new technological field, the sources of the ideas and the value these ideas create.

10 To calculate the average correlation, first we find the maximum correlation for each topic with all the possible patent classes. We then take the average over all the calculated maximum correlation values.

Emergence of Nanotechnology - 15 - 4.1 Emergence and evolution of topics over time

Topic modeling provides us a means to observe the emergence of a new technology.

Without a technique for evaluating the text of the patents, scholars are left with classifications applied by the patent office that are applied retrospectively to identify technological arenas. This can be problematic because standardized categories to understand a technology may emerge much later than the technologies themselves. In our analysis, we examine the texts of the patents to understand the categories or topics as they emerged. As such, topic modeling provides us a means to identify and quantify the origins of technological trajectories.

Using this approach in an analysis of fullerene and nanotube patents, we find that scientists’ understanding of fullerenes and nanotubes and the descriptions of their inventions have changed over time. The most important topics (those with the highest share across all patents) in the early period (1991-1994) were primarily related to methods and techniques for the synthesis of fullerenes and their derivatives (mainly polymeric derivatives such as “fullerene amine-containing ”). The most important topics in the later period (2003-2006) focus on the application of fullerenes and nanotubes in a number of fields such as , field emission devices, conductive thermoplastic resins, and thermal interface material. Figure 5 shows the shifting shares of these topics over time, as the share of the top topics for 1991-1994

(based on a measure of the sum of the weights for each topic in the patents in which it appears) decreases as the share of top topics for 2003-2006 increases. Using the “primary” (most important) topic measure results in similar patterns. Overall, the analysis shows a movement from methods of producing fullerene and nanotubes toward their applications in electronics and beyond.

-- Insert Figure 5 about here --

Emergence of Nanotechnology - 16 - The sources of these topics also change over time, where universities and public research institutions play a more substantial role in the top topic areas in the early development of the field (association more with method and “proof of concept”) and corporations play a larger role in later topic areas (associated more with applications) (Figure 6). This suggests a difference in focus between the private sector and universities and public research institutes.

-- Insert Figure 6 about here --

We found that the ways in which patents cited prior art (both previous patents and scientific literature) changed in parallel to these trends (Figure 7). The number of backward citations to previous US patents increases as the total patent base grows over time.11

Surprisingly, despite the remarkable growth of the scientific research in the field during the past decade (Miyazaki and Islam, 2007; Meyer, 2007), the number of references to non-patent materials decreases over time. This can signal a gradual departure between the focus of the scientific community and that of the private sector in the field of fullerenes and nanotubes. The topic modeling analysis demonstrating the movement from methods of synthesis towards discovering new applications, as well as the documented shift of focus from universities and public research institutes towards corporations, may explain the decline in the average number of references to scientific (non-patent) material as well as the increase in citations to prior patents over time.

11 Research has documented the importance of considering both inventor and examiner listed prior art (Alcácer, Gittelman & Sampat 2009; Alcácer & Gittelman 2006). Because these distinctions are only available from the USPTO since 2001, we are not able to evaluate these differences across our entire sample. However, for the patents where we have this information, the examiner citation shares are similar. Comparing our sample to that of Alcácer et al (2009) at the patent level (where they examine more than 500,000 patents from all technological arenas), we find an overall examiner citation rate of 50% vs. 63%, for academia 43% vs. 45%, foreign 69% vs. 74%. To the extent that fullerenes and nanotubes look more like “chemical” patents from Alcácer et al (2009), then our overall rate of 50% examiner citations compares favorably to their “chemical” patents rate of 52%. Tan and Roberts (2010) note that there will be more examiner added citations where the technologies are more ambiguous or cut across various knowledge domains. We do not find this result in our own data, which, if anything, has slightly lower examiner added citations than those in the Alcácer et al benchmark. Alcácer and Gittelman (2005) conclude, however, that examiner and inventor citations generally track each other closely. Therefore, for this reason and for the reason that these data are not available for any of the pre-2001 patents in our dataset, the analyses included in this paper do not consider these two sets of citations separately. We are aware, however that this would bias some results, such as if we were to use patent renewal rates as a dependent variable in our analyses (see Hegde & Sampat 2009).

Emergence of Nanotechnology - 17 - -- Insert Figure 7 about here --

These changes in the patent themes and the emergence of new topics raise numerous questions. In this study, we take a first step toward some answers. In particular, we focus on three aspects regarding the evolution of topics: What are the types and sources of inventions?

Where to they emerge from? And how do the generation and combination of topics affect the value of inventions?

4.2. Types and sources of inventions

One of the important questions in research on technologies is about breakthrough innovations: how to identify them and where they come from. In these studies, breakthrough innovations have typically been measured by the number of “forward citations” (citations made to the focal patent by subsequent patents). Higher numbers of citations (often at the top 5 percent of the distribution) are said to indicate that a patent represented a breakthrough innovation (e.g.,

Trajtenberg 1990, Singh & Fleming 2010). Using forward citation measures, scholars have sought to determine the sources of innovative breakthroughs at the organizational, team, inventor and invention levels (e.g., Ahuja and Lampert, 2001; Fleming, 2001; Fleming, 2002; Singh &

Fleming, 2010).

For example, Singh and Fleming (2010) found that teams of researchers are more likely to generate path-breaking inventions than individual researchers (with the implication that teams provide more diverse inputs). Conti, Gambardella and Mariani (2010) posit that the total stock of individual inventive experience positively affects the number of new inventions, but negatively affects the likelihood that an invention turns to be a breakthrough. And, Fleming (2001) has argued that breakthrough innovations are those that are the product of the most creative processes, as captured by the degree to which the patents recombine diverse knowledge domains

Emergence of Nanotechnology - 18 - (as measured as the diversity of primary patent classifications of the prior art cited by the focal patent).

Topic modeling to understand the themes represented in patents offers a new, potentially more precise, means for characterizing breakthrough innovations, because it will allow us to distinguish between the cognitive or knowledge dimensions of an invention (as measured by themes in the texts of the patents) and the measures of its future economic value (as proxied by forward citations). To measure breakthroughs, we argue that the formation of each new topic in the patent data can be seen as the emergence of a new area of innovative activity (be it a new method, a new composition, or a new application). Using the primary topic measure, for each topic we identified the first patent that clearly represented each topic.12 This resulted in 100 patents representing the origin of 100 topics in our sample. All other patents associated with a topic are considered “follow-on” patents. Table 4 shows the distribution of these patents over the sample time period. To be able to make a meaningful comparison, we focus our attention on the

1991-1998 period in which the number of topic-generating patents is comparable to the number of follow-on patents.13

-- Insert Table 4 about here --

Having constructed this new measure of patent breakthrough status, we sought to identify precursors. Following previous studies, we examined team-, inventor- and invention-level characteristics. At the team level, we counted the number of inventors on each patent (a proxy for team size). At the inventor level, we focused on both the inventors’ total inventive experience

(as measured by Conti et al (2010) and others) as well as their targeted experience in the field of fullerenes and nanotubes. To measure the total inventive experience, using previously

12 Note that sometimes a patent is not a good representative of its primary topic (i.e. the primary topic still has a low weight). Such patents cannot be considered as topic-generating inventions. To address this issue, we manually read through the first few patents of each topic and picked the first one that thematically represented that topic. 13 Few topics were originated after 1998.

Emergence of Nanotechnology - 19 - established approaches (e.g., Cassiman et al., 2008; Singh & Fleming, 2010), we sum up the total number of inventions for each inventor (as represented by prior successful patent applications).

To calculate the targeted experience, we extended the sample to include all the patents cited by the 2,826 fullerene and nanotube patents in the core sample and then counted the number of inventors’ previous inventions in the extended sample (this expands the sample to 17,735 patents or 15,686 patent families). The idea here was to look at the experience of the inventor in technologically-related areas.

At the invention level, we counted the number of citations to previous US patents and the number of citations to non-patent materials (mainly scientific publications). These measures of technological and scientific background have previously been linked to the degree of forward citations received (Singh & Fleming 2010). Next, we examined the creativity of an invention by assessing the degree to which the patent represented combinations of knowledge from different arenas (its degree of recombination). Whereas in prior studies, topic recombination (sometimes called “patent originality” has been calculated as the degree of recombination of patent-office- applied classifications (Hall, Jaffe & Trajtenberg 2001), here we examine the texts of the patents and the prior art in order to determine if the descriptions of the invention indicate that different fields of knowledge have been combined. The topic recombination measure is constructed based on the Herfindahl index of citation concentration:

!! ! !"#$% !"#$%&'()*'$( ! = 1 − !!" ! where !!" denotes the percentage of citations made by patent ! to patents in topic !, out of !! patent topics. Hall (2002) shows that for firms with few patents the Herfindahl-based measures using patent data are biased downward. Extrapolating this idea to the case where patents cite few other patents as prior art, we adjusted the index as follows in order to assure that the measure is

Emergence of Nanotechnology - 20 - not biased due to different numbers of backward citations:

!"#$%& !" !"#$%#&! !"#$% !"#$%&'()*'$(! = (!"#$% !"#$%&'()*'$(!) !"#$%& !" !"#$%#&! − 1

If a patent cites patents in a wide range of topics, it scores high on “topic recombination” whereas a patent that only cites patents from a single topic scores zero. A high value of topic recombination shows that inventors have combined different ideas from different topics in the field. In order to calculate this measure, we needed to assign topics to the patents that were cited by the patents in our original sample. Since our main topic modeling analysis was done only for the 2,826 core fullerene and nanotube patents, we first needed to develop topics for all of the cited patents (the backwards citations). To do so, we used our extended sample (17,735 patents including core patents plus all of their backward citations). Then, using the same topic modeling method, we identified a new set of 100 topics and assigned a primary topic to each of the patents in the extended sample. With topics assigned to all the cited patents, we then were able to calculate the “topic recombination” for each of the patents in the core sample.

Looking at these team-, inventor- and invention-level characteristics as precursors to breakthrough inventions, we find intriguing results. Table 5 compares topic-generating and follow-on patents across different dimensions for 1991-1998. Contrary to prior work on breakthrough innovations (Fleming and Singh 2010), topic-generating patents are produced by smaller teams than follow on patents. In addition, there is no significant difference in the inventors’ total experience (that is, prior patenting in any technical domain) between inventors of topic-generating patents and those of follow-on patents. However, where we do find a difference in experience, it is in the degree of targeted experience (that is, prior patenting in the fullerene or nanotube domain, or in patents cited by fullerene and nanotube patents). On average, inventors of topic-generating patents have around half the targeted inventive experience of the inventors of

Emergence of Nanotechnology - 21 - follow-on patents. Our findings add nuance to the results from Conti et al. (2010) and Banerjee and Campbell (2009) by suggesting that it is an increase in the stock of targeted experience, and not the total experience, which has a negative effect on generating breakthroughs.

-- Insert Table 5 about here --

At the invention level, we find that we find that topic-generating patents do not appear to evolve as the combination of patents in other topics, at least relative to follow on patents. This stands in contrast to research using primary patent classifications as a means to assess categories of knowledge, which has previously argued that breakthrough patents (those with the most forward citations) are more likely to combine different categories of knowledge (Fleming, 2001).

However, topic modeling methods give us a new means for assessing the nature of breakthroughs. By focusing on “topic-generating” patents rather than the most highly cited patents, and by examining the topics of the prior art (the “topic recombination” index), we provide a new picture of the precursors to breakthrough patents. Topic-generating patents also contain fewer references to previous patents, and fewer references to non-patent materials (again, different from findings by Gittelman & Kogut, 2003, and Deng, Lev & Narin, 1999).

As an example of the differences that come from using topic generation to characterize breakthroughs, in Table 6, we replicate findings by Singh and Fleming (2010) using their measure of breakthroughs (whether the patent is in the top 5% of the distribution of forward citations) and our proposed measure of breakthroughs (whether a patent is topic generating). It is worth noting that there is a not a one-to-one correlation between patents that are topic generating and those that get cited. Indeed of the 325 patents listed in Table 5, 15 are both topic generating and in the top 5% of citations, 55 are topic generating but have lower citation levels, 36 are in the top 5% of citations but are follow on patents and the remaining 219 are follow on patents that

Emergence of Nanotechnology - 22 - have lower citation rates.

In our sample, which is decidedly smaller than theirs (2000 or fewer observations vs. their 500,000 observations), we find in column (1), results that are fully consistent with theirs

(though the significance is attenuated). Teams, especially those affiliated with organizations and those with more experience, are most likely to produce highly cited patents and these patents will have more citations to prior patent and non-patent references. We show, in columns (2) and (3) that it is the targeted experience that is more associated with future citations than the average total experience. In columns (4) through (6), we examine breakthroughs as being patents that generate new topics in the field. Here, we find contrasting results. Teams, organizational affiliation and total experience no longer are associated with breakthroughs. On the other hand, citations to prior art are actually negatively associated with outcomes as is the targeted experience of the inventors.

-- Insert Table 6 about here --

These analyses of breakthrough patents as characterized by topic generation offers a means for distinguishing breakthroughs in terms of cognition or knowledge and those patents which later are determined to have economic value (as characterized by citations made to the focal patent). When examining the conceptual breakthroughs, we find that they are more likely to come from smaller teams of researchers outside of the mainstream of research in the field.

4.3. Geographic clustering of the emergence of new topics

Research has highlighted the importance of technology clusters as sources of technological innovation (Audretsch & Feldman, 1996; Feldman & Francis, 2003). A topic modeling analysis of fullerene and nanotube patents suggests that such clusters matter, but in circumscribed ways. In examining the geographic sources of fullerene and nanotube patents

Emergence of Nanotechnology - 23 - (looking at the first inventor on each patent), we find that they come primarily from the US, and within the US primarily from its main technology cluster, California. Looking at the location of the first inventor, we find that the patents are concentrated in the US, Japan, and Korea, with 59 percent, 16 percent, and 8 percent share of the total patents respectively, though inventors represent 38 different countries in the total sample. Within the US, California has the highest share of fullerene and nanotube patents with 13 percent of the US patents. Texas, Massachusetts, and stand in the next places with 6 percent, 5 percent, and 4 percent shares respectively, though, again, most US states are represented in the sample.

However, an examination of the geographic sources and clustering of topics shows a subtler pattern. First, the most important topics in the early phase appeared mainly from outside of the standard technology clusters (away from Silicon Valley and Route 128 near Boston).

Texas appears as a location of topic-generating patents because the fullerene discoveries happened at in Houston. New Jersey and Illinois also figure prominently in the important early topics. However, as time moved on, the important topics in the later period tended to arise in California and Massachusetts, and to a certain extent in New York.

We also find that some conversations about fullerenes and nanotubes are clustered tightly geographically. For example, topic 56 (“application of nanotubes in field emission devices”) emerges primarily from Korea (with more than 56% share), while topic 71 (“functionalized derivatives of fullerenes and nanotubes”) comes mainly from inventors in Japan (with almost

52% share). Similarly, within the US, topic 34 (“nanotube- and nanoscale-devices”) has most of its inventors located in California (with more than 35% share), while topic 86 (“producing single-wall carbon nanotubes and its applications”) emerges mainly from Texas (with more than

65% share).

Emergence of Nanotechnology - 24 - These results suggest two patterns in technology emergence and development in fullerenes and nanotubes. First, it appears that inventions may emerge from anywhere, but that the later development and search for applications tends to favor pre-existing technology clusters.

Second, within this broad pattern, the interpretations about what fullerenes and nanotubes could do often clustered into local “conversations,” where specific topics tend to be located in fairly narrow geographic settings.

4.4. Generation and recombination of topics and the value of patents

Having examined two different measures of creativity – breaking from previous inventive lines (by generating new topics) and combining different topics (as measured by topic recombination), a further analysis investigated their impact on an invention’s value. Following previous research, we use the forward citation rate of patents as a measure of their value

(Trajtenberg, 1990; Hall et al., 2005; Harhoff et al., 1999). Note that, as described above, the number of forward citations has also been used to capture whether an invention is a breakthrough

(where the patents with the highest number of citations to them were considered breakthroughs).

Thus, the constructs of breakthroughs and high-value patents have been confounded in a single measure. Our analytical approach – using topic modeling of the texts of patents – allows us to distinguish between the level of citations to the patent and different measures of the technical inventiveness (described in section 4.2): the degree to which the patent represents a cognitive breakthrough (as a topic-generating patent) and the creativity of the patent (the degree of recombination of other topics it represents).

We started by calculating the number of forward citations to each patent in our sample

(the measure of “value”). To address the issue of right truncation in our sample, we only count the number of citations received in the four-year window after each patent application date. For

Emergence of Nanotechnology - 25 - this preliminary examination, we used a very simple modeling approach to provide a general sense of the relationships at play. Since the dependent variable (number of forward citations) is a count variable, we used a Poisson-based estimation method with robust standard errors.

Furthermore, since the variance of the dependent variable is proportional to its mean

(overdispersion), we use a negative binomial specification, which is a generalization of the pure

Poisson method and commonly employed under the conditions of overdispersion (Cameron and

Trivedi, 1998). Annual time dummies are included as a simple control for possible time trends.

Table 7 shows the summary statistics of the variables and the results from this stripped- down regression analysis. We included as controls other factors that scholars have found to be associated with the value of an innovation and we find the expected effects. Consistent with

Fleming and Singh (2010), the team size is positively associated with higher invention’s value.

Also, in line with previous research (Gittelman & Kogut, 2003; Deng, Lev & Narin, 1999), our results suggest that the number of references to non-patent materials (a proxy for scientific intensity) is positively correlated with forward citation rate of inventions.

When examining topics in this simple regression, we find that both measures of creativity

– topic-generation and topic recombination – are associated with higher value compared to inventions that are less creative according to these measures. Topic-generating patents have significantly higher forward citation rates compared to follow-up patents, and patents that have higher topic recombination (combine a higher number of other topics) have more forward citation rates than patents that combine few topics. Again, we find that total past experience is not associated with an invention’s value but the degree of previous targeted experience of inventors is. Note that this effect comes only when we control for the creativity of the patent.

Thus, as described above, higher degrees of prior targeted experience are not associated with

Emergence of Nanotechnology - 26 - breakthrough innovations, but once controlling for whether the invention is a breakthrough, a greater degree of prior targeted experience is actually associated with higher invention value (as measured by forward citations).

-- Insert Table 7 about here --

Thus, the topic modeling approach provides a new means for assessing the creativity of inventions and differentiating such creativity from the commercial value of the patent as proxied by citations.

5. Discussion and implications

These preliminary descriptive analyses suggest that we can use the texts of patents to trace interpretations of new technologies and patterns of technology evolution. These analyses reveal how meanings given to technologies evolve over time. They provide a new means for assessing the knowledge represented by patents, where previously the focus has been on the

USPTO (or other agency’s) patent classification system and other more distant proxies. In doing so, it responds to recent calls to understand better the appropriate use of patent data in strategy technology management research (Alcacer & Gittelman 2006, Alcacer, Gittelman & Sampat,

2009; Benner & Waldfogel, 2008; Hegde & Sampat, 2009; Jaffe, Trajtenberg & Fogarty, 2000;

Tan & Roberts, 2010).

The low correlation between topics and patent classes indicates that the classification system may not be an accurate proxy of the knowledge represented by the patent. And, the imperfect relationship between topic-generating patents that those that receive high citations also indicates that there are different kinds of “breakthroughs,” those which are cognitive and those which are associated with later market value. As a result, our analyses find results that contrast with conclusions drawn previously by scholars looking to determine the sources and trends in

Emergence of Nanotechnology - 27 - inventive activity.

The topic modeling approach treats patents not as proxies of innovation but rather as historical documents produced by inventors, lawyers, patent examiners and others. By paying attention to how these actors interpreted the technology, we usefully complement existing research on technology evolution, in particular that which draws on patent data to understand the sources and value of innovation.

Our preliminary findings using this technique suggest that the understanding of what fullerenes and nanotubes could do evolved over time, and that these understandings were often geographically clustered in specific “conversations.” Further, cognitive breakthroughs in this field tended to be from outsiders, not necessarily lone inventors, but teams that were operating outside the targeted field and often outside of traditional research and development clusters. On the other hand, the follow on inventions were more likely to emerge from existing technology clusters and larger organizations with interests in commercializing specific applications. Further refinements of this analysis are likely to shed more light on the creative inputs and the impact of inventions in this field.

In addition to calculating breakthrough inventions and creative recombination, the topic modeling approach may usefully contribute to other areas of research on science and technology such as analyses of technological distance between entities. To date, this research has primarily been conducted through an examination of the overlaps in USPTO patent classification amongst the patents of the different entities (e.g., Jaffe 1986, Ahuja 2000, Song, Almeida & Wu 2003) or citations between entities (e.g., Stuart & Podolny 1996; Mowery, Oxley & Silverman 1996).

These vectors of overlaps are constrained to indicator variables (either there is an overlap or not), however, because topic modeling provides a weight of each topic for each patent, there is an

Emergence of Nanotechnology - 28 - opportunity to evaluate the strength of the ties according to weights. This approach may be superior to the use of classifications because it tracks the actual language of the actors rather than the classifications assigned by others. It also supplements the cross-citation approach both by examining the ideas directly rather than inferring them from citation ties, and allows for the possibility that such connections occur even if specific patents are not cited.

These examples are just a few of the possibilities that will be possible with the application of topic modeling to studies in the management of technology. It is worth noting that the topic modeling methodologies are quite new, even to the field of computer science, and their application to the social sciences are even newer (Ramage et al 2009). The approach used in this paper is the state-of-the-art for which a toolkit is available. New techniques such as correlated topic modeling (which allows for the correlation of topics) (Blei & Lafferty, 2007), dynamic topic modeling (which takes into account the passage of time in computing topics) (Ahmed &

Xing, 2009; Wang et al, 2008; Zhang & Wang, 2010) and author-matched topic modeling

(Rosen-Zvi et al. 2010), may offer better representations of topics as they emerge and evolve.

Future analyses should take advantage of these approaches as they are refined and as toolkits become available.

Finally, the analyses performed here have been conducted on the abstracts of the patents alone. These methods could be extended to examine the full texts of patents, or at least the claims within them, though the computational demands would be high for the average social scientist’s computer! They also can be used equally well in the evaluation of knowledge in scientific publications and any other corpus of comparable documents. We hope that this demonstration of the use of these methods can open up further avenues for research into invention and innovation.

Emergence of Nanotechnology - 29 - References

Abernathy, W. J. & Utterback, J. M. 1978. Patterns of Industrial Innovation. Technology Review, 80(7): 40-47. Abrahamson, E. & Hambrick, D. C. 1997. Attentional homogeneity in industries: The effect of discretion. Journal of Organizational Behavior, 18: 513-532. Ahmed, A., E. Xing. 2009. Timelines: Recovering Birth and Evolution of Topics in Scientific Literature Using Dynamic Non-Parametric Bayesian Models. Working Paper, Carnegie Mellon University. Ahuja, G. & Lampert, C. M. 2001, Entrepreneurship in the Large Corporation: A Longitudinal Study of How Established Firms Create Breakthrough Inventions. Strategic Management Journal, 22: 521-544. Agrawal, A., R. Henderson. 2002. Putting Patents in Context: Exploring Knowledge Transfer from MIT. Management Science 48(1) 44-60. Alcácer, J. & Chung, W. 2007. Location strategies and knowledge spillovers. Management Science, 53(5): 760-776. Alcácer, J. & Gittelman, M. 2006. Patent citations as a measure of knowledge flows: The influence of examiner citations. Review of Economics and Statistics, 88(4): 774-779. Alcácer, J., M. Gittelman, B. Sampat. 2009. Applicant and Examiner Citations in US Patents: An Overview and Analysis. Research Policy, 38(2) 415-427. Aldrich, H. E. & Fiol, C. M. 1994. Fools rush in? The institutional context of industry creation. Academy of Management Review, 19(4): 645. Amburgey, T. L., Dacin, T., & Singh, J. V. 1996. Learning races, patent races, and capital races: Strategic interaction and embeddedness within organizational fields. Advances in Strategic Management, 13: 303-322. Anderson, P. & Tushman, M. L. 1990. Technological Discontinuities and Dominant Designs: A Cyclical Model of . Administrative Science Quarterly, 35(4): 604-633. Audretsch, D. B. & Feldman M. P. 1996. R&D Spillovers and the Geography of Innovation and Production. American Economic Review, 86: 630-640. Azoulay, P., Ding, W & Stuart T. 2007.The Determinants of Faculty Patenting Behavior: Demographics or Opportunities? Journal of Economic Behavior & Organizations, 63(4), pp. 599- 623, 2007. Banerjee, P. M. & Campbell B. 2009. Inventor bricolage and firm technology and development. R&D Management, 39 (5): 473-487. Benner, M., J. Waldfogel. 2008. Close to You? Bias and Precision in Patent-Based Measures of Technological Proximity. Research Policy 37(9) 1556-1567. Bijker, W. E., Hughes, T. P., & Pinch, T. J. 1987. The Social construction of technological systems: new directions in the sociology and . Cambridge, Mass.: MIT Press. Blei, D. M., Griffiths, T. L., Jordan, M. I., & Tenenbaum, J. B. 2004. Hierarchical topic models and the nested chinese restaurant process. Advances in Neural Information Processing Systems 16, 16: 17-24. Blei, D. M. & Lafferty, J. D. 2007. A Correlated Topic Model of Science (Vol 1, Pg 17, 2007).

Emergence of Nanotechnology - 30 - Annals of Applied Statistics, 1(2): 634-634. Blei, David M.; Ng, Andrew Y.; Jordan, Michael I (2003).Latent Dirichlet allocation. Journal of Machine Learning Research 3: pp. 993–1022. Bresnahan, T. F. & Trajtenberg, M. 1995.General purpose technologies: Engines of growth'? Journal of Econometrics, 65(1): 83-108. Burton, M. D., Sorensen, J., & Beckman, C. M. 2002. Coming from good stock: career histories and new venture formation. In M. Lounsbury& M. J. Ventresca (Eds.), Research in the Sociology of Organizations, Vol. Social Structure and Organizations Revisited: 229-262: JAI Press. Cameron, C. A., & Trivedi, P. K. 1998. Regression Analysis of Count Data. Cambridge: Cambridge University Press. Cassiman B., Veugeler R. & Zuniga P. 2008. In search of performance effects of (in)direct industry science links. Industrial and Corporate Change, 17 (4): 611-646. Chang, J., J. Boyd-Graber, C. Wang, S. Gerrish, D.M. Blei. 2009. Reading Tea Leaves: How Humans Interpret Topic Models. Conference Proceedings, Neural Information Processing Systems. Christensen, C. M. & Rosenbloom, R. S. 1995. Explaining the attacker's advantage: Technological paradigms, organizational dynamics, and the value network. Research Policy, 24(2): 233-257. Conti, R., Gambardella, A. & Mariani M. 2010. Learning to Be Edison? Individual Inventive Experience and Breakthrough Inventions. Working paper. Deng, Z., Lev, B. & Narin F. 1999. Science and technology as predictor of stock performance. Financial Analysts Journal, 53(3) 20–32. Dosi, G. 1982. Technological Paradigms and Technological Trajectories: A Suggested Interpretation of the Determinants and Directions of Technical Change. Research Policy, 11(3): 147-162. Duriau, V. J., Reger, R. K., & Pfarrer, M. D. 2007. A Content Analysis of the Content Analysis Literature in Organization Studies: Research Themes, Data Sources, and Methodological Refinements. Organizational Research Methods, 10(1): 5-34. Eisenhardt, K. M. 1989. Making Fast Strategic Decisions in High-Velocity Environments. Academy of Management Journal, 32(3): 543-576. Elsbach, K. D., Barr, P. S., & Hargadon, A. B. 2005. Identifying Situated Cognition in Organizations. Organization Science, 16(4): 422-433. Feldman, M. P. & Francis J. 2003. Fortune Favors the Prepared Region: The Case of Entrepreneurship and the Capitol Region Biotechnology Cluster. European Planning Studies, 11: 765-788. Fleming, L. 2001. Recombinant Uncertainty in Technological Search. Management Science, 47 (1): 117-132 Fleming, L. 2002. Finding the organizational sources of technological breakthroughs: the story of Hewlett-Packard's thermal ink-jet. Industrial and Corporate Change, 11(5): 1059-1084. Fleming, L. & Sorenson, O. 2004. Science as a map in technological search. Strategic Management Journal, 25(8-9): 909-928. Fleming, L. & Singh J. 2010. Lone inventors as sources of breakthroughs: Myth or reality?

Emergence of Nanotechnology - 31 - Management Science, 56(1): 41-56. Fleming, L. & Szigety M.C. 2006. Exploring the tail of creativity: An evolutionary model of breakthrough invention. Advances in Strategic Management, 23: 335-362. Furman, J., Kyle, M. K., Cockburn, I., & Henderson, R. forthcoming. Public & Private Spillovers, Location, and the Productivity of Pharmaceutical Research. Annales d'Economie et de Statistique. Garud, R. & Rappa, M. A. 1994. A socio-cognitive model of technology evolution: The case of cochlear implants. Organization Science, 5(3): 344-362. Garud, R., Jain, S., & Kumaraswamy, A. 2002. Institutional entrepreneurship in the sponsorship of common technological standards: The case of Sun Microsystems and Java. Academy of Management Journal, 45(1): 196-214. Gittelman, M. & Kogut B. 2003. Does Good Science Lead to Valuable Knowledge? Biotechnology Firms and the Evolutionary Logic of Citation Patterns. Management Science, 49(4): 366-382. Griliches, Z. 1990. Patent Statistics as Economic Indicators - a Survey. Journal of Economic Literature, 28(4): 1661-1707. Griliches, Z. 1998. R&D and Productivity: The Econometric Evidence. Chicago: University of Chicago Press. Hall, B. H., Jaffe, A. B., & Trajtenberg M. 2001. The NBER Patent Citation Data File: Lessons, Insights and Methodological Tools. NBER Working Paper 8498. Hall, B. H. 2002. A note on the bias of herfindahl-type measures based on count data. In A. Jaffe, M. Trajtenberg (Eds), Patents, Citations, and Innovations: 454-459, Cambridge, Mass: MIT Press. Hall, D., D. Jurafsky, C.D. Manning. 2008. Studying the History of Ideas Using Topic Models. Proceedings of the 2008 conference on Empirical Methods of Natural Language Processing 363- 371. Hargadon, A. B. & Douglas, Y. 2001. When innovations meet institutions: Edison and the design of the electric light. Administrative Science Quarterly, 46(3): 476-501. Harhoff, D., F. Narin, F.M. Scherer, K. Vopel. 1999. Citation Frequency and the Value of Patented Inventions. Review of Economics and Statistics, 81(3) 511-515. Hegde, D., B. Sampat. 2009. Examiner Citations, Applicant Citations, and the Private Value of Patents. Econ Lett 105(3) 287-289. Helpman, E. 1998. Diffusion of General Purpose Technologies, In E. Helpman (Ed.), General Purpose Technologies and Economic Growth: 85-117. Cambridge, Mass.: MIT Press. Henderson, R. 1995. Of life cycles real and imaginary: The unexpectedly long old age of optical lithography. Research Policy, 24(4): 631-643. Higgins, M. C. 2005. Career imprints: creating leaders across an industry (1st ed.). San Francisco: Jossey-Bass. Huff, A. S. 1990. Mapping strategic thought. Chichester, New York: John Wiley and Sons. Jaffe, A. B. 1986. Technological Opportunity and Spillovers of Research-and-Development - Evidence from Firms Patents, Profits, and Market Value. American Economic Review, 76(5): 984-1001.

Emergence of Nanotechnology - 32 - Jaffe, A. B., Trajtenberg, M., & Henderson, R. 1993.Geographic localization of knowledge spillovers as evidenced by patent citations. The Quarterly Journal of Economics, 108(3): 577- 598. Jaffe, A. B. & Trajtenberg, M. 2002. Patents, citations, and innovations: a window on the knowledge economy. Cambridge, Mass.: MIT Press. Kaplan, S. 2008. Cognition, capabilities, and incentives: Assessing firm response to the fiber- optic revolution. Academy of Management Journal, 51(4): 672-695. Kaplan, S. & Tripsas, M. 2008. Thinking about technology: applying a cognitive lens to technical change. Research Policy, 37(5): 790-805. Kaplan, S. & Radin, J. 2011. Bounding an Emerging Technology: Para-scientific media and the Drexler-Smalley Debate on Nanotech. Social Studies of Science. 41(4) pp. xx-xx. Kennedy, M. T. 2005. Behind the one-way mirror: Refraction in the construction of product market categories Poetics, 33(3-4): 201-226. Kennedy, M. T. 2008. Getting counted: Markets, media, and reality. American Sociological Review, 73(2): 270-295. Lamoreaux, N., Raff, DMG & Temin, P. 2004. Against Whig History. Enterprise and Society 5: 376-87. Lounsbury, M., Ventresca, M. J., & Hirsch, P. M. 2003. Social movements, field frames and industry emergence: a cultural-political perspective on US recycling. Socio-Economic Review, 1(1): 71-104. Mann, G. S., Mimno, D., & McCallum, A. 2006. Bibliometric impact measures leveraging topic analysis. Opening Information Horizons: 65-74. McCallum, A., Wang, X. R., & Corrada-Emmanuel, A. 2007. Topic and role discovery in social networks with experiments on enron and academic email. Journal of Artificial Intelligence Research, 30: 249-272. McCallum, A., Wang, X. R., & Mohanty, N. 2007. Joint group and topic discovery from relations and text. Statistical Network Analysis: Models, Issues, and New Directions, 4503: 28- 44. Mei, Q., X. Shen, C. Zhai. 2007. Automatic labeling of multinomial topic models. Proceedings of 2007 Conference on Knowledge Discovery & Data Mining (KDD). Meyer, M. 2007. What do we know about innovation in nanotechnology? Some propositions about an emerging field between hype and path-dependency. Scientometrics 70: 779–810. Meyer, A. D., Gaba, V., & Colwell, K. A. 2005. Organizing Far from Equilibrium: Nonlinear Change in Organizational Fields. Organization Science, 16(5): 456-473. Mimno, D., Wallach, H., & McCallum, A. 2008. Gibbs Sampling for Logistic Normal Topic Models with Graph-Based Priors. Paper presented at the Proceedings of the 2008 Workshop on Analyzing Graphs: Theory and Applications. Miyazaki, K. & Islam, N. 2007. Nanotechnology systems of innovation – an analysis of industry and academic research activities. Technovation, 27: 661–675. Murray, F. 2002. Innovation as Co-Evolution of Scientific and Technological Networks: Exploring . Research Policy, 1389-1403. Nelson, R. R. & Winter, S. G. 1977. In search of a useful theory of innovation. Research Policy, 6: 36-76.

Emergence of Nanotechnology - 33 - Nightingale P. 1998. A cognitive model of innovation. Research Policy 27(7): 689-709 Peng, F. C. & McCallum, A. 2006. Information extraction from research papers using conditional random fields. Information Processing & Management, 42(4): 963-979. Phillips, D. J. 2002. A genealogical approach to organizational life or chances: The parent- progeny transfer among silicon valley law firms, 1946-1996. Administrative Science Quarterly, 47(3): 474-506. Porac, J. F., Ventresca, M. J., & Mishina, Y. 2001. Interorganizational cognition and interpretation. In J. A. C. Baum (Ed.), Blackwell companion to organizations: 579-598. Malden, MA: Blackwell Publishers. Ramage, D., Rosen, E., Chuang, J., Manning, C., & McFarland, D. A. 2009. Topic Modeling for the Social Sciences. NIPS 2009 Workshop on Applications for Topic Models: Text and Beyond Rao, H., Monin, P., & Durand, R. 2003. Institutional change in Toque Ville: Nouvelle cuisine as an identity movement in French gastronomy. American Journal of Sociology, 108(4): 795-843. Rindova, V. P. & Fombrun, C. J. 1999. Constructing competitive advantage: The role of firm- constituent interactions. Strategic Management Journal, 20(8): 691-710. Rosen-Zvi, M., C. Chemudugunta, T. Griffiths, P. Smyth, M. Steyvers. 2010. Learning Author- Topic Models from Text Corpora. ACM Transactions on Information Systems 28(1) Article 4. Sahal, D. 1981.Patterns of Technological Innovation. London: Addison-Wesley. Santos, F. M. & Eisenhardt, K. M. 2009. Constructing Markets and Shaping Boundaries: Entrepreneurial Power in Nascent Fields. Academy of Management Journal, 52(4): 643-671. Sapir, E. 1944. Grading: A study in semantics. Philosophy of Science, 11: 93-116. Schmookler, J. 1972. Patents, invention, and economic change; data and selected essays. Cambridge, Mass.: Harvard University Press. Schneiberg, M. 2007. What's on the Path? Path Dependence, Organizational Diversity and the Problem of Institutional Change in the US Economy, 1900-1950. Socio - Economic Review 5(1) 47. Singh, J. 2005. Collaborative networks as determinants of knowledge diffusion patterns. Management Science, 51(5): 756-770. Singh, J., L. Fleming. 2010. Lone Inventors as Sources of Breakthroughs: Myth or Reality? Management Science 56(1) 41-56. Sorenson, O. & Fleming, L. 2004. Science and the diffusion of knowledge. Research Policy, 33(10): 1615-1634. Sorenson, O., Rivkin, J. W., & Fleming, L. 2006. Complexity, networks and knowledge flow. Research Policy, 35(7): 994-1017. Subramaniam, M. & Youndt, M. A. 2005.The Influence of Intellectual Capital on the Types of Innovative Capabilities. Academy of Management Journal, 48(3): 450-464. Tan, D., P.W. Roberts. 2010. Categorical Coherence, Classification Volatility and Examiner- Added Citations. Research Policy 39(1) 89-102. Trajtenberg, M. 1990. A Penny for Your Quotes: Patent Citations and the Value Of. The Rand Journal of Economics 21(1) 172. Tushman, M. L. & Rosenkopf, L. 1992. Organizational Determinants of Technological-Change - toward a Sociology of Technological Evolution. Research in Organizational Behavior, 14: 311-

Emergence of Nanotechnology - 34 - 347. U.S. Patent and Trademark Office. 2005. Handbook of Classification, Available at: http://www.uspto.gov/web/offices/opc/documents/handbook.pdf, Accessed March 29, 2011 Utterback, J. M. 1994. Mastering the Dynamics of Innovation. Cambridge, Mass.: HBS Press. Wallach, H. M. 2008. Structured Topic Models for Language. Cambridge University, Cambridge. Wallach, H. M., Murray, R. Salakhutdinov, et al. 2009. Evaluation methods for topic models. In the Proceedings for the International Conference on Machine Learning. Walsh, J. P. 1995. Managerial and organizational cognition: Notes from a trip down memory lane. Organization Science, 6(3): 280-321. Wang, C., D. Blei, D. Heckerman. 2008. Continuous Time Dynamic Topic Models. Working Paper, Computer Science Department, Princeton University. Weber, K., Heinze, K. L., & DeSoucey, M. 2008. Forage for Thought: Mobilizing Codes in the Movement for Grass-fed Meat and Dairy Products. Administrative Science Quarterly, 53(3): 529-567. Weick, K. E. 1990.Technology as equivoque. In P. Goodman & L. Sproull (Eds.), Technology and Organizations: 1-44. San Francisco: Jossey-Bass. Whorf, B. L. 1956. Science and linguistics. In J. B. Carroll (Ed.), Language, thought, and reality: Selected writings of Benjamin Lee Whorf: 207-219. Cambridge, MA: MIT Press. Wry, T., Greenwood, R., Jennings, P. D., & Lounsbury, M. 2010. Institutional Sources of Technological Knowledge: A Community Perspective on Nanotechnology Emergence. Research in the Sociology of Organizations, 29: 149-176. Zhang, X., T. Wang. 2010. Topic Tracking with Dynamic Topic Model and Topic-Based Weighting Method. Journal of Software 5(5) 482-489.

Emergence of Nanotechnology - 35 - Figure 1: and carbon nanotube

Photo credits: Buckminsterfullerene (http://www.thenanoage.com/buckminsterfullerene.htm), nanotube (Journal of Nuclear Medicine, http://jnm.snmjournals.org/cgi/content/full/48/7/1039#FIG1)

Figure 2: Population of fullerene and carbon nanotube patents

USPTO 977 cross ref class (305)

Text search (2,504) 134 50

1585 117 4

668

268 Derwent classes (1,057)

Total population = 2,826 patents

Emergence of Nanotechnology - 36 - Figure 3: Number of fullerene and nanotube patents per year, by application date

450

400

350

300

250

200

150

100

50

0 1990 1992 1994 1996 1998 2000 2002 2004 2006

Note: only those patents for which the applications were successfully granted are included here. Decline in recent years is due to right truncation as the average lag between application and grant is 34 months.

Figure 4: Example coded abstract

Patent Number: 5494558 Title: Production of fullerenes by sputtering Inventors: Bunshah; Rointan F. (Playa del Rey, CA), Jou; Shyankay (Santa Monica, CA), Prakash; Shiva (Santa Barbara, CA), Doerr; Hans J. (Westlake Village, CA) Assignee: The Regents of the University of California (Oakland, CA) Application date: August 1992, Issue date: February 1996

Abstract A process41 and system52 for producing10 fullerenes41 by sputtering41. A carbon target59 is sputtered41 to form a vapor25 of sputtered carbon . The sputtered41 carbon atoms are quenched in an atmosphere43 of inert10 gas10 and deposited onto a collection substrate19. The resulting carbon soot40 is 41 41 41 79 extracted to recover fullerenes . The process produces carbon soot which is rich in C70 and higher59 fullerenes41.

Vector of top topics: Topic 10 (producing fullerene-containing soot): 11%, Topic 41(combustion synthesis of carbon nano-materials): 78% Topic 43 (method for producing ): 5% Topic 59 (application of nano-materials in forming or absorbing beams): 4% Topic 95 (producing carbon nanotubes using metallic catalytic particles): 0.5%

Emergence of Nanotechnology - 37 - Figure 5: The relative share of top 3 topics 1991-1994 versus the top 3 topics 2003-2006

100%

90%

80%

70% topic56 60% topic17 50%

40% topic2 30% topic41 20%

10% topic30

0% 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 topic27

Note: Both the top topics and the relative shares are determined using the “weighted sum” measure. Top 3 topics 1991-1994: • Topic 27: methods to form multi-layer fullerenes • Topic 30: fullerene amine-containing polymers • Topic 41: combustion synthesis of nano-materials Top 3 topics 2003-2006: • Topic 2: production and use of nanotubes in electronics and related fields • Topic 17: application of fullerenes and nanotubes in semiconductors • Topic 56: application of fullerenes and nanotubes in field emission devices

Figure 6: Share of US and non-US corporations, universities & public research institutes

100% 10% 8% 11% Foreign University- 90% Instute

80% 21% 35% 70% 38% Foreign Corporaon 60% 23%

50% 17% 15% 40% US University- Instute 30% 48% 20% 38% 36% 10% US corporaon

0% Whole sample Top topics-1st 4 years Top topics-Last 4 years

Note: Years assessed using priority dates

Emergence of Nanotechnology - 38 - Figure 7: The evolution of the average number of references to previous US patents and references to non-patent materials over time

25

20

15

10

5

0 1990 1992 1994 1996 1998 2000 2002 2004 2006

average number of references to non-patent materials

average number of references to previous US patents

Table 1: Sample topics including the top 20 words associated with each (core fullerene dataset)

Topic 2 Topic 17 Topic 27 Topic 30 Topic 41 Topic 56 production and application of methods to form fullerene amine- combustion application of use of nanotubes fullerenes and multi-layer containing synthesis of nano- fullerenes and in electronics and nanotubes in fullerenes polymers materials nanotubes in field related fields semiconductors emission devices present forming fullerene compositions fullerenes gate materials polymers soot emission relates providing fullerenes relates process cathode provides steps derivatives useful high insulating directed depositing molecular weight hydrocarbon hole properties removing c60 containing purity display use form composition combustion insulation new comprises form novel zone emitter novel making diamond making reactor portion uses fabricating sieve present mixture focusing particular doped involves impurities cavity inventive provided provided effective hydrocarbons lines processing exposing providing equal source preparation opposite number free flame openings suitable growing enclosed described containing covering unique arrays multilayer properties continuous provided encapsulating thin-film stream oil form fed comprises n-type using greater produced triode manufacture plurality column treating aromatic using insulative improved especially higher exposed

Emergence of Nanotechnology - 39 - Table 2: Hypothetical sample of topics and the calculated number of patents in each topic using “weighted sum” and “primary topic” measures

Topic 1 Topic 2 Topic 3 Patent 1 0.22 0.13 0.65 Patent 2 0.17 0.80 0.3 Patent 3 0.52 0.6 0.42 Patent 4 0.33 0.2 0.65 Patent 5 0.12 0.79 0.9 Patent 6 0.25 0.3 0.72 Patent 7 0.62 0.5 0.33 Patent 8 0.15 0.84 0.1 Patent 9 0.10 0.65 0.25 Patent 10 0.45 0.12 0.43 Weighted Sum 2.93 3.49 3.58 Primary Topic 3 4 3

Note: Bold underlined numbers in each row show the primary topic for the patent in that row.

Table 3: Summary statistics of the two measures of topic patent count

median mean SD min max Correlation weighted sum of topics 21.69 22.87 6.88 9.14 44.61 1.00 number of patents in primary topics 21 22.87 8.69 8 47 0.83 1.00

Table 4: Total number of patents and number of topic-generating patents in each year

Year Topic-generating patents Follow-on patents 1987 1 1 1988 1 0 1989 0 2 1990 1 0 1991 11 10 1992 3 23 1993 20 27 1994 10 37 1995 5 50 1996 5 29 1997 8 41 1998 9 38 1999 12 109 2000 3 144 2001 6 281 2002 2 366 2003 0 422 2004 1 370

Emergence of Nanotechnology - 40 - Table 5: Comparison between topic-generating and follow-on patents 1991-1998

Topic-generating Follow-on Mean patents patents Difference Number of Observations 71 254

Number of inventors’ previous targeted 1.61 2.95 -1.34* inventions (3.39) (4.95) Total number of inventors’ previous 26.11 28.12 -2.00 inventions (36.91) (48.78) Topic recombination (adjusted) 0.56 0.63 -0.06 (0.43) (0.40) Number of references to previous US patents 4.34 9.28 -4.90 (4.54) (28.17) Number of references to non-patent materials 7.70 12.14 -4.45 (11.88) (22.46) Number of inventors 2.32 2.69 -0.36+ (1.27) (1.70) Claims 16.44 21.50 -5.07+ (14.02) (23.32) Assigned to Corporations 0.68 0.57 0.10 (0.47) (0.49) Assigned to Individuals 0.14 0.11 0.03 (0.35) (0.32) Assigned to Universities and Public 0.18 0.31 -0.13* Institutions (0.39) (0.46) +p<0.10; *p<0.05; **p<0.01; ***p<0.001 standard deviations are shown in parentheses

Emergence of Nanotechnology - 41 - Table 6: Differences in measures of “breakthrough” patents

(1) (2) (3) (4) (5) (6) Top 5% Top 5% Top 5% Topic Topic Topic cites cites cites generating generating generating Logistic Logistic Logistic Logistic Logistic Logistic Team dummy 0.412 0.459 0.354 -0.066 -0.016 0.062 (0.291) (0.290) (0.294) (0.330) (0.344) (0.348) Assigned dummy 1.144+ 1.150+ 1.144+ -0.685 -0.763 -0.755+ (0.596) (0.602) (0.598) (0.448) (0.460) (0.451) Ln(claims) 0.026 0.038 0.032 -0.039 0.010 0.008 (0.114) (0.114) (0.114) (0.183) (0.186) (0.184) Ln(patent references) 0.090 0.073 0.088 -0.294+ -0.284 -0.259 (0.096) (0.098) (0.097) (0.176) (0.178) (0.176) Ln(non-patent references) 0.280*** 0.269*** 0.260** -0.478*** -0.482*** -0.501*** (0.083) (0.084) (0.083) (0.147) (0.148) (0.149) Ln(avg. experience) 0.149+ 0.050 0.050 0.103 (0.079) (0.090) (0.137) (0.147) Ln(avg. related experience) 0.349* -0.305 (0.172) (0.302) Ln(total related experience) 0.239* -0.285 (0.108) (0.181) Constant -6.657*** -6.613*** -6.440*** 0.203 0.135 -0.117 (0.986) (0.989) (0.994) (0.851) (0.851) (0.798) Year fixed effects Yes Yes Yes Yes Yes Yes Observations 2063 2063 2063 325 325 325 Chi Square 99.85 106.59 100.64 36.37 36.21 35.80 Log Likelihood -393.96 -391.65 -392.68 -146.87 -146.32 -145.28

Notes: Column (1) replicates of column (3) Table 4 in Singh and Fleming (2010) using the core fullerene and nanotube sample from 1991-2005. Top 5% cites: patent top 5% in citation impact (over 4 years). Robust Standard Errors, +p<0.10; *p<0.05; **p<0.01; ***p<0.001

Emergence of Nanotechnology - 42 - Table 7: Factors associated with the value of nanotube and fullerene patents

Mean St. Dev Min Max Negative Dependent Variable: Binomial Forward citation in 4 year 4.73 8.72 0 109 Regression window (1991-2005) Topic-generating patent 0.04 0.20 0 1 0.367** (0.138) Topic recombination (adjusted) 0.69 0.36 0 1 0.029 (0.107) # domestic references* 15.94 37.73 0 712 0.125** (0.046) # non-patent references* 10.22 22.54 0 249 0.113*** (0.031) # inventors* 3.11 1.96 1 15 0.041 (0.093) # claims* 22.46 19.59 1 296 0.169*** (0.042) # inventors’ previous inventions* 28.22 49.77 0 640 0.052+ (0.028) # inventors’ previous targeted 1.81 4.821 0 62 0.071 inventions* (0.051) Corporate patent 0.64 0.48 0 1 0.520*** (0.128) University or Public Research 0.25 0.43 0 1 0.529*** Institute Patent (0.128) Assigned to individuals 0.11 0.31 0 1 Omitted Constant -1.163*** (0.262) Year fixed effects Yes Observation 2243 Chi Square 506.01 Log Likelihood -5496.558

Robust Standard Errors, +p<0.10; *p<0.05; **p<0.01; ***p<0.001 *calculated as a the natural log in the regression

Emergence of Nanotechnology - 43 -