<<

Paper to be presented at DRUID19 Copenhagen Business School, Copenhagen, Denmark June 19-21, 2019

A Firm Scientific Community

Stefano H Baruffaldi University of Bath School of Management [email protected] Felix Poege Max Planck Institute Max Planck Institute for Innovation and Competition [email protected]

Abstract The diffusion of scientific knowledge to industry is instrumental to technological change and productivity growth. In this paper, we investigate the extent to which firms participate in scientific communities' activities and whether this facilitates the exchange and transfer of scientific knowledge to firms' technological activities. We focus on two modes of interactions of firms with scientific communities: the participation to and the sponsorship of international scientific conferences. We exploit a newly constructed database comprising the set of all conference proceedings in computer science from 1996 until 2015 from a specialized computer science database (DBLP) matched to and and conference ranking information. We track knowledge transfers by patent and science citations to these proceedings. First, we document that both conference participation and sponsorship by firms is frequent and concentrated in the events of the highest quality. Contributions of firms at conferences are of higher average quality, even within individual conferences. Second, we go on to find that firms are significantly more likely to cite in their patents and papers the scientific articles presented at a conference which they attended relative to comparable articles presented at other conferences. We use airline connectivity between researcher and conference locations in an instrumental variable strategy to establish causality. A Firm Scientific Community

February 28, 2019

ABSTRACT

The diffusion of scientific knowledge to industry is instrumental to technological change and productiv- ity growth. In this paper, we investigate the extent to which firms participate in scientific communities’ activities and whether this facilitates the exchange and transfer of scientific knowledge to firms’ tech- nological activities. We focus on two modes of interactions of firms with scientific communities: the participation to and the sponsorship of international scientific conferences. We exploit a newly con- structed database comprising the set of all conference proceedings in computer science from 1996 until 2015 from a specialized computer science database (DBLP) matched to Web of Science and Scopus and conference ranking information. We track knowledge transfers by patent and science citations to these proceedings. First, we document that both conference participation and sponsorship by firms is frequent and concentrated in the events of the highest quality. Contributions of firms at conferences are of higher average quality, even within individual conferences. Second, we go on to find that firms are significantly more likely to cite in their patents and papers the scientific articles presented at a conference which they attended relative to comparable articles presented at other conferences. We use airline connectivity be- tween researcher and conference locations in an instrumental variable strategy to establish causality.

KEYWORDS: Corporate Science, Conferences 1 Introduction

The way academic scientific knowledge diffuses to industry has consequences for technological change, productivity growth and distribution. Knowledge does not flow spontaneously, nor do firms wait passively downstream to receive knowledge. Its diffusion is embodied in both informal and institutionalized voluntary interactions (Jaffe, 1989; Nelson, 1986). On the one hand, science orga- nizes in scientific communities, united by a common interest in specific topics, that share knowledge systematically through publications in reference journals and by meeting regularly in international conferences (Cetina, 1999). On the other hand, firms’ activities in the realm of science may grant them direct access to these loci of upstream knowledge flows (Fleming and Sorenson, 2004). Ex- tant literature has studied the motivations and effects of firms’ investments in science (Cohen and D. Levinthal, 1990; Cohen and D. A. Levinthal, 1989; Rosenberg, 1990; Simeth and Raffo, 2013), but the effect of participation of firms to the main activities of specific scientific communities’ remains relatively unexplored. Traditional economic models have deemed firms’ investment in science partly irrational. A num- ber of subsequent studies have pointed out that these investments serve a series of purposes: they may generate knowledge directly useful to the firm; they are a precondition to learn from external sources of knowledge (Cohen and D. Levinthal, 1990; Cohen and D. A. Levinthal, 1989); they con- stitute a strategy to establish connections and collaborations with academic institutions (Almeida, Hohberger, and Parada, 2011; Gittelman and Kogut, 2003); they increase attractiveness and re- tention of talents (Stern, 2004); and they may signal innovative capabilities (Simeth and Cincera, 2016). Interestingly, Arora, Belenzon, and Patacconi (2018) and Arora, Belenzon, and Sheer (2017) have recently noted that the average ratio between firms’ scientific publications and employees has declined. At the same time, scientific knowledge remains highly valuable for the development of technologies, as revealed by the constant share of patent citations to scientific papers. Therefore, the authors suggest that firms increasingly focus on research activities that have direct value for the internal development of technologies or that allows obtaining access to external knowledge. The involvement of firms in scientific activities is often mentioned in the past literature. However, fur- ther theoretical investigations and, in particular, empirical evidence are scant (with the exception of Vlasov, Bahlmann, and Knoben, 2016). In this paper, we investigate the effect of direct interactions of firms with scientific communities as captured by the attendance of firms to scientific conferences. We measure firms conference at- tendance by two main indicators: the authoring of publications presented at conferences (henceforth we refer to this as “authorship”) and the sponsorship of scientific confer- ences (henceforth, “sponsorship”). We refer to “participation” of a firm when it either participates by authorship or sponsorship, or both. Both firms’ authorship and sponsorship represent substantial investments and are indicative of an actual engagement into the activities of a scientific community. Semi-structured interviews with firm participants at a conference confirm this assessment. In this paper, we posit that these investments of participation and sponsorship can be of primary impor- tance.

1 First, both authorship and sponsorship respond to the need for compliance with the norms of scientific communities, where the active participation in the scientific discourse and the status of researchers and institutions play a fundamental role (Cetina, 1999; Gittelman, 2007). Second, ge- ographical and technological boundaries usually constrain firms’ search possibilities (Fleming and Sorenson, 2004). The cosmopolitan nature that characterizes scientific communities - and is mani- fest in international conferences - may constitute a unique opportunity for firms to overcome such boundaries (Gittelman, 2007). Third, numerous studies have demonstrated that physical proximity, even when temporary, can have a substantial impact on the diffusion of knowledge and the de- velopment of professional ties (Boudreau et al., 2014; Catalini, 2017). Descriptively, we provide representative evidence of the occurrence of firms’ participation to scientific conferences. Second, we provide causal evidence of the impact of participation to specific conferences on firms’ innovation activities. We build a unique database on conferences in Computer Science (CS). We use the Digital Bib- liography & Library Project (DBLP) to obtain a comprehensive list of conference proceedings publi- cations in CS with information on conference events and conference series. We obtain conference ranking information from the CORE conference portal. We merge this database with both Web of Science (WOS) and Scopus to obtain complementary bibliographic information and the list of con- ference sponsors. We focus on the period of time between 1996 and 2010. Combining these sources allows obtaining a dataset which is largely representative of all relevant conferences held world- wide in CS within this period. Patent data information is obtained from the EPO Worldwide Patent Statistical Database (PATSTAT). We merge non-patent literature citations (NPL) to scientific papers based on the database described in Knaus and Palzenberger (2018) and Poege et al. (2018). Finally, we merge affiliation information, conference sponsors, and patent applicants with firms names. Our final dataset consists of a total of 5523 firms which participate to at least one conference event, and 7503 conference events – pertaining to 1080 conference series. We find that the partic- ipation of firms to conferences both with authorship and sponsorship is a consistent phenomenon. Eighty percent of firms in our sample have participated to at least one conference and about 70% of conferences have at least one attending firm. The share of conferences with at least one corporate sponsor is about 18%. Both these figures are steady and constant over our period of observation. In addition, we find that firms participate to conferences of the highest quality and their proceedings are also among the most cited within the conferences where they participate. We analyze the probability that a firm cites conference proceedings in patents and in other conference proceedings. We construct a counterfactual sample of conference proceedings using conference proceedings presented at comparable conferences. To identify comparable conferences we use an exact matching approach considering exclusively conferences in the same year, within the same subfield of research and the same quality level. When the number of counterfactual con- ferences identified remains larger than five we further restrict the matched sample by keeping the first five conferences ranked accordingly with a measure of similarity based on the likelihood of cross-citations between conference series previous to the focal year. We deploy a two stages IV regression model. In the first stage, we model the probability that a

2 conference proceeding is presented at a given conference. To address endogeneity, we exploit ex- ogenous variation deriving from the presence of direct flight connections from the location of origin of authors (henceforth origin) to the location of the conference. We build the airline connectivity network with data from the International Civil Aviation Organization (ICAO) and from the US Bu- reau of Transportation Statistics (BTS). In the second stage, we study the probability that a firm cites a conference proceeding. Our setting allows us to control for fine-grained levels of fixed-effects (FE). This includes pair FE for the combination of the origin and sub-fields of research; origin and year FE, and firm and year FE to control for idiosyncratic time characteristics of a region and a firm; firm and origin pair FE, to control for the specific pair level relevance of scientific activities within a region for the firm; we can go as far controlling for combination of firm, origin and year FE. Importantly, within these specifications, the identifying variance derives mostly from changes in the location of conference series over time and from changes in airline connectivity for researchers in different regions and in different research fields within regions. Conditional on the set of FE, this variation is plausibly exogenous. To show this, we analyze, in an event study framework, the number of participants from an origin to a conference series over time as a function of the presence of direct flight connectivity. The analysis demonstrates that the presence of a direct flight connection has an instantaneous and statistically significant effect on the number of participants, which is not anticipated in previous years. The first stage regression analysis confirms this finding. The presence of a direct flight connection is a strong instrument for the probability of a conference proceeding being presented at a given conference. Returning to the IV regression model, we find in the second stage that conference proceedings presented at conferences where firms participate have a causal effect on future firms innovations. The effect is particularly strong for scientific citations, both to conference proceedings presented at the conference and to previous conference proceedings of the same authors. The effect on patent ci- tations is less strong and not always significant when considering citations to conference proceedings presented at the same conference. On the contrary, it remains highly significant when considering also previous conference proceedings of the same authors. One main concern is the possibility that scientific citations are merely driven by scientists citing more likely documents they see, but for research projects they would perform anyway, without an actual effect on their research trajectory. To dispel this doubt, we construct a measure of text similarity between conference proceedings and find equivalent results. Section2 discusses the context of CS in general and of conferences in particular. Section3 introduces the data. Section4 lays out our econometric strategy. Section5 presents descriptive results and section6 shows the results of our causal approach. Section8 concludes.

2 Research context

We focus our analysis on Computer Science (CS). CS is a broad research field including a wide va- riety of heterogeneous sub-fields spanning the entire spectrum between basic and applied research

3 (Ahmadpoor and Jones, 2017): from the physical properties of materials to the development of hardware and software. At the same time, the economic relevance of this field of research is un- deniable. Most of the main recent and ongoing technological revolutions stem from this field of research: from the development of computers, to the Internet and to Artificial Intelligence (AI). CS is also an ideal setting for our study for pragmatic reasons related to the role of conferences within the field. Conference proceedings constitute a primary outlet for the publication of CS research re- sults (Franceschet, 2010). As a consequence, conference proceedings in CS are better represented in bibliographical databases as compared to other research fields. This is convenient as it allows observing conferences on a large scale through bibliographic information. In addition, most confer- ence organizers attempt to ensure the participation of authors, for instance, conditioning the actual publication of proceeding papers on the physical participation of at least one author to the confer- ence. This guarantees that the information obtained from conference proceedings will largely reflect the actual composition of active participants. CS is characterized, more than other research fields, by private investments in scientific activities. Scientific contributions are also often cited in patents (Ahmadpoor and Jones, 2017). However, the field is not an outlier with respect to the relevance of scientific conferences for scientists (beyond the specific value of the publication of conference proceedings) and for the presence of firms at scientific events. We explore here descriptively the presence of firms at scientific conferences across different fields based on the affiliation information in conference proceedings in Scopus 1. Figure1 shows the share of proceedings with firm affiliation across aggregated scientific fields. In CS, 7.5% of all conference proceedings in Scopus are associated with firms. In Physics and Engineering, two other areas with a large coverage of scientific publications in Scopus, 9.7% and 11.3% of proceedings are associated with firms. The variation in fields with fewer covered conferences is larger, reaching from only 5.2% in Biochemistry/Genetics to 17.9% in Earth and Planetary Sciences. With this observation in mind, we note that CS is actually not an outlier when it comes of scientific involvement of firms 2.

1The methodology to identify conference proceedings associated with firms is an extension of the method- ology discussed in section3 to all conference proceedings in Scopus beyond CS 2Note that the overall coverage of conference proceedings in Scopus is unknown. While we can assert that coverage for CS is fairly representative of all most relevant conference series (see section3), we can not ensure this is the case for other fields. Descriptives are only qualitatively indicative and should be taken with care.

4 Biochemistry, Genetics Agriculture Medicine Computer Science Mathematics Physics and Astronomy Chemical Engineering Engineering Scientific field Materials Science Chemistry Environmental Science Energy Earth Sciences

.05 .1 .15 .2 Share of firm affiliations

Notes: Fields are identified based on ASJC codes from Scopus between 1996-2015. Largest fields (millions of proceedings) are Engi- neering (2.19), CS (1.49), Physics (0.53), Mathematics (0.30), Material Science (0.28), Energy (0.18). Smallest fields are Agriculture (0.02), Biochemistry (0.02), Chemistry (0.03), Medicine (0.03), Environmental Science (0.08). Fields with less than 15,000 items or in social sciences or humanities are disregarded.

Figure 1: Share of conference proceedings with associated firm by field

5 2.1 Understanding firms’ attendance of scientific conferences

In this paper, we capture firms’ conference attendance on a large scale through data on authorship of conference proceedings and on sponsorship of conference events. We collected qualitative evidence to better understand the reality behind these indicators. We attended the European Conference on ? Computer Vision 2018 (ECCV, https://eccv2018.org/) in Munich. The ECCV is a large biannual A conference in computer vision, a subfield of AI. In 2018, about 3500 persons attended. At the ECCV, we discussed with representatives of 13 firms and a small number of academic scientists. From the firm side, we conducted semi-structured interviews with both scientists and human resources or administration personnel. We talked to US, Asian as well as German firms, investigating the processes taking place before, during and after conferences. We could ascertain that firms’ participation in scientific conferences constitutes an actual firm- level investment and represents a true engagement into the activities of scientific communities. At the same time, we verified that both authorship and sponsorship directly correspond to real participation. The appearance of firms scientists as authors of proceedings largely coincides with their physical presence at the conference. Almost all firm scientists we interviewed declared that the acceptance of a conference proceeding where they appear as authors constitute a sufficient condition to have the permission and the financial support to participate, independent from which author presented the work. Participation without authorship is also possible but more rare. The sponsorship of the conference, in most cases, grants firms the right to install a booth at the conference venue. One or more employees among HR personnel, engineers and scientists are normally responsible to represent the firm at the booth. The intensity of a firm’s participation in a conference event may come in varying degrees. This is first of all manifest in the number of conference proceedings presented and, consequently, of scientists attending, which may vary from one or a few to several dozens. Similarly, sponsorship constitutes an additional optional investment which may take different forms in accordance with different packages offered by the conference organizers. Some firms have very large booths with several representatives present at all times and located in prominent positions within the confer- ence venue. Especially large booths would normally serve the function of providing information and disseminating material on the firm scientific activities, including on the work presented at the conference by scientists of the firm, on careers opportunities and vacancy offers, and may present products. These sponsors often also organize designated lunch or dinner events. Smaller booths would mostly feature information material and only one or two firm employees present. Few spon- sors are not physically present, but their advertisement is shown. Pictures in appendixC show impressions. In summary, firms activities at a conference can be categorized in (i) scientific activities and, (ii) branding and recruiting activities. These two categories relate to rather distinct underlying dynamics and processes. The former is reflected in the conference participation of scientists who present their work and normally interact with their academic and corporate peers. Generally, firm scientists conveyed the impression of a high degree of autonomy, having considerable freedom in the

6 decision of which conferences to attend and, to a large extent, what to present. Firm-level processes, mostly unknown to academic scientists, play a role mostly in the screening of presented work before and in the follow-up activities after the conference. Interestingly, the screening of work submitted to conferences concerns primarily a selection based on quality: most firms have in place internal systems to ensure presenting above-average scientific work. Nonetheless, this also entails guaranteeing the presence of sufficient intellectual property pro- tection. All firm scientists interviewed declared that prior to a conference presentation, firm lawyers would verify whether a patent application is necessary, to protect possible valuable inventions and to avoid compromising the future option of obtaining a patent 3. It remains that the work of firms scientists presenting at the conference appeared mostly unrelated to current product development. Research closely related to product development is normally maintained secret and performed by different organizational units. After a conference event, all firms appeared to have in place knowl- edge sharing processes. Depending on the firm, these may take the form of informal activities, such as the sharing of references among colleagues (also who did not participate to the conference). More often, researchers were expected to write more structured reports or to prepare presentations on the content of the conference to be discussed in internal meetings. Official recruiting and branding activities are mostly carried out by personnel at the conference booths and are directly connected with sponsorship. HR personnel, in particular, advertise job op- portunities, mostly for PhDs and young researchers, attend at the booth all potential candidates and schedule possible follow-up interviews after the conference. Interestingly, most HR representatives we interviewed declared that the decision of the conferences to sponsor largely follows the prefer- ences of where scientists want to present their research. The HR units then take care of preparing the material and define the main activities before the conference, and have follow-up meetings to discuss the outcomes and possible improvements after the conference. Finally, we observed that despite being distinct activities, scientific and branding activities are not fully disconnected. On the one hand, the presence of scientists at the booths is planned, in order to facilitate the conversations with potential candidates for job positions that are often interested in discussing in detail research developed by the firm. Moreover, informal connections and interactions of scientists at the confer- ence may also constitute a vehicle to reach and engage potential job candidates. On the other hand, the firm sponsorship and the personnel employed at the firm’s booths also advertise more gener- ally scientific activities of firms scientists and can offer organizational and logistic support to their scientists. To conclude, it is possible also to proxy the level of investment of a firm in a conference. As a representative example, we take the participation of Google at the Neural Information Processing Systems (NIPS) conference in 2017 for which sponsorship costs are publicly available 4. Google figured as second-tier sponsor of the conference, which corresponds to a price of 40’000$. Fifty-five proceedings presented at the conference were authored by 86 distinct scientists. Being conservative,

3In most patent juristictions, rendering public an invention generates prior art which jeopardizes the novelty of an eventual patent application also if inventors and authors of the publication are the same 4 https://medium.com/syncedreview/a-statistical-tour-of-nips-2017-438201fb6c8a

7 we can assume that 50% of them participated to the conference. Moreover, we count 5 between HR and additional administrative personnel. The conference took place in Long Beach, California and lasted for 6 days. We assume a travel cost of 130$, a daily cost for accommodation and expenses of 200$ per person, and an average yearly wage of 120’000$, to divide by 260 working days. We neglect expenses related with the preparation and submission of conference proceedings to the conference and other general costs for the preparation of the material, the booth and conference activities. This sums up approximately to 260’000$. The evidence discussed here is necessarily anecdotal. In particular, it is based on one single event where we focused our attention primarily on large firms. Moreover, we were told that the level of investment of firms at ECCV and similar conferences has risen sharply in the latest years. Nonetheless, we can very well expect that the type of firm activities carried out at other conferences would be equivalent, and, while the level of investment may have varied over time and across subfields, the nature of these activities would likely be the same. Most importantly, this evidence stands as a proof that the participation of firms to conferences constitute a substantial firm-level investment which is well approximated by our empirical quantitative data.

3 Data

We combine different data sources, whose relationships to each other are visually documented in figure2. Table1 summarizes the type of information obtained from each source.

Figure 2: Structure of the data sources

We obtain the central information on conference proceedings, conference titles, conference series, location and dates from the Digital Bibliography & Library Project (DBLP, http://dblp.uni- trier.de/). This is a database specialized on proceedings and publications in CS with very broad cov-

8 erage (Cavacini, 2015)5 and, compared to other sources, contains more consistent conference and conference series indicators. Additionally, DBLP supplies a high-quality author name disambigua- tion. (Kim, 2018) However, other relevant bibliographic information is missing. We complement DBLP data with information from WOS and Scopus regarding the affiliation of authors and paper citations. Importantly, WOS and Scopus also contain the list of sponsors associated with a particular conference.

Data source Variables DBLP Conference, conference series information including place, time and presented papers, author disambiguation CORE Conference series quality ranking, sub-fields classification WOS, Scopus Affiliation information, citations, scientific classifications of articles, sponsorship information SNPL data NPL citations from patents to conference proceedings PATSTAT Patent information, applicant and inventor names and addresses ICAO, BTS Direct flight connections, Airport regions ORBIS, GRID, EU Firm names, ownership structure, industry information Scoreboards

Table 1: Data sources.

The match between DBLP and the complete WOS and Scopus is done using the DOI and the cleaned title. Matches are verified using page numbers, publication years and author names and only matches showing sufficient overlap are kept. The achieved coverage rate is displayed in figures ?? and ??, where after 1996, rates of 70-90% are observed. The full Scopus database is only available to us from 1996 onwards, which explains the lack of coverage before. Clearly, without Scopus, the analysis would lack representativeness, but the WOS adds around 10% in all years. This forces us to restrict our period of analysis to after 1996. Combining DBLP,WOS and Scopus guarantees to obtain the largest possible coverage of bibliographic information in CS in this period of time. Necessarily, we drop conferences and conference proceedings for which no match is found in WOS and Scopus. From the Computing Research and Education portal (CORE, http://www.core.edu.au/conference-portal), we take classifications of conference series into quality levels (A?, A, B, C and Unclassified) and into subfields of CS. CORE provides expert based assessments of a comprehensive set of conferences in CS. CORE is matched manually, supported by probabilistic string matching algorithms. We retain exclusively conference series which match with CORE ranking information and also drop conference series which are unclassified. We obtain patent level information from PATSTAT. PATSTAT contains patent information for all major patent jurisdictions worldwide, including information on inventors and applicants, their lo-

5DBLP is found to have the highest coverage rate among specialized databases. WOS and Scopus have a higher coverage rate, but the information in DBLP can be expected to be more consistent.

9 cations, and patent citations. However, citations to scientific articles are only available as strings within the broader field of non-patent literature (NPL) citations. One cornerstone in our data effort is an additional dataset where references to scientific articles in NPL citations are singled out and linked with bibliometric records in both WOS and Scopus (henceforth SNPL data). The preparation of this dataset is described in Knaus and Palzenberger (2018) and documented in greater detail in Poege et al. (2018). This dataset allows us to track citations from patents to conference proceedings in our conferences sample. We match affiliations, sponsors as well as patent applicants with a custom firm database. Sources for firm data include ORBIS, the Global Research Identifier Database (GRID, http://www.grid.ac) and the EU scoreboards (https://ec.europa.eu/growth/industry/innovation/facts- figures/scoreboards_en). From ORBIS, we take US and German firms as well as firms associated with a patent in the proprietary firm-patent match by the Bureau van Dijk. We select a list of important firms as targets. These firms tend to be large and established, but likely cover the relevant portion of global CS industry and well as upstream and downstream firms. For the matching methodology, we borrow the approach from Autor et al. (2016), who use the search engine Bing to generate search results for the firm names. The similarity of search results is used to judge whether two string names come from the same firm. Additionally, standard string similarity measures are used. The advantage of this methodology is its high tolerance for writing mistakes and its ability to incorporate additional information found in the Internet, for example about merged and renamed firms. This information is combined in a supervised machine learning algorithm.6 We invest substantial manual post-processing efforts, especially to link entities that occur in multiple databases. We also code mergers and acquisitions so that the firm structure is roughly consistent in 1996-2012. We aggregate at the level of the corporate group. Data in direct flights is obtained from the International Civil Aviation Organization (ICAO, https://www4.icao.int/newdataplus) and the US Bureau of Transportation Statistics (BTS, https://www.bts.gov/). The ICAO data covers international flights, but domestic flights are not available. The US is one of the most important geographic areas for scientific activity in CS and flights are very important for US domestic travel. Hence, we use BTS data, which covers domestic US flights, as a complement. Both data sources come with a definition of market regions, usually the name of a city. When a region, such as London, Paris or Milan have several airports, this way they already come grouped into a reasonable level of aggregation. We geocode these market regions and subsequently use this level of analysis for conference and researcher locations. Conferences and researchers affiliations are uniquely assigned to a market region. We chose the busiest market region (highest passenger volume) among the geographically closest candidates within a maximum distance of 100km. We view this approach as pragmatic but conservative with respect to direct flight availability. Table2 provides an overview over the number of observations in our data. Merging DBLP with WoS/Scopus and CORE inevitably reduces the number of available observations. The sample is

6 As a basis, we use the Python package Dedupe. https://github.com/dedupeio/dedupe.

10 Observation counts All WoS/Scopus With CORE 2010 ≤ Dataset Proceedings 1617817 1444813 1000540 621320 Conference Events 22404 20361 11344 7503 Conference Series 3767 3505 1136 1080 Firms All Firms 10415 7673 5523 Participants 9608 7126 5092 Sponsors 2211 1450 1029

Notes: Observation counts for different matching steps. Fourth column is the estimation sample. Third column from the right is relevant for the descriptive part. First column: All DBLP items. Second column: DBLP items found in WoS or Scopus. Third column: Also restricting to conference series matched with CORE. Last column: Also restricting to 1996-2010.

Table 2: Observation counts biased against small conference events and short conference series, that are likely not covered in generic as WOS and Scopus, and conference series of the lowest quality, that are less likely ranked in CORE. However, we can claim that the data remain largely representative of all relevant conference events in CS in our period of observation. Table2 also shows the difference between our estimation sample with years 1996-2010 and our full sample 1996-2015. Whenever possible, we show descriptive evidence for data until 2015. Citation-based variables requires time windows in which the citations can be observed. We choose five-year windows. This truncation issue forces us to limit to the 1996-2010 sample the descriptive evidence based on scientific and patent citations and our econometric estimations. Our final dataset consists of a total of 7673 firms and 11344 conferences in the 1996-2015 period – pertaining to 1136 conference series. Overall, we work with more than one million papers presented at the conferences in our sample. Due to truncation in the main dependent variables and citation based measures, regression analyses are restricted to the sample of conferences previous to 2010, comprising 5523 firms, 7503 conference events pertaining to 1080 conference series, for a total of more than 600000 of conference proceedings.

4 Econometric strategy

4.1 Econometric model

We study how firms participation to scientific conferences affect their innovation activities as mea- sured by patent and scientific citations. A first challenge is that we only observe realized conference participations and actual citations. Second, both participation of firms and scientists to conferences is likely endogenous. A firm would normally choose consciously at which conferences to participate. At the same time, scientists both from academia and other firms, select conference events based on

11 considerations on research focus, the quality of the conference and expectations on who the other participants may be. A simple comparison of a firm citations to conferences with and without the firm among the participants would be misguided. We address these challenges first by constructing a sample of counterfactual conference pro- ceedings. For each conference where a paper is presented, we create a group of similar conferences with the same year, ranking and subfield of research. When more than five conferences fulfill these criteria, we select the first five with the highest share of cross-citation flows, as a further proxy of similarity. Accordingly, we consider conference proceedings presented at these similar conferences as a valid counterfactual. This relies on the assumption that based on their quality, sub-fields of re- search and year of publication, proceedings within these pools of similar conferences may have been presented at the other matched conferences. We take the presence of firms at a conference as given: i.e. we study the effect on the effect on citations as a function of the presence of the researcher, conditioning on the presence of the firm. Therefore a conference proceeding is retained in the re- gression sample if a firm is participating to the conference in which it was presented or is linked by matching to a conference where a firm is participating. This shift the matched sample slightly: about 20% of the full sample is dropped because no company is associated to the corresponding conference. By construction, about one out of six observations of the initial matched sample are actual participations. After focusing only on companies that are actually present, the share is at about one in three. To further address endogeneity we make use of an econometric model where we control for fine- grained levels of FE and we use an instrumental variable approach to instrument the probability that a conference proceeding is presented at a given conference. As source of exogenous variation, we use changes in the airline connectivity from scientists affiliation locations to conference venues. The cost to reach a conference venue directly affects the probability of participation to a conference series and is determined by the geographic location of the conference in a given year and by the transport connections available to that location. Specifically, we use the presence of direct flights to conference venues as instrumental variable. Direct flight connections tend to reduce costs, travel time and eliminate layovers. Flight connections have been used before as explanatory variables for behavior of both firms (Giroud, 2012) and scientists (Catalini, Fons-Rosen, and Gaulé, 2016). Similarly to these studies we have to worry that the introduction of direct flights may be itself a function of demand shocks that are time and pair level specific between locations. However, differently from these studies, in our context, connectivity relates to destinations that are in most of the cases a third different location distinct from the location of firms and scientists. In other words, while we use connectivity between scientists locations and conference venues as instrument, we look at interactions between scientists and firms likely located in areas different from the conference venue. This feature also reduces concerns related with the exclusion restriction, i.e. the possibility that the instrument may affect the outcome variables through different channels than the endogenous variable of interest. In the following we present our econometric model in detail and discuss more formally how we address identification concerns.

12 Figure 3: Empirical setup

We implement a two-stage regression model. The first stage equation 4.1 models the probability that a proceeding p is presented in a conference series c where firm f is participating. Presentation is a function of the presence of a direct flight between the location of the authors of p and the location of c, in the year of the paper. Note that as conference proceedings p are nested in years, we do not have another index for years. Since the estimations are conditional on firm participation, the index f is also relevant for variables that are defined on a proceeding-conference series level such as D p presented at c f pc. { } First stage:

D p presented at c f pc = β1 D direct flight f pc + β2X f pc + uf pc (4.1) { } { } In the second-stage equation 4.2 we consider as dependent variables proxies for the extent to which innovation activities of firms rely on the research findings described in the paper p. For instance, in our main set of models, we look at the number of citations by the firm f to proceeding p (at actual or counterfactual conference c). Second stage:

log(1 + N f citations to p f pc) = γ1 D p presented at c f pc + γ2X f pc + εf pc (4.2) { } { }

X f pc is a vector of control variables and FE controls. Since X f pc contains at least fixed effects for years, firms, proceeding locations and conference series, we omit the intercept in the specification. To clarify our identification strategy, we represent our empirical setup in a stylized scenario in figure3. A researcher has the choice between conferences A and B taking place in two different locations. A direct flight connection exists to reach conference A location, from the region of ori- gin of the researcher, and, hence, the probability of attendance increases relative to conference B, where no direct flight exists. At least partly different firms will be present at conference A and con- ference B. Our test is whether a firm present at conference A has a higher likelihood of citing the researcher compared to a firm present at conference B. Or, in an equivalent framing, whether a firm participating in A is more likely to cite the researcher participating to A as compared to a researcher

13 participating in B. To the extent that the presence of a direct flight to the conference A instead of B affects the probability of participation and can be conceived as exogenous to the likelihood of citations from the firm to the researcher, the model would identify a causal effect. However, this example abstracts, first of all, from the heterogeneity across different researchers, firms and conferences, and from general time trends. For instance, in a most naive approach without additional controls, we would have to worry that most innovative firms, as well as most productive researchers, will be located in specific geographic regions, likely better connected to destinations where the best conferences take place, and better connected to each other. A positive coefficient would likely reflect the location in more innovative geographic areas, rather than an effect of the conference. Accordingly, in our - line specification we include FE controls for all the main panel dimensions in our data: firms, region of origin of researchers, years and conference series. We develop further our model, to account also for pair-level specific and time specific variation that could potentially bias the results. First, conference series focus normally on specific research sub-fields. While origin FE broadly control for the strength of a region in CS, regions may be specialized in these sub-fields. This is a concern to the extent that conference events will not take place in random locations, but more likely in locations that the communities they refer to can reach. To account for this possibility, we include region and sub-field pair FE. Second, we want to account for the possibility that firms and researchers are co-located or located in proximate regions. A major concern, in this case, would be the possibility that connectivity to conference venues would be correlated to the connectivity between the firm and researchers locations. We control for firm-region of researchers pair FE, to control generally for all pair specific features between the firm and the location of scientists. Moreover, we control for a set of pair level variables related with geographic distance which are detailed in the following section (4.1). We focus then on additional sources of variation relative to the location of researchers that may imply a spurious correlation between the likelihood of researchers participation and the existence of direct flights to conference venues. We account for local time specific shocks of researchers regions. Economic and innovation trends may be region and time specific, and better infrastructures, includ- ing transportation networks, may follow such trends. The presence of direct flights would correlate with the quantity and quality of scientific activities in a region. We can control for FE at the pair level of the year and region of origin of researchers to fully absorb this variation. A second general concern relates with firm level source of variation over time. To control for firm-level shocks over time, we are able to control for year-firms FE controls. This set of controls captures a large part of the data variation besides the one generated by the conference location and the emergence over time of new direct flight connections to conference venues for researchers in the relevant research fields. Finally, we can go as far as controlling for firm, region of researchers and year FE that allows controlling for changes in the connectivity between firms and researchers over time. In this latter case, the only identifying variation derives from researchers within the same region but in different sub-fields and by firms participating to multiple conferences. More specifically, the firms may be exposed to different research from one specific

14 region, due to differences in how the region is connected to conferences in different sub-fields where the firm participates. Due to the structure of our data and analyses, we have to account for serial correlation and within groups correlation of the standard errors. In our main specification we cluster standard errors at the level of region of origin of scientists. We show in appendix tables that alternative specifications where we use, paper by firm clusters, or firm level clusters, or origin and conference location pair clustered standard errors provide equivalent results. In the appendix we also report the comparison between OLS specifications where the variable of interested is treated as exogenous and our model specification.

4.2 Definition of variables

Dependent variables

Patent citations (Pat): We denote with the abbreviation Pat a first category of dependent variables based on citations from patent documents of the focal firm. This count the number of citations re- ceived by conference proceedings from patents of the firms filed after the conference. We aggregate SNPL citations on the patent family level. We consider patent families with priority years up to five years after the conference year. For this reason, this variable is defined for conferences up to the 2010, as the observation of citations to conference proceedings of subsequent year would be im- possible (truncation). This may be insufficient to eliminate truncation. SNPL citations are added to patent families over time in subsequent publications of the patent (e.g. grant publication and inter- national filings), both by examiner and applicants. As grant lags of several years are not uncommon, many citations may remain unobserved for the latest conferences in our sample. We ensured that our result are not affected by this issue by running regressions for a sample of conferences up to 2009 and 2008. Science citations (Sci): We denote with the abbreviation Sci a second category of dependent variables based on citations from other conference proceedings of the focal firm. Also in this case, we restrict the count of citations to a 5 years time window after the year of the conference. Truncation concerns are limited in this case, because all possible citations originating by subsequent conferences within this window of time are actually observed in our data. We adopt two different variants of these variables. The first variant, labeled dir (Pat (dir) and Sci (dir), respectively), refer to the count of citations to the exact same conference proceeding presented (or counter-factually presented) at the conference were a firm is participating. The second variant takes into account also citations to CS publications from the same authors of the focal conference proceedings, published in the previous five years. In other words, we consider citations to the recent bibliography of the authors of conference proceedings presented at a conference. Also in this case, we consider citations generated from the year of the conference where the firm participates up to five years later: we exclude citations occurring before the conference. This variable is labeled as bib (Pat (bib) and Sci (bib)). These variables have first of all a practical justification. Patent citations,

15 in particular, are a fairly rare event, and the probability of a conference proceeding being cited by a patent is a rather sparse variable. Considering the recent bibliography of the authors induce more variation in the variable. From a theoretical stand point, this takes into account the possibility that by learning from a given conference proceeding, or from their authors, firms may develop innovations not strictly related with that proceeding, but rather within the broader line of research associated to it. For science citations, we consider a third variant. We want to understand whether follow-on science by firms has technological relevance. To do so, we create patent-weighted science citation counts. For each conference proceeding p by a firm f which cites the focal conference proceeding, we count the number of patents which cite p. We consider patents with priority year at or within five years of the publication year of p. In a first variant of this variable, we consider patents by all firms, F and others (Sci (PatW)). In a second variant of this variable, we consider only patents by firm F (Sci (PatW, same)). Finally, we consider a last variant specific to patent citations: indirect patent citations. A firm might not use the focally observed proceeding as knowledge input for a patent, but instead keep track of the literature and use subsequent contributions. We therefore take all CS publications P published in the year of the conference up to five years later. For the variable, we count the num- ber of patents by the firm which cite any publication P within five years of the publication of P. Consequently, we create one variable, Pat (indir). Similarly to the variables bib above, this vari- able is meant to capture firm’s innovation activities possibly triggered by the exposure to a given conference proceeding content, but that do not lead to direct patent citations to the specific con- ference proceeding at the conference. In contrast to bib however, this variable considers the future knowledge stream related to the focal conference proceeding.

Figure 4: Outcome variables

Notes: Orange: Focal conference proceeding or bibliography of the authors of the focal conference proceeding. Black: Patents of scientific articles by the focal firm. Gray: Any Scientific articles. Diamonds: Patents. Round rectangles: Scientific articles.

16 Main variables

Attendance. The main endogenous variable of interest is the actual presentation of a conference proceeding in a conference where a firm is present. This is a variable constructed on the basis of the counterfactual matched sample as described in the section above (4.1). It takes value equal one if the paper is actually presented at the conference and zero otherwise. Direct flight. This variable equals one if a direct flight is available between the region of origin of researchers of a conference proceeding and the location of the conference venue. It is used as instrumental variable in the first stage regression (see 4.1). Geographic distance controls. We control for specific control variables that capture the geographic distance between regions where researchers are located and a conference location. In particular, we control for the logarithm of the geographic distance, a dummy equal one if the conference takes place in the same region of the researcher, and a dummy equal one for locations within the US 7. Geographic distance is measured as the minimum great circle geodetic distance between the conference venue and the location of the authors of the paper.

5 Descriptives

We present descriptive characteristics of our data. We focus on the geographic location of confer- ences, the growth trends of different subfields of research and of the frequency and typologies of firms participation. We can present reliable descriptives for the years 1996-2015. However, due to truncation, the estimation sample for citation-based regressions only contains the years 1996-2010. Figure 5a shows the distribution of conference locations over time and regions. While the conference proceedings covered in our dataset have grown substantially in numbers, non-US locations have be- come more prominent. In 1996, half of all conferences were taking place in the United States, by 2015 it was about a quarter. In the appendix, we provide more detailed figures conference locations and the location of re- searchers. Figure 10 shows frequency-weighted scientist locations and figure 11 shows frequency- weighted conference locations. Both figures are created based on the final analysis sample, so that the locations correspond to the airport regions that scientists and conferences are assigned to. In- terestingly, some regions are much more represented as conference locations rather than as origin of researchers. Generally, regions in Northern Europe, Northern US and Asia tend to produce more papers than they host conference, while the opposite is true for Southern Europe and Southern US. Prominent examples are locations in Florida, Southern France and Hawaii. The strong presence of Vancouver as a conference location is also interesting and can be explained by the multi-year presence of the large AI conference “NIPS” in that location. We use CORE data to group conferences into subfields of CS. Figure 5b shows the most rele- vant fields. The size, in terms of published conference proceedings, of all fields has increased over

7The latter control variable is introduce to take into account a feature of our data, that contains complete flight information also for US domestic flights but mostly international flights for other countries

17 h rvlneo NLcttoso r aet oad ofrnepoedns fe 00 the 2010, After proceedings. conference towards patents firm of citations SNPL of prevalence the fields. CS and ranks across for stable frequent are more somewhat are whereas Collaborations firms, between academia. collaborations ( are proceedings proceedings of all 20% of on 5% around Conditional than academia. less with proceeding, collaborating collab- firm firms firms a more more or being or are one two by proceedings by produced all produced are of are 5% around proceedings 6% orating, all around of dataset, 1% our academic Around In and collaboration. firm in firms. between produced different coauthored of also scientists can between Proceedings or firm. scientists one least at with associated are likely more are As events larger sponsor. that as implies firm this one level, sponsored. least be event at to conference have percent the Twenty proceedings, on of affiliation. and defined author 30% is conferences as to sponsoring all firm 20 associated of to one 80% corresponding least Figure than at conferences, have More patent). of papers on all least proceedings. of at conference 10% in than of citation of more share the share or the the firm instead (considering one presents level least 6b conference at the of on participation the this with shows conferences 6a 11. Figure table in phenomena. available consistent are all names full in secondly, their and and, subfields Intelligence CS Artificial of in list observed full be The can Systems. increases Information prominent most the but time, code. field CORE first their to assigned are 5b panel in Notes:

Number of proceedings

eshow we 6, figure In patents. firms to related often are proceedings conference that find also We they when proceeding firm a be to proceedings consider we below, analyses all in and Here are conferences scientific of sponsorship their and conferences scientific to firms of Participation 0 20,000 40,000 60,000 80,000 nyDL ofrne ihmthdWbo Science of Web matched with conferences DBLP Only 1996

a ofrnelocations Conference (a)

USA 2000

EU

2005 iue5 ofrnelctosadfields and locations Conference 5: Figure

<

5 n19,2%i 05 r olbrtosbtenfim and firms between collaborations are 2015) in 25% 1995, in 15% Asia

2010

Other

2015 / cpsatce swl sCR nomto r osdrd Conferences considered. are information CORE as well as articles Scopus 18

Number of proceedings 0 5000 10000 15000 20000 25000 A b ieo ifrn Sfields CS different of Size (b) ? Communication Data AI ulctos u tews patterns otherwise but publications, 2000 2005 Year Information Sys Distributed Comp Theory 2010 Hardware Software 2015 rate drops quickly, which is likely due to a residual effect of truncation. 8 Especially the share of conference proceedings cited by firms’ patents decreases over time. However, the absolute number of conference proceedings cited by patents has increases, while the total number of conference proceedings has grown proportionally more.

Firm affiliation Cited by firm patent Firm affiliation Cited by firm patent Firm sponsor Firm sponsor .4 1 .8 .3 .6 .2 .4 Share of conferences Share of proceedings .1 .2 0 0 2000 2005 2010 2015 2000 2005 2010 2015 Conference year Year

(a) Firm activity by conferences (b) Firm activity by proceedings

Notes: Due to truncation we restrict the data based on patent citations to the year 2010 and before. The sudden drop in 2010 in Figure 6a is still likely the artifact of citations data truncation.

Figure 6: Firm activity

Finally, we find that firm contributions are concentrated at events of high quality and stand out for quality also within those events. Figure 7a shows that the share of firm contributions at conferences by conference rank. At top outlets in terms of prestige and quality, A?, more than 15% of contributions are by firms. At levels A and B, around 10% of contributions are by firms and at C-level conferences, only around 7% of contributions are by firms. Figure 7b shows that within conference ranks, firm proceedings are of exceedingly high quality. Within A? conferences, firm proceedings are able to gather on average five more forward citations within five years compared to non-firm proceedings. These descriptive results are reinforced by regression analysis presented in appendix table 12. We include as regressors a dummy indicating whether at least one author is affiliated to a firm, Firm, a dummy indicating whether the conference where a conference proceeding is presented is sponsored by a firm, Sponsor, and one dummy indicating whether the presenting firm is also a sponsor. In all regressions, we control for year FE. In columns (3) and (4), conference series FE capture time invariant quality and field differences across conference series. In columns (5) and (6), conference event FE also leave out all variation except within individual events.

8Considering patent families with priority years up to five years after the conference year should allow to observe all patent references to conference proceedings in 2010, occurring within a five years time window. However, SNPL citations are added to patent families over time, also after the first patent filing (e.g. at grant or for international filings). Hence, the drop in 2010 is likely still dependent on truncation. In general, descriptive results towards the end of the sample have to be taken with great care and we choose not to show citation rates after 2010.

19 ypriiaigt cetfi ofrne.Tedsrpieeiec nti eto eosrt that demonstrate section this communities in scientific evidence of descriptive activities findings The the qualitative in conferences. our engage scientific with fully to aligns firms participating that evidence by suggests this contrary, further and the comparable 2.1 On likely section more in research. be of would body contributions their overall con- of the firm quality to considerations, the additional but much present in allow without be simply get work would would to tributions own firms enough their If be present conference. to would the freedom This pre- within researchers knowledge or participate. access research to and own researchers sufficient their with relevance contact presenting minimum without a conferences of a would attend However, results conferences only to senting participation could research. firm Firms of follow-on ways and many plausible. as visibility be given, more cre- not sponsoring to is finding that leads descriptive case which this the priori, effect be also halo could additional It an signals. ates quality strong reinforce citations to more sponsor 11% especially receive sponsor. sponsoring also also not does is firm firm the the where proceedings within where to that citations compared shows firm 6 Column event, citations. presenting conference many When especially a 6. receives and 5 paper columns the in coincide, included be sponsoring cannot and it 1 level, likely columns conference more the (compare at sponsor citations defined is to of sorship number seem higher not a do with firms events series, conference conference within However, conferences. quality in seen one as conference least same at the with in proceedings associated other are 5. than column which citations articles more authored 4.4% scientific proceedings roughly Overall, events, receive firm conference citations. high- single more the and receive of series series firms conference conference by in within research also present but to quality, tend est firms that remains implies but This FE, event significant. conference and highly FE series conference for controlling when decreases coefficient a hr ffimcnrbtosb ofrnerank conference by contributions firm of Share (a)

Company affiliation efidta ofrnepoednsatoe yfim eev naeaemr iain.The citations. more average on receive firms by authored proceedings conference that find We hs ecitv eut ontipyaycuaiy o ntne tcudbt eta firms that be both could it instance, For causality. any imply not do results descriptive These high- among concentrated is sponsorship corporate that suggest sponsorship on results The 0 .05 .1 .15 2000 A* 2005 A Year iue7 ult ffimcontributions firm of Quality 7: Figure B 2010 C 2015 20

b iaincut yfim,cneec rank conference firms, by counts Citation (b) 0 5 10 15 A* A No firmauthor / n 3 and 2 B / Firm author ) ic spon- Since 4). C Unclassified their participation has been a significant and over-time consistent phenomenon.

6 Empirical analysis and results

6.1 First stage regression results: the effect of direct flights on confer- ence participation

The first stage regression results of our econometric model confirm a strongly significant effect of the instrumental variable. In Table3 we report the first-stage regression results for our main model specification and in correspondence of our first outcome of interest. First-stage regression results are almost identical for the other outcomes. In all specifications we find a highly significant and positive coefficient for the variable Direct flight. The magnitude is economically meaningful, implying that the existence of a direct flight leads to an increase in the probability of a paper being presented in a conference of about 2 percentage points. Since, by construction,9 this probability is about 30% in the sample, this corresponds to a 6.6% increase in probability. The F-test value on the excluded instrument always exceeds a value of 20 and is often substantially higher, depending on the specification.

9Recall that we match each actual attendance with up to five counterfactual attendances. The attendance probability is however larger than one over six as we only consider conferences where firms actually attended. We also consider only firm year observations where firms have at least one citation activity. This skews the dataset more towards actual× conference participations.

21 (1) (2) (3) (4) (5) Attendance Sci (dir) Sci (dir) Sci (dir) Sci (dir) Sci (dir)

Direct flight 0.020∗∗∗ 0.019∗∗∗ 0.019∗∗∗ 0.019∗∗∗ 0.020∗∗∗ (0.003)(0.003)(0.003)(0.003)(0.003) log(Dist from-to) 0.030∗∗∗ 0.029∗∗∗ 0.030∗∗∗ 0.030∗∗∗ 0.030∗∗∗ −(0.002)(−0.002)(−0.002)(−0.002)(−0.002) Domestic non-US 0.130∗∗∗ 0.131∗∗∗ 0.135∗∗∗ 0.134∗∗∗ 0.131∗∗∗ (0.009)(0.009)(0.009)(0.009)(0.009) Dist=0 0.079∗∗∗ 0.070∗∗∗ 0.074∗∗∗ 0.077∗∗∗ 0.076∗∗∗ −(0.022)(−0.021)(−0.021)(−0.021)(−0.021) Firm FE Yes Origin FE Yes

22 Year FE Yes Yes Conf Ser FE Yes Yes Yes Yes Yes Origin Field FE Yes Yes Yes Year ×Origin FE Yes Year × Firm FE Yes Origin× Firm FE Yes Yes Year ×Origin Firm FE Yes Yes × × Cluster Origin Origin Origin Origin Origin Number clusters 1735 1676 1674 1673 1673 Adj. R-Square 0.214 0.234 0.240 0.249 0.241 Observations 22123130 21996508 21996470 21789739 21789792

Table 3: First stage

Notes: Only firm-year observations with at least one non-zero y variable are considered. Estimates are conditional on firm participation to conferences, i.e. only citation behavior of firms actually present at conferences is analyzed. Despite the level of detail of our controls, some concerns to identification still remain. In par- ticular, highly specific shocks at the level of sub-field, region, year, and firm cannot be ruled out as this is ultimately the level of variation used for the estimation. For instance, a concern may be that researchers in a specific sub-field and from a specific region become suddenly more inclined to participate to the events of a particular conference series, and that connectivity from the region to the conference location is a function of this change. This could be the case either because airlines companies foresee this demand or because the conference organizers pick the conference location to reach these researchers more easily. In general, we deem this possibility unlikely, as the location of conferences and the emergence of new airlines connectivity, conditional on our set of FE controls, are more likely driven by broader and exogenous supply shocks. To provide further evidence of the validity of our approach we ex- plore the dynamic effects of the presence of a direct flight connection to a conference series on the participation of researchers. We do this in an event-study framework. The methodology and graph- ical results are presented in appendixA. Importantly, we find no evidence that the presence of a direct-flight connection to a conference series venue is anticipated by participation of researchers in the previous year. The number of participants from a region to a conference series increases sharply in the year where a direct flight is present.

6.2 Second stage regression results: the effect of conference partici- pation on firms’ innovation

We present the result of the second stage regressions in a series of tables from4 to 10. These re- gression tables present results, in order, for the dependent variables: direct citations from patents (Pat(dir)), citations from patents to recent authors bibliography (Pat(bib)), direct citations from conference proceedings (Sci(dir)), citations from conference proceedings to recent authors bibliog- raphy (Sci(bib)), direct citations from conference proceedings, weighted by patent citations by any firm (Sci(PatW)), direct citations from conference proceedings, weighted by patent citations by the same firm (Sci(PatW, same)), indirect patent citations via any conference proceeding (Pat(indir)). For details, see section 4.2. For each table, column 1 presents the base-line model with controls for simple FE for firms, origin of researchers, year and conference series. In column 2, we add FE for the combination of origin and sub-field of research and the pair of origin and firms. In column 3, we add time specific FE for the origin and the firm. In column 4 and 5, we include FE for the combination of year, origin and firm. In column 4, differently with column 5, we maintain the control of FE for region and research sub-fields.

23 (1) (2) (3) (4) (5) Pat (dir) Pat (dir) Pat (dir) Pat (dir) Pat (dir) Attendance 0.004 0.004 0.001 0.000 0.000 (0.003)(0.003)(−0.003)(0.003)(0.003) 1st Stage Controls Yes Yes Yes Yes Yes Firm FE Yes Origin FE Yes Year FE Yes Yes Conf Ser FE Yes Yes Yes Yes Yes Origin Field FE Yes Yes Yes × 24 Year Origin FE Yes Year × Firm FE Yes Origin× Firm FE Yes Yes Year ×Origin Firm FE Yes Yes × × Cluster Origin Origin Origin Origin Origin Number clusters 1732 1661 1659 1658 1660 Adj. R-Square 0.006 0.006 0.010 0.027 0.026 Observations 14571177 14540485 14540433 14458552 14458636 F (Cluster) 40.9 41.0 40.9 42.4 43.2

Table 4: Direct citations (Patents)

Notes: Only firm-year observations with at least one non-zero y variable are considered. Estimates are conditional on firm participation to conferences, i.e. only citation behavior of firms actually present at conferences is analyzed. (1) (2) (3) (4) (5) Pat (bib) Pat (bib) Pat (bib) Pat (bib) Pat (bib)

Attendance 0.081∗∗ 0.070∗∗ 0.065∗∗ 0.071∗∗ 0.105∗∗∗ (0.038)(0.035)(0.028)(0.028)(0.029) 1st Stage Controls Yes Yes Yes Yes Yes Firm FE Yes Origin FE Yes Year FE Yes Yes Conf Ser FE Yes Yes Yes Yes Yes Origin Field FE Yes Yes Yes × 25 Year Origin FE Yes Year × Firm FE Yes Origin× Firm FE Yes Yes Year ×Origin Firm FE Yes Yes × × Cluster Origin Origin Origin Origin Origin Number clusters 1637 1575 1575 1573 1573 Adj. R-Square 0.106 0.153 0.162 0.180 0.165 Observations 14813844 14754562 14754520 14620303 14620355 F (Cluster) 38.3 38.8 38.9 41.8 43.3

Table 5: Bibliography citations (Patents)

Notes: Only firm-year observations with at least one non-zero y variable are considered. Estimates are conditional on firm participation to conferences, i.e. only citation behavior of firms actually present at conferences is analyzed. (1) (2) (3) (4) (5) Sci (dir) Sci (dir) Sci (dir) Sci (dir) Sci (dir)

Attendance 0.009 0.016∗∗ 0.014∗∗ 0.017∗∗ 0.016∗∗ (0.007)(0.006)(0.006)(0.007)(0.007) 1st Stage Controls Yes Yes Yes Yes Yes Firm FE Yes Origin FE Yes Year FE Yes Yes Conf Ser FE Yes Yes Yes Yes Yes Origin Field FE Yes Yes Yes × 26 Year Origin FE Yes Year × Firm FE Yes Origin× Firm FE Yes Yes Year ×Origin Firm FE Yes Yes × × Cluster Origin Origin Origin Origin Origin Number clusters 1735 1676 1674 1673 1673 Adj. R-Square 0.044 0.050 0.055 0.053 0.051 Observations 22123130 21996508 21996470 21789739 21789792 F (Cluster) 49.4 45.7 46.4 47.1 49.6

Table 6: Direct citations (Science)

Notes: Only firm-year observations with at least one non-zero y variable are considered. Estimates are conditional on firm participation to conferences, i.e. only citation behavior of firms actually present at conferences is analyzed. (1) (2) (3) (4) (5) Sci (bib) Sci (bib) Sci (bib) Sci (bib) Sci (bib)

Attendance 0.175∗∗ 0.172∗∗∗ 0.132∗∗ 0.150∗∗∗ 0.232∗∗∗ (0.073)(0.065)(0.056)(0.056)(0.060) 1st Stage Controls Yes Yes Yes Yes Yes Firm FE Yes Origin FE Yes Year FE Yes Yes Conf Ser FE Yes Yes Yes Yes Yes Origin Field FE Yes Yes Yes × 27 Year Origin FE Yes Year × Firm FE Yes Origin× Firm FE Yes Yes Year ×Origin Firm FE Yes Yes × × Cluster Origin Origin Origin Origin Origin Number clusters 1639 1591 1591 1589 1589 Adj. R-Square 0.233 0.295 0.316 0.324 0.301 Observations 21740193 21480533 21480507 21164046 21164080 F (Cluster) 39.8 38.7 38.7 41.9 44.0

Table 7: Bibliography citations (Science)

Notes: Only firm-year observations with at least one non-zero y variable are considered. Estimates are conditional on firm participation to conferences, i.e. only citation behavior of firms actually present at conferences is analyzed. (1) (2) (3) (4) (5) Sci (PatW) Sci (PatW) Sci (PatW) Sci (PatW) Sci (PatW) Attendance 0.001 0.001 0.003 0.003 0.003 −(0.004)(0.004)(0.004)(0.005)(0.004) 1st Stage Controls Yes Yes Yes Yes Yes Firm FE Yes Origin FE Yes Year FE Yes Yes Conf Ser FE Yes Yes Yes Yes Yes Origin Field FE Yes Yes Yes × 28 Year Origin FE Yes Year × Firm FE Yes Origin× Firm FE Yes Yes Year ×Origin Firm FE Yes Yes × × Cluster Origin Origin Origin Origin Origin Number clusters 1636 1577 1576 1574 1574 Adj. R-Square 0.011 0.015 0.019 0.033 0.032 Observations 12935845 12902385 12902345 12827388 12827431 F (Cluster) 38.8 38.4 39.6 41.7 43.6

Table 8: Patent-weighted science citations

Notes: Only firm-year observations with at least one non-zero y variable are considered. Estimates are conditional on firm participation to conferences, i.e. only citation behavior of firms actually present at conferences is analyzed. (1) (2) (3) (4) (5) Sci (PatW, same) Sci (PatW, same) Sci (PatW, same) Sci (PatW, same) Sci (PatW, same) Attendance 0.001 0.000 0.002 0.003 0.003 −(0.003)(0.003)(0.003)(0.003)(0.003) 1st Stage Controls Yes Yes Yes Yes Yes Firm FE Yes Origin FE Yes Year FE Yes Yes Conf Ser FE Yes Yes Yes Yes Yes Origin Field FE Yes Yes Yes × 29 Year Origin FE Yes Year × Firm FE Yes Origin× Firm FE Yes Yes Year ×Origin Firm FE Yes Yes × × Cluster Origin Origin Origin Origin Origin Number clusters 1633 1567 1567 1565 1565 Adj. R-Square 0.008 0.014 0.017 0.031 0.030 Observations 8691380 8678394 8678341 8644087 8644162 F (Cluster) 35.2 34.1 34.8 36.6 38.0

Table 9: Patent-weighted science citations (patents by same firm)

Notes: Only firm-year observations with at least one non-zero y variable are considered. Estimates are conditional on firm participation to conferences, i.e. only citation behavior of firms actually present at conferences is analyzed. (1) (2) (3) (4) (5) Pat (indir) Pat (indir) Pat (indir) Pat (indir) Pat (indir)

Attendance 0.003 0.006 0.012∗ 0.014∗∗ 0.016∗∗ (0.009)(0.008)(0.007)(0.007)(0.007) 1st Stage Controls Yes Yes Yes Yes Yes Firm FE Yes Origin FE Yes Year FE Yes Yes Conf Ser FE Yes Yes Yes Yes Yes Origin Field FE Yes Yes Yes × 30 Year Origin FE Yes Year × Firm FE Yes Origin× Firm FE Yes Yes Year ×Origin Firm FE Yes Yes × × Cluster Origin Origin Origin Origin Origin Number clusters 1636 1573 1572 1570 1570 Adj. R-Square 0.036 0.047 0.055 0.068 0.064 Observations 12680404 12648058 12648014 12561543 12561599 F (Cluster) 35.3 37.2 37.9 40.5 41.8

Table 10: Indirect patent citations

Notes: Only firm-year observations with at least one non-zero y variable are considered. Estimates are conditional on firm participation to conferences, i.e. only citation behavior of firms actually present at conferences is analyzed. All outcome variables are in log terms. Therefore, all reported results are semi-elasticities indi- cating the effect of a shift from not observing the focal conference proceeding to observing it. With respect to patenting activity, we find no effect on patent citations to the focal conference proceeding table4). In all specifications, we can reject effects of larger than 1%. Instead, we ± find significant and positive coefficients for citations to the recent bibliography of the authors of these conference proceedings (table5). Point estimates vary between 6.5% and 10.5%. Results are robust across different specifications. Scientific activities of firms are affected both when looking at citations from conference proceedings of firms to the focal conference proceeding (table6) and to the bibliography of authors (table7). Significance levels and magnitudes of coefficients are stable in both specifications, except in column one of table6 where a smaller magnitude of the coefficient leads to insignificant results. The results indicate effect sizes of 1.4-1.7% for direct science citations and 13.2-23.2% for bibliography citations. When introducing patent weights to the science citations (tables8 and9), results are similar to direct patent citations. Coefficients are small and insignificant. In fact, we can typically reject effects of larger than 1%. When considering indirect patent citations ± (table 10), we find mixed coefficient sizes and significance levels. Column one and two have small coefficients which cannot be distinguished from zero, whereas columns 3-5 feature effect sizes of 1.2-1.6%, comparable to direct science citations. The results constitute substantial evidence that exposure to conference proceedings and confer- ence events influence firms’ innovation. That is true for both scientific and technological activities. Yet, firm science and firm patenting does not seem to be completely intertwined. When weighting firms scientific publications by forward patent citations to these publications (tables8 and9) we do not find significant results. This suggests that the scientific publications by firms which are triggered by the participation to conferences are not more likely cited by patents than other firms scientific publications. Moreover, the effect on patenting activities is more evident when looking at citations to the broader set of citations to the recent bibliography of authors or to subsequent research related to the focal article. This fact is compatible with the idea that technological activities likely build on a larger pull of scientific contributions and relate less directly with exclusively one conference pro- ceeding observed at a conference. We will discuss implications of these findings in detail in section 8.

7 Robustness: Text similarity analysis

In this paper, we claim that patent SNPL and scientific citations are signs of learning and follow-on research. However, in the literature, there is the concern that citations may instead reflect strategic behavior or salience. This concern applies especially to scientific citations as scientists may cite a conference proceeding only because they have seen it and not because it actually constituted a knowledge input. We alleviate this concern by showing that besides generating additional citations, observing a paper at a conference changes the material content of the research subsequently carried out by the firm.

31 In our text similarity analysis, we study scientific articles published by the firm in conferences of the same field in subsequent years. We demonstrate that they are more similar to articles that the firms observed, again employing our instrumentation strategy. For each focal conference pro- ceeding, we find all conference proceedings with an abstract in WoS/Scopus that are presented in a conference in year t + 1 to t + 5 and are authored by a firm. These comparison proceedings should fall into the same CORE field, but are allowed to be published at conferences with a different quality. In total, this yields about 1.5 billion comparisons. After having done the comparisons, for each firm year field group, there may be several relevant articles of which we obtain similarity scores. We × × then calculate the mean and the maximum similarity within these groups and run regressions on those. The similarity measure is based on the cosine similarity between reduced term frequency–inverse document frequency (tf-idf) values of the abstracts and titles. Results are presented also in the ap- pendix and discussed here. The instrumental variable regressions show that the average similarity score is significantly higher for those papers that could be observed due to flight connections com- pared to those that could not be observed. Table ?? shows that the increase of the mean similarity is almost 550 in the first year and falls to value between 300 and 400 in the subsequent years. For the maximum similarity, table ?? shows that the increase of the maximum similarity is about 950 in the first year and subsequently at 550-770. These effects are sizable compared with the overall distribution. The t +1 effects are at .85 (average) and .72 (maximum) of a standard deviation. This is a reassuring pattern - the most similar paper that a firm develops gets substantially more similar to the observed paper, but the content shift can even be observed at the mean. The effect is especially large in the short run, which is expected for CS conferences with very short turnover of papers. But the effect remains over a longer time horizon, reinforcing our belief that participation of firms to the scientific community has real and relevant effects to the firms R&D activities.

8 Conclusion

The interactions between firms and scientific communities have attracted the attention of economists in previous studies. However, there has been limited empirical investigation on the participation of firms to scientific international conferences and its effect on the diffusion of scientific knowledge. Whether this is a consistent phenomenon or a marginal by-product of firms scientific activities is a simple descriptive question that previously remained unanswered. Moreover, research unearthed evidence that the level of firms’ investment in basic research has declined while scientific knowledge remains highly relevant for technology. In this paper we investigate the extent to which firms participate in scientific conferences, ac- tively by presenting research articles or by sponsoring conference events. We leverage a newly constructed large-scale database of conference series in CS between 1996 and 2015. We look at participation behavior of the global population of larger firms active in research and patenting. We find that both active participation and sponsorship are consistent phenomena. More than 60% of

32 conferences have at least one corporate participant and about 18% are sponsored by at least one firm, stably over our period of observation. This, paired with qualitative and anecdotal evidence that we collected, suggests that firms fervently invest and engage in the activities of scientific com- munities. In our main empirical analysis, we exploit plausibly exogenous variation in the likelihood that a firm is exposed to different research results derived by the airline connectivity from origin of researchers to the location where conferences take place. We find evidence for the hypothesis that scientific results to which firms enter in contact at conferences influence their innovation activities. Firms cite more likely in their patents and in their scientific contributions research material they have been exposed to at conferences where they have actively participated. Research material includes presented papers as well as previous contributions by the authors of presented papers. Beyond the general finding of technological learning through exposure, results are more nu- anced. We do not find that firms increase their patenting directly based on the exact proceedings they were exposed to at conferences. Instead, we find substantial indirect ways in which the firms’ patenting benefits. We find strong evidence that firms increasingly patent based on previous research of authors they encounter. Research which is based on the proceedings that the firms encountered is also increasingly used in patents by the firms. A possible conclusion is that for technological develop- ment, knowing and understanding research trends and streams is more important than encountering particular research results embedded in a single paper. Similarly, it is possible that personal interac- tions are especially important for technology development. For science, both personal interactions as well as exposure to a single paper seem to matter. We find that even though exposure to scientific findings at conferences is conducive to both science and technology follow-on research, firm science is not necessarily just a means to a techno- logical end. In such a linear view, science follow-on research by firms is mostly necessary to break frontier knowledge down to an applicable form. A firm then uses patents to protect the subject matter of or of technological developments based on this applied research. Then, science would be a means towards technology and both activities would go hand in hand. Our results already show that there is more firm science derived from materials the firm was exposed to at conferences (direct science citations). If science led to technology linearly, we would see that there are also more patents based on materials the firm was exposed to at conferences (direct patent citations). Also, we would see that there are more patents based on the firm’s own follow-on science (patent-weighted direct science citations). Our results reject both conjectures. Participation does not trigger firm scientific activities that are particularly more (nor less) technologically relevant. While preliminary, these findings suggest that interactions between scientific communities and the private sector remain strong and have consequences for the way that scientific knowledge flows to firms, potentially affecting their innovative capacity. Several avenues for further research remain open. In particular, we plan to consider explicitly, first of all, the determinants of the strategic decision of firms to present research results to a given scientific community and its effect on the research and technological trajectories of academic scientists and other firms.

33 References

Ahmadpoor, Mohammad and Benjamin F Jones (2017). “The dual frontier: Patented inven- tions and prior scientific advance.” In: Science 357.6351, pp. 583–587.

Almeida, Paul, Jan Hohberger, and Pedro Parada (2011). “Individual scientific collabora- tions and firm-level innovation.” In: Industrial and Corporate Change 20.6, pp. 1571– 1599.

Arora, Ashish, Sharon Belenzon, and Andrea Patacconi (2018). “The decline of science in corporate R&D.” In: Strategic Management Journal, n/a–n/a.

Arora, Ashish, Sharon Belenzon, and Lia Sheer (Feb. 2017). Back to Basics: Why do Firms Invest in Research? 23187. National Bureau of Economic Research.

Autor, David, David Dorn, Gordon H. Hanson, Gary Pisano, and Pian Shu (Dec. 2016). Foreign Competition and Domestic Innovation: Evidence from U.S. Patents. Working Paper 22879. National Bureau of Economic Research.

Boudreau, Kevin J., Tom Brady, Ina Ganguli, Patrick Gaule, Eva Guinan, Tony Hollenberg, and Karim Lakhani (2014). “A Field Experiment on Search Costs and the Formation of Scientific Collaborations.” In: SSRN Electronic Journal.

Catalini, Christian (2017). “Microgeography and the Direction of Inventive Activity.” In: Management Science, p. 59.

Catalini, Christian, Christian Fons-Rosen, and Patrick Gaulé (2016). “Did Cheaper Flights Change the Direction of Science?”

Cavacini, Antonio (Mar. 1, 2015). “What is the best database for computer science journal articles?” In: 102.3, pp. 2059–2071.

Cetina, Karin Knorr (1999). Epistemic cultures: how the sciences make knowledge. Ed. by Harvard University Press. Vol. 53. 9. Harvard University Press, pp. 1689–1699.

Cohen, Wesley M and D.a. Levinthal (1990). “A new perspective on learning and innova- tion.” In: Administrative Science Quarterly 35.1, pp. 128–152.

Cohen, Wesley M and Daniel A Levinthal (1989). “Innovation and Learning: The Two Faces of R&D.” In: The Economic Journal 99.397, p. 569.

Fleming, Lee and Olav Sorenson (2004). “Science as a map in technological search.” In: Strategic Management Journal 25.89, pp. 909–928.

34 Franceschet, Massimo (Dec. 2010). “The Role of Conference Publications in CS.” In: Com- mun. ACM 53.12, pp. 129–132.

Giroud, Xavier (Dec. 2012). “Proximity and Investment: Evidence from Plant-Level Data.” en. In: Quarterly Journal of Economics 128.2, pp. 861–915.

Gittelman, Michelle (2007). “Does Geography Matter for Science-Based Firms? Epistemic Communities and the Geography of Research and Patenting in Biotechnology.” In: Or- ganization Science 18.4, pp. 724–741.

Gittelman, Michelle and Bruce Kogut (2003). “Does good science lead to valuable knowl- edge? Biotechnology firms and the evolutionary logic of citation patterns.” In: Manage- ment Science 49.4, pp. 366–382.

Jaffe, Adam B (1989). “Real effects of academic research.” In: American Economic Review, pp. 957–970.

Kim, Jinseok (2018). “Evaluating author name disambiguation for digital libraries: a case of DBLP.” In: Scientometrics 116.3, pp. 1867–1886.

Knaus, Johannes and Margit Palzenberger (2018). PARMA. A full text search based method for matching non-patent literature citations with scientific reference databases. A pilot study.

Nelson, Richard R. (1986). “Institutions supporting technical advance in industry.”In: Amer- ican Economic Review 76.2, pp. 186–189.

Poege, Felix, Stefano Baruffaldi, Fabian Gaessler, and Dietmar Harhoff (2018). Tracing the Path from Science to Innovation - A Novel Link between Non-Patent Literature References and Bibliometric Data. Working Paper. Max Planck Institute for Innovation and Compe- tition.

Rosenberg, Nathan (1990). “Why do firms do basic research (with their own money)?” In: Research Policy 19.2, pp. 165–174.

Simeth, Markus and Michele Cincera (2016). “Corporate Science, Innovation, and Firm Value.” In: Management Science 62.7, pp. 1970–1981.

Simeth, Markus and Julio D. Raffo (Nov. 2013). “What makes companies pursue an Open Science strategy?” In: Research Policy 42.9, pp. 1531–1543.

Stern, Scott (2004). “Do Scientists Pay to Be Scientists?” In: Management Science 50.6, pp. 835–853.

35 Vlasov, Stanislav A., Marc D. Bahlmann, and Joris Knoben (Mar. 2016). “A study of how di- versity in conference participation relates to SMEs’ innovative performance.” In: Journal of Economic Geography, lbw004.

36 A Appendix: Event study on the effect of direct flights to conference locations

For our event study, we consider the time line of conferences on the basis of the following regression:

Notes: ECCV: European Conference on Computer Vision. We visited this conference in 2018 and discuss findings in section2.

Figure 8: ECCV locations over time

5 X j yrct = γj Drct + µr t + µrc + µct + βX rct + εrct (A.1) j= 4 − Figure9 plots the regression results of equation A.1. β is a vector containing coefficients refer- j ring to additional control variables. Coefficients Drct show the effect of introducing a direct flight at t = 0 relative to the baseline year t = 1. There is a small but statistically significant increase − of the number of participants. This effect starts immediately at the point of the flight introduction and remains for two periods, but is only very imprecisely estimated. Based on the regression re- sults, pre-trends or anticipation effects are likely only a minor concern. All other coefficients behave as expected. When the conference is in their home region, researchers are more likely to attend. Researchers are less likely to attend distant conferences.

37 Notes: Coefficients from linear regression with 95% confidence intervals. Estimation for years 1996-2016. FE for region year, region conference series and conference events included. Other controls include distance (log), indicator variables for conference× being held× at the researcher region and for non-US domestic connections. Clustering is on region conference series level. Participation count is winsorized at the level of 25 participants. ×

Figure 9: Regression results

38 A.1 Maps

Figures 10 and 11 are based on the estimation sample. They aggregate paper counts for the years 1996-2010 for conferences where at least one firm was present. Each paper and each conference is assigned to the airport with the largest passenger volume in the vicinity, details are explained in section3.

Notes: Frequency-weighted scientist airport regions are shown. See the explanation in A.1.

Figure 10: Geographic location of scientists

Notes: Frequency-weighted conference airport regions are shown. See the explanation in A.1.

Figure 11: Geographic location of conference events

39 B Appendix: Tables

Share Count Share Count (first) (freq) Computer Science (general) 2.8 25574 2.3 20870 Engineering (general) 0.0 434 0.4 3459 Technology (general) 0.0 166 0.1 824 Design (general) 0.0 0 0.1 649 Artificial Intelligence and Image Processing 28.3 261359 28.0 259267 Computation Theory and Mathematics 7.3 67681 7.2 67046 Computer Software 10.2 93895 9.8 90913 Data Format 6.4 59501 6.4 58870 Distributed Computing 10.2 93977 9.5 87454 Information Systems 14.1 130600 13.3 123079 Library and Information Studies 0.0 0 0.6 5692 Other Information and Computing Sciences 1.2 10651 1.2 10651 Automotive Engineering 1.7 15874 1.7 15874 Biomedical Engineering 1.4 12755 1.7 15472 Civil Engineering 0.4 4136 0.4 3723 Environmental Engineering 0.0 0 0.0 413 Geomatic Engineering 0.0 192 0.0 334 Manufacturing Engineering 0.0 260 0.6 5657 Mechanical Engineering 1.6 14377 0.5 5074 Communications Technologies 3.6 33647 3.9 36023 Computer Hardware 10.2 94200 10.4 96309 Architecture 0.0 323 0.0 304 Design Practice and Management 0.6 5251 1.8 16657 Urban and Regional Planning 0.0 0 0.0 18 Cognitive Sciences 0.0 0 0.0 222 Total 100.0 924853 100.0 924853

Notes: Each conference series is associated with up to three CORE fields. Shares and counts using the first or using equal weighting among the CORE fields is shown. 1996-2015 data is shown.

Table 11: CORE fields.

C ECCV Impressions

40 (1) (2) (3) (4) (5) (6) log 5y Citations Citations Citations Citations Citations Citations

Firm 0.249∗∗∗ 0.228∗∗∗ 0.055∗∗∗ 0.048∗∗∗ 0.044∗∗∗ 0.038∗∗∗ (0.009)(0.009)(0.004)(0.004)(0.004)(0.004)

Sponsor 0.129∗∗∗ 0.121∗∗∗ 0.001 0.002 (0.017)(0.017)(0.010)(−0.010)

Firm=Sponsor 0.358∗∗∗ 0.121∗∗∗ 0.113∗∗∗ (0.036)(0.021)(0.018) Year FE Yes Yes Yes Yes Conf FE Yes Yes Conf Series FE Yes Yes R2 0.015 0.015 0.267 0.267 0.330 0.330 Clusters 14665 14665 14639 14639 14476 14476 Obs 1054982 1054982 1054956 1054956 1054793 1054793

Notes: t statistics in parentheses. ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001. Standard errors clustered on the conference level. Conference series with less than 30 associated papers are disregarded. log(1 + cit) for a window of five years is used as the outcome variable. Outcome variable: Forward citations of DBLP conference proceedings by DBLP items after five years. Mean: 2.95

Table 12: Scientific and commercial value of corporate proceedings.

Figure 12: List of all sponsors by sponsorship rank

41 Figure 13: Large Sponsorship booths

Figure 14: Smaller sponsorship booths

42 Read: top-left, top-right, bottom-left, bottom-right.

Figure 15: Google brochure on scientists activities at ECCV

43