How Open Data Entrepreneurs Advance Institutional Change

by

Helen Lasthiotakis

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Leadership, Higher and Adult Education University of Toronto

© Copyright by Helen Lasthiotakis 2017

How Open Data Entrepreneurs Advance Institutional Change

Helen Lasthiotakis

Doctor of Philosophy

Department of Leadership, Higher and Adult Education University of Toronto

2017 Abstract Openness is one of the fundamental principles of scientific inquiry and the benefits of open research data include accountability, transparency, and efficiency. Change is also an expected prerequisite for scientific progress, yet, despite recent technological advances enabling open data not all data generated during scientific inquiry is normally available. This thesis examines the institutional work of open science entrepreneurs’ efforts to introduce open data practices, thus advancing institutional change in science. Through a multi-case study of five institutional entrepreneurs, this study is guided by the question – How do open data entrepreneurs institutionalize a new open data practice? First, the background of open data as a scientific practice is presented, highlighting the obstacles to its adoption. Second, the theoretical framework of institutional entrepreneurship and work are discussed, along with concepts of multilevel institutional work. The findings indicate that open data entrepreneurs actively advance change by persevering in conducting a variety of institutional work such as counterfactual thinking, disassociation, creation of standards of practice, mobilization of resources, forging new relations, and alliances, and advocacy. They are adept at strategizing iteratively between institutional levels, fluidly mediating between their organization, scientific communities, and broader societal levels.

Theorization of open data occurs at all levels, with an expanding set of specifications and justifications attuned to each level. The outcome of their work is distinct at each level: individual

ii opportunity recognition; development and establishment at the organization level; legitimation and diffusion at the scientific community level; and policy change at the societal level. The study casts light on the nature of science-based institutional entrepreneurship and contributes to the study of agency and multilevel institutional change. The iterative, multilevel institutional work of the open data entrepreneurs provides support to the recursive change models that highlight the interplay of both bottom-up and top-down processes. As research funding agencies, professional societies, and policy bodies are increasingly enacting open data and data sharing policies, the findings from this study are significant in demonstrating importance of open data legitimation and diffusion at the level of the scientific community for the successful adoption of open data by researchers.

iii

Acknowledgments

The journey to completing doctoral research study and achieving a PhD has been a fascinating and rewarding one. In deciding to begin, I was excited to explore institutional change and policy in universities, with a real desire to delve deeply and fully in the associated theoretical foundations and research paradigms. And now at the end, it is so very clear that the journey was only possible with the guidance, questioning, and support of professors, colleagues, family, and friends.

I wholeheartedly thank my wise thesis supervisor, Professor Creso Sá, for providing me with tremendous academic advice and positive support, as well as opportunities to receive feedback form others along the way. He was supportive even before he became my supervisor, always encouraging me to publish papers coming out of coursework. He patiently helped me come up with the thesis topic and guided me through its development, pushing me to clarify concepts and, importantly, think strategically. He also formed the Sá Research Group of fellow graduate students that met regularly, and I thank them as well for generously sharing their research and questions, as well for their camaraderie and their intellectual grilling.

I would like to thank my thesis committee: Professors Glen Jones, Daniel Lang, Scott Davies, and John Willinsky for their insightful comments and encouragement, as well as their questions which pushed me to broaden my research perspective and flesh out the theoretical background and its presentation. My experience at OISE has been rich thanks to OISE faculty members teaching the courses that set the foundation for my research—so incredibly interesting and inspiring, providing very different perspectives of scholarship than that of my own science-based background.

Importantly, the study participants were generous with their insights, and I am grateful for their time and patience with my questions regarding their efforts on open data.

I am also incredibly helpful to my University of Toronto work colleagues who have helped me through the difficult times, and also pushed me to complete the study when they sensed I was postponing tackling a new chapter. I’ve worked in different offices at U of T while I was engaged in the thesis. All my supervisors – Professors David Cameron, Meric Gertler, Vivek Goel, Cheryl Misak, Cheryl Regehr and Ms. Sheree Drummond – were very supportive, iv allowing me time for my studies and keeping me on track for completion. Colleagues in the Division of the Vice-President, Research & Innovation were unflagging in cheering me to the finish line.

And my greatest thanks to my family and friends who, with a hazy concept of my research area, listened to me patiently and shared my stresses and successes, as well as holding back any eye rolling when I came home raving on about a new idea or exciting research area I had learned about. My husband Joseph has been steadfast and encouraging, and Katerina, Kosta and Elizabeth have supported me through stressful times, providing emotional support as well as the needed lattes. My friends, in particular Elaine and Susan, never failed in their encouragement and were always there to listen and provide cheerful and, at times, saucy encouragement.

Thank you to you all– the thesis would not have been possible without you.

v

Table of Contents

Acknowledgments...... iv

Table of Contents ...... vi

List of Tables ...... viii

Chapter 1 Introduction ...... 1

1.1 Statement of the Problem ...... 1

1.2 Research Questions ...... 3

1.3 Significance of the Study ...... 6

1.4 Outline of the Thesis ...... 7

Chapter 2 Background ...... 8

2.1 Openness within the Institution of Science ...... 8

2.2 Open Science and Open Data ...... 10

2.3 The Case for Open Data ...... 12

2.4 Open Data Policies and Supports ...... 15

2.5 Resistance to Open Data ...... 20

2.6 Open Data Innovators ...... 22

Chapter 3 Theoretical Context ...... 25

3.1 Institutions, Institutional Theory, and Multi-level Institutions ...... 25

3.2 Institutional Change and Entrepreneurship ...... 33

3.3 Institutional Work ...... 37

3.4 Multilevel Institutional Work ...... 44

Chapter 4 Research Design ...... 52

4.1 Overview and Rationale for Design ...... 52

4.2 Data Collection and Analysis ...... 54

4.2.1 Participant Identification ...... 54

vi

4.2.2 Ethical Considerations ...... 56

4.2.3 Selection of Participants and Document Assembly ...... 57

4.2.4 Pilot Study ...... 59

4.2.5 Interviews with the Open Science Entrepreneurs ...... 60

4.2.6 Data Analysis ...... 62

4.3 Description of the Participants ...... 65

4.4 Limitations of the Study ...... 74

Chapter 5 Findings: Multilevel Institutional Work ...... 76

5.1 Individual Level: Opportunity Recognition ...... 76

5.2 Organization Micro-level: Innovation Design and Establishment ...... 82

5.3 Scientific Community Meso-level: Legitimation and Diffusion ...... 98

5.4 Societal Macro-level: Policy Change ...... 125

Chapter 6 Conclusions and Implications ...... 134

6.1 Summary of the Findings ...... 134

6.1.1 The Institutional Work of the Open Data Entrepreneurs ...... 135

6.1.2 Multilevel Institutional Work ...... 143

6.2 Main Conclusions and Implications ...... 152

References ...... 158

Appendix 1: Informed Consent Letter ...... 182

Appendix 2: Participant Consent Letter ...... 184

Appendix 3: List of Analyzed Documents ...... 187

Appendix 4: Interview Protocol ...... 195

vii

List of Tables

Table 1. Multi-level institutional work p. 47 Table 2. Characteristics of selected informants p. 55 Table 3. List of participants p. 58 Table 4. Alignment of interview questions with research questions p. 61 Table 5. Opportunity recognition: Individual level institutional work p. 77 Table 6. Developing and establishing the innovation: Micro-level p. 82 organization institutional work Table 7. Legitimation and diffusion: Meso-level scientific community p. 100 institutional work Table 8. Policy change: Macro-level societal institutional work p. 125 Table 9. Institutional work and outcomes at each institutional level p. 144

viii

Chapter 1 Introduction

1.1 Statement of the Problem

Change is an expected and fundamental prerequisite for scientific progress, and openness is one of the fundamental principles of scientific inquiry. Yet despite benefits for scientific accountability, transparency, replicability, and extending the life cycle of research data, not all data generated during scientific inquiry is normally publicly available despite the emergence of open science paradigms, coupled with technological advances that have increased the capacity for open data. Open science reflects a changing paradigm in how research data and methodologies, in addition to research findings, are shared within the scientific community and with the public. Open data approaches include the availability of all the underlying research data and information relating to how scientific results were obtained, methodologies, tools and relevant analysis that allow the results to be understood and validated (den Besten, David, & Schroeder, 2010). In the last two decades, there has been an explosion of information and communication technology applications that enable full storage and disclosure of all research findings to the scientific community and the public, barring exceptions for ethical, political and industrial reasons. Research funding agencies in the United States, United Kingdom and Canada, as well as science journals and non-governmental organizations, have embraced open data and have begun to put in place policies to formalize the disclosure and management of research findings (Lasthiotakis, Kretz & Sá, 2015). And yet, despite technological advances and policy changes, the literature indicates a lack of change by scientists towards increased sharing of primary research data (Dasgupta & David, 1994; David, 1998; Mayernik, 2017; Piwowar & Chapman, 2010; Resnik, 2006; Shamoo & Resnik, 2009). This may not be a surprising development if one considers that modern science is a social institution (Ziman, 2000) that embodies a set of characteristic values, norms and organizations that prescribe the behavior of scientists (Merton, 1942, 1957; Ziman, 2000).

The process by which institutions change is an important question in the scholarship of institutions as one of their key characteristic is stability, permanence, and resistance to change

2

(DiMaggio & Powell, 1983). The question of how institutional actors are able to envision new practices and act on them has been identified as the paradox of embedded agency arising from institutional theory (Hardy & Maguire, 2008). The concept of institutional entrepreneurship has emerged as one mechanism to explain endogenous institutional change, i.e. to understand how actors can instigate and contribute to change within institutions despite resistance. Institutional logics refer to the norms, belief systems, practices, and regulations that prescribe behavior within social institutions (Friedland & Alford 1991; Thornton & Ocasio, 2001, 2008). Institutional entrepreneurs are considered key agents that change existing practices and/or introduce new practices, and then ensure that these become adopted more widely by other actors in the field (DiMaggio, 1988; Maguire, Hardy, & Lawrence, 2004).

Although there have been extensive studies of the traits of institutional entrepreneurs, there have been calls from the scholarly community that studies should also focus on their actions, in particular, the consideration of the means by which institutional changes are instigated, established, legitimized and adopted within an institution (Suddaby & Greenwood, 2005). The processes of how institutional entrepreneurs introduce and institutionalize new practices, and thus change an institution, have been studied through the lens of entrepreneurial agency and institutional work. Agency refers to the ability of an actor to have some effect on a social institution through their actions by altering the rules or how resources are distributed (Battilana & D’Aunno, 2009; Emirbayer & Mische, 1998; Scott, 2001). 1 Institutional work refers to the purposive, on-the-ground practices by individuals to create, maintain and change institutions (Lawrence & Suddaby 2006; Ritvala & Grandquist, 2009; Thompson, Herrmann, & Hekkert, 2015; Tracey, Phillips & Jarvis, 2011).

Scholars have also highlighted an important research direction in considering institutions as a nested system, acknowledged the ‘multiple embeddedness’ of institutional entrepreneurship and the need for the study of their work at multiple institutional levels from that of the individual, organization, professional field, and society (Battilana, Leca, & Boxenbaum, 2009; Kaghan & Lounsbury, 2011; Lawrence, Suddaby, & Leca, 2011; Thornton & Ocasio, 2008).

1 This definition of agency relates to social and institutional theory. There is a different definition of agency relationship within economic theory (Ross, 1973) and the principal agency theory that relates to the study of relationships between one party (the principal) that delegates work to another (the agent), who performs the work (Eisenhardt, 1989).

3

The study of the institutional work of scientists that have created and institutionalized open data innovations and practices provides a useful perspective for the study of endogenous change within the institution of science. One of the norms of science is publication in peer-reviewed journals of research results, and the sense that all the research data itself is proprietary (Cutcher- Gershenfeld, et al., 2017; De Silva & Vance, 2017). Open data scientists can be considered institutional entrepreneurs that are changing the norms of research result dissemination by pioneering and making acceptable new ways to make all their research findings available, including, in some cases, the near real-time results of their research projects (Borgman, 2012; Nielson, 2009). The study of open data entrepreneurs also allows for exploration of multilevel institutional work as, in practice, scientific communities are situated concurrently both within a local academic organization and globally within a scientific discipline (Crane, 1972).

Although a growing scholarly literature about the subject of open data is emerging (De Silva & Vance, 2017; Fecher & Friesike, 2013; Kim & Stanton, 2016; Mayernik, 2017; Rowhani-Farid, Allen, & Barnett, 2017), there have been no studies of the institutional work employed by open scientists (RIN/NESTA, 2010; Whyte & Pryor, 2011). To address this gap, and provide insight into the work of institutional entrepreneurs, this study investigates how five open data entrepreneurs institutionalized open data innovations and practices. This study furthers the understanding of how institutional entrepreneurs engage in institutional work across institutional levels by recognizing the opportunity for an innovation, as well as their efforts in designing and establishing its form, and to legitimate—to establish social acceptability and credibility—and ensure its adoption. In addition to providing insights into how agents engage in institutional change of an institutional logic, the study of successful institutional change in open data is also important for the promotion of success of open data policies by government agencies and academic organizations.

1.2 Research Questions

Open data entrepreneurs that have institutionalized their innovations have diverged from traditional forms of the dissemination of research results and have persuaded other members of their community to adopt it. In this study, those that have developed and successfully implemented open data practices are considered institutional entrepreneurs: individuals who have

4 an interest in certain institutional arrangements and are responsible for leveraging resources to deliberately transform their existing institutions (Maguire, Hardy, & Lawrence, 2004). The unit of analysis for each case is an individual open scientist that has established an open data innovation.

Through an exploratory qualitative multi-case study, this thesis investigates the institutional work undertaken by open data entrepreneurs to recognize an opportunity, design, establish, legitimate and diffuse an open data innovation. The research focuses on the following research question and sub-questions:

How do open data entrepreneurs institutionalize an open data innovation?

What is the institutional work conducted by open data entrepreneurs in order to institutionalize an open data innovation?

The framework of institutional work has increasingly been utilized as a theoretical lens for studies of individual institutional entrepreneurs (Lawrence & Suddaby, 2006; Ritvala & Grandquist, 2009; Thompson, 2013; Tracey et al., 2011). The literature indicates several forms of institutional work (Hardy & Maguire, 2008; Lawrence & Suddaby, 2006; Ritvala & Grandquist, 2009; Thompson et al., 2015; Tracey et al., 2011) such as theorization, counterfactual thinking, mobilization of resources, connecting to macro-level discourses, and advocacy. A few of these studies have explored the institutional work of scientists, in particular, health researchers and professionals (Hardy & Maguire, 2008; Ritvala & Grandquist, 2009).

What are the institutional levels at which open data entrepreneurs conduct work?

Recognizing that there are multiple dimensions to entrepreneurial activities, models have emerged that encompass institutional work at different institutional levels (Hjorth, Jones, & Gartner, 2008; Tracey et al., 2011; Watson 2013; Zilber, 2013). Lawrence & Suddaby (2006) suggested that Holm’s perspective of institutions as nested systems (1995) well describes a broad view of institutions as existing at many levels from the individual level of self, micro-level of groups and organizations, to field-level (meso-level) institutions associated with professions or industries, and the macro-level of societal institutions.

5

Although a limited amount of research exists that explores the institutional work of entrepreneurs at these different levels, findings indicate that individual entrepreneurs conduct institutional work on more than one level in order to create opportunities and legitimate a change (Ruef & Lounsbury, 2007; Thornton & Ocasio, 1999; Tracey, et al., 2011).

Scientists are the actors with the legitimacy and status necessary to change existing institutional practices and beliefs from within the institution of science (Ritvala & Grandquist, 2009). Ideas tend to emerge in local contexts and, over time, can become institutionalized beyond the local level as they are spread through formal and informal communications and networks through journal publication, conference presentations, research collaborations, and through scientific communities (Crane, 1972; Ritvala & Grandquist, 2009).

What is the institutional work of the open data entrepreneurs at different institutional levels?

The literature on institutional change suggests that different practices and outcomes are associated with different stages of institutionalization (Lawrence & Suddaby, 2006; Perkman & Spicer, 2007) and scholars have acknowledged the multiple embeddedness of institutional entrepreneurship (Battilana, et al., 2009; Kaghan & Lounsbury, 2011; Lawrence et al., 2011). Only a few studies have been undertaken to study the differing work and outcomes undertaken by institutional entrepreneurs at different institutional levels (Ritvala & Granqvist, 2009; Tracey, et al., 2011). The lack of attention that institutional theorists have paid to study of levels and level interactions in organizational and change studies more broadly has been raised as an issue (see Bitektine & Haack, 2015). The study of the institutional work of open scientists provides an opportunity to study multilevel institutional work as scientific communities are situated within a local academic organization and concurrently globally within a scientific discipline.

In order to answer these questions, a qualitative multi-case study was conducted in which five open data entrepreneurs who had established open data initiatives were identified. The initiation and implementation of their novel open initiative was studied through document analysis and interviews.

6

The study was bounded by the process of establishing a particular open data initiative associated with each case study.

1.3 Significance of the Study

The research question that the thesis addresses is how scientists initiate, establish and legitimate change in order to institutionalize open data initiatives within the institution of science through their institutional work at the level of themselves as individuals, their organization, scientific community, and within the larger societal context. This is the first study of open scientists through the use of a multi-case study method. The study of the institutional work of open data entrepreneurs is significant for two reasons.

Firstly, the study provides insight into the process of multilevel institutional change and serves to extend the thinking on institutional entrepreneurship and work in an area that has not been studied. The thesis sheds light on the institutional work needed to enact change for an institutional logic where gaps have been identified. Scholars have only recently begun to focus on institutional work perspectives in the study of institutional change, for example, in studying entrepreneurial activities within their broader context within organizations, professional communities, and society (Lounsbury & Boxenbaum, 2013; Watson, 2013; Zilber, 2013). As well, within the study of neo-institutional theory, there have been calls to integrate macro- institutional change studies to studies of entrepreneurs and their institutional work at the individual level (Tracey, et al., 2011) and micro-level (Greenwood, Raynard, Kodeih, Micelotta, & Lounsbury, 2011; Lawrence & Suddaby, 2006; Lounsbury & Boxenbaum, 2013; Thompson, 2013; Watson, 2013). The study of open data entrepreneurs from an institutional work perspective is new and extends the applicability of this scholarship, as well as providing insight into the nature of diffusion of change within the institution of science.

Secondly, this study may serve to identify possible practices that can be supported and encouraged to implement open data. The demand for change in the dissemination of research data is increasing among research funding agencies, organizations, scientists and the public (Cutcher-Gershenfeld, et al., 2017; Lasthiotakis, et al., 2015; Rowhani-Farid, et al., 2017). Efforts to change current practices and norms, however, face institutional inertia with respect to

7 the institutional logic or norm of publication of final successful research results. Institutional change is a complex process that involves different external and internal forces and agents. An understanding of the strategies of open data entrepreneurs, in particular the institutional levels that are most responsive to change, may allow funding bodies and agencies to revise and enact policies that are more successful in establishing a culture of open data.

Thus, the study is important both in supporting conceptual models for future study as well as to identify organizational strategies for supporting open data.

1.4 Outline of the Thesis

In examining open science entrepreneurship, this thesis is organized in six chapters.

Through a literature review, Chapter 2 provides an overview of openness in science, open science and open data, the case for open data, open data policies and supports, resistance to open data, and open data innovators.

Chapter 3 outlines the key, broad theoretical concepts relating to institutional theory, institutional entrepreneurship and scientists as entrepreneurs, institutional work, and multilevel institutional work that frame the research questions.

The research methodology and design are presented in Chapter 4 including the rationale for the design, the data collection method, participant selection, description of the participants, and limitations of the study.

The research findings are presented in Chapter 5, including the institutional work undertaken by the open scientists at the level of individual, organization micro-level, scientific community meso-level, and society macro-level.

In the final chapter, a summary of the research findings is presented. Conclusions relating to the research questions are summarized and discussed, as are the implications of the findings.

Chapter 2 Background

This chapter provides a review of the literature relevant to the study. The first section contains a general discussion on openness within the institution of science, followed by a discussion of open and traditional approaches to the dissemination of research results. The case for open data is then presented, followed by a review of open data policies and supports as well as an outline of the resistance to the adoption of open data. Finally, an example of an open data innovator is presented.

2.1 Openness within the Institution of Science

Modern science has been described as a social institution whose primary purpose is to advance and produce knowledge (Crane, 1972; Storer, 1966; Ziman, 2000). It is present within virtually every country and has a global scope. The institution of science embodies a set of characteristic values and norms (Merton, 1942, 1957; Ziman, 2000) and organized activities (Storer, 1966). Robert K. Merton argued that such a prescriptive set is both functional, in terms of advancing knowledge, as well as morally binding on the scientist as a professional (Merton, 1942, 1957, 1973).

Merton’s idealized norms reflect the ‘rules’ that define the appropriate values, beliefs, attitudes and behaviours of the institution of science: scientific results should be openly published in a timely fashion (communism), knowledge findings should be subject to impersonal evaluation criteria (universalism), personal interests should not appear in scientific procedures (disinterestedness), and questioning and criticism should be allowed and encouraged (skepticism). These distinct social norms are considered the desired prerequisites for the production of objective knowledge, and open communication and accessibility are imperatives for scientific integrity (Merton, 1942, 1957, 1973, Ziman, 2000). This normative structure emphasizes the relative autonomy, neutrality and rationality of modern science and its insulation from other societal influences. According to Merton, the institution of science embodies:

9

an ethos [or an] affectively toned complex of values and norms which are held to be binding on scientists. The norms are expressed in the form of prescriptions, proscriptions, preferences, and permissions. They are legitimized in terms of institutional values [ . . . ] and internalized by the scientist [ . . . ] Although the ethos of science has not been codified, it can be inferred from the moral consensus of scientists as expressed in use and wont, in countless writings on the scientific spirit and in moral indignation toward contraventions of the ethos. (Merton 1942/1973, p. 268-9).

The norm that is the most relevant for this study is that of communism, i.e. that the products of science should be considered as public knowledge. Sharing of information and ideas has deep roots in ancient Greek philosophy and science where an emphasis was placed on open, free and rational debate (Resnik, 2006).2 Communism requires that the knowledge gained through academic science should be considered ‘public knowledge’. The core of this norm connects communism to greater efficacy in the cumulative generation of knowledge that can be validated and shared through the prohibition of secrecy: all the evidence needed to support the results of an inquiry should be publically available, and as quickly as possible.

Based in the Western European scientific activities of the late sixteenth and seventeenth centuries, openness in traditional science emphasizes practice of the public dissemination of successful research results within papers in published in peer-reviewed journals (Carey, 2013; David, 1998; Nielsen, 2011; Ziman, 2000). The subject of scientific papers is the outcome of an investigation, in the form of a claim to a discovery, backed up by the supporting evidence of data and observations. The process of publication occurs after the research has been completed and includes the sharing of research findings that support a study’s conclusions (Resnik, 2006). One of the primary incentives in this tradition is a strong norm of priority and recognition that is based on confirmed claims of priority (den Besten et al., 2010) for scholars across all disciplines (Aldrich, 2012; Benner & Sandström, 2000; Dasgupta and David, 1994; David, 1998; Merton, 1957, 1973; Shamoo & Reznik, 2009). As a result, research scientists traditionally consider their data as private, and may share only some data and results in their scientific publications (de Silva & Vance, 2017).

2 These have been a part of university life since the formation of the first universities in the twelfth century (Shamoo and Reznik, 2009).

10

Traditional publication of research results does not normally include the publication of the complete data, nor information relating to how the results were obtained, methodologies, tools and relevant analysis (den Besten et al., 2010). Null results are published less frequently than statistically significant results and thus are likely inaccessible (Nosek et al., 2015). As research data in traditional science is embedded within the research process, the public and the broader research community do not normally have access to the research data itself unless it is included in the publication of the final research report.

2.2 Open Science and Open Data

Open science is an emerging approach to the scientific process, and the dissemination and sharing of the product of scientific inquiry. An open science approach makes public the underlying data (both statistically significant and null results) and information relating to how the results were obtained, methodologies, tools and relevant analysis that allow the results to be understood and validated (den Besten et al., 2010). Open science is defined in various ways but definitions tend to encompass three related themes that cover the complete research process, its underlying infrastructure as well as its results:

1) High pre- and post-publication transparency of data, tools, activities, and communications amongst researchers that often includes collaboration amongst researchers and possibility for re-use of the data;

2) Full, candid, timely publication of research results; and,

3) Absence or minimization of intellectual property restrictions with the understanding that not all aspects of science are suited to openness (Fecher & Friesike, 2013; Science Commons, 2008).

Different forms of open science have emerged including open access and open data (Lasthiotakis et. el., 2015). Open access refers to the practice of making to the products of research available for re-use and distribution without price or permission barriers as long as there is appropriate attribution for the author (Berlin Declaration, 2003; Bethesda Statement, 2003; Budapest Open Access Initiative, 2002). The two main vehicles for achieving open access are through open

11

access journals and open access repositories; both seek to lower the barriers for users to make use of articles and data by not charging user fees (Quint-Rapoport, 2010; Willinsky, 2006).

The term open data refers to the process of releasing both raw and processed research data, enabling others to analyse and use it without restriction (Gewin, 2016; Kitchin, 2014; Molloy, 2011; Murray-Rust, 2008; Willinsky, 2006), subject at most to citation/attribution and/or share- alike (Pollock, 2006).3 More recently, Gurstein (2013) has highlighted that open data should be considered as a service process rather than a product, taking into account the needs of end-users. Open data also implicates the technology—tools and methods—used to acquire the data (den Besten, et al., 2010; Sá & Grieco, 2016; Uhlir & Schroeder, 2007). As an example of a tool, open source encompasses collaborative software and scientific tool development that ensure access to an end product’s source materials; in the case of software this is normally source code (den Besten, et al., 2010). The availability of source codes allows an end user to re-run an analysis as well as to understand the underlying functionality of the program (Wilson & Edwards, 2015). In this thesis, the term open data is inclusive of both data as well as the open source tools and methods.

With open data, the ongoing investigation and data can be followed, analyzed and potentially contributed to by others, as well as for work on reproducibility or the synthesis work such as meta-analysis, thus extending the life of the data beyond the immediate data collection project.4 Open data dissemination can take various forms including providing the data to a journal as supplemental material, depositing data in a public collection, and posting datasets on a public website or publicly within a community of colleagues (Borgman, 2012). The data include that of failed and/or less significant and unpublished work to be public. Open research methods, such as open notebook science, include the publication of a primary research record, i.e. data and/or research activities and methods including associated raw data files, in near real time. Open notebook proponents make their ongoing research projects publicly available, incorporating

3 More recently, the similar term ‘data sharing’ (emphasizing the procedural nature of open data) has arisen referring to “making raw research data available in an open data depository, and includes controlled access where data is made available upon request which may be required due to legal or ethical reasons.” (Rowhani-Farid, et al., 2017 p. 2). 4 Open science that also includes an interactive element, wherein collaboration between scientists in a computer- networked or internet-based environment, has been termed ‘extreme openness’ (Nielson, 2008), ‘dynamic system’ (Mukherjee & Stern, 2009) and Science 2.0 (Shneiderman, 2008; Waldrop, 2008).

12

blogs or wikis in order to allow for interaction and commentary for an ongoing experiment (Bradley, Lang, Koch, & Neylon, 2011; Carter-Thomas & Rowley-Jolivet, 2016).5

2.3 The Case for Open Data

Open data allows for the impact and contribution of a study that serves the scientific enterprise beyond the original analysis and to advance understanding. Some have suggested that the data itself should be the main publication as the “data will outlive the paper, as others re-analyse within the context of new scientific discoveries” (Watson, 2015, p. 101). Data and tools can be reused, combined with other publically available data in order to explore new hypothesis, encouraging new perspectives as well as contributing to the efficient use of resources by avoiding duplicate data collection (Bond-Lamberty, Smith, & Bailey, 2016; Piwowar & Vision, 2013; Uhlir & Schroeder, 2007).

Open data is not a new concept,6 however, promulgation of the possibility for it has been greatly facilitated by the advantages provided beginning in the late 1900’s in the advance of telecommunications and information technology tools that can be used for information dissemination, collaboration and storage (Atkins, Droegemeier, & Feldman, 2003; OECD, 2015; Schroeder, 2008). The increased ability of sharing and using data online through the internet, and continued increases in computing speed and power, and the development of sophisticated statistical software, have made it possible to post large, searchable databases online, thus allowing scientists to contribute to existing databases as well as to search for patterns within existing databases (Arzberger, et al., 2004).7

5 A wiki is a type of computer software that allows users easily to create, edit and link web pages, enabling documents to be written collaboratively, in a simple markup language using a web browser. A wiki is essentially a database for creating, browsing and searching information. A blog or weblog is a website in which entries are written and presented in a chronological order. They generally provide commentary or news on a particular subject. Blogs can contain text, images and links to other blogs, web pages, and media related to its subject. 6 In the 1660s, Henry Oldenburg, the first secretary of the Royal Society in the UK, persuaded the then new society to publish the ‘letters’ he received when corresponding on scientific matters, as long as new concept was accompanied by the data (or evidence) on which it was based (Boulton, 2014). 7 The Berlin Declaration (2003) hailed the promotion of the internet as a “functional instrument for a global scientific knowledge base and for human reflection”.

13

With advances in computer and communication technologies, the dissemination, sharing, analyzing, and storing of data is increasingly becoming easier and faster. The potential of sharing through the internet has amplified sharing expectations (Mayernik, 2017). There is growing interest in making the results of scientific inquiry readily accessible in order to maximize its value. Lack of transparency and/or access to research data and methods are at the root of the calls for more openness in science in order to increase the availability of publicly-funded scientific research results; enhance their accountability and reproducibility; as well as the increased availability and impact of research data for reuse and/or contribution by other researchers and the public (Boulton, Rawlins, Vallance, & Walport, 2011).

Accountability and reproducibility

Publishing or archiving the data underlying scientific papers is considered an essential component of scientific publication and, critically, as to its subsequent reproducibility (Vines et al., 2013). Researchers have accountability for their data when they provide and describe the data in a way sufficiently to explain how data may be expected, problematic, anomalous or correctable, as well as for data archiving (Mayernik, 2017). Ongoing concern regarding the reproducibility of published scientific results relate to problematic practices such as selective reporting and analysis, and insufficient delineation of the conditions required to obtain the results (Baker, 2015). For example, in a replication study of 98 experiments that were published in 2008 in highly ranked psychology journals, researchers found that only about one-third to one-half of the original findings were observed in their replication study (Open Science Collaboration, 2015). There have also been high profile issues with respect to the conduct of research, such as the much publicized problems at the Climate Research Unit at the University of East Anglia (Bricker, 2013; Ryghaug & Skjølsvold, 2011) that resulted in a greater public awareness of the need for accountability and reproducibility in research data and processes. In 2009, hundreds of illegally obtained emails from the Climate Research Unit were made public online, and a subset of these emails seemed to indicate that there had been interference with the independence of the peer-review process for the Anthropogenic Climate Change (ACC) hypothesis (Bricker, 2013). The term ‘climate-gate’ was coined to highlight the suggestion that research data were manipulated (Delingpole, 2009). The implications of deliberate deceit and manipulation of data strongly influenced public opinion, with calls for open science as a way to encourage high

14

standards in research such as careful data production, well-tested modelling and sound software (Nature Special, 2013).

Enhancing the availability and impact of research data for reuse and/or contribution

Most developed countries allocate a substantial amount of public funding on research that generates enormous amounts of data. However, although many scientific disciplines are integrative and collaborative, in practical terms research results are not readily available, seeming as though the sharing of interpretations of data, rather than data sharing per se, is the norm (Reichman, Jones & Schildhauer, 2011). Researchers mostly collect and use their own data in their own projects and have access to few external data sources.8 The release of the human genome sequence data is often cited as an example of the value of the open release of research data that enabled world-wide scientific collaborations leading to new insights for a variety of health conditions, including rare diseases (de Silva & Vance, 2017).

Studies have found that sharing research data may increase article download and/or citation rates (Eusenbach, 2006; Piwowar, Becich, Bilofsky, & Crowley, 2008; Piwowar & Chapman, 2010; Piwowar, Day, & Fridsma, 2007; Piwowar & Vision, 2013). Data sharing has been encouraged in the clinical research enterprise in order to improve the development of drugs and devices, as well as to benefit public health (Ross & Krumholz, 2013). There are examples of the resulting power of shared data to assist in health care advances such as the 2011 containment of a severe gastrointestinal infection in Germany (Boulton, 2014), and in malaria drug development (Wells, Willis, Burrows, & van Huijsduijnen, 2016). As noted above, open data advocates invoke the success of research in genomics through data sharing of GenBank and Hapmap archives (Choudhury, et al., 2014). Success is also documented in sharing with industry’s participation in open data initiatives, for example, with the Structural Genomics Consortium (Perkmann &

8 As well, aside from small community of scholars and students associated with research universities, within traditional science publication timely access to academic journals is neither optimal nor cost-free. In the United States, policy makers enacted legislation in 1980 aimed at stimulating universities to undertake more industrially relevant research. This especially influenced in engineering and applied sciences, life sciences, physical sciences. The rise of the resulting ‘secret science’ produced an ‘anti-commons’ effect that further provided support for Open Science.

15

Schildt, 2015). Sharing and reusing of has also been useful for qualitative social sciences, for example, in community studies and reanalyses of oral histories (Bishop, 2014).

Societal desire for greater access to publicly-funded scientific data

Given the allocation of public funds to the conduct of research, there is a sense that publicly- funded scientists should more openly share resulting data and methods with the public both for accountability reasons as to maximize knowledge dissemination (Borgman, 2012). Uhlir & Schröder write that “precious little of that investment is devoted to promoting the value of the resulting data by preserving and making them broadly available. The largely ad hoc approach to managing such data, however, is now beginning to be understood as inadequate to meet the exigencies of the national and international research enterprise” (2007, p.45).

Openness is also critical for the development of public policy that is well-informed. Following an oil spill in the Gulf of Mexico in 2010, researchers highlighted that despite numerous ecological studies in the area, there was very little accessible current and historical ecological data that would have allowed for the effects of the oil spill to be fully understood (Reichman, et al., 2011). These authors suggested that, based on their experience, less than 1% of ecological research data is accessible after publication (Reichman, et al., 2011). More recently, the relatively wide spread of Ebola and Zika virus outbreaks have highlighted the critical need for medical and public health research to accelerating outbreak control with calls for the need for researchers to share data rapidly and widely during public health emergencies (Chretien, Rivers, & Johansson, 2016).

2.4 Open Data Policies and Supports

Data and methods sharing is a critical issue in current scientific research (Hey & Trefethen, 2005; Kim & Stanton, 2016; Tenopir, et al., 2011). While the technological feasibility of increased sharing of large amounts of research data increases, at the same time as there is criticism about the accessibility of data on which policy and/or regulatory decisions are based (Boulton, et al., 2011). National research funding agencies, policy bodies, professional societies, and publishers in the US, UK and Canada increasingly require open data practices, specifically

16

related to the storing and sharing of primary data collected by scientists (Kim & Stanton, 2016; Lasthiotakis, et al., 2015). As discussed in Section 2.3, open data is framed as 1) an efficient use of the public funds invested in research, 2) a way to re-use data and to be able to address complex problems, and, 3) as a way to ensure data validity and reproducibility (Arzberger, et al., 2004; Pampel & Dallmeier-Tiessen, 2014).

Internationally, the open access movement as applied to research publications crystalized in the formulation of the Budapest Declaration in 2002, later expanded to include “original scientific research results, raw data and metadata, source materials, digital representations of pictorial and graphical materials and scholarly multimedia material” (Berlin Declaration on Open Access to Knowledge in the Sciences and the Humanities, 2003, p. 1). The Organization for Economic Co- operation and Development (OECD) Committee for Scientific and Technological Policy called for “open access to and wide use of research data” (OECD, 2004) and the OECD continues to encourage open science and research principles (Lasthiotakis, et al., 2015; Väänänen & Peltonen, 2016).

National research funding agencies and policy bodies began requiring open data practices in the early 2000s, often linked with open access policies (Lasthiotakis, et al., 2015). The UK e-Science programme, established in 2001, was the first coordinated initiative to involve all the research councils in supporting the development of cyberinfrastructure in science and engineering, and encourage the development of open middleware—software that bridges a database and applications. In 2003, the National Institute of Health (NIH) was a pioneer with its Data Sharing Policy that required applicants for a grant upwards of $500,000 US to make statements on Data Sharing (NIH, 2003a, b). In addition, at around the same time, both countries established organizations to support the cyberinfrastructure needed for open data sharing. Leading scientific organizations signed on to the Berlin Declaration, committing to open access and the wider sense of open data.9

In the US, the National Science Foundation (NSF) established the Office of Cyberinfrastructure (OCI) to coordinate e-infrastructure programmes and to fund information technology infrastructure research and training, including access to data (NSF, 2006; OCI, n.d.). In 2011, the

9 http://www.fu-berlin.de/sites/open_access/Veranstaltungen/oa_berlin/poster/Berlin-Declaration_Simone-Rieger_MPIWG.pdf

17

NSF required receivers of funds “to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants” (National Science Foundation, 2011). In 2013, the US Federal agencies with more than $100 million in annual research and development expenditures were directed to develop plans for increasing public access to the results of the research they support through scholarly publications and digital data (Holdren, 2013).

In Europe, open science has been adopted by the European Commission and several science policy actions were initiated in 2015 including post-grant publishing funds, and the Open Research Data Pilot (Spires-Jones, Poirazi & Grubb, 2016). The Pilot “aims to make the research data generated by selected Horizon 2020 projects accessible with as few restrictions as possible, while at the same time promoting sensitive data from inappropriate access”.10 Finland has promoted the availability of open research information and open publication, seeking to become a leading country for open science by 2017 (Ministry of Education and Culture, Finland, 2014). Government funding agencies both in the US and UK require scientists to deposit their data in publicly available archives, even before they have published any articles related to the data (Borgman, 2012).

More recently, government and funding agencies have also begun to provide incentives and acknowledgements for open data and open science in general. In 2013, the US White House celebrated Open Science Champions of Change, highlighting the Human Genome Project and the Global Positioning System as strategies for “driving positive change”.11 In 2016, the UK Wellcome Trust, and the US National Institutes of Health and Howard Hughes Medical Institute launched the Open Science Prize in order to “encourage and support the prototyping and development of services, tools and/or platforms that enable open content – including publications, datasets, code and other research outputs – to be discovered, accessed and re-used in ways that will advance research, spark innovation and generate new societal benefits”.12

10 https://www.openaire.eu/opendatapilot 11 https://www.whitehouse.gov/blog/2013/05/07/seeking-outstanding-open-science-champions-change 12 https://www.openscienceprize.org/res/p/FAQ/

18

Scholarly journals began to require that authors make their data available to the public as a condition of publication (Borgman, 2012; Piwowar & Chapman, 2010; Resnik, 2006; Vines et al., 2013). In 2011, Science enhanced their data access requirements to include “a specific statement regarding the availability and curation of data” as well as “computer codes involved in the creation or analysis of data” (Hanson et al., 2011). PLoS journals have required open data associated with their publication as of 2014.13 In 2015, the Nature Publishing Group launched Scientific Data as an online, peer-reviewed, open access journal housing descriptions of scientific datasets and research for the natural, clinical and social sciences.14 Scientific communities and related journals are coming together to develop to improve transparency, including the archiving and availability of research data and tools, as well as citation of archived content (Gewin, 2016; Parker, Nakagawa & Gurevitch, 2016).

Within scientific communities and agencies, large-scale collaborative data-intensive science has led in the support of open data (Piwowar & Chapman, 2010). For data to be reused, it must not only be archived but also be understandable or ‘transparent’ (Mayernik, 2017). Scientific disciplines that make use of a large datasets such as in life and physical sciences (genomics, proteomics, epidemiology, astrophysics, geology, neuroscience) fit what Resnik has termed a ‘data-driven’ model (2006).15 Several data-driven disciplines have led in the development of their own open data repositories and infrastructures, setting metadata archiving standards, and data-sharing norms (Mayernik, 2016; Resnik, 2006). The field of genomics led with the launch of the Bermuda Principles in 1996, establishing that that human DNA sequencing data segments from the human genomic be released within 24 hours of generation and freely available in the public domain (Arias et al., 2015).

Data sharing mandates have been encouraged and supported in data-intensive scientific communities and consortia such as the ecological sciences (Michener, 2015), archaeology (Richard & Winters, 2015; Wilson & Edwards, 2015), genetics (Perkmann & Schildt, 2015), astronomy (Borgman & Sands, 2016), and cognitive neuroscience (Choudhury, Fishman,

13 http://blogs.plos.org/everyone/2014/02/24/plos-new-data-policy-public-access-data-2/ 14 http://www.nature.com/sdata/about 15 Borgman (2012) has also noted that certain fields that are ‘data-driven’ or ‘big-science’ lead in open science. However, even in these fields open science aspects of data sharing are not uniform.

19

McGowan, & Juengst, 2014). Examples of open data systems include the American National Institute of Health’s GenBank (Benson, Karsch-Mizrachi, Lipman, Ostell, & Sayers, 2012), the European Molecular Biology Laboratory Nucleotide Sequence Database, the Structural Genomics Consortium (Perkmann & Schildt, 2015), the Ocean Biogeographic Information System (OBIS) database, Japan's DNA DataBank (Nicol, Caruso, & Archambault, 2013), NASA’s Earth Science Data System, the Brain Research Through Advancing Innovative Neurotechnologies (BRAIN) Initiative, and the EU Human Brain Project.

Granting agencies have also supported and facilitated the development of databases, as well as charters that bring together researchers and stakeholders from sectors including data repositories, scholarly publishers, academic libraries, and scholarly research service providers in order to allow for the development of community-based solutions to open data challenges (Cutcher- Gershenfeld, et al., 2017; Mayernik, Phillips & Nienhouse, 2016). As an example, in 2017, the NSF set a meeting of geoscience facilities that curate, share and preserve data to form a Council of Data Facilities (Cutcher-Gershenfeld, et al., 2017) that led to the adoption of a charter to foster initiatives to credit authors for the sharing and reuse of their data and for advancing common standards. The US National Data Service was formed in 2014 by leaders from the US supercomputing centres, university scholars, government agencies, publishers in order to promote open software and software services for data sharing (Cutcher-Gershenfeld, et al., 2017). National agencies have also collaborated with agencies in other nations to support open data. In 2013, funders in the European Commission, the US NSF and National Institute of Standards and Technology, and the Australian government Department of Innovation together established the Research Data Alliance (RDA) “with the goal of building the social and technical infrastructure to enable open sharing of data”,16 focusing on social systems that include identification of community-generated use cases and standards (Treloar, 2014). Studies have focused on open data requirements within agencies, government organizations, and scientific communities (see Cutcher-Gershenfeld, et al., 2017; Dawes,Vidiasova, & Parkhimovich, 2016; Sá & Grieco, 2016; Susha, Grönlund, & Janssen, 2015). At the organizational level, universities have developed research data policies that include aspects of open data. For example, in 2012 the University of Oxford implemented its Policy on the Management of Research Data and

16 https://www.rd-alliance.org/about-rda

20

Records17 that includes provisions for infrastructure to host open databases and open source products. However, there are only a few studies on open access policies at universities (Veletsianos, 2015; Vincent-Lamarre, Boivin, Gargouri, Larivière, & Harnad, 2014) and no reviews of policies and supports for open data.

2.5 Resistance to Open Data

Despite an ethos of openness, public sentiment and government agency requirements supporting open data, the majority of research data and source dissemination in science still occurs through traditional science methods, i.e. the dissemination of data and research results has not fundamentally changed in the majority of scientific disciplines (Blumenthal, et al., 2006; de Silva & Vance, 2017; Fry, Schroeder, & den Besten, 2009; Hedstrom & Niu, 2008; Nielson, 2009; Piwowar, 2011; Piwowar & Chapman, 2010; Richards & Winter, 2015; Vickers, 2006; Williams, 2008). Even in fields such as health and medical research, where the benefits of open data are potentially great, many studies have demonstrated the low rate of open data (see Rowhani-Farid, et al., 2017). As described above, the benefits of open data have been enumerated, policy changes have been enacted, and there have been significant advances in information and communication technologies (Azberger, et al., 2004) to support open data. Open data infrastructure challenges such as the standardization of datasets and metadata, linking of datasets across organizations and systems, and the check on the quality of data (Martin, Law, Ran, Helbig, & Birkhead, 2016; Mayernik, et al., 2015) are being addressed. However, studies show that there is a persistence of both barriers and perceived risks to sharing amongst researchers (den Besten, et al., 2010; Eschenfelder & Johnson, 2014; Grand, 2015; Pisani, et al., 2016; Smith & Roberts, 2016; Tenopir, et al., 2015).

Studies have explored scientists’ resistance to open science and scientists’ beliefs regarding the sharing of research findings (Eschenfelder & Johnson, 2014; Jamali, 2015; Leonelli, Spichtinger, & Prainsack, 2014; Tenopir, et al., 2011, 2015). At the level of the individual scientist, perceived career benefits and effort, and a sense of scholarly altruism have been found to influence sharing behaviours, whilst at the scientific community level, journal regulatory pressures and

17 http://libereurope.eu/wp-content/uploads/2014/06/LIBER-Case-Study-UOX.pdf

21

disciplinary normative pressures have been found to influence scientists’ data-sharing behaviours (Kim & Stanton, 2016). Research indicates that the reasons that scientists cite for not making their data available include legal and proprietary issues and misuse and/or conditions of data use (Campbell, et al., 2002; Pryor, 2009), insufficient time and/or lack of funding for expert data support (Berman & Cerf, 2013), the structure of data sharing models (Wilhelm, Oster, & Shoulson, 2014), fear of lack of citation and/or recognition (Tenopir, et al., 2011, 2015), political issues (Gray, 2015; Pisani, et al., 2016), lack of relevant skills (Grand, 2015), and the change in culture of reporting failures as well as successes (Grand, 2015). The academic reward system has been cited as not incentivizing open science (Nosek, et al., 2015; Rowhani-Farid, et al., 2017). A recent study by Tenopir, et al. (2015) examining the state of data sharing and reuse perceptions and practices among research scientists found disciplinary differences, and differences across age groups, with younger respondents being more favourably toward data sharing and reuse, but making less of their data available than older respondents.

Regardless of perceived obstacles, however, there is increasing literature highlighting the benefits of openness noting that the individual-level ‘excuses’ for not sharing can and should be overcome and that open data should become routine (Smith & Roberts, 2016; Spires-Jones, et al., 2016). Although studies have explored the obstacles to open science and scientists’ beliefs regarding the sharing of research data, the resistance to open data is not a surprising development, given that, the institution of science has been observed to be “very sensitive to attempts to bypass its traditional channels of communication” (Ziman, 2000, p. 36). Research indicates that “the concept of Open Science finds a lot of support in theory, yet struggles in practice” (Friesike & Schildhauer, 2015, p. 277). Attitudes of scholars are described as “conservative and seemingly entrenched” (Jamali, 2015, p. 7). Science is described as ‘self- serving’ and ‘uncooperative’, full examples of secrecy and resistance to change, and the natural state of researchers is one of one of possessiveness (Nielson, 2012). There have been suggestions that significant changes in the scientific culture are needed for a widespread uptake of open science and data by researchers (Friesike & Schildhauer, 2015; Leonelli, et al, 2014; Nielsen, 2011).

22

2.6 Open Data Innovators

Whilst researchers have found that scientists have concerns about open data, open data innovators express puzzlement at the slow pace of change, as they take for granted that data should be fully open (for example, Gezelter, 2015). A small set of scientists and organizations have led the establishment of open science initiatives and they are referenced in the literature and in the open science community (Nielson, 2012; RIN/NESTA, 2010). These open science leaders and innovators embrace open data principles, developing and implementing initiatives to disclose their own and others’ discoveries via a method that is accessible and can be further used by researchers.

Early innovators played leading roles in developing the ideas and principles associated with open data that break with the traditional institutional process of publication of research data and methods, helping to transform the process of research data dissemination and science collaboration from within the traditional institution of science (Nielson, 2011; Waldrop, 2008). Open science and open data leaders have had an acknowledged critical effect on policy and practice especially with respect to research funding agencies and policy makers (Waldrop, 2008; RIN/NESTA, 2010). They publicly advocate and call for the adoption of open science practices on web sites and social media, and at conferences and meetings (Nielson, 2012).

As an example of an open data innovator and leader, I highlight the biography of Jean-Claude Bradley, an organic chemistry professor at Drexel University in Philadelphia. Studying the field of nanotechnology, Bradley worked in the traditional science mode, publishing articles and acquiring patents (Bradley, 1997; Korneva, et al., 2005). In 2005, he began to consider that his research could have greater impact in a more open research environment. He stated that he ‘cut ties’ with collaborators who did not share his views (Poynder, 2010), creating the UsefulChem project18 in 2005, an internet-based initiative through which he aimed to reflect a research process that was as transparent as possible.19 On UsefulChem, the raw details of every experiment being worked on in Bradley’s lab were made freely and publically available within hours of production. The main scientific objective of the project was the synthesis and testing of

18 http://usefulchem.wikispaces.com/ 19 Building on the work of Edelson, Pia & Gomez on “The Collaborative Notebook”, 1996.

23

new anti-malarial agents. Bradley included all the data generated from his experiments, including the experiments that had failed. Bradley coined the term ‘open notebook science’ in 2006 (Poynder, 2010), first using it in a blog post, then later including in a correspondence to Nature Proceedings in 2007 (Bradley, 200720). He explained that open notebook science “is a way of doing science in which—as best as you can—you make all your research freely available to the public, and in real time” (Poynder, 2010).

In 2008, Bradley launched the Open Notebook Science Challenge, a crowd-sourcing project through which he called upon scientists to measure the solubility of different compounds and solvents and enter the information in an open database he had set up and to which he had made the initial contributions. Over 700 solubility measurements were collected through the Challenge. Bradley sought funding from prestigious organizations, such as the Royal Society of Chemistry, as well as industry to award cash prizes to Challenge participants. Both the database and following reaction database are available freely online through the UsefulChem website. The names of the students and participants are listed as co-authors.

Bradley made use of a free Wikispace account as a paper notebook in that a page of the wiki served as a lab notebook. The contents of the notebook were linked to the raw data that had been entered into a Google Spreadsheet. Bradley noted that: “The principle is that if anyone wants to find out what happened in any experiment we have done, they can simply go to the wiki and review all the details. And if the experiment included a calculation, they are automatically directed to the Google Spreadsheet containing the data” (Poynder, 2010).

Bradley incorporated his open science methods in his teaching classes and research labs. He advocated for open notebook science, and open science in general, in his scientific communications and in his public blogs and editorials. Acknowledged as a ‘hero of open notebook science’,21 Bradley is considered by many in the scientific community to be a pioneer of open notebook science, as well as a vocal advocate for open data in general (Shaikh-Lesko, 2014). In 2013, he was invited to the White House to discuss the role of open notebook science

20 http://precedings.nature.com/documents/39/version/1 21 Murray-Rust (2014) https://blogs.ch.cam.ac.uk/pmr/2014/05/19/jean-claude-bradley-hero-of-open-notebook- science-it-must-become-the-central-way-of-doing-science/

24 in allowing the determination the melting points of over 27,000 substances, many of which that had never before agreed upon.

Since he first coined the term, many open notebook science projects have emerged. Electronic laboratory notebooks (ELNs) have been developed for use as a scientific data infrastructure that allows metadata capture, provenance trails, and curation at source (for example, Bird, Willoughby, & Frey, 2013). Open notebook science practitioners practice in fields such as theoretical physics22 and genomics (Carter-Thomas & Rowley-Jolivet, 2016). Many acknowledge the pioneering work of Bradley. As an example, open notebook science researcher Professor Carl Boettiger, Department of Environmental Science, Policy and Management at University of California Berkeley, includes on his website the open notebook science logo created by Dr. Bradley’s research team.23 Summaries and links to several hundred active online open notebooks and experiments can be found within the sites such as OpenNotebookScience Challenge24 and OpenWetWare25 that allow for sharing of electronic lab notebooks in the biological sciences and engineering.

I consider Bradley and similar open data innovators as institutional entrepreneurs that have conducted institutional work to alter the institution of science from one of traditional data dissemination to open data. Despite studies on the development and adoption or obstacles to the adoption of open data paradigms, as well as literature on the formalization of open data repositories and agreements (Blumenthal, et al., 2006; de Silva & Vance, 2017; Fry, et al., 2009; Hedstrom & Niu, 2008; Nielson, 2009; Piwowar, 2011; Piwowar & Chapman, 2010; Richards & Winter, 2015; Vickers, 2006; Williams, 2008) little research exists that explores how open science entrepreneurs such as Bradley initiated and established their open data initiatives, changing the institution of traditional research data dissemination. The theoretical framework of institutional entrepreneurship and work, and multilevel institutional work are discussed in the next chapter.

22 For example, Garrett Lisi http://www.deferentialgeometry.org/ 23 http://www.carlboettiger.info/index.html 24 http://onschallenge.wikispaces.com/list+of+experiments 25 http://openwetware.org

Chapter 3 Theoretical Context

This chapter contains a general literature review that is divided into four sections, each of which contributes to the rationale for the study. The first and second sections present the theoretical context of institutions and institutional entrepreneurs, respectively, as well as concepts of the institution of science and of scientific entrepreneurs. The third section presents a discussion of institutional work. The final section presents multilevel institutional work as related to institutional change models.

3.1 Institutions, Institutional Theory, and Multi-level Institutions

An institution is broadly defined by formal rule sets and agreements, less formal shared norms and strategies, and taken-for-granted assumptions that organizations and individuals are expected to follow (Bruton, et al., 2010; Crawford & Ostrom, 1995; Thornton & Ocasio, 2008).26 Such societal-level structures include the market economy, the legal system, the family, religion and science as examples. Institutions are durable and “provide stability and meaning to social life” (Scott, 2001, p. 48) through the transmission of various types of carriers, including symbolic and relational systems, routines and artifacts operating at multiple levels of jurisdiction from global systems to interpersonal relationships (Scott, 2001). Each institution has overarching ‘logics’ or rules—practice, beliefs, regulations—that prescribe social organizational and individual behavior (Alford & Friedland, 1985). Institutions have their own infrastructure of occupations and professions, organizations and organizational fields (communities of organizations), and groups associated with these regulative, normative, and cultural-cognitive systems (Scott, 1995, 2001).

26 Although there is a widespread use of the term ‘institution’ in the social sciences, there is no consensus regarding the definition of the concept (Hodgson, 2007). Key concepts include rules, norms and social structures. Institutions are thus described as “any collectively accepted system of rules (procedures, practices) that enable us to create institutional facts” (Searle, 2005, p.21) and “systems of established and prevalent social rules that structure social interactions” (Hodgson, 2007, p. 2). Crawford & Ostrom (1995) consider that the boundaries of an institution depend on the theoretical question being asked as well as the time-scale being considered and the pragmatics of the research.

26

The shared norms and practices set an internal equilibrium that is inherent to institutions, i.e. they exist as shared understandings and resultant behaviours of individuals and do not require external enforcement (Crawford & Ostrom, 1995). In reviewing the analytical frameworks for institutions, Scott (2001) described three important pillars or elements of institutions—the regulative, normative and social-cultural systems—that form a continuum from the implicit to explicit and from legally-enforced to taken-for-granted mores that influence behaviours. The regulative systems includes rule-setting, monitoring and sanctioning activities, while normative systems are prescriptive and evaluative, including an obligatory element. Cultural-cognitive systems are the shared conceptions reflect the nature of social reality and meaning. Through such systems, each institution has its own overarching logic that provide the beliefs and rules for acceptable behaviour (see Thornton & Ocasio, 2008). Scott (2001) also highlighted the importance of the concept of legitimacy—social credibility and acceptability—as related to these systems as a “condition reflecting perceived consonance with relevant rules and laws, normative support, or alignment with cultural-cognitive frameworks” (p. 59). Berger & Luckmann (1967) considered legitimacy as a second order of meaning in that institutionalized practices develop as repeated behavioural patterns that summon shared meanings, connected to the wider cultural norms.

Institutional theory is concerned with the effects of social rules, norms and expectations on organizations and individuals (Greenwood, Hinings, & Whetten, 2014; Scott, 2001). Although scholarship on institutions, and the relationship of institutions and individuals, began in the late nineteenth century among several disciplines (see Scott, 2001), institutional theory also came to the fore in the study of organizations in the mid-1970s, building on theories of open systems as related to organizations, providing insight into the importance of the broader institution in shaping organizations (Scott, 2001). Organizations, such as universities, are social entities that are goal-directed, deliberately structured and coordinated systems that are also linked to the external environment (Daft, 2012). Organizational fields are sets of interdependent, differentiated organizations that are engaged in a similar function, for example, an educational system (DiMaggio & Powell, 1983; Scott, 2001). Institutional analysis in the 1970s to early 1990s emphasized institutional isomorphism (see Scott, 2001; Thornton & Ocasio, 2008) and studies focused on how institutional environments shape the structures and behaviours of organizations and individuals by embodying the common rules and norms that regulate

27 behaviour and provide structure to daily activities (Sine & David, 2010).27 Institutional theory emphasizes that institutions arise in good measure independent of local circumstances—deriving from wider socio-cultural environments that support and even require local structure around exogenous models and meanings (Meyer, Ramirez, Frank, & Schofer, 2005). The basis of institutional theory is that normative expectations and socially-shared assumptions often direct organizational decision-making and practice (DiMaggio & Powell, 1983; Meyer & Rowan, 1977; Zucker, 1977). The normative, regulative and cognitive dimensions of the institutional environment constrain mature organizations and the ability to adopt new structures within them (DiMaggio & Powell, 1983; Meyer & Rowan, 1977; Tolbert & Zucker, 1983).

Institutions define what is appropriate and therefore render actions socially acceptable and credible (legitimate), unacceptable or beyond consideration (DiMaggio & Powell, 1991). John Meyer and colleagues such as Brian Rowan in 1977, Richard Scott in 1983, and Lynne Zucker in 1977, formulated a neo-institutional theory that proposed that formal organizational structures are shaped by institutional forces such as rational myths, knowledge legitimated through the educational system and by the professions, public opinion, and the law in addition to the more traditional technical demands and resource dependencies (see Scott, 2004). The main tenant of neo-institutionalism is that organizations are completely embedded in social and political environments suggesting that organizational practices and structures are often either responses or reflections of societal rules, beliefs, and conventions. Thus, these studies turned the emphasis on legitimacy, organizational fields, templates, and schema (later institutional logics), institutional entrepreneurship, and institutional work (Greenwood et al., 2008). Institutional fields constitute the environment within which organizations operate (Garud, Hardy, & Maguire, 2007) and are “characterized by the elaboration of rules and requirements to which individual organizations must conform if they are to receive support and legitimacy” (Scott, 1995, p. 132). Institutional logics are the cultural beliefs and rules—a set of organizing principles comprised of “material practices and symbolic constructions” (Friedland & Alford, 1991, p. 248)—that guide the behaviour and decision-making of organizations and actors in similar ways in response to

27 The study of organizations as a recognized field of study arose in the 1940s, building on the scholarship of institutional theorists (Scott, 2001). Organizational theory focuses on the organizational level of analysis with a concern for groups and the environment (Daft, 2012). It does not consider the behaviour of individuals but their aggregate behaviour in groups or within the organization.

28

institutionalized norms and practices (Thornton & Ocasio, 1999; Thornton, Ocasio, &

Lounsbury, 2012).28

Scholars have described institutions as embedded or nested systems across several societal levels (Holm, 1995; Lawrence & Suddaby, 2006; Meyer & Rowan, 1977; Scott, 1995, 2001) from micro-level institutions in groups and organizations that regulate forms of interaction among members, to mid-level institutions such as fields that are associated with professions or industries, to macro-level societal institutions concerned with, for example, the role of family. Organizational fields, organizations, and individuals are embedded within various contexts that shape behaviour such as the nation state and professional associations (Meyer & Rowan, 1977). Governments and international agencies can exercise authority over organizational fields and institutions and can take a variety of direct regulatory actions—for example, by allocating resources and exercising regulatory controls through their agencies—or by exerting their influence through normative pressures to induce change in organizations (Scott, 2001).

Scholars have identified institutional processes occurring at different institutional levels and actors. For example, DiMaggio & Powell (1983) identified nation states and the professions as a particularly important actors that exercises authority. Agencies of a nation state can take a variety of actions to affect behavior such as allocating key resources, imposing taxes and regulatory controls, and setting institutional structure. Professions are groups that have claim to formal knowledge and exert control via setting cultural-cognitive and normative processes, and “by defining reality—by devising ontological frameworks, proposing distinctions, creating typifications, and fabricating principles or guidelines for action” (Scott & Backman, 1990, p. 290).

Scott (1995, 2001) summarized six institutional levels of analysis at which institutional theory is applied: world system, society, organizational field, organizational population, organization, and organization system. As an example use of a multi-level approach, using Scott’s (1995) conceptual model of multilevel institutional theory to reflect on higher education, Austin & Jones (2016) identified the societal level (for example, world organizations such as the OECD),

28 A review of the institutional logic and work perspectives, their strengths and shortcomings, as well as possible links is presented by Zilber (2013).

29 second-level structures of organizational fields (higher education, for example) and organizations (such as universities), and lower level individuals or groups (such as faculty members or faculty committees). By conforming to the logics of an organizational field such as higher education, organizational forms such as professional norms and practices are legitimated (Austin & Jones, 2016).

Science has been described as a social institution whose primary purpose is to advance and produce knowledge (Crane, 1972; Storer, 1966; Ziman, 2000). The institution of science embodies institutional logics—a set of characteristic values, norms and practices—that prescribe its function and the behavior of organizations and scientists. Robert K. Merton identified such a prescriptive set of values and norms for science that he argued were functional, in terms of advancing knowledge, as well as morally binding on the scientist as a professional (Merton, 1942, 1957, 1973). Merton’s four idealized norms reflect the ‘rules’ that define the appropriate values, beliefs, attitudes and behaviours by which science operates: scientific results should be openly published in a timely fashion (communism), knowledge findings should be subject to impersonal evaluation criteria (universalism), personal interests should not appear in scientific procedures (disinterestedness), and questioning and criticism should be allowed and encouraged (skepticism). This normative structure emphasizes the relative autonomy, neutrality and rationality of modern science and its insulation from other societal influences as well as defining the desired prerequisites for the production of objective knowledge, open communication and accessibility that are imperatives for scientific integrity (Merton, 1942, 1957, 1973, Ziman, 2000). Storer characterize science as the “organized social activity of men and women who are concerned with extending man’s body of empirical knowledge through the uses of these techniques” (1966, p.3).

Data can be considered as one of the foundation of science as a critical part of the research process involves the collection and analysis of data to make inferences and support theories. The institution of science also has an institutional logic in terms of the norms and practices related to the dissemination of research data and findings. In conducting research, scientists produce, use, collect, compile, store, and interpret data in various forms (Gitelman, 2013). Mayernik (2016) noted the importance of an institutional perspective for the study of research data practices, in particular that the processes for the creation, documentation, managing and sharing of data are present institutional levels including within that of organizations (universities and research

30

centres, for example), at the disciplinary level (professional guidelines and funding agencies), as well as for the norms of modern science. He noted that scientists encounter data practices and that are embedded within these institutional levels through their education and personal interactions over time.

As discussed in Section 2.1, openness in the logic of traditional science emphasizes the norm or practice of the public dissemination of successful research results within papers in published in peer-reviewed journals (Carey, 2013; David, 1998; Nielsen, 2011; Ziman, 2000). The process of publication occurs after the research has been completed and includes the sharing of research findings that support a study’s conclusions (Resnik, 2006). One of the primary incentives in this tradition is a strong norm of priority and recognition that is based on confirmed claims of priority (den Besten, et al., 2010), scientists traditionally consider their data as private, and may share only some data and results in their scientific publications (de Silva & Vance, 2017).

The institutional logic of open data, referring to the process of releasing both raw and processed research data, enabling others to analyze and use it without restriction (Gewin, 2016; Kitchin, 2014; Molloy, 2011; Murray-Rust, 2008; Willinsky, 2006), subject at most to citation/attribution and/or share-alike (Pollock, 2006) has been spurred on in recent decades by technological advances as well as a growing interest in making the results of scientific inquiry readily accessible in order to maximize its value, and increase data transparency and access as discussed in Section 2.3.

The institution of science exists at the global level of the scientific community. In advancing knowledge, science is fundamentally an international activity as its products are evaluated in the global arena (Crane, 1971; Ziman, 2000). Science can also be considered as a multi-level or embedded institutional system. Aligning with Scott’s (1995, 2001) conceptual model of varying levels of institutional analysis, individual scientists and groups can be considered as the organizational subsystem or lower level, academic units and universities as the organizational level, and educational systems and disciplinary scientific communities as an organizational field level. There are also exist a diverse array of international scientific organizations and associations that connect scientific disciplines globally. A broader societal level is reflected when science interacts with non-science actors such as within areas of policy, industry or with society at large.

31

Science is organized around many disciplinary areas each of which consists of communities of scholars that connect formally through publications, associations and agreements, as well as informally through collaborations, conferences and meetings, correspondence and pre-prints. Members of a disciplinary community come together by their interests and commitment to a particular research area toward a set of common problems (Crane, 1972). Within the sociology of science, Crane named the later form of informal network amongst scientists as the ‘invisible college’ (1971, p. 585) in which “scientists in a research area maintain continual contact with each other in order to monitor recent developments in the area and adjust their activities accordingly” (1971, p. 585).29 Invisible colleges act as a communication network within a discipline in which researchers are connected by strong ties of informal collaboration that facilitate diffusion of information.30In studying mathematics authors and rural sociology authors in the United States using bibliometric methods and questionnaires to find emergent relationships through sociometric methods, Crane found that the social interaction through these networks played a role in scientific growth or the process of knowledge diffusion. In relation to the promulgation of open data, scholars have studied the importance of subject-related consortia (Cutcher-Gershenfeld, et al., 2017; Kim & Stanton (2016). Cutcher-Gershenfeld, et al. (2017) studied more than a dozen scientific consortia concluding that when they work well, consortia serve as catalysts to accomplish what individuals and organizations cannot do alone with respect to data sharing. Studies have also examined the role of scholarly journals in the adoption of open data through both the creation of regulations for open data as discussed in Section 2.4, as well as providing incentives for open data acknowledge open data contributions (de Silva & Vance, 2017; Kim & Stanton, 2016; Rowhani-Farid, et al., 2017).

At the organizational level, science is conducted by individuals and groups at a variety of enduring organizations including universities, colleges, government and private research agencies, as well as in corporations.31 For this study, the most relevant organization is that of the

29 Crane (1972) conducted a study of mathematics authors and rural sociology authors in the United States using bibliometric methods and questionnaires to find emergent relationships through sociometric methods. 30 The concept of disciplines is, however, complex as disciplines can exhibit fluid boundaries over time (Mayernik, 2016). 31 Scholars have also considered these organizations at an institutional level, for example, universities as institutions (Meyer, Ramirez, Frank & Schofer, 2006).

32 university in which the practices and rules reflect the logics of the institution of science, professional norms, the higher education sector, as well as government regulations (Clark, 1983). Austin & Jones (2016) examine how institutional theory assists in understanding university practices and governance structures noting that the concept of an organizational field that reflects a community’s organization “resonates with universities given the common elements associated with these organizational forms” (Austin & Jones, 2016, p. 27) and is reflected in their adoption of similar behaviours and practices. They also highlight that in organizations such as universities in which social, cultural, legal, and political demands are prevalent, rewards are associated with conforming to the expectations of social institutions (Hatch, 2006) in order to gain acceptance and legitimacy (DiMaggio & Powell, 1983; Suchman, 1995). Organizations such as universities incorporate institutional norms and rules into their own structures and practices becoming homogeneous with respect to other universities (Austin & Jones, 2016).

Science also intersects at the societal level with the nation state in terms of research funding policy (see de Silva & Vance, 2017; Ritvala & Granqvist, 2009; Spruijit, et al., 2014), industry (for example, Etzkowitz & Leydesdorff, 1997) and society at large. As discussed in Section 2.4, the role of national funding agencies has been important for the promotion and legitimation of open data practices, with the goal of sharing data generated through public funds in a timely manner. Academic organizations as well as individual scholars, through their connections with government agencies and professional organizations, can play a role in influencing government policies, as well as professional standards, practices and ethics.

At the most basic level in the institution of science, are the individual actors such as researchers within an organizational setting such as a university that can be constrained or enabled by the norms and rules within the levels of the institution of science as well as from the organizational (Austin & Jones, 2016; Casati & Genet, 2014). One norm in this organizational environment is the expectation of professional autonomy in researchers’ pursuits, as well as a high level of specialization in their scientific area of inquiry (Austin & Jones, 2016).32 Scientists can influence policy beyond their organization both informally through the results of their research and

32 An additional norm, at times perceived as constraining, is that of achieving academic tenure in terms of scholarly productivity (see Lawrence, Celis, & Ott, 2014).

33

formally through by providing expert advice (Jones, 1983; Stoutenborough, Bromley-Trujillo, & Vedlitz, 2015) and by announcing new research findings, noting a policy issue and/or possible policy instruments, or via efforts to emphasize scientific consensus on a pending policy issue (see Stoutenborough, et al., 2015). The individual actors that are most relevant for this study are the open science entrepreneurs that are considered as institutional entrepreneurs and discussed in the next section.

3.2 Institutional Change and Entrepreneurship

Although institutions as durable, ensuring stability and order, they can undergo change as well as remain stable (Scott, 2001). Scholars have been interested in institutional change and how new organizational forms emerge and are diffused since the mid-twentieth century (Scott, 2001; Thornton & Ocasio, 2008) as these processes are considered to be a critical source of innovation in society (Schumpeter, 1942; Stinchcombe, 1965). Within the institution of science, scholars have examined how new knowledge is diffused in particular through cognitive paradigm shifts and social interaction (Crane, 1972; Kuhn, 1996).

Given that one of the key characteristics of institutions is their stability, permanence, and resistance to change (DiMaggio & Powell, 1983) the question of how actors embedded in an institutional field—with its regulative, cognitive and normative pressures—are able to envision new practices and act on them was identified as the paradox of embedded agency arising from institutional theory (Hardy & Maguire, 2008). Scott (2001) describes the early institutional scholars in sociology such as Talcott Parsons, and efforts to encapsulate the role of normative institutional frameworks and how they are internalized with individuals conforming to an institutional norm or value. In the 1990s, researchers began to question neo-institutional theory in its ability to explain endogenous, rather than exogenous, institutional change (DiMaggio, 1988; Fligstein, 1997).33 The concept of institutional entrepreneurship emerged as one mechanism to explain such change, i.e. to understand how actors instigate and contribute to

33 Early institutional scholars in sociology had emphasized the importance of the interdependence of individuals and institutions. In particular, working in the early to mid-twentieth century, Scott (2001) described the work of Cooley and Hughes in putting forward interdependent models of interaction of individuals and institutions.

34

change within institutions despite resistance (Battilana, et al., 2009; DiMaggio, 1988; Greenwood & Suddaby, 2006; Maguire, et al., 2004; Thornton & Ocasio, 2008; Zilber, 2013). One of the key concepts in the literature is that of ‘embedded agency’ (Greenwood & Suddaby, 2006; Holm, 1995; Seo & Creed, 2002) that reflects how actors who are constrained by institutions are nonetheless able change those institutions.

Institutional entrepreneurs are the critical actors that change existing practices and/or introduce new practices, beliefs, or values and then ensure that these are diffused and become adopted more widely by other actors in the field (Hardy & Maguire, 2008).34 The term was introduced by DiMaggio to describe actors who initiate changes to either transform existing or create new institutions (DiMaggio, 1988). Research has moved away from the initial perspective of the institutional entrepreneur as a ‘heroic’ individual agent to the study of how institutional entrepreneurs engage in strategic interventions and practices such as the mobilization of constituencies and resources to promote institutional change (Battilana, et al., 2009; Hardy & Maguire, 2008; Lawrence, 1999; Lawrence & Suddaby, 2006; Rao, Morrill, & Zald, 2000).35

Institutional entrepreneurs “break away from scripted patterns of behaviour” (Dorado, 2005, p. 388) and aim “to develop strategies and shape institutions” (Leca & Naccache, 2005, p. 627). They are integral to the building and establishment of new practices, forms and structures at the organizational level (Battilana, 2006; Battilana et al., 2009; Garud & Karnøe, 2003; Greenwood & Hinings, 2006; Tolbert, David, & Sine, 2011). These individuals have a high degree of agency (Westenholz, 2009) where agency is defined as an actor’s engagement with society such that their actions can reproduce and/or change an environment’s structure (Battilana & D’Aunno, 2009; Emirbayer & Mische, 1998), changing the rules or norms or the distribution of resources (Scott, 2001). Agency is reflected in the “interpretive processes whereby choices are imagined, evaluated, and contingently reconstructed by actors in ongoing dialogue with unfolding situations” (Emirbayer & Mische, 1998, p. 966).

34 Lawrence, Leca & Zilbert (2013) review the literature exploring who engages in institutional work highlighting the role of professionals. Suddaby & Viale (2011) reviewed work on professions and institutional change, and found four ways in which such change occurs through professionals: 1) use of ‘expertise and legitimacy to challenge the incumbent order’; 2) use of ‘their inherent social capital and skill to populate the field with new actors and new identities’; 3) introduction of ‘nascent new rules and standards’; and 4) management of ‘the use and reproduction of social capital within a field’ (p. 423). 35 In addition to the actions of actors themselves, the literature explores the role of existing conditions that also enable or facilitate change. These include field conditions and actors social position (Battilana, et al., 2009) and precipitating jolts that can be social, technological, or regulatory (Greenwood, et al., 2002).

35

Institutional entrepreneurs lead collective attempts to bring new beliefs, norms and values into social structures often taking risks to put forward a larger cause or public good (Rao & Giorgi, 2006; Tolbert, et al., 2011). They are actors that are able to “contextualize past habits and future projects within the contingencies of the moment” (Emirbayer & Mische, 1998, p. 963). Garud, et al. suggest that, in order to qualify as institutional entrepreneurs, individuals “must break with existing rules and practices associated with the dominant institutional logic(s) and institutionalize the alternative rules, practices or logics they are championing” (2007, p. 962). Full institutionalization can occur when a new idea is legitimized and is taken-for-granted as natural and appropriate arrangement (Greenwood, Suddaby, & Hinings, 2002).

As change agents, institutional entrepreneurs perceive institutional complexities, ambiguities, and contradictions within and between logics and take advantage of resulting opportunities (Seo & Creed, 2002; see Thornton & Ocasio, 2008), initiating change and seeking legitimacy for new endeavors (Bruton, Ahlstrom, & Li, 2010). They motivate others to collaborate by initiating change that allows for a shared sense making and identity (Suddaby & Greenwood, 2005; Westenholz, 2009). They make use of both symbolic and material means from their environments (Thornton & Ocasio, 2008). In addition to investing time and resources in technological and social innovations, they may also enjoy a social status that gives them legitimacy that can cross over to these causes that they promote. They “creatively manipulate social relationships by importing and exporting cultural symbols and practices from one institutional order to another” (Thornton & Ocasio, 2008, p. 115).

Studies of institutional entrepreneurs explore how institutions are changed or reassembled and practices and standards are altered through the role of individuals and agency (DiMaggio, 1988; Dorado, 2005; Garud, et al., 2007; Kalantaridis & Fletcher, 2012; Seo & Creed, 2002; Thompson et al., 2015; Tracey, et al., 2011; Westenholz, 2012) and the wider environment that both defines and creates opportunities for change (Battilana, et al., 2009; Hwang & Powell, 2005). Since the 1970s, when entrepreneurship and its various forms became a popular field of systematic research, a variety of methodological approaches have been employed. Research methods have been mainly descriptive, empirical and based on structured surveys (Bygrave, 2002; Neergaard & Ulhoi, 2007). In response to initial quantitative survey-based research on entrepreneurship, there were calls for studies using qualitative research methods in order “to develop concepts that enhance the understanding of social phenomena in natural settings, with due emphasis on the

36

meanings, experiences and views of all participants” (Neergaard & Ulhøi, 2007, p. 4). For example, Greenwood & Suddaby (2006) studied institutional entrepreneurship in the accounting industry that provided advice to corporations and governments on business matters, showing how the institutional entrepreneurs within this mature organizational field used strategies to change institutional logics and bring about change in the organizational form towards multidisciplinary accounting practices. Scholars have proposed consideration of scientists as institutional entrepreneurs as the descriptions of institutional entrepreneurs well describe scientists. Frestedt (2008) considered that, as entrepreneurs, scientists anticipate the next stages of scientific development and constructing future trajectories.36 Crane (1971) noted that, as a group, scientists wield very little political or economic power and exert influence based on their expert knowledge. Casati & Genet (2014) proposed that principal investigators can be considered as scientific entrepreneurs that engage in practices to innovate and problem solve, shape new paradigms and models, and broker or animate new ways for science to interact with society. Through interviews and document analysis, they studied twenty principal investigators in the field of nanotechnology working at universities and a national lab in order to gain insight into their scientific careers. In particular, they sought information on four key patterns of the scientists’ actions: producing science, building legitimacy, interacting with actors and communities, and envisioning. Based on their study, Casati & Genet defined scientific entrepreneurs as scientists who work within academia to not only conduct research but are “also involved in acquiring resources from different sources (funding agencies, firms, professional associations, etc.), in combining internal and external resources to shape scientific avenues, and in gaining legitimacy for these new avenues by organizing workshops, conferences, special issues or setting up new journals, building on their scientific reputation to transfer it to other networks (economic, business, policy makers)” (2014, p. 24). They also found that, although scientists were involved to a greater or lesser extent in each set of practices, the emphasis was different depending on the stage of their careers: while new principal investigators were focused on scientific production, as they gained tenure or professorships, their roles became diversified with some undertaking a greater role in the broader scientific community (managing academic

36 Scientific entrepreneurship is used as a distinct term from ‘academic entrepreneurship’ that is usually defined as contributions made by university research to society, that includes university scientists engaged in commercial activities including technology transfer (Casati & Genet, 2014)

37 associations or being editor of journals), in university or research organizations management, knowledge mobilization, or influencing interactions between science and society.

3.3 Institutional Work

The topic of how change is initiated and legitimized by entrepreneurial individuals has only recently begun to be studied within the framework of institutional work. Researchers have described different forms of institutional change, often incorporating them within proposed models (Greenwood, et al., 2002; Perkman & Spicer, 2007, 2008; Tracey, et al., 2011). Lawrence (1999) defined institutional change strategies as “patterns of organizational action concerned with the formation and transformation of institutions, fields and the rules and standards that control those structures” (Lawrence, 1999, p. 167). Building on this definition, Lawrence & Suddaby (2006) coined the term institutional work as a conceptual umbrella to frame the purposive action individuals and organizations to create, maintain and disrupt institutions by “redefining, recategorizing, reconfiguring, abstracting, problematizing and, generally, manipulating the social and symbolic boundaries that constitute institutions” (p. 238). They considered the theoretical foundations of institutional work as twofold: firstly, stemming from institutional theory and agency and secondly from the sociology of practice, emphasizing both agency of actors and the structure of institutions. The study of institutional work is an important contribution to the study of entrepreneurship because the study of the practices shed light on how individuals take action that results in significant institutional changes (Battilana, et al., 2009).

Institutional work is discussed as the change or ‘disruption’ of institutions (Lawrence & Suddaby, 2006), ‘intervention strategies’ (Hardy & Maguire, 2008), ‘mechanisms’ (Thornton & Ocasio, 2008), and ‘instigation of divergent change’ or ‘divergent change implementation’ (Battilana, et al., 2009). Lawrence & Suddaby (2006) reviewed the empirical literature that examined organizations for descriptions of change and catalogued distinct forms of institutional work through which individual actors engaged in institutional change.37 Although

37 Examples were drawn from the film industry (Leblebici, Salancik, Copay, & King, 1991) and Norwegian fish industry (Holm, 1995) for institutional work of disconnecting sanctions and rewards, employment in Japanese firms

38

acknowledging that it is at times difficult to distinguish institutional change or disruption from institutional creation processes (Scott, 2001), in creating new institutions, they found that institutional entrepreneurs engage in advocacy, theorizing, mimicry, educating, etc. In disrupting institutions, they found institutional work related to disconnecting sanctions, disassociating moral foundations, and undermining assumptions and beliefs. 38

Suddaby & Viale (2011) extended the review of institutional change by examining the institutional work of professionals as key causal agents for institutional change. They considered that, because of their role in serving elites, professionals “are sensitive to and able to reproduce social structures of hierarchy and status” (2011, p. 436) that are critical for institutional change. Based on a literature review on institutional change in which professionals played a key role, they identified critical dynamics through which professionals changed institutions including the use of their expertise and legitimacy to challenge existing structures and formulate new ideas or practices and define a new ‘uncontested space’ and the use of their social capital and skill to inhabit the field with new actors and ideas.

The institutional work of individual institutional entrepreneurs have been studied in a variety of areas including in the adoption of new HIV/AIDS guidelines (Maguire, Hardy & Lawrence, 2004), in the health sector (Nigam, 2013; Ritvala & Granqvist, 2009) and food sector (Rao, Monin, & Durand, 2003), in social innovation (Tracey, et al., 2011), management fashions (Perkmann & Spicer, 2008), in the academic context (Symon, Buehring, Johnson & Cassell, 2008), and in the sustainability industry (Thompson, et al., 2015).

Institutional work encompasses a range of activities from problem identification, theorization, use technical demonstrations of efficiency and effectiveness, naming and creation of new symbols, negotiations and advocacy, incentivizing, forging new alliances, establishing standards of practice, and attaching new practices to pre-existing organizational routines (see reviews:

for disassociation (Ahmadjian & Robinson, 2001), and for the film industry (Leblebici, et al., 1991) and coal industry (Wicks, 2001). 38 Recent research suggests that although associations and individuals engage in similar strategies, they differ in their scale of enactment focus more on the scale of personal interaction (Thompson, et al., 2015).

39

Battilana, et al., 2009; Hardy & Maguire, 2008; Lawrence, 1999; Lawrence & Suddaby, 2006). Some examples of types of institutional work are described below.

The term theorizing was first defined by Strang & Meyer (1993) who suggested institutional entrepreneurs generate and develop an account of the faults of a current practice and champion alternative practices as more effective/efficient, necessary and/or culturally appropriate (Greenwood, et al., 2002; Hardy & Maguire, 2008; Suddaby & Greenwood, 2005; Tolbert & Zucker, 1996). Through theorization “institutional roles and practices are abstracted into comprehensive and compelling theoretical models that foster institutional change and the subsequent diffusion of those roles and practices” (Mena & Suddaby, 2016, p. 1669). The term arises from institutional theory and is similar to the legitimating concept of framing in social movement theory (Hardy & Maguire, 2008; Battilana, et al., 2009) through which new practices can be justified as indispensable, appropriate, and valid (Rao, 1998).

Theorization of roles and practices is the critical work in which institutional entrepreneurs render their ideas comprehensible to others in a compelling way that resonates with others (Greenwood, et al., 2002; Strang & Meyer, 1993; Tolbert & Zucker, 1996). Ideas are aligned with prevailing normative prescriptions allowing them to have a ‘moral legitimacy’ and/or the innovations are argued to have functional superiority or ‘pragmatic legitimacy’. Greenwood, et al. (2002) further argued that a demonstrated conformity of an innovation with existing values is important. Perkman & Spicer (2007, 2008) proposed that institutional entrepreneurs frame institutions in ways that appeal to wider audiences and similarly, Battilana, et al. (2009) identified the articulation or development of a vision of change as a key change process including making the case and sharing the vision for change through diagnostic, prognostic, and motivational framing. Hardy & Maguire considered the interpretive struggles of actors over meaning including translation in which actors are viewed as “active interpreters of practices” (2008, p. 205). Perkman & Spicer (2007, 2008) proposed that the work also involves technical activities to create models and shared world views.

Theorization occurs via a two-fold process that encompasses specifying the need for a new practice and justifying it through the provision of an explanation as to how the new process

40

meets the needs, i.e. setting out the moral and/or pragmatic legitimacy of the proposed change in ways that will resonate with others (Battilana, et al., 2009; Greenwood, et al., 2002; Hardy & Maguire, 2008; Perkman & Spicer, 2007, 2008; Tracey, et al., 2011). Specification focuses on the need for change and can be broken down into several components (Battilana, et al., 2009; Hardy & Maguire, 2008). These include the identification of the problem and an account of why it is important (punctuation) and the diagnosis of the problem in terms of how it has come about (elaboration). The failings of a current practice can be decried as inefficient, ineffective, out-of-date, or unjust. These failings can be generalized to a profession by indicating the profession is under threat and/or subject to forces for change (Hardy & Maguire, 2008; Strang & Meyer, 1993; Suchman, 1995; Tolbert & Zucker, 1996). The specification of an institutional failing supports the justification of the need for the proposed local innovation and solution by providing for the latter’s moral legitimacy by either aligning of a practice with normative prescriptions and current social context and/or asserting how the new practices are superior in a functional or pragmatic manner (Tolbert & Zucker, 1996).39

In emerging fields, where shared norms do not yet exist, Maguire, et al. (2004) proposed that institutional entrepreneurs frame a variety of reasons in order to satisfy a group of diverse stakeholders and be more influential than putting forward a single justification. In studying how institutional entrepreneurs motivated the adoption of new practices related to HIV/AIDS, the researchers found that they used persuasive argumentation to frame problems and justify solutions with different stakeholders in order to develop a broad support base. Similarly, broad mobilization and involvement of a variety of community stakeholders was a crucial strategy in the institutionalization of new eating habits as part of the Finnish heart study (Ritvala & Granqvist, 2009).

In order for a new practice to be adopted, the specification and justification of the innovation must be presented in a compelling way by institutional entrepreneurs (David,

39 In social movement theory, justification aligns with the concept of prognostic framing in which a project or practice is cast as superior to a current arrangement, as well as the concept of motivational framing in which actors provide compelling reasons and/or a vision to support a new practice (Battilana et al., 2009; Hardy & Maguire, 2008).

41

Sine, & Haveman, 2013; Greenwood, et al., 2002; Hardy & Maguire, 2008). The innovation can be justified as a morally legitimate, practically or pragmatically superior, aligning with societal or cultural practices, or better reflecting professional values and standards (Dejean, Gond, & Leca, 2004; Thompson et al., 2015; Tolbert & Zucker, 1996). For professionals, theorization includes debate within the profession as well as the reframing of professional identities as they are presented externally (Greenwood, et al., 2002). The justification is presented in a way that resonates with internal and external stakeholders and encourages actors to participate in change (Hardy & Maguire, 2008; Thompson, et al., 2015). Institutional entrepreneurs can be altruistic in expressing concern for an innovation’s broad social benefits rather than their own welfare (David, et al., 2013). Change can be presented as a natural, and even inevitable, progression (Greenwood, et al., 2002). As the current organizational failings are explicitly linked to new potential solutions, associated change strategies are legitimating account that usually combine normative- and interest-based appeals.

Counterfactual thinking and opportunity recognition: Institutional entrepreneurs envision new forms and independently innovate, introducing new ideas and the possibility of change. In their literature review of institutional change, Greenwood, et al. (2002) noted the distinctive change associated with the emergence of new players and ascendance of actors that introduce new ideas and the possibility of change (‘deinstitutionalization’) and independent innovation (‘pre-institutionalization’). Counterfactual thinking is set on individuals’ challenging assumptions, revealing causes, and generating creative solutions to a specific problem (Gaglio, 2004; Tracey, at al., 2011). Identification of problems and engagement in counterfactual thinking to come up to develop a novel solution has been documented in several studies of institutional entrepreneurs (Karlesky, 2015; Tracey, et al., 2011). In an emerging field, new solutions can be drawn from forms of expertise that are accepted in other fields but have not yet been applied to the particular problem (David, et al., 2013; Tracey, et al., 2011), aligning with concepts of borrowed logics from one institution to another (Friedland & Alford, 1991;Thornton & Ocasio, 2008). As an example, Tracey, et al. (2011) conducted a case study of two institutional entrepreneurs that established a catalogue business that employed homeless people as a new organizational form to tackle homelessness in the United Kingdom. They found that the institutional

42 entrepreneurs conducted institutional work of opportunity recognition through the two processes of problem framing and counterfactual thinking to develop an alternative solution to the support of the homeless. The entrepreneurs framed the issue of homelessness, not as a simple lack of housing but as complex set of factors, both individual and social. They recognized that current societal supports did not address this set of factors, and through counterfactual thinking combining of institutional logics, they developed alternative solutions underpinned by a new, hybrid logic. Similarly, Gregoire & Shepherd (2012) suggested that opportunities are identified when entrepreneurs perceive a match between a new supply and a market where a new method of supply can be introduced.

Technical demonstrations of efficiency and effectiveness: Within their persuasive arguments, institutional entrepreneurs often justify an innovation on grounds of increased efficiency and effectiveness. They may bring measurement techniques and quantification to the fore in order to reduce uncertainty and ensure that stakeholders view the proposed change as better in some way (Déjean, et al., 2004; Ritvala & Granqvist, 2009; Nigam, 2013; Thompson, et al., 2015; Tracey, et al, 2011; Zilber, 2007). Technical demonstrations may include the creation of new methods of quantification (Déjean, et al., 2004). Quantification involves the reduction of an object to subsets of elements and analysis, whereby the measurement is a mechanism to establish its legitimacy according to existing values (Kondra & Hinings, 1998).

Naming and creation of new symbols: Institutional entrepreneurs can create symbols such as pictures, diagrams, and logos, as well assigning new names to their innovations, in order to assist in the symbolic sharing new ideas and also to enable a collective sense of identity with an initiative (Lounsbury & Glynn, 2001; Zilber, 2007; Thompson, et al., 2015).

Creations of standards of practice: Scholars have emphasized the regulatory facets of institutions (Lawrence & Suddaby, 2006; Scott, 2001). Regulatory processes include the establishment of rules, inspection of conformity to rules, as well as the development of sanctions and/or rewards in order to influence behaviour (Lawrence & Suddaby, 2006).

43

Forging new relations, alliances, coalitions and associations: Institutional entrepreneurs establish new relations with like-minded actors through individual relationships, collaborations, alliances, trade or professional associations and coalitions in order to enhance their legitimacy and/or advance a change via collective action (Battilana et al., 2009; Garud, et al., 2007; Hardy & Maguire, 2008; Ritvala & Nyquist, 2009; Stuart, Hoang, & Hybels, 1999; Thompson, et al., 2015). They often forge new relations to individual legitimate actors and/or may choose to act collectively by sharing responsibilities, networks, and resources in order to increase their resource-power and/or legitimacy (Stuart, et al, 1999) so that they may reshape the institution in a way that they could not have done alone. They can enhance the legitimacy of a new practice by mobilizing support on interpersonal level with key constituents such as highly embedded agents, and respected professionals, policy makers, government officials and experts as well as with associations (David, et al., 2012; Thompson, et al., 2015; Tracey, et al., 2011) who operate at the centre of a field (Battilana, et al., 2009). These affiliations signal to others their personal reputations as being legitimate through the implicit sanctioning by elites and professional organizations.

Membership rules: In institutionalizing a new practice, institutional entrepreneurs can highlight and define their new practice by creating institutional boundaries with rules of membership (Lawrence, 1999; Lawrence & Suddaby, 2006). These can provide legitimacy to an institution and provide value to actors associate with the new practice with the perception of the formation of an elite group as well as the explicit expansion and visibility of the space within which the expertise related to a new practice is considered (Lawrence, 1999).

Attaching or aligning practices to pre-existing organizational routines and values: In order to minimize the objections to a new initiative and enhance opportunities for support, institutional entrepreneurs can explicitly align with pre-existing organizational routines and values (Maguire, et al., 2004) and/or connect with broader societal narratives (Lawrence & Suddaby, 2006). Institutional entrepreneurs reaffirm the alignment of new practices with important stakeholder values on an ongoing basis (Hardy & Maguire, 2008).

44

Mobilization of resources including financial, material, political (state and professional associations), and cultural resources (Hardy & Maguire, 2008): Institutional entrepreneurs can mobilize a range of resources including material resources to sustain a project or encourage others (Battilana, et al., 2009; Hardy & Maguire, 2008), and the support from actors in order to increase their social position or collective action. Mobilizing resources involves securing material resources from a variety of sources in order to sustain initiatives and provide financial or other incentives (DiMaggio, 1988; Hardy & Maguire, 2008). This can include social investors and firms (Tracey, et al., 2011), government agencies (Nigam, 2013), as well as through the formation of associations through which members contribute funds (Thompson, et al., 2015). Mobilizing support encompasses gaining other actors’ support of new practices, as well as motivating others to engage in activities to support the change. Institutional entrepreneurs specialize in mobilizing and leveraging resources through political negotiation and advocacy; bargaining, compromise, and negotiation; and coalition-building (DiMaggio, 1998; Dorado 2005; Perkman & Spicer, 2007).

Bargaining, compromise and incentivizing: Institutional entrepreneurs mobilize allies and material resources through bargaining, negotiating, and compromise in order to ensure that various stakeholders agree to support a new practice, or at least not undermine it (Battilana, et al., 2009; Hardy & Maguire, 2008). Hardy & Maguire noted that institutional entrepreneurship involves dependency on other actors and resources and this makes bargaining and negotiation inevitable (2008). Institutional entrepreneurs operate through such exchange mechanisms in which support for a project relies on the perception that there will be tangible or intangible benefits forthcoming (Colomy, 1988).

Political negotiations and advocacy: Institutional entrepreneurs mobilize political and regulatory support through direct and deliberate techniques of social suasion of political negotiation and advocacy (Maguire, et al., 2004).

3.4 Multilevel Institutional Work

As discussed in Section, 3.1, scholars have put forward the model of institutions assystems existing at many levels from the micro-level of groups and organizations, to field-level (meso-

45 level) institutions associated with professions or industries, and the macro-level of societal or global institutions (Holm, 1995; Lawrence & Suddaby, 2006; Scott, 1995, 2001). In coining the term institutional work, Lawrence & Suddaby (2006) highlighted an important possible research direction in considering institutional work within institutional system. The ‘multiple embeddedness’ of institutional entrepreneurship is also linked to calls for multilevel studies of work conducted at different institutional levels (Battilana, et al., 2009; Kaghan & Lounsbury, 2011; Lawrence, et al., 2011). The lack of attention that institutional theorists have paid to study of levels and level interactions in organizational and change studies more broadly has been raised as an issue (see Bitektine & Haack, 2015).

Studies and reviews have proposed models of change that include institutional entrepreneurs engaging in institutional work in a discursive manner (Battilana, et al., 2009; Hardy & Maguire, 2008), while others have suggested a temporal order (Greenwood, et al., 2002; Perkman & Spicer, 2007). In terms of temporal order, scholars have suggested change occurs in a particular order, beginning with interactional activities that build networks and organization, followed by increasing importance of the technical and then cultural processes (Greenwood, et al., 2002; Perkman & Spicer, 2007). Building on the work of Strang & Meyer (1993) and Tolbert & Zucker (1996), Greenwood et al. (2002) first proposed a six-stage institutional change model. In addition to, and following, an exogenous precipitating jolt stage, their model included stages of deinstitutionalization (the emergence of new players and ascendance of actors that introduce new ideas and the possibility of change), pre-institutionalization (independent innovation); institutionalization including theorization (the specification and justification of an innovation through explication of its moral and/or pragmatic legitimacy) and diffusion (the increasing objectification of the change, and the imparting of its pragmatic legitimacy and value); and the end stage of re-institutionalization (cognitive legitimacy and taken-for-granted as the natural and appropriate arrangement). Although the stage model of institutional entrepreneurship put forward by Greenwood, et al. (2002) was not presented as a multilevel model, it can be considered that the pre-institutionalization stage is one that is at the individual and local level, whilst theorization and diffusion occurs at a meso- or macro-level followed by cognitive legitimacy.

Other scholars have highlighted the multi-directional nature of institutional change as related to institutional levels. Scott (1995) highlighted the top-down and bottom-up process of institutional change between societal and global institutions, to organizational fields and organizations, and

46

individual actors. Similarly, Lawrence & Suddaby (2006) proposed that the nested relationship at different levels involved particular forms of work “that connects institutions across levels, potentially drawing one level to create new institutions at another level” (2006, p. 248). Individual actors may be constrained or legitimized by occurrences or changes at the other levels (Hartley, et al., 2002; Lawrence & Suddaby, 2006; Ritvala & Granquist, 2008; Tracey, et al., 2011). In studying legitimacy, Meyer & Scott (1983) and Scott (2001) highlight the “vertical dimension” of legitimation from higher institutional levels for organizations – for example, through agents of the state or professional associations. But the multi-directional nature of institutional change is also important for legitimation, in particular early activities by actors to develop regular patterns of behaviour that evoke shared meanings to participants but which then are connected to broader cultural logics (Berger & Luckman, 1967).

The particular level at which institutional work occurs has been shown to be critical in the diffusion of institutional change, in particular, those of the professional institutions. As an example, in studying the change towards the art museum organizational form in the early 20th century, DiMaggio (1991) found that the change was strongly influenced by the professionalization of art museum workers and the diffusion of change occurred at the level of professional fields, not at the level of organizations. In discussing the importance of legitimation for organizations, Scott (2001) considers that agents of the state and professional organizations are critical for conferring legitimacy to organizations, through for example, conferring of certification or accreditation.

Studies have begun to investigate multilevel institutional entrepreneurship through the study of individual entrepreneurs (Ritvala & Granquist, 2008; Tracey, et al., 2011),40 defining the institutional micro-, meso-, and macro-levels in relation to their particular context. Lawrence & Suddaby (2006) had set three institutional levels: the micro-level of groups and organizations, the meso-level of fields associated with professions and industries, and the macro-level of broader society. Tracey, et al. (2011) introduced an individual level and termed it the micro- level. Both studies describe a local community level, however, in Tracey, et al. (2011) this is termed the meso-level, whilst in the Ritvala & Granqvist (2009) this is defined as the micro-level

40 Smets, Morris & Greenwood (2012) and Smets & Jarzabkowski (2013) also present a multi-level model of institutional change, however, they study the changes that emerge from everyday work rather than work by institutional entrepreneurs.

47

aligning with the Lawrence &Suddaby (2006) definition of scope. In terms of the macro-level, in the Ritvala & Granqvist (2009) study it is that of broader societal public policy both in Finland and globally that could be considered a meso-level of a professional level, whilst in the Tracey, et al. (2011) study the macro-level that of societal public sphere in the United Kingdom. These two studies are described in greater detail below.

In their in-depth case study of a social enterprise in the UK, Tracey, et al. (2011) present a multilevel model that highlights the specific work of entrepreneurs at the micro- (individual), meso- (organizational) and macro-levels (societal). In establishing a new social enterprise that provided employment for homeless people in England in the late 1990s, the institutional entrepreneurs acted between and across the individual, organizational and societal levels of institutional processes in order to establish the new institutional model. The entrepreneurs bridged two conflicting logics from different organizational fields—for-profit retail and nonprofit—to construct a new organizational form that combined aspects of both and established a new and accepted organizational form. Institutional work at each of the three institutional levels had a different and complementary role in the establishment of the organizational form. The study showed that institutional entrepreneurs use specific and different forms of institutional work at the three levels: opportunity recognition at the micro-level, organizational design at the meso-level, and organizational legitimation at the macro-level as shown in Table 1, and discussed below. The forms of institutional work align with the forms presented and discussed in section 3.3.

Table 1. Multi-level institutional work. Modified from Tracey, et al., 2011, Figure 1, p.64)

Aggregate Theoretical Dimensions Institutional work

1. Frame problem differently Micro-level (individual): Opportunity recognition 2. Develop a new solution through counterfactual thinking 3. Building the organizational template Meso-level (organization): Design of the new organizational form 4. Theorize the organizational template 5. Connecting with appropriate macro-level discourses Macro-level (societal): Legitimation of the new organizational form 6. Aligning with very legitimate actors

48

Institutional work on the individual micro-level occurred as opportunities for change were recognized by the two institutional entrepreneurs, Harrod and Richardson. This form of institutional work involved the identification and expression of a novel understanding of a problem and refocusing attention on alternative aspects of the issue (Tracey, et al., 2011). The entrepreneurs considered that homelessness resulted from an individual’s lack of basic skills to sustain employment. Through counterfactual thinking, the entrepreneurs came up with the novel idea to create an enterprise with a sustainable business model that would offer homeless people paid jobs, training, and support. The work of problem framing and counterfactual thinking was conducted at the individual level amongst the two of them. The motivation to build the new enterprise was rooted in their own experiences volunteering for charities that supported the homeless. Tracey, et al. (2011) found that the entrepreneurs’ “creativity and imagination allowed them to visualize a new institutional configuration that was counterintuitive and contrary to the accepted wisdom among homeless support organizations” (p. 70).

At the meso- level, Tracey, et al. (2011) found that the two institutional entrepreneurs engaged in an intentional effort to build a new organizational form by building the new organizational template and theorizing the new form to key business investors and not-for-profit stakeholders. Harrod and Richardson actively justified the new model and rationale for their enterprise to investors and how it would be a superior approach in meeting the needs of the homeless compared to conventional charity-based initiatives. In promoting their business model to local investors and potential customers, they emphasized the sustainable business model of their new enterprise as well as the social benefits it would generate for the homeless.

Tracey, et al. (2011) found that at the macro-level, Harrod and Richardson worked to legitimate their new social business model by connecting with a societal discourse on homelessness and aligning with highly legitimate actors. The work that the entrepreneurs had engaged in at the micro- and meso-levels, gave them the ‘right to voice’ in broader discussions on homelessness and social enterprise. Harrod and Richardson leveraged their position to legitimate their new enterprise and convince actors from multiple fields, tapping into discourses within the United Kingdom and, in particular, the recently elected New Labour government. By tapping into the macro-level narratives, Harrod and Richardson courted the national media and senior politicians to explain their new model for homelessness, highlighting the alignment of their new enterprise with the government’s rhetoric on the complementarity of social and commercial enterprises.

49

They also connected with senior business leaders in high profile companies that were interested in profiling and promoting corporate social responsibility. The two entrepreneurs actively worked to align themselves with highly legitimate figures in politics and government (for example, the UK Prime Minister, members of the Royal Family, and senior bureaucrats), the media, and in the business and nonprofit sectors. They joined government-organized working groups considering social enterprises, and leveraged these contacts and their own initial successes to further legitimate their own social enterprise. In time, politicians and business leaders began to cite their social enterprise as evidence of a potential successful model. Tracey, et al. (2011) suggested that these macro-level connections played a significant role in building the success for Harrod and Richardson’s social enterprise.

Tracey et al. (2011) documented how the two institutional entrepreneurs conducted institutional work on all three important levels, moving fluidly and iteratively between the levels. In particular, their continued work at the meso- and macro-levels was iterative in that they used the establishment of their own social enterprise that provided them with credibility at the macro- level and used the conferred legitimacy to further support their work at the meso-level. Tracey, et al (2011) also suggested that institutional entrepreneurship was less linear and predictable than is portrayed in the literature.

The consideration of scientists as institutional entrepreneurs and their practices also affords an opportunity to study multilevel models of institutional work. In order to manage projects and organize their activities to produce scientific results, scientists operate—often concurrently—at a variety of institutional levels: individually and/or with their research groups, at the level of their organization (for example, university, college, research agency), and at the level of their international disciplinary community (Crane, 1971; Melkers & Klopa; 2010; Wagner, 2009).41 They effectively oversee their own work, exercising professional judgment as very skilled individuals with internalized norms. They are employed within an organization. Scientists can travel internationally and interact with recognized peers; there are international organizations and associations for a broad spectrum of disciplines. Within their disciplinary community, they interact both formally and informally with peers and associations. Crane (1972) described the

41 Increased mobility and information and telecommunications have resulted in increased international co-operation and co-authorship (Wagner & Leydesdorff, 2005).

50

nature of the scientific community noting that the structure of informal scientific relationships change with time. At first, a particular research area may contain a few isolated members, who with time and production of research of interest, attract new members who may develop a long- term commitment to the research area and are very productive. In turn, these scientists train students and collaborate both with them and other scientists. Each of the productive scientists communicates formally and informally with others, producing a connecting network. Scientists also interact with non-academic actors such as policy makers, industry or the broader society.

One study has considered multi-level institutional change by scientists. Although not looking at the specific types of institutional work, Ritvala & Granqvist (2009) considered a historical case study of a pioneering heart health initiative in eastern Finland in the 1970s, specifically in terms of how the emergence of novel scientific findings and discussion in a global scientific context served as a catalyst for significant change locally and vice-versa. In the study of the adoption of heart disease community prevention programs, the researchers found that health sector institutional entrepreneurs engaged in grass-roots strategies at the local community level, defined as the micro-level, as well as in the structuring of institutions resulting in an impact on public policy in Finland and globally, defined as the macro-level. The authors found that engaging in change with many community-level organizations was critical, given that dietary habits were culturally and economically deeply rooted within communities but the entrepreneurs also worked with policy makers at the national level to change the regulatory policies.

Ritvala & Granquist (2009) were particularly interested in exploring the links between both levels. They concluded that in mobilizing across local institutions and global scientific communities, their study highlighted how the entrepreneurship of likeminded scientists created impetus for the emergence of a novel movement. These science entrepreneurs were able to mediate between the local level of the organization to the global level by strategically working communication channels both that of scientific journals and mass media. The former was critical as legitimation and institutional change occur at the level of the global scientific community, not the local academic organization. The authors also suggested that alliance building occurs across the local and global levels in science-based fields because scientific and intellectual movements are global in nature. Based on their findings, they suggested that the capacity of an agent to work to theorize and link local and global scientific communities was an important capability of science-based institutional entrepreneurship and that scientists that induce institutional change

51

through translating and mobilizing a novel idea are active not only in their own scientific fields but also in other sectors of society.

In this thesis, I aim to contribute to the emerging literature on institutional work and institutional multi-level change models by studying the institutional work of open scientists in establishing an open data innovation. I investigate whether the open scientists engage in different institutional work at different institutional individual, micro-, meso-, and macro-levels. To examine this, I propose that open data scientists that successfully establish an open data innovation can be considered as institutional entrepreneurs within the institution of science. The rationale for this consideration is outlined in the next section.

Chapter 4 Research Design

In this chapter, the particulars of the study’s research design are described including the participant selection, data collection and procedures. I describe the data analysis methods used to answer the research questions, as well as the possible limitations of the methodology. The literature review and theoretical constructs from institutional entrepreneurship and institutional work helped to shape the design.

The overarching research question for this study is to investigate the institutional work employed by open data entrepreneurs to initiate and establish open data innovations. Three sub-questions were explored in order to answer the main research question of how researchers initiate and implement open science paradigms within the institution of science:

What is the institutional work conducted by open data entrepreneurs in order to institutionalize an open data innovation?

What are the institutional levels at which open data entrepreneurs conduct institutional work?

What is the institutional work of the open data entrepreneurs at different institutional levels?

4.1 Overview and Rationale for Design

A qualitative case study methodological approach was selected as the study’s aim is to undertake an exploratory approach through interviews and examination of related documents and materials of the institutional work of the open data entrepreneurs. The informants are individual open data entrepreneurs that have implemented an open data innovation in the release of scientific data or tools.

The cases are individual open data entrepreneurs. The open scientists selected have developed and overcome resistance in order to successfully implement open data innovations and are

53 considered to be institutional entrepreneurs: individuals who have an interest in certain institutional arrangements and are responsible for leveraging resources to deliberately transform their existing institutions (Maguire, et al., 2004).

The primary data consists of five interviews that I undertook between January to July 2014. The research design involved on a content analysis of several sources including interviews with five open data scientists and related documented interviews of them, their own blogs, and both academic and popular media articles.

The case studies are bounded by several contexts, the open scientists themselves and their experiences in developing and establishing open data innovations. The bounding of a case study is consistent with a design that is exploratory in nature (Yin, 1989, 2009). A qualitative research design is appropriate for building and testing theory (Eisenhardt, 1989; Eisenhardt & Graebner, 2007) and well-suited to examining a real-life, contemporary phenomenon (Cresswell, 2013; Yin, 2009) such as open science that have not been studied. As well, Lawrence, Hardy, & Phillips (2002) suggest that qualitative approaches are appropriate for studying the dynamics of institutional change. The study of the dissemination of practices from an individual to macro- level calls for the use of qualitative methods to uncover how practices are promoted (Tempel & Walgenbach, 2007). The case study method is normally adopted to provide insights into specific individuals and multiple case studies can be used to develop theoretical constructs and/or midrange theory as well as to test theories (Eisenhardt & Graebner, 2007).

By studying the activities of different open science institutional entrepreneurs in detail, the study responds to calls for multilevel empirical research into institutional entrepreneurship (see Lawrence & Suddaby, 2006; Tracey, et al., 2011). The study’s intent is to understand the specific agency that open science entrepreneurs undertake to initiate and establish their open data innovations and to test the hypotheses that open scientists make use of different institutional work to accomplish their objectives at different institutional levels.

The institutional work is studied and illustrated through the selection of multiple cases of open scientists at different research sites in order to examine different perspectives (Stake, 1995). This intent to study a single issue through a multiple case (or collective) study is defined as an instrumental case study (Stake, 1995). The multiple cases are analyzed and compared. As the aim of

54

the research is to test the hypothesis in terms of individual scientists’ institutional work at different institutional levels, only scientists were considered.

The study is focused on an identified open data innovation and its establishment. Boundaries for the study—times, events, and process constraints—are related to the nature of the innovation that each researcher has established. The beginning and ending of the case study may not be clear as it relates to a specific open data project, but can be approximately constrained in terms of the scientists’ definitions of the initiation and establishment of a project. With respect to the later, the establishment of a project is considered to be complete when the project is accepted and used by others.

The research design included several steps: 1) open science participant identification and document analysis; 2) a pilot interview by skype and related document analysis; 3) interviews by skype with four additional open scientists, and related document analysis.

4.2 Data Collection and Analysis

4.2.1 Participant Identification

As a first step in the study, I conducted a review of the open science, data and methods landscape through a study of the scholarly literature, popular science articles, blogs, and awards. This allowed for the identification of scientists engaged in open data initiatives and recognized as open data leaders.

In case study research, purposeful sampling allows for the opportunity to select and learn from the most promising participants (Merriam, 1998).42 This sampling is more likely to provide relevant and rich data for replicating or extending theory in exploratory and case-based research (Yin, 2009). The sample in this study was limited to open data entrepreneurs and, as a result of

42 The term theoretical sampling is also used (Eisenhardt & Graebner, 2007) to denote cases that are selected because they are suitable for illuminating and extending relationships and logic among constructs” (Merriam, 1998, p. 27).

55

my literature review, I purposively selected a number of potential informants based on the criteria for institutional entrepreneurs expressed by Garud, et al. (2007): “To qualify as institutional entrepreneurs, individuals must break with existing rules and practices (divergent) associated with the dominant institutional logic(s) and institutionalize the alternative rules, practices or logics they are championing” (p. 961). The selected scientists had been described as breaking with traditional data approaches, and advocating and institutionalizing open data. Potential participants were identified from the pool of open data researchers working in the United Kingdom and the United States during the last decade as these two countries have a documented history of open science and open data publication policies and funding support during this time frame (Lasthiotakis, et al., 2015; de Silva & Vance, 2017). The selection of potential participants included the consideration of both unusual cases as well as similar cases, in addition to the practical aspects of convenience sampling in terms of available documentation on each potential participant. The sampling criteria are summarized in Table 2.43

Table 2: Characteristics of selected informants 1. Scientists who have practiced and championed open data as different from the dominant institutional logics and who have institutionalized the open data initiative. 2. Scientists from a range of organizations and disciplines. 3. Scientists were working within the US or UK when the initiative was established. 4. Sufficient documentation available for each participant and their initiative.

A data collection matrix in which the amount and type of information likely to be collected for potential participants was developed. Data for each potential participant consisted of interviews that had been conducted with them, YouTube videos, and articles and blogs they had written. A pilot case was selected as well and is discussed below.

One of the matters for consideration in designing a multiple case study is the ideal number of cases for inclusion. Although there is no one answer, researchers typically choose four to five cases in order to balance in-depth study of each case as well as generalizability across the cases (Cresswell, 2013). Eisenhardt (1989) considers that cases should be added until theoretical saturation is reached but also considers that 4-10 cases for a multiple-case study. With less than 4 cases it becomes

43 Although it has been noted that sampling can change during a study and researchers need to remain flexible, researches need to plan their sampling strategy as much as possible ahead of time (Cresswell, 2013).

56

difficult to generate much theory and its empirical grounding will be less convincing whereas with more than 10 cases, it becomes difficult to cope with the complexity and volume of the data. Creswell suggests that 4-5 cases allows for “ample opportunity to identify themes of the cases as well as conduct cross-theme analysis” (2013, p. 157). I decided to select five cases for this study from the pool of potential informants, after realistically taking into consideration barriers such as time constraints and the accessibility and availability of participants.

In addition to identifying potential participants, content analysis (Krippendorf, 2013) of these documents was conducted in order to provide insight as to the possible strategies employed by them to implement an open data innovation. The content analysis served to inform the framing of the interview questions for a pilot case study.

4.2.2 Ethical Considerations

An ethics protocol was adopted and approval sought through the University of Toronto Office of Research Ethics (ORE). Free and informed consent is a hallmark of ethical research. The University of Toronto Guidelines and Practices Manual for Research Involving Human Subjects in the development of the recruitment and informed consent letters for participants and the interview protocol.

Following ethics approval from the University of Toronto Research Ethics Board, potential participants were approached by email and asked to participate in the study (see recruitment letter Appendix 1). When a potential participant did not respond to the initial email, a second follow-up reminder was sent by email. If there was no response, another was selected from the potential list with the aim of preserving a diversity of disciplines and organizations represented by the participants. All data gathered from the participants was collected with participants’ explicit permission.

As the group of open scientists is relatively small and individuals are well-known in the scientific community, participants agreed to have their names presented in the thesis and in any subsequent publications arising from the research. The open scientists who were willing to participate in the study received an individual consent form by email prior to the interview

57

(Appendix 2). Thus, as part of the informed consent and interview processes, participants knew that, by participating in the study, their identities would be known.

The interview audio files and transcripts, case study reports, and case study database were stored on a computer within encrypted files in order to restrict access and maintain confidentiality. Data for each participant was kept in separate folders and dated/time-stamped for easy of retrieval.

4.2.3 Selection of Participants and Document Assembly

Potential informants, including the pilot study participant, were invited to the study. As per the identified characteristics of the selected informants (Table 2 above), the participants were scientists who have practiced and institutionalized an open data initiative, and who were working within the US or UK when the open data initiative was established. The invited informants reflected a range of open data innovations, institutions, and academic disciplines, partly because the nature of open data is that there are not many scientists within each type of activity and partly to be able to make use of both literal and theoretical replication on these dimensions (Yin, 2009). Such triangulation also supported the confirmation of the validity of the research process by relying on multiple sources of data (Yin, 1994). For each there was extensive archival material available through the internet.

A total of ten were invited to participate in the study during early 2014. Five open scientists, including the pilot case, agreed to participate in the study (Table 3). All five established, developed and promoted their innovations over the course of their careers before open data began to be entrenched in funding agency mandates in the mid- to late 2000s: Jean-Claude Bradley began releasing his research data through UsefulChem in 2005, developed, and promoted it until 2014; Daniel Gezelter began Jmol as an open source code as graduate student in 1998; Eric Kansa began Open Context in 2001 to publish archaeological data and then established the Alexandria Archive Institute at around the same time with the first substantive data set being published in 2007; Peter Murray-Rust co-founded the World-Wide Molecular Matrix source code and promoted open science initiatives since 2002; and Fernando Pérez began working on IPython source code as a graduate student in 2001.

58

Table 3. List of participants

Participant Position Institution Discipline Open Science Initiative (s) Year Initiated Dr. Daniel Professor Department of Biochemistry Open source: Jmol, OpenMD 1998 Gezelter Chemistry & Jmol Biochemistry, University of 2004 Notre Dame, OpenMD US Dr. Fernando Researcher Berkeley Data sciences Open source: IPython 2001 Pérez Institute for NumFocus 2011 Data Science, UC Berkeley, US Dr. Peter Reader Department of Chemistry Open data: World Wide 2002 Murray-Rust Emeritus Chemistry, Molecular Matrix (WWMM) University of Cambridge, UK

Dr. Jean-Claude Associate Department of Chemistry Open notebook science/Open 2005 Bradley (Pilot) Professor Chemistry, data: UsefulChem Drexel University, US

Dr. Eric Kansa Technology Alexandria Archaeology Open data: Open Context 2006 Director and Archive http://opencontext.org Open Institute, US Context (not-for-profit) Program Director

As the participants were being confirmed, documents relating to the scientists were assembled and analyzed in order to prepare for the interview. These documents were publicly available through the internet. Documents included several forms of qualitative data including articles and blogs written by the informants, informants’ presentations (written or video), interview transcripts of the informants, and articles others had written about the scientists and/or their initiatives. Multiple sources of data were collected as is the practice in qualitative case study research (Eisenhardt, 1989; Denzin & Lincoln, 2005; Yin, 2009). This multi-method approach in gathering empirical material added rigor, breadth, and depth in addition to allowing for enhanced understanding and

59

strengthening the internal validity of the study (Denzin & Lincoln, 2005; Yin, 2009). The List of Analyzed Documents for each participant is presented in Appendix 3.

This archival material complemented the research interviews, enabling a comprehensive analysis of the participants’ responses in terms of substantiating their stated behaviors, as well as providing insight into the history and development of their open science initiatives. In particular, the blogs of the open scientists provided insight into both the timing as well as motivations of the scientists. The blogging motivations and approaches of scientists have been studied recently (see Trench, 2012). In general science blogging has been considered as a means for scientists to communicate directly and casually with peers as well as the public (Batts, Anthis, & Smith, 2008; Butler, 2005) in matters not related to the science per se but also to the political dimensions of science (Trench, 2012). A survey study of medical bloggers found that the respondents’ motivations included both the sharing of practical skills and knowledge, as well as the desire to influence the way readers think (Kovic, Lulic, & Brumini, 2008). The same study indicated that these bloggers censored their thoughts and comments less than might be expected in a public setting.

A content analysis of the documents for each informant was conducted (Krippendorf, 2013). The analysis provided basic information on the open data innovations including the history of a particular initiative, as well as insights into the informants’ work in implementing their initiatives and their chronology. The analysis indicated that the scientists had engaged in institutional work at the level of themselves as individuals, within an organization, a scientific community, and, for a few, within broader society. This information was important in preparing for the interviews, and for the later analysis and further detail and evidence for either corroboration or contradiction of the data collected through the interviews (Merriam, 1998).

4.2.4 Pilot Study

The pilot study, including document analysis and a semi-structured interview with Jean-Claude Bradley by Skype and related document review, was conducted early in 2014. The main goal of the pilot was to test whether the research questions were relevant to the informant’s implementation of his open data innovation. The focus of the questions was broad. The interview confirmed that the scientist had, in fact, engaged in institutional work at different institutional levels to implement the innovation. His responses to the open-ended questions were rich in indicating the work he had

60

undertaken to implement his initiative. He provided insights into the realities and obstacles related to promotion open paradigms in science. The interview indicated that the scientists had engaged in work at the level of himself as an individual, organization, scientific community, and broader society.

The interview also confirmed that the overall nature of the interview questions were appropriate. Two of the questions were refined in order to better focus the responses.

4.2.5 Interviews with the Open Science Entrepreneurs

One-hour semi-structured interviews were conducted with the five open scientists between January and July 2014 in order to deepen the understanding of their work strategies. Each interview typically lasted between 1 and 1.5 hours. In the interviews, participants were encouraged to elaborate on their work, how they approached advancing their open data innovation, and how their activities unfolded from themselves as individuals, and other institutional levels they viewed as important. The goal of the interviews was to obtain from the participants their own accounts of how they advanced their own innovation.

As the participants were located in other countries, the interviews were conducted by Skype, with video enabled for 4 interviews and audio only for one. Accommodation was made to ensure the interview times are appropriate for the time of day in the participants’ local area.

The interviews were digitally audio recorded. During the interviews, handwritten notes were taken in order to both extend the questions or to serve as personal notes in the form of questions for further investigation. I conducted a full review of the existing documentation before the interview with each participant so that if any contradictory information arose in the interview, clarification could be sought (Yin, 2009).

During the interview, participants were provided with verbal overview of the project, asked if they had any questions, and then were asked the interview questions. The format was semi- structured with open-ended questions. This design allows for both consistent interview questions, in general, and also allows for flexibility to engage in natural conversation for deeper understanding and ability to ask new questions if new areas come up to be explored (Yin, 2009). The interview questions focused on understanding the research question and sub-questions as reflected in this proposal, i.e. the institutional work employed by open scientists, whether there

61

were different levels, and whether work was employed at these institutional levels. Participants were first asked questions regarding their perception of the organization and scientific institutional norms at the time they conceived their own open science innovation. Then they were asked to describe the work they took to establish it and what obstacles they encountered. In order to test whether institutional work was related to different institutional levels, one question was explicit on asking about work taken with within their own work/lab, academic department, the scientific community, and within the broader society. Researchers were also asked if there was a temporal order to their institutional work. Attention was paid to the tone of the interviews, as well as key points that seemed to be important for them. Information indicating whether the scientists consciously engaged in agency was also noted and probed if not mentioned by the scientist.

A map outlining how the interview questions align with the research sub-questions is presented in Table 4. The conceptual framework was used to ensure the issues of interest were being discussed in a way that provided helpful information but not bias or lead the responses. The questions were developed with consideration of advice noted in Kvale & Brinkman (2007) and Yin (2009). The interview protocol is presented in Appendix 4.

Table 4. Alignment of interview questions with research questions Research Question Interview Questions

What is the institutional work What were the data publication norms for your discipline at the conducted by open science time you were establishing your innovation? entrepreneurs in order to What were the important steps or actions you took to establish institutionalize an open data your [open data innovation] from the idea to the establishment innovation? stage? What was your role in seeing these steps or actions through? What obstacles did you encounter? Did you overcome them? If so, how? What are the institutional levels at Were there steps or actions you took yourself, within your which open science entrepreneurs organization/department, within the scientific community, conduct work? and/or broader society What is the institutional work of the Was there a temporal order to the actions you undertook to open data entrepreneurs at different establish your initiative? institutional levels? Why did you feel these actions were necessary and significant?

62

The participants were also asked if there are any other public domain documents that could be available to understand the project and its establishment. These could be proposals, technical reports, budget or administrative documents.

The recorded interviews were transcribed. The transcripts were provided to informants for review and checking, as well as to provide any further responses to the interview questions. Participant checking of the transcripts was conducted for purposes of verification and validation of the transcribed information (Merriam, 1998; Stake, 1995). One participant, Jean Claude-Bradley, passed away in May 2014, a few months after the interview and verification of his interview transcript.

Data was collected from sources in addition to the interviews. The key principles and advances of the informants’ initiatives were documented in academic journals and in popular articles, which assisted in the verification of data consistency. Given the scientists’ ethos of openness and transparency, the initiatives are particularly well documented in these sources, as well as their own personal blogs and statements from organizations and association documents. Appendix 3 presents the additional data sources used in the study.

4.2.6 Data Analysis

The primary interview and secondary archival data were analyzed and interpreted through several steps (Creswell, 1998) that included preparation, reading, detailed analysis, description and interpretation. Using this method, data collection and analysis was ongoing throughout the study, iterated between data, literature, and tentative findings, thus allowing for rich exploration of the institutional work of open scientists.

As a first step, the transcripts were carefully read, along with the documents for each open scientist. This allowed for me to get a general sense of the information collected, any potential similarities and differences, and the points that were key for each participant. Notes were made in the document margins, noting of key words and phrases, as well as examples of institutional work and levels.

With respect to the institutional levels, four levels were identified as ones for which the informants had engaged in institutional work: individual, organization, scientific community, and broader society. These aligned with the levels found in my literature review (see Section 3.3) that identifies institutional work at the level of the individual (Tracey, et al., 2011), local groups and

63 organizations (Lawrence & Suddaby, 2006; Ritvala & Granqvist, 2009; Tracey, et al., 2011), fields associated with professions and industries (Lawrence & Suddaby, 2006; Ritvala & Granqvist, 2009; Thompson, et al., 2015), and broader society (Lawrence & Suddaby, 2006; Tracey, et al., 2011). These were coded as: individual, micro-level organization, meso-level scientific community, and the macro-level societal.

Documents for each scientist, including articles they had published or that had been published about them, were coded according to the institutional level target audience. For example, individual level institutional work was coded as such when the thoughts and actions of the informants elucidated their own thought or action processes were labelled as individual level. Instructions to students for a teaching course or for a research lab process and/or interactions within their academic department or organization were labelled as micro-level organization. Articles published in discipline-related (for example, related to chemistry for Jean-Claude Bradley) or general science journals or interviews were coded as the institutional meso-level scientific community. Information published as part of personal blogs and wikis was more complicated as the audience for blogs and wikis ranged from micro-level organization to meso- level to macro-level. These were coded as micro-, meso- and macro-level depending on the contents.

The analysis was conducted throughout the process of data collection through constant comparison thematic analysis followed by cross-case analysis (Creswell, 1994; Merriam, 1998; Stake, 2006; Yin, 2009) and logged within the case study database. The documents (interview transcripts, documents, notes, and artifacts) were manually coded for meaning. Through the process of coding, data reduction and analysis, I organized the data in order to allow for the generation of preliminary meaning by noting patterns and themes across the information collected in order to arrive at comparisons and contrasts to determine test the multilevel model and potentially rule out rival models (Miles & Huberman, 1984).

A detailed data analysis of each transcript and data source was conducted in order to identify strategies and levels for each scientist. In terms of understanding the institutional context, content analysis was used to identify key terms important within the institutional documents. Sections of the data assigned codes by open coding, i.e. assigning codes according to what the informants had noted. This detailed data analysis was conducted by reading each document to

64

see whether the scientists indicated that they had engaged in strategies and work as defined in the literature. The informants’ agency was coded for institutional work such as theorization, counterfactual thinking, advocacy, etc. In terms of institutional work at different levels based on the multilevel institutional entrepreneurship model, the data were analyzed to see whether the informants had engaged in institutional work at multiple levels as identified above.

Information gleaned from the analysis of the interview transcripts was compared and cross- referenced with information garnered from other documents related to each participant. This enabled the identification of any themes or issues within each case (within-case analysis). The documents were then read again to see if any labels had been missed or any others emerged. A case study report was developed for each participant. In addition to individual case study reports, and in order to increase the reliability and accessibility of the findings, a formal case study database was developed so that evidence could be reviewed directly across all the cases (Yin, 2009). The database was organized according to the individual case and research questions. It includes data resulting from the interviews as well as from the study of additional documents. Notes from interviews were not edited or rewritten when included in the database (Yin, 2009). The main purpose of the latter information was to document the connection between the various pieces of evidence and case study issues. These notes were helpful in analysis and in my writing the final results for the study.

Following analysis of each participant’s data, a cross-case comparison was performed in which similar strategies as well as unique strategies were identified (Yin, 2009). These cross-case findings were then described and presented. When the process of coding for meaning was complete, I rechecked the concepts against all the interviews and data collected to seek instances that contradict my insights. The data was reexamined to identify initial concepts and the institutional work and levels.

In the second stage of the analysis, I re-examined the data to identify initial concepts and group into work themes. This process proceeded iteratively, moving among data, emerging work patterns, and the literature until the data were refined into institutional work at the four institutional levels (Eisenhardt 1989).

65

4.3 Description of the Participants

As described above, five open science entrepreneurs participated in the study. The participants, reflecting a range of disciplines, institutions, and initiatives were selected in order to be able to make use of both literal and theoretical replication on these dimensions (Yin, 2009) with the expectation that each would bring a unique perspective with respect to their open science efforts. The participants were scientists who practiced and institutionalized an open data innovation working within the US or UK.

As noted in Table 4, the informants have created and helped to found a range of open data innovations (open notebook, open data, and open source) in a variety of academic disciplines (chemistry, archaeology, data sciences). Each open scientist has a background in academia, receiving a PhD, with four holding positions at a university and one working at not-for-profit research organizations. Brief biographies and the importance of their open science initiatives are included below.

Dr. Jean-Claude Bradley – UsefulChem

Through his development of Open Notebook Science, Dr. Jean-Claude Bradley, institutionalized new practices for how chemistry research experiments and resulting data and results are disseminated to the scientific community. He stated that he severed ties with his nanotechnology research colleagues that did not share his views on open data. He constructed the new practice of open notebook science, taking advantage of the internet, social media, and database-enabling technologies. He motivated others to collaborate by initiating change. His open notebook practices were institutionalized with the formation of a public web site, and resources marshaled for an Open Notebook Challenge that resulted in hundreds of contributors and advocates (Williams, 2008). Dr. Bradley was interviewed many times (for example, Bradley, 2006; Coturnix, 2008; Drahl, 2009; Poynder, 2010; Sanderson, 2008; Udell, 2008) and strongly advocated for open paradigms in science (Bradley, et al., 2009; Bradley, et al., 2011; Bradley, 2013). Open Notebook Science, along with other forms of web-based tools such as wikis and blogs, have been described by others as “positioned to change the way that research collaborations are initiated, maintained and expanded” (Hohman, et al., 2009, p. 261).

66

He received his PhD in organic chemistry from Laurentian University. He was appointed as an assistant professor of chemistry at Drexel University in Philadelphia, US in 1996, after having served as postdoctoral researcher at Duke University and College de France in Paris, and was later promoted to Associate Professor (2003). Beginning his academic career in the field of nanotechnology, he worked in the traditional science mode, publishing articles and acquiring patents in the areas of synthetic and mechanistic chemistry and nanotechnology (for example, Bradley, 1997; Korneva et al., 2005; Rossi, Ye, Gogotsi, Babu, Ndungu, & Bradley 2004). In 2005, he began to consider that his research could have greater impact in a more open research environment and began to work on the synthesis and testing of new anti-malarial agents, publishing his lab notebook online. He stated that he ‘cut ties’ with collaborators who did not share his views (Poynder, 2010).

In 2005, Dr. Bradley created UsefulChem44 as an online notebook to make the scientific process as transparent as possible by publishing all his research work online in real time. He coined the term Open Notebook Science (ONS) in 2006 to describe the approach of making all primary records of an experiment (including failed and successful experiments) available to all online (Bradley, et al., 2007). This includes placing the personal notebook of a researcher online, along with raw and processed data, and any associated material as it is recorded.

The UsefulChem format is that of a wiki with links to relevant raw data, blogs and posts. Bradley’s idea was to discover and work on urgent chemistry problems and report on the research in a transparent way. On UsefulChem, the raw details of every experiment being worked on in his lab were made available freely within hours of production. Bradley included all the generated data, including that related to failed experiments. Bradley noted that: “The principle is that if anyone wants to find out what happened in any experiment we have done, they can simply go to the wiki and review all the details. And if the experiment included a calculation, they are automatically directed to the Google Spreadsheet containing the data” (Poynder, 2010). Bradley frequently chronicled the research of UsefulChem on a public blog,45 along with general comments related to open science and chemistry. Bradley also used the public wiki medium to

44 www.usefulchem.wikispaces.com 45 http://www.blogger.com/profile/6833158

67 challenge the publishing community. He wrote a paper on a wiki and incorporated links to experiment pages on an online notebook wiki as valid references. Public exposure to research results before publication is not the norm in traditional science publication, but Bradley noted that preprints are already hosted on institutional repositories (Williams, 2008).

Bradley was a vocal and active advocate of open data and open science. He was an acknowledged leader in promoting open science (Williams, 2008).46 Open notebook science practitioners now practice in fields such as theoretical physics47 and genomics (Carter-Thomas & Rowley-Jolivet, 2016). Many acknowledge the pioneering work of Bradley. As an example, open notebook science researcher Professor Carl Boettiger, Department of Environmental Science, Policy and Management at University of California Berkeley, includes on his website the open notebook science logo created by Dr. Bradley’s research team.48 Summaries and links to several hundred active online open notebooks and experiments can be found within the sites such as OpenNotebookScience Challenge49 and OpenWetWare50 that allow for sharing of electronic lab notebooks in the biological sciences and engineering. He was acknowledged as a “Hero of Open Notebook Science” (Murray-Rust, 2014)51 and an “Open Science Evangelist” (Shaikh-Lesko,

2014).

Dr. Daniel Gezelter – Jmol, OpenMD

Dr. Gezelter is the founder of several open science initiatives and can be considered as an open source software (OSS) developer in that he designed, developed and managed software applications that were released for free for continued development and use of by others (Quint- Rapoport, 2012). He received his PhD from the University of California Berkeley, then was

46 In 2013, he was invited to present a poster on ONS at the White House as part of its Open Science Poster session. 47 For example, Garrett Lisi http://www.deferentialgeometry.org/ 48 http://www.carlboettiger.info/index.html 49 http://onschallenge.wikispaces.com/list+of+experiments 50 http://openwetware.org 51 https://blogs.ch.cam.ac.uk/pmr/2014/05/19/jean-claude-bradley-hero-of-open-notebook-science-it-must-become- the-central-way-of-doing-science/

68

hired as an Assistant Professor in the Department of Chemistry and Biochemistry at the University of Notre Dame. He was promoted to Associate Professor in 2005, and full professor in 2015. Around 2008, he began to write and advocate for open science, and created an ‘open science’ tag on his blog.

Gezelter first created Jmol 52as an open source molecular visualization tool. He began Jmol as an unfunded project in 1998; since 2002 it has been supported by five lead developers. It subsequently became the standard for showing protein structures at the RCSB Protein Data Bank and is widely used in chemical instruction, databanks, and a number of chemistry journals (blog Gezelter 7 Nov 2012). A text has been published to guide users on how to use the tool in order to study and present molecular structures (Herráez, 2008). The Jmol website lists over 120 websites that use Jmol,53 as well as references to scientific articles whose authors have made use of the program in their analyses and teaching (for example, Glasser, Herráez, & Hanson, 2009; Godbeer, Al-Khalili, & Stevenson, P. D.,2015). Writing in the Journal of Applied Chemistry, Hanson entitled the paper “Jmol – a paradigm shift in crystallographic visualization” (2010, p. 1250), described its “dedicated community of users and developers”. In 2004, Gezelter established OpenMD,54 an open source molecular dynamics engine that simulates liquids, proteins, nanoparticles, interfaces, and other complex systems, which has also been adopted for use by the scientific community (for example, Elkhedim, Benard, Bronz, Gavrilovic, & Bonnin, 2016; Thamali, et al., 2915).

Gezelter is an advocate for open science, setting up and directing the Open Science Project (openscience.org) in 1999 in order to catalyze open source development in science. Others have noted that Gezelter is “giving away source code for our simulations and spreading information about our lab protocols so that all sorts of scientists can use this information” (Funnell, 2010). He is sought by the media to comment on other open science initiatives (for example, Owens, 2016).

52 http://jmol.sourceforge.net 53 http://wiki.jmol.org/index.php/Websites_Using_Jmol 54 http://openmd.org

69

Dr. Eric Kansa – Open Context

Dr. Kansa is the founder of Open Context55 an online open data and source publishing system for archaeology and related disciplines that is considered a ‘pioneer’ in open archaeology (Wilson & Edwards, 2015, p. 2). Open Context provides web-based tool for researchers and collections managers to upload, markup and publish archaeological and museum collection datasets that can be browsed, searched, and analyzed. Open Context reviews, edits, and publishes archaeological research data and archives data with university-backed repositories, including the California Digital Library.

Kansa received his PhD from Harvard in 2001. He was appointed as a lecturer and undergraduate tutor at Harvard from 2001-03, and then as an adjunct professor at the University of California Berkeley School of Information from 2007-2010. His main areas of research are in web architecture, service design and how these issues relate to the social and professional context of the digital humanities and social sciences; policy issues relating to intellectual property, including text-mining and cultural property concerns. He is a principal investigator and co- investigator on projects funded by the National Science Foundation, the William and Flora Hewlett Foundation, the US National Endowment for the Humanities, the Institute for Museum and Library Services, Hewlett-Packard, the Sunlight Foundation, Google, the Alfred P. Sloan Foundation, and the Encyclopedia of Life. He is a member of the Board of the Shelby White and Leon Levy Program for Archaeological Publications, a granting program that funds archaeological publications.

While completing his PhD, he established the Alexandria Archive Institute in 2001 as a non- profit organization to help preserve and share archaeological data. Leaving academia following completion of his PhD, he founded Open Context in 2006-07. He notes that Open Context was built on the idea of the Archaeological Data Service (ADS) in the UK that was a trailblazer in building a repository of archaeological data and media. Nothing had been available at an institutional level in the US (Wilson & Edwards, 2015).

55 http://opencontext.org

70

As measures of its legitimation, almost 200 users are formally registered with Open Context, and as of January 2015 it contained 54 projects comprised of records derived from one or more associated datasets (Sheehan, 2015). In 2011, the National Endowment for the Humanities (NEH) listed Open Context as an example of a venue for data storage to fulfill its new data management plan requirements. Two archaeological journals, the Journal of Open Archaeological Data and the Internet Archaeology, recommend Open Context as a repository for their peer-reviewed data (Kratz & Strasser, 2014; Sheehan, 2015). In reviewing electronic research tools for archaeologists, Ross, et al. (2013) describe Open Context as one of the “high- quality online archives and publication services for archaeological data” (2013, p. 107), while Sheehan (2015) notes that Open Context has emerged as a key player “in the development of technology and Web platforms for preservation and public online access to archaeological research data” (2015, p. 173). Sheehan also noted that Open Context had become increasingly more visible in the US archaeological community, “through presentations and marketing at conferences, recognition by professional associations, and articles published in archaeological journals and newsletters” (p. 176). Open Context has also partnered with the National Association of State Archaeologists in an NSF-funded initiative to create a Digital Index of North American Archaeology (Sheehan, 2015).

Kansa is an active and vocal advocate of better ethics and practices in sharing and preserving knowledge of the past. He is acknowledged as having assisted to change the norms that have transformed practices in his field to become more open, increasing access to archaeological data (Huggett, 2014, p. 2). In 2013, Kansa was announced as one of thirteen Champion of Change for

Open Science:

Eric Kansa is an archaeologist and a computer geek, with a passion for making our knowledge of the human experience and our shared cultural heritage, available for everyone to explore and debate. Frustrated with the pervasive lack of access to quality research data in the humanities and social sciences, Eric spearheaded the development of Open Context (http://opencontext.org), an open access publishing venue for data in archaeology and related fields.

(White House website, http://www.whitehouse.gov/champions/open-science/eric-kansa,-ph.d)

71

Dr. Peter Murray-Rust – World Wide Molecular Matrix (WWMM); chemical markup language

Dr. Peter Murray-Rust has helped to create a number of open innovations. Receiving his PhD from Oxford University, he became a lecturer in chemistry at the University of Stirling. In 1982 be moved to the Glaxo Group Research at Greenwood to lead Molecular Graphics, Computational Chemistry working in computational chemistry and structural bioinformatics. From 1996-2000 he was Professor of Pharmacy at the University of Nottingham, setting up a Virtual School of Molecular Sciences. He is a Reader in Molecular Informatics at the University of Cambridge, studying informatics in molecular science. He is associated with the Centre for Molecular Informatics which is part of the Chemistry Department at Cambridge.56 His research is in molecular informatics – bringing tools from computer science to chemistry, biosciences and earth sciences, and information management. He is formally retired but has a group in Cambridge developing software for extracting chemistry from publications.

In 2001, along with colleagues in the chemistry department at the Cambridge University, he created the World Wide Molecular Matrix (WWMM), an online, free, peer-to-peer system and electronic repository for unpublished chemical data. It includes data on over 250,000 molecules (Murray-Rust, Adams, Downing, Townsend, & Zhang, 2011) and has been cited by researchers as a valuable open repository (for example, Costa, Qin & Wang, 2014; Frey & Bird, 2011; Glen & Aldridge, 2002; Karthikeyan, Krishnan, Pandey & Bender, 2006). In 2005, Murray-Rust founded the Blue Obelisk, an informal group of chemists who promote open data, open source, and open standards (O’Boyle, et. al., 2011). In 2010, he co-authored the Panton Principles for Open Data in Science57 with Cameron Neylon, Rufus Pollock, and John Wilbanks. His research group develops peer-to-peer systems for publishing molecular information at source so it becomes freely available. In 2014, he launched the Content Mine,58 an automated code machine-

56 http://www.ch.cam.ac.uk/about 57 http://pantonprinciples.org/ 58 http://contentmine.org

72

content-mining that aims to “liberate 100,000,000 facts from the scientific literature” (Price, 59 2015).

Murray-Rust promotes is a very active promoter of openness in science, particularly open data. He is a member of the Open Knowledge Foundation Advisory Board that aims to promote open knowledge “which anyone is free to use, re-use and redistribute without legal, social or technological restriction" (Murray-Rust, 2008). In 2011 a symposium around his career and visions was organized, called Visions of a Semantic Molecular Future.60 In 2014, the UK Shuttleworth Foundation awarded him a Fellowship to develop automated mining of science from existing published literature. He has openly published his grant proposals (Martone, et al., 2016).

Murray-Rust was an acknowledged leader in this advocacy that resulted in the UK government introducing text and data mining copyright exceptions for non-commercial research purposes in 2014 (Mounce, 2014; Poynder, 2008b, 2012).

Dr. Fernando Pérez – IPython

Dr. Fernando Pérez is the founder of IPython (for Interactive Python)61 an open source program for interactive computing, data analysis and visualization, essentially the computing equivalent of an online lab notebook. He received his PhD from the University of Colorado in 2002. He is a research scientist at the Henry H. Wheeler Jr. Brain Imaging Centre at UC Berkeley who works at interface between high-level scientific computing tools and the mathematical questions that arise in the analysis of neuroimaging data. He has a strong interest in building tools for scientific computing. He is the recipient of grants from the NSF and the Sloan Foundation. He is a senior fellow of the Berkeley Institute for Data Science (BIDS) that launched in 2013.

59 http://blogs.ch.cam.ac.uk/pmr/2014/06/13/we-launch-the-content-mine-in-vienna-interviews-talks-and-our-first- public-workshop/; Youtube post Nov 2015 UKSGLIVE 60 http://www.jcheminf.com/series/semantic_mol_future 61 http://ipython.org

73

Pérez began IPython in 2001 while a graduate student. For his research on quantum field theory, Pérez brought together his various computer codes and data-analysis tools and, using the Python programming language, he created IPython, an open source, integrated platform to type code, run his analyses, plot and visualize data and include graphics within a single system. Released in 2011, the IPython Notebook, the online notebook version, provides an internet-based computational environment for code execution, text, math, plots and media. Pérez has stated that “what we are trying to contribute is having a tool that provides a very fluid experience, so when scientists are working with their data and they are trying to understand a problem, they are as close to the data and the results and experience as possible, with as few barriers between the code they're trying to type and the [obtaining] of results. That then allows them to communicate

whatever insights they obtain with others” (Krill, 2014).

Pérez is an active advocate of open source and open scientific tools. In 2012, he co-founded the NumFocus Foundation. In 2012 he was awarded the Award for the Advancement of Free Software (Ravven, 2013). Researchers have cited their use of IPython (for example, Stevens, Elver & Bender, 2013) and published papers directly from running data on IPython (Howe & Chair, 2015; Shen, 2014). IPython has been very successful with several universities offering courses and/or adopting in IPython (University of California Berkeley, Harvard University, the Massachusetts Institute of Technology, and Columbia University) and texts have been educational texts have been published for its use (for example, Martins, 2014; Rossant, 2014). In a 2013, a survey of its users was conducted by IPython and revealed that it is primarily used in the US, with 455 users were reported from 48 countries.62 The book and online project, Mining the Social Web (Russell, 2013), profiles almost 130 examples with the IPython Notebook. Microsoft has provided funds to sponsor its development. In 2012, IPython was awarded a $1.15 million grant from the Alfred P. Sloan Foundation towards the funding of the core team.

62 https://ipython.org/usersurvey2013.html

74

4.4 Limitations of the Study

A limiting factor in this research project was the biased nature of the sample of participants selected for this study. The participants included scientists working only from the US and the UK, and only in certain research fields may not have similar views as other open scientists (Creswell, 2013). By selecting participants from a variety of organizations, different disciplines, and different open science initiatives, the aim was to mitigate potential biases in the sample of participants. However, five open scientists selected to participate in the study did not respond to the invitations.

The information collected through the interviews reflected their own personal perceptions. In order to minimize the latter threat, different methods of data collection (triangulation) including checking articles, other interviews, and articles by others on the open scientists and their initiatives were used in order to cross-check interpretations as well as seeking validation of the data from participants regarding the accuracy of responses, emerging themes and categories (both during the interview and afterwards) (Creswell, 2013; Denzin & Lincoln, 1994). I also read technical literature to understand more clearly the specific subject areas of the informants (Yin, 2009).

A primary theoretical threat to validity is to the inaccurate representation of what the participants said. Recording and accurately transcribing the interviews verbatim, as well as taking detailed and chronological field notes during the interview process were ways that were introduced to mitigate this threat. Threats to interpretation validity may occur as a result of my own biases and beliefs in how I approached the participants during the interviews and in the data analysis.

Various measures to ensure the validity and reliability of qualitative research studies are described in the literature (Yin, 2009). Validity refers the degree to which a study measures what it is supposed to measure. Reliability refers to the ability to demonstrate that the operations of a study—such as the data collection procedures—can be repeated with the same results.

The methods of data collection reflect procedures to ensure the validity and reliability of the study. In terms of validity, multiple sources of evidence (cases) were used to ensure triangulation. Transcripts were presented to the participants for checking. In terms of reliability, I provide “thick” and rich descriptions of the interviews and document analysis (Yin, 2009).

75

The data collected from the interviews are unique in their scope, providing first-hand perspectives on the institutional work of these open scientists. The findings, linked to the research questions and to the theoretical framework, are presented in Chapter 5.

Chapter 5 Findings: Multilevel Institutional Work

Chapter 5 presents the institutional work of the open data entrepreneurs, examining what types of institutional work they engaged in at the four institutional levels. The chapter addresses the research questions of the study: 1) What is the institutional work conducted by open data entrepreneurs in order to institutionalize an open data innovation? 2) What are the institutional levels at which the open data entrepreneurs conduct work? 3) What is the institutional work of the open data entrepreneurs at different institutional levels? The institutional work and themes are presented as related to the institutional levels. From the analysis, the distinct outcome of the informants’ institutional work at each level are highlighted.

5.1 Individual Level: Opportunity Recognition

The informants were asked to provide the important steps in the establishment of their innovation, including at the stage the idea came to them. As a start, they were also asked to elaborate on the existing norms for data dissemination in their research discipline. This level was labeled the individual level, aligning with the multilevel institutional research of Tracey, et al. (2011) that identified distinct institutional work at the individual level. The data analysis of the interviews and archival data revealed themes that I then grouped into two institutional work types. All five open science entrepreneurs performed these two distinct types of work institutional work at the level of the individual: 1) theorization – specification and justification of the need for a change, and 2) counterfactual thinking. The outcome of this work resulted in opportunity recognition for their open data innovation.

At the time of their individual-level work, Pérez was completing his doctoral dissertations at the University of Colorado, Gezelter was conducting a postdoctoral fellowship at Columbia University, Kansa was engaged in a lectureship at Harvard University, and Murray-Rust and Bradley were faculty members at Cambridge University and Drexel University respectively.

77

The institutional work at the individual level is summarized in Table 5 and discussed below.

Table 5: Opportunity recognition: Individual level institutional work Institutional Work

Theorization: Specifications and justifications for the need for change Traditional publication of research data morally inadequate for science and society as process lacks transparency and verifiability. Traditional methods are impeding collaboration, data evaluation, and efficient progress of science in sharing results that may be useful. No formalized structure for sharing data in place that is easy to use. Diagnosis of problem; how it has come about Open data is key for solving of scholarly questions; securing the reproducibility, transparency, and verifiability findings New information and communication technologies can and should be harnessed

Counterfactual thinking to identify alternative solutions Consideration of openness other areas (open source software)

Institutional work: Theorization – Specifying and justifying the need for change

For the informants, the motivation to create a new open data innovation was rooted in their experiences as scientists that arose through their ongoing scholarly work as they sought solutions to their own needs. They engaged in theorization to specify the need for change through identification of the problem of traditional science. The term theorization, or framing, refers to the practice of generating and developing an account of the faults of a current practice and championing alternative practices as more effective/efficient, necessary and/or culturally appropriate (Greenwood, et al., 2002; Hardy & Maguire, 2008; Strang & Meyer, 1993; Suddaby & Greenwood, 2005). During the data analysis of the interviews and archival documents, four main specifications emerged and were coded as theorization: 1) The informants expressed concern that the traditional publication of research data was not serving science and society morally in terms of transparency and verifiability, and benefiting society; 2) They identified pragmatic issues with traditional methods that impeded collaboration, data evaluation and progress in science; 3) They pointed out that there was no formalized structure for sharing data that was easy to use; and, 4) They diagnosed the problem and considered how the closed nature of data dissemination had come about. The justified open data as key for solving of scholarly

78 questions; securing the reproducibility, transparency, and verifiability findings, and that information and communication technologies can and should be harnessed.

On the one hand, the open scientists specified the problem of traditional data dissemination as an altruistic and moral one. They stated that the conduct of science should be selfless in that the results of the scientific work ultimately belonged to and should benefit society, not the individual researcher, academic organization or private sector firm. This aligns with research on data sharing by scientists in which scholarly altruism is found to be a predictor of data sharing (Kim and Stanton, 2015). The informants were clear in highlighting the moral dimensions of the secret or closed nature of traditional scientific data dissemination: In thinking about what has meaning for me as a scientist, I realized that the work I was doing wasn’t having the kind of impact that I would like it to have, and it was not benefitting mankind in the way I would have hoped. I concluded that this was partly a consequence of secrecy. (Bradley in Poynder, 2010)

I felt strongly that data of this sort should by right belong to the community and not to the publisher. (Murray-Rust, 2008)

The informants also considered that all the data and source generated through scientific research was of pragmatic value to the scientific community. They considered that traditional science dissemination research was counterproductive to the efficiency and progress of science itself as only the final, generally positive, findings were reported. They specified the problem of traditional science in terms of the effectiveness of current norms, impeding transparency and collaboration, data evaluation and reproducibility, and the progress of scientific work. Several of the informants discussed this point quite explicitly:

Donoho made the argument at the time, and I am going to paraphrase here a quote that is really easy to find, that the scholarly contribution of a scientific computational publication is not the article itself; it is the code and the data that goes into the idea. The paper is merely an advertisement for the scholarly work. When publications are made, it is the code and the data for the readers to be able to replicate the work… And that’s sort of the context where I started. (Pérez, Interview 2014)

That really most of what I did [previous nanotechnology] was not benefitting the community because it wasn’t shared with anybody. (Bradley, Interview 2014)

The specification of traditional science dissemination of research findings arose through the informants’ ongoing scholarly work as they sought solutions to their own needs. The informants were frustrated that there was no formalized and accessible way for data and source to be shared. Kansa noted that there was a formal structure for archaeological data dissemination in place in

79 the U.K., but not in the U.S.. Both Gezelter and Kansa noted similar programming tools existed (Xmol and Mathematica/Matlab respectively) but were clear that these were undesirable or unavailable given their closed nature:

Very little in the way of sharing structured [archaeological] digital data before that point. For the most part, the main trailblazer was the Archaeological Data Service which is in the UK. That was more or less it. There may have been a couple of departmental web sites, small individual projects that had downloadable spreadsheets and that sort of thing. But not much in the way of anything that was institutionalized, except for the ADS. (Kansa, Interview 2014)

An important aspect of specification is elaboration or the diagnosis of the problem in terms of how it has come about (Greenwood, Suddaby, & Hinings, 2002). The informants recognized that there were issues with alternative methods of publishing research data in that they involved greater work on behalf of a researcher:

There had been a long history of sharing code in the chemistry community that had not been formally open source. There was something dating back to the 70s called the Quantum Chemistry Program exchange which had been just a bunch of computational chemists sharing code. But there were usually costs associated with it. The source code always came with it. If you bought it, you got a tape with the source code on it. So, chemistry has always expected some of the source sharing. But in general it wasn’t a formalized open source the way things are in computer science. Chemistry is much less worried about actual sort of code sharing. Usually it has been very informal that if you ask for something you usually get it or get some sort of academic license. There was no formalized structure in place. (Gezelter, Interview 2014)

The informants linked specifications to justifications in that they explored how open data could present a novel solution to the problem of sharing of their research findings or tools. Two justifications emerged: 1) Open data is key for solving of scholarly questions; securing the reproducibility, transparency, and verifiability findings, and that 2) new information and communication technologies can and should be harnessed. They linked their justifications directly to their specifications:

I would like to be able to create self-sustaining and fully transparent open science research systems that produce results unambiguously useful to humans. These systems should eventually be mostly, if not completely, automated. By transparent, I mean that not only the results, but also the reasoning behind research projects be made publicly available in real time. (Bradley, 2006 online article) The Internet has clearly revolutionized the creation and dissemination of information and the tools to support these activities. … Similarly in programming the Internet has greatly increased the quality and power of systems available to scientists. (Murray-Rust, 2005)

We interpreted the spirit of the age to be the dawn of a data and knowledge-rich infosphere which would be self-evidently valuable to science and where every discipline would be actively publishing their data on the web” (Murray-Rust, et al., 2011)

80

Institutional work: Counterfactual thinking to develop alternative solutions

Counterfactual thinking is defined as a set of cognitive processes that allows for the envisioning of unexpected or unusual approaches (Roese & Olson, 2014). I found that the informants were able to creatively envision a new way of presenting their research data that was contrary to that of traditional science. As noted above, the motivation for the scientists’ counterfactual thinking was their concern for the theorized faults of traditional science dissemination and sharing to benefit both society and the scientific community. The informants considered that in order to address the issue of lack of transparency and reproducibility, they would need to create an open data innovation that allowed for the timely and open release of their research findings or the source code for the tools that they used in their labs.

The informants wanted to try something different that built on concepts of openness they had come across in other fields of scientific practice. Pérez was influenced by David Donoho, a statistician who wrote about the merits of computational reproducibility (Donoho, 2010). Two of the scientists noted that they had been influenced by the philosophy of open science paradigms – both Gezelter and Pérez stated they had read Eric Raymond’s The Cathedral and the Bazaar (1999), an important text on the development of the Linux open source system:

When I started writing Jmol I had been reading some of the formative open source documents. Eric Raymond and The Cathedral and the Bazaar and I said let’s actually formalize this as open source. Let’s actually put a real license on it [Jmol] and make sure that people can keep using it. (Gezelter, Interview, 2014)

All the informants recognized the possibility of taking advantage of information and communication technological advances to achieve open science. They spoke to the development of the internet and data as providing an opportunity for technology to solve their problem. As noted in the literature, institutional change can be accomplished during times of larger upheaval or change (Lawrence & Suddaby, 2006) and the exploding availability of internet and computing capabilities was cited by the scientists as a game changer in terms of enabling how science should be conducted in their fields: The open source movement was already a fairly recognized force in the computing space at the time. This was in late 2001, right after the dot-com crash, which obviously kind of affected the commercial sphere which had established really in the kind of broader, technically-savvy community an understanding of the value of open source for other things. This was the age of Linux, all that type of

81

LAMP Stack (Linux, Apache, MySQL and Perl/PHP/Python), Apache and Perl as a way of creating websites. So, I think people were very aware of the value, the impact, the importance of open source and tools for other reasons, from a philosophical perspective, but there was not a lot of presence for those ideas in academic settings. Sure, scientists would use Linux or they would use Emacs but the ethos of open science, of building the process of research itself from open source tools was not really established in science. (Pérez, Interview)

Technology also enables openness in research, sharing research as it happens, almost in real time. (Bradley, 2013)

The social web has evolved into a more semantically aware, machine-readable web in which an ecology of services and tools has flowered. Chemical information can now be communicated to both human and automaton at every point on the spectrum of openness. (Bradley, 2013)

Individual level institutional work outcome - Opportunity recognition

In summary, at the individual level, the informants engaged in theorization that includes both specification of the faults of a current system and justification for a proposed change (Greenwood, et al., 2002; Hardy & Maguire, 2008; Strang & Meyer, 1993; Suddaby & Greenwood, 2005). The informants drew attention to the importance of the dissemination of scientific data and methods in a particular way and specified how the traditional avenues of publication were not able to meet current needs to make available their research data, and for this to be shared and made use of by others as well as for being easily verifiable and reproducible. The informants drew on Mertonian principles of the openness and transparency of science as well as identifying their personal values related to transparency and efficiency in the conduct of science. Concurrently, the informants engaged in counterfactual thinking to develop an alternative, open data innovation that drew on open logics they had come across in other fields of scientific practice and software development.

The main outcome of the individual level institutional work, including theorization and counterfactual thinking, was labeled as opportunity recognition in that it supported the informants in recognizing the opportunity for them to implement a novel practice. Counterfactual thinking is a component of opportunity recognition, in that as institutional entrepreneurs, they “need a particular kind of insight to think beyond the current institutional arrangements and realize that there is an opportunity” (Trace et al., 2011). This opportunity recognition occurred at different points in the informants’ careers: for Pérez while completing his doctoral dissertations,

82 for Gezelter while engaged in a postdoctoral fellowship, for Kansa while he was engaged in a lectureship, and for Murray-Rust and Bradley as faculty members.

5.2 Organization Micro-level: Innovation Design and Establishment

While analyzing the data, I labeled the level of the organization as the micro-level that included institutional work within the informants’ academic organizations—teaching classrooms with students, research labs with students and research colleagues, and academic departments with colleagues—as well as organizations created by the informants outside of their academic environment to support open innovations. The labeling of the organization as the micro-level aligns with the group- and organization level scope of the institutional micro-level as described by Lawrence & Suddaby (2006).

At this level, the informants conducted institutional work to: 1) disassociate with traditional modes, 2) name and create new symbols, 3) create standards of practice, 4) theorize, and 5) mobilize resources in order to establish and develop their innovation. The institutional work at the micro-level is summarized in Table 6 and discussed below.

Table 6: Developing and establishing the innovation: Micro-level organization institutional work Institutional Work Disassociation with traditional modes Cutting ties with collaborators and/or existing tools Establishment of a new organizational form Naming and creation of new symbols Encapsulating theorization themes related to openness Creation of standards of practice: training and education Mandated for use by students in teaching classrooms Developed with graduate students in research labs Theorization: Specifications and justifications for the need for change Moral value: Transparency and reproducibility of scientific dissemination in serving the scientific and community and the public Pragmatic value: ease of use, efficiency, and ability to time track the finding of a scientific result Way of the future conduct of science Financial resources financed through traditional research grants and/or through set up of separate organizations Lack of human resources in terms of colleagues within department

83

Institutional work: Disassociation with traditional modes

At the time of the development of their open science innovations, the informants were conducting research within academic organizations, either as professors or postdoctoral research associates (Bradley, Gezelter, Murray-Rust), as a graduate student (Pérez), or as a lecturer (Kansa). In seeking to establish their open data innovation, in different ways, three informants conducted institutional work to disassociate themselves from their traditional science practices and deliberately create new ones to align with open data. Two approaches emerged: 1) Cutting ties with collaborators in traditional science (Bradley), or 2) the establishment of parallel organization form (Kansa, Pérez). In disassociating with traditional modes, the informants broke with existing institutional assumptions in order to facilitate new ways of acting (Lawrence & Suddaby, 2006).

Bradley broke off his connections with collaborators and completely changed the field of studies of the work conducted in his research lab. In his early academic career, his focus was in the field of nanotechnology, working in the traditional science mode and acquiring patents (Bradley, 1997; Korneva, et al., 2005; Rossi, et al., 2004). Based on his work at the individual level, he considered that he could not work in an open manner within nanotechnology. He thus changed his disciplinary studies to chemistry so that he would be able to work in a more open manner, finding an area of research that would most benefit from an open data model. He was explicit that turning to open data was deliberate, resulting from his assessment of his specification and counterfactual thinking, and necessitated the change of his scholarly focus:

However, I couldn’t be open with the project I was then working on, because I was collaborating with someone who didn’t feel the same way as me. My decision to do open science meant cutting ties with my previous collaborators. Having done that in 2005, I started the project UsefulChem. (Bradley in Poynder, 2010)

Initially, the project consisted of a single blog, UsefulChem.blogspot.com, with the objective of carrying out chemistry research in areas that could benefit most from an Open Science model. (Bradley, Blog April 24, 2007)

Although not changing their field of studies, Murray-Rust and Gezelter disassociated themselves from traditional practices within their organizations. Murray-Rust, et al. noted that “almost all publishers of chemistry are closed access and have determinedly remained so” (2011, p. 1). Along with colleagues, he developed the WWMM and dissociated himself from traditional methods of data publication within his academic department in which “they still take their spectra and print

84

them out on paper and stick them in their books; and no electronic data are around at all even though they were created on machines” (Interview, 2014). Gezelter refused to make use of a proprietary source tool in his research lab and decided to create his own, releasing the source code publicly and continuing its development in an open source model:

Xmol only ran on SGI workstations and we were decommissioning all our SGI workstations. We had a bunch of new Linux machines so I wrote to the Minnesota Supercomputer Centre and I said, “We use this tool. It’s only available for SGIs. Do you have a Linux version of it?” And they said, “No.” I said, “OK, well, can I borrow the source code, I’ll make you Linux version of it and give it back to you, and you can do what you want with it.” And they said, “No.” At that stage, we said, well ok, this program is not that complicated; we’ll write our own. (Gezelter, Interview 2014)

Kansa and Pérez were explicit that they were discouraged from pursuing open data as an integral part of their research while at their academic organizations as open data was not recognized as a valid scholarly pursuit for a tenure-track position that would negatively impact their academic careers: A lot of this was seen as work that you should do on the side and that is unimportant. I mean, it may be valuable, but it is valuable in the sense that having pipes that don’t break is valuable in your house but you think that once the pipes are there you can forget about them and they are not important; or they are not interesting. So, it really was seen a plumbing and janitorial work rather than a first class scientific contribution. And that manifests itself in the difficulty of getting grants, as getting it as part of your academic CV; of getting hired; of getting tenure, etc. etc. … I was told not to do it. I was told by folks senior not to do this sort of thing because it was, as somebody said “a waste of your talents and a waste of your time”. (Pérez, Interview 2014)

Both Kansa and Pérez pursued non-tenure track academic positions and established their open data innovations outside of their academic organization.

Kansa established the Alexandria Archive Institute (AAI) in 2001 while lecturing at Harvard. At that time, he was affiliated with University of California Berkeley as an Adjunct Associate Professor at its School of Information. Kansa stated that the AAI was begun in order to preserve and openly share archaeological data, and that he had found that existing computing infrastructure was not going to meet these needs (Ellson, 2013). The AAI was established as non-profit entity, with Kansa as Technology Director & Open Context Program Director, and his spouse Dr. Sarah Whitcher Kansa, as Executive Director. The Institute staff include usability and web designers and research fellows that work on specific research projects With the aid of seed funding, he began to develop Open Context as a ‘side project’ for three years while also continuing his teaching and administrative obligations (Interview, 2014). In 2006, he launched

85

the first public database in Open Context, having launched the AAI website in order to make it publically accessible.63 He disassociated himself with the traditional mode of pursuing a tenure- track position, noting that he launched Open Context and established AAI to pursue open archaeological data where he is in an “institutional context where I don’t have the constraints and pressures of a tenure review committee” because he wanted to get into archaeological data sharing but “didn’t see a way to do it within the university context” (Interview, 2014).

Pérez began working on the open source IPython in 2001 while completing his graduate studies at the University of Colorado. He continued his graduate and followed with postdoctoral studies, working in parallel on development and release of the first public version of IPython in 2006.64 He built an organizational form that was that of an open source model, conducting institutional work in seeking out collaborators from other organizations.65 He was active in theorizing the benefits of IPython as an open source innovation, and able to garner human resources to support its development. In 2011, he co-founded NumFocus (Numerical Foundation for Open Code and Usable Science), an organization with a mission to “promote sustainable high-level programming languages, open code development, and reproducible scientific research”.66 By 2011, Pérez noted there was a realization that a more formal structure was needed to the open source development of IPython and other open source tools:

So the birth of NumFocus came from the recognition of a number of us in the community. Those of us who have been doing this since the beginning and struggling with this, that as these tools grow and mature, we really need infrastructure that goes beyond a few crazy, like-minded people who get together at conferences and were willing to do this kind of, because it was crazy. We wanted a more formal infrastructure. (Interview, 2015)

Institutional work: Naming and creation of new symbols

Institutional entrepreneurs can create names and symbols that assist in sharing new ideas and enable a collective sense of identity with an initiative (Zilber, 2007; Thompson, et al., 2015). The

63 https://alexandriaarchive.org/about/history/ 64 http://blog.fperez.org/2012/01/ipython-notebook-historical.html 65 http://blog.fperez.org/2012/01/ipython-notebook-historical.html 66 http://www.numfocus.org/about.html

86

informants indicated that the naming of their innovation related to how it would be received and located at the micro-level as well as announcing or formalizing it at the meso-level. For the informants, naming served as a convenient way to identify the innovation, drawing attention to it and distinguishing it, supporting the establishment of the innovation at the micro-level, as well as with legitimation at the level of the scientific discipline.

Bradley coined the name UsefulChem for his blog and website in order to reflect his decision to work on science that was useful, including working on anti-malarial compounds (Poynder, 2010). He also coined the term open notebook science in 2006,67 less than a year after launching the UsefulChem project (Bradley, 2008; Bradley, et al., 2011; Poynder 2010). He named and defined ‘Open Notebook Science’ as a term to encompass a way of doing open science that went beyond his own UsefulChem initiative. He first used it in his blog post, and at talk at Drexel University at the micro-level, and then formally created an entry on Wikipedia, as well as alerting the scientific community through scientific journals such as Nature Precedings. Although created at the micro-level, he had conducted research at the meso-level to see if the name had already been used as he wanted to ensure the name reflected a distinct approach:

I started using the term ["Open Notebook Science] a year ago to describe our UsefulChem project because it had no hits on Google and so it offered an opportunity to start with a fresh definition. There are currently over 43 000 hits for that term and it is nice to see that the first hit is still the post with the original definition. (Bradley blog, October 4, 2007)

Jean-Claude coined the term Open Notebook Science to distinguish this approach from other more restricted forms of Open Science….. (Bradley, Nature Precedings, 2008)

To clear up confusion, I will use the term Open Notebook Science, which has not yet suffered meme mutation. By this I mean that there is a URL to a laboratory notebook (like this) that is freely available and indexed on common search engines. It does not necessarily have to look like a paper notebook but it is essential that all of the information available to the researchers to make their conclusions is equally available to the rest of the world. Basically, no insider information. (Bradley, Blog September 26, 2006 http://drexel-coas-elearning.blogspot.ca/2006/09/open- notebook-science.html)

Murray-Rust and colleagues at Cambridge University named the World-Wide Molecular Matrix (WWMM). It was announced it in a scientific paper and at an eScience conference (Murray, et

67 http://drexel-coas-elearning.blogspot.com/2006/09/open-notebook-science.html

87

al., 2011). He also defined the term ‘Open Data’ in 2008, and, similar to Bradley, looked beyond the micro-level for its use:

So about 2007, I think, I went out to Wikipedia to look for open data – no, I went to Google to look for open data. I couldn’t find any mention of the phrase. You might find that strange now but there was no mention until I did a little, I got together with some people and wrote a page in Wikipedia about it – what is open data. (Murray-Rust, Interview 2014)

An entry in Wikipedia on Open Data. Wikipedia requires a neutral point of view (NPOV), and I did my best to review the usage of “Open Data” as accurately as possible. I came to realize that it was used outside scholarly publishing and listed the main areas (see below). There have been several valuable contributions, but the structure of the entry is largely unchanged. (Murray-Rust, 2008)

Three informants incorporated the term ‘open’ in either names (Open Context, Open Notebook Science, OpenMD) to reflect the theorization narrative of openness. Thompson, et al. (2015) found that sustainability entrepreneurs created new symbols and slogans that encapsulated their theorization narratives. Slogans such as the AAI’s “Opening the past, inspiring the future” captured Kansa’s theorization narrative for the availability of archaeological data. Similarly, the ‘tagline’ for OpenMD is “Molecular Dynamics in the Open”, a double reinforcement of openness.68 Two informants named initiatives to align with existing named entities. Pérez noted that he liked using the Python programming language but wanted to create a more interactive and open system. He named the distinctive IPython for ‘interactive’ Python.

Gezelter named Jmol and OpenMD and developed logos and websites for them from within his academic organization. Similarly, Pérez started using an existing programming language, Python, in 2001. Kansa developed Open Context and the website for the Alexandria Archive Institute. These informants also indicated that the naming related to how their open data innovation would be received both at the micro- and meso-level.

The informants and their colleagues also created symbols and logos to assist in sharing new ideas. The Bradley lab developed a logo for the identification of research conducted with Open Notebook Science, so that other researchers could include the logo on their own lab notebooks to “specify a researcher's intent in making their lab notebook available” and to link to the open notebook science claims page (Bradley Blog, March 12, 2009).

68 http://openmd.org/

88

Institutional work: Creation of standards of practice - training and education

Three informants (Bradley, Murray-Rust and Gezelter) developed and incorporated their open innovation within their teaching classrooms and research labs, performing institutional work to create new standards of practice through training and education to align with open science practices. Training and education practices allow for the educating of others in the skills and knowledge that are required to support a change (Lawrence, 1999; Nigar, 2013).

One informant, Bradley, made use of his open data innovation as an integral part of his undergraduate teaching and research. He mandated the use of open data methods by students in his undergraduate teaching classrooms, thereby establishing new standards of training and education. For example, within his organic chemistry undergraduate classes, Bradley assigned

projects that were open and whose results were public.

In one of my classes they have to write an essay on a topic of their choosing in chemistry. I make it very clear at the beginning of class that this is a public wiki. (Bradley, Interview 2014)

Students will be required to record their experiments using an Open Notebook on a wiki and will receive and respond to comments from the online world. (Bradley, Nature Precedings, 2008)

To a lesser extent, Gezelter has incorporated OpenMD in his undergraduate and graduate courses. For example, the course description for the Computational Chemistry course notes that the computer lab sections of the course cover a range of topics and software packages including OpenMD.69 He also taught the Science 2.0 course that deals with topics in modern science includes topics and readings on open data, source and access.70

The informants also established new standards of training and education in their research labs by incorporating open innovations as part of the research process. Launched in 2004, Gezelter has made his OpenMD his research group’s primary molecular dynamics code.71 The Murray-Rust

69 http://gezelterlab.org/teaching/ 70 http://gezelterlab.org/teaching/science-2-0/syllabus/ 71 http://gezelterlab.org/software/

89

Group at Cambridge made use of open data as well.72 Bradley named his research lab the “UsefulChem” group, aligning directly with his open data innovation:

The UsefulChem group had 2 meetings this week. Everyone needs to subscribe to the UsefulChem blog, the UsefulChem -Molecules blog and the UsefulChem Wiki. (Bradley Blog, 11 January 2006)

I have set up a separate blog to put all experimental details for the UsefulChem project, at least for experiments done in my blog. (Bradley, Blog, 7 February 2006)

We have now been trying for 2 months to execute on the synthetic work we planned for the diketopiperazine malaria project. There are now 4 students working on this: 2 graduate students Alicia and Khalid, and 2 undergrads James and Brett. And we might get one more undergrad next week. Things have been progressing slowly since all of these students are new to this type of lab work (Bradley Blog, March 23, 2006)

The informants made use of their teaching classrooms and research labs to test and refine their open innovations:

So, that’s why very early on this project, this open notebook science project started in the summer of 2005. And although initially for a few months I did use a blog, I realized very quickly that for a lab notebook that didn’t work because it doesn’t, for example, keep track of corrections. You can’t tell when text or data has been changed. That is extremely important in my opinion in order to understand how science actually happens. If there are errors and they are corrected, I think it is very important to know when the error was made and when it was corrected. (Bradley, Interview 2014)

In developing UsefulChem, Bradley highlighted how the open data could easily support the aspirational goals of complete openness to the research process:

If you have been wondering why there has been a drop in the activity on the UsefulChem Experiments Blog, remember that we have moved our group lab book to the wiki. The best way to find out who is doing what on an hourly basis is to click on the Recent Changes link on the left of the wiki…. That way it is possible to find out not only what happened but also HOW the student, supervisor and colleagues arrived at the results, presented arguments in the discussion and came to their conclusions. (Bradley, Blog August 29, 2006)

The three informants expressed that incorporating new standards of practice into the teaching classrooms was a challenge, but one that was manageable at the undergraduate level. At the graduate level, students were attracted by open data to work with the informants. The informants were also clear on the importance of training graduate students early on in open data:

72 http://www-pmr.ch.cam.ac.uk/wiki/Group_Members

90

What was the biggest challenge? I guess maybe convincing my students that is was a good thing to do. And to convince them to put their data in on the same day that they do it. If you keep a lab book sometimes you can wait a while before you put stuff in but if you're making this available ultimately you do have to input your data fairly frequently. We'd like it to be there the same day so people can comment. So that's probably been a challenge that I think has been overcome. (Bradley, interview at Drexel University to students of the Information School, Ritter-Guth, B. 2006)

Mainly because they (graduate students) came to work for a guy who cares about open source and wants all their contributions out in the public. (Gezelter, Interview 2014)

One of my first students, a guy named Chris Fallon, has just started an assistant professorship at Oklahoma State. He certainly got trained in how to release code and how to encourage people to use it from me. (Gezelter, Interview 2014)

In discussing the undergraduates students as unique, Murray-Rust and Bradley’s statements in include elements of evangelizing, a legitimation process in which actors, co-create, spread, or support novel practices through a desire to express unique identities (Suchman, 1995; Jones & Massa, 2013):

I say, you should get in the first week of a graduate student coming in to the institution; get them before they are institutionalized. The way I would do it is I would have the 3rd year graduate students instruct the 1st year graduate students on how to manage data. Because the 3rd year students have been through it. They know the pain they’ve been through because they didn’t do it properly. And so they will be listened to by the 1st year students. … But I do also talk about undergraduates because the thing about undergraduates is that at that stage, they are not necessarily going on to do research at all. So they are not tied down by the worry about their h- factor or any of this crap. (Murray-Rust, Interview 2014)

Kansa and Pérez also developed training and educational opportunities for Open Context and IPython, respectively, however, these were launched at the level of the scientific community (see section 5.3).

Institutional work: Theorization

Within their academic organizations, the informants theorized the need for change along similar themes as expressed at the individual level. In addition, they justified the appropriateness of their proposed solution, as well specifying open data practices as being beneficial for learning and career development. The faculty members communicated with their students and research colleagues through courses, and their own and others’ blogs, papers, and talks. Although three informants were able to incorporate their innovations within their teaching classrooms and/or

91 research labs, the informants expressed that overall failure in their attempts to get the support of colleagues and institutionalize open data within their academic departments or units.

One informant highlighted the benefits of open data for his students both in the classroom and in their labs. Bradley theorized open notebook science as a positive learning and career- development tool, but also as a way to prepare them for the way science would be conducted in the future. In his course outline, in addition to the development of organic and analytical chemistry skills, he stated:

Students will be required to record their experiments using an Open Notebook on a wiki and will receive and respond to comments from the online world. Maintaining a public laboratory notebook can be a very efficient way to learn about the proper way to document an experiment because the adviser and other interested parties can provide immediate and ongoing feedback, which is impossible with a conventional closed paper notebook. They will also be encouraged to engage in conversation about their project on various social networks, including mailing lists (e.g. OrgList, UsefulChem), our collaborators' blogs and wikis, Facebook, Nature Networks, SciVee, Flickr, etc. Not only will interacting with peers and mentors be valuable as a learning experience, the contacts formed may be helpful for the progress of the student's career after graduation. … Coupled with their work doing conventional manual organic synthesis, these students will be competitively trained to enter the 21st century chemistry workforce (Bradley, Nature Precedings, 2008)

Bradley also reached out beyond his own disciplinary academic department to information technology and information units within their university, theorizing the appropriateness of their innovation as well. He presented to his university’s Information School, for example:

And this actually looks very similar to what it would look like in a paper notebook. And that's on purpose. We wanted to make things easy as possible for people to get involved with Open Notebook Science. (Bradley, interview at Drexel University to students of the Information School, Ritter- Guth, B. 2006)

We are trying to get high quality information processing without having to become computer scientists to do it.... the idea is that this is replicatable. ... this is all made explicit, so you don't have to ask the researcher for permission. ... The idea of Open Notebook Science is basically to report the work that you do in the laboratory in real time or as close as you can to real time, so that the entire world knows as much as you do about your research. Like I said, there are a number of references here that you can take a look at the background of this. But, the motivation is that - well, it should be self-evident that it's a way to do faster science compared to either not disclosing some things or significantly delaying them. (Bradley, interview at Drexel University to students of the Information School, Ritter-Guth, B. 2006)

Two noted their open data work was discounted within their academic unit:

About a year later I started my own faculty position and so then it was up to me how I spent my time. It was, again, not something that has been formally recognized in any formal way by my university. At this stage, I’ve gone through one promotion and about to do another and I always get

92

the feeling that the stuff that has code associated with it is “Well, that’s nice, but it doesn’t count.” It is not really counted as part of the academic reward system in the same way that a paper would be…. I think my colleagues in my own department are pretty oblivious … It’s not viewed locally…. I think it’s still, “Oh, that’s not really chemistry….” So even though this does not get a heck of a lot of traction in my own department, outside my department I think it is viewed pretty positively. (Gezelter, Interview 2014)

For Kansa, in establishing the AAI, the theorization is reflected in the stated vision and values of the organization that reflect the themes of the moral and pragmatic value of open data for science and society, as well as the development of legitimate resources and tools: The Alexandria Archive Institute will lead the research community in developing accessible, reliable, comprehensive, and open access scholarly resources. We will help transform these data into knowledge, making them valuable and relevant for students and scholars worldwide.  We support universal accessibility to scholarship.  We promote the development of authoritative scholarly resources that are reliable, comprehensive, and open. (https://alexandriaarchive.org/about/mission/)

Institutional work: Mobilization of resources

Resources are requirements to build a new initiative and can include financial, material or human resources (Battilana, et al., 2009) and seeking these resources was critical for the informants to operate and sustain their innovations. At the organization micro-level, the informants sought financial and human resources through their academic organizations, but were not tremendously successful. Kansa and Pérez established organizations outside of their universities and sought resources from the scientific community (Section 5.3).

Although the informants theorized the low cost of some aspects of their open science initiative, their research did require financial resources. They acknowledged that ensuring the financial sustainability of their initiatives took a significant amount of time and effort but that it was a critical to developing and sustaining their initiative:

Financing all of this is hard. I don’t know – you spend almost half your time just trying to raise funds—grant writing and everything like that. There is a lot to do. …. I would love have money to bring in a full-time developer and more than a full-time developer. It’s just the, you know, it is not financially feasible at all. That’s the biggest constraint. (Kansa, Interview 2014)

The informants sought funding through their existing research grants, incorporating low cost information and communication technology, as well as seeking non-traditional research funding. Bradley and Gezelter funded their initiatives through existing research grants (Interviews, 2014).

93

Gezelter noted that funding for Jmol did not exist separately and it never received any formal funding (Interview, 2014). For his second open source initiative, OpenMD, Gezelter received some seed funding from the Sloan Foundation but also “piggy-backed” funding for it onto ongoing research funding (Interview, 2014). Bradley also attempted to find support beyond that of traditional research agencies, enlisting the support of the Institutional Advancement Office at Drexel University (Bradley, Blog, April 24, 2007).

Murray-Rust may have been the most successful in amassing financial resources from within an academic organization, as at the same time as he had begun to work on open data, the University of Cambridge Department of Chemistry established the Unilever Centre for Molecular Informatics73 in 2000 and the UK government established its ambitious eScience program in 2002 (Lasthiotakis, et al., 2015). The WWMM received some early funding from the fledgling UK eScience programme (Murray-Rust, et al., 2011) and funding from the JISC74 (Murray-Rust, Interview, 2014). Murray-Rust noted the importance of funding from the Unilever Centre noting that his helped to draw in human resources for his open data projects:

I am extremely fortunate in Cambridge to be able to get funding from… You know, I work within the Unilever Centre. They gave core funding, and I was able to run a group which I think was probably without equal in scientific programming in academia. It is run by a wonderful postdoc called Jim Downing. Jim was an engineering and Jim had a vision of how software should be developed and that ethos has penetrated me and everybody else that has come in contact with our group. (Murray-Rust, Interview, 2014)

Kansa and Pérez noted that their existing academic environment was restrictive and did not allow for their open data innovation to be sustained (Open Context and iPython, respectively), as both the research culture did not value the open data and, also, the research funding policies did not allow for either themselves or for non-traditional funding to be facilitated. As a result of lack of funding opportunities within their academic organizations, Kansa and Pérez developed and developed and established their innovations by forming new organizations (Alexandria Archive Institute for Open Context) and new ways of working (open source for IPython and co-founding NumFocus).

73 Later renamed as the Centre for Molecular Informatics (Fuchs, Bender & Glen, 2015). 74 Formerly the Joint Information Systems Committee, JISC is a United Kingdom public body whose role is to support higher education and research by providing advice, digital resources and network and technology services, as well as supporting the development of new technologies.

94

Kansa was able to garner some initial resources for Open Context at UC Berkeley as an Adjunct Associate Professor from the William and Flora Hewlett Foundation. This was important funding that allowed the Alexandria Archive Institute “to get off the ground” (Ellson, 2013) but he noted the funding was soft money and it run out with the financial crisis” (Interview, 2014). He established the Alexandria Archive Institute as separate, tax-exempt organization in order to be able to seek out research funding:

So, part of that is also, universities have pretty strict rules about who can and who can’t apply for grants, and we wanted to have an independent organization through which we could apply for grants. It has been really valuable. (Kansa, Interview)

AAI has been successful in receiving support from government and public funding agencies and organizations such as the National Endowment for the Humanities (NEH), the National Science Foundation (NSF), the Sloan Foundation, and the Leon Levy Foundation. Kansa also set up the Alexandria Archive Institute organization so that it is able to accept donations and provide tax receipts to donors.75 The Institute makes use of funds support the further development of Open Context, for advocacy, industry collaboration and research, as well as educational activities. The Institute is also able to charge for the inclusion of archaeological data in Open Context, including services such as editing, annotating, peer-review, open access publishing, as well as data archiving that includes data cleaning and organization, annotation with linked data standards, stable web identifiers for citation. Financial statements of account are published as part of the Institute’s publically-available Annual Report.

Pérez was able to receive a small amount of funding for IPython from the NiPy Project, a community of practice using the Python programming language in neuroimaging analysis. He described this funding as a critical breakthrough for IPython as the funds supported a collaborator to work on IPython full-time leading to a developmental “bottleneck”.76 Similar to Kansa, he noted the difficulty of research funding through his academic organization:

The bulk of the funding for IPython right now, are grants that I run through the university. But first of all, historically that had been really, really hard. It is only in the last couple of years that I am getting significant academic funding for these projects. Up until now, almost every time we have tried we have failed. Not all of them. For example, this neuroscience project actually did get

75 https://alexandriaarchive.org/contribute/?ref=opencontext 76 http://blog.fperez.org/2012/01/ipython-notebook-historical.html

95

funded by the NIH in 2008. So not every time have we had failure, but many times. (Pérez, Interview 2014)

Establishment in 2011 of the NumFocus along with like-minded collaborators allowed for funds to flow directly for development of IPython, making accepting of certain funding (form companies such as Microsoft and Racksoft) to be more easily accepted through a separate organization:

We also realized that we needed a tax exempt entity that would allow people to make tax-exempt donations, at least under US law… We realized it would be great if we had a legal entity that could help us; that would serve as a donation and financial centre for people to make contributions; could distribute money to individual projects and serve as a fiscal sponsor; and could interface with industry and larger parties who may be willing and interested in collaborating but are not going to send a random cheque to some random person out on the internet. (Interview, 2014)

It’s not always easy to run funding through the university. It depends on what you are trying to do and how you are trying to do it. Certain classes of funding work very well in universities, others it’s much harder. For example, IPython has $100,000 donation now that came from Microsoft. And we ran through Numfocus, not through the university. It was easier, the overhead was lower, and it lets us spend the resources much more flexibly. Brian Granger is at Calpoly, I’m here, it is a lot easier to make decisions on how to use that money with NumFocus than it is with a grant from the university that is managed by one PI that has to be in that location. (Interview, 2014)

In terms of human resources, at the organizational level, the informants who taught courses and/or ran research labs noted the importance of students that supported open data and its development. As noted above, Bradley tested and refined UsefulChem through its use in his classrooms and research labs. Similarly, Murray-Rust noted that:

I have the great fortune of working with fantastic second year students during vacations. They formed the basis of my group in many cases. I managed to get funding for that. What I would do is I would research a problem – it was writing software – I would scope this out, I’d build some prototypes and then put them onto it and give them their head. I would tell them where we’re starting from but I wouldn’t tell them how to do it because they would come up with different ways of doing it. (Murray-Rust, Interview)

Although Gezelter noted that OpenMD is tied to his research group, he noted that sustainability of an open source initiative was an issue in terms of resources needed to maintain the code:

Say I have a graduate student or a postdoc who works on code and leaves the group and goes someplace else. Unless that grad student has actually put their work in our group repository, after about two years nobody knows how to use it; nobody knows how it works at all anymore. So that code kind of decays…. Anytime somebody works on a program that is related to our simulation package, that piece has to be imported into the main group code. OpenMD has a bunch of small utility programs to process the data that comes out. A lot of those have been the work of various graduate students. And as they finish writing up, part of the writing up is putting their code into the

96

group repository if it is not there already; to document it; to leave behind a readme file about how to use it and things like that. (Interview 2014)

Organization micro-level institutional work outcome – Innovation establishment and development

At this organizational micro-level, the informants conducted institutional work to develop and establish their open data innovations by 1) disassociating themselves from current modes; 2) naming and creation of new symbols, 3) creating standards of practice, including training and education, 4) theorizing, and, 5) mobilizing resources. Three informants that were faculty members were successful in establishing the use of their innovation in their own research labs and courses but were frustrated in attempting to engage in institutional change within their academic departments. Two established parallel organizations/communities to their academic ones in order to house and sustain their innovations.

The institutional work of theorization continued at the micro-level, with similar justifications as expressed at the individual level—the moral and pragmatic value of the innovations for research transparency and efficiency was highlighted as per the informants’ reflections at the individual level and, in addition. However, the open data innovation was also specified as a positive learning tool and as a way of future learning and networking. The informants dissociated themselves from current modes – either in terms of their research discipline and/or use of existing research tools – or by development of separate organization or development community outside of their academic organization. Establishment of a separate organization was deemed appropriate by two informants, as there it did not appear to them that there was a way for open data scholarship to be recognized, and hence, sustained within a university environment. Establishment of a separate organization facilitated their ability mobilize financial and human resources. Informants that were faculty members at universities established standards of practice by incorporating their open innovations into their teaching classrooms and/or research labs. This institutional work was critical in the development and establishment of their open data innovation.

The informants attempted to engage in institutional work within their academic departments. Although those with tenure-stream appointments were able to incorporate their innovations within their teaching classrooms and research labs, the informants expressed that overall failure in their

97 attempts to legitimize their innovations in their academic departments. Gezelter cited lack of recognition by institutions as a major obstacle to open data, along with lack of citation academically (Interview, 2014). Beyond a few active scientific collaborations (Bradley, 2008; Murray-Rust, et al., 2011), the open scientists expressed that overall their attempts to foster a culture of open data within their academic department were unsuccessful and they discounted their colleagues as conservative and unsupportive, and the academic unit as not as a place to seek change:

I have made very little progress in the department of chemistry, University of Cambridge. And that will probably be true for almost all other chemistry departments. I have for example, for the 5 years, including funding from JISC (the UK digital library and networking group). I’ve had money to develop tools for managing chemical data and so on with zero results…. I mean I could weep but I don’t. (Murray-Rust, Interview 2014)

I didn’t gather the department together and tell them I was doing this. I didn’t hide it. If anybody talked to me, I certainly was very happy to tell them what it is that I was doing. But, you know, if the question, was anyone else interested in even adopting components of it. Well, no. I think most faculty are happy with the way things are going. (Bradley, Interview 2014)

About a year later I started my own faculty position and so then it was up to me how I spent my time. It was, again, not something that has been formally recognized in any formal way by my university…. It is not really counted as part of the academic reward system in the same way that a paper would be…. I think my colleagues in my own department are pretty oblivious … It’s not viewed locally…. I think it’s still, “Oh, that’s not really chemistry….” So even though this does not get a heck of a lot of traction in my own department, outside my department I think it is viewed pretty positively. (Gezelter, Interview 2014)

The one exception for support from a university unit was that of Murray-Rust who, although he did not gain support from his Department of Chemistry, did garner both financial and human resources through the university`s Unilever Centre for Molecular Informatics that aligned with and benefited from the UK government`s ambitious eScience program.

As the informants began to gain legitimacy for their open innovations at the disciplinary meso- level within their scientific community (described in the Section 5.3 below), they then presented their work within their own academic units, as well as arranging for like-minded colleagues within their disciplines outside of their own academic institutions to present. For example, two years after establishing UsefulChem and coining the term Open Notebook Science, Bradley

98 began inviting open science colleagues to present talks and symposia in his department at Drexel.77

5.3 Scientific Community Meso-level: Legitimation and Diffusion

Soon after designing and implement their open science initiative for their own teaching and/or research, or through the development of a new organizational form, the informants began to engage in institutional work within the scientific community both within their own discipline as well other disciplines where they found ‘like-minded’ open data practitioners. This level of the scientific community was labeled the meso-level that aligns with the institutional mid-level described by Lawrence & Suddaby (2006) as including the field-level associated with professions or industries. The majority of institutional work appears to have occurred at this level, the main outcome of which was to legitimate and diffuse their innovation through the scientific community.

In practice, scientific communities are concurrently situated both within a local academic organization and globally within a scientific community. Understandably, the institutional work at the meso-level began very soon after, and almost in tandem, with the establishment of the innovation at the organizational micro-level. For four of the five informants, connecting with the scientific community began within the same year as work within their academic organization had been initiated. Bradley concurrently announced the establishment of UsefulChem and naming of Open Notebook Science in his blog as well as in scientific journals such as Nature in 2005. Kansa launched the Alexandria Archive Institute in 2001 and OpenContext in 2006 both aimed at the archaeological community. Pérez began working on the open source IPython source code in 2001 and reached out to work with collaborators across the scientific community to further develop and launch it publically. He also collaborated with other open source innovators to launch the non-profit NumFocus in 2011. Gezelter worked with collaborators across scientific

77 Examples: Bradley organized at Drexel; University: Talk by open chemist Cameron Neylon (Blog, October 5, 2007; Conference - How Web 2.0 is Changing Scholarly Communication (Blog, February 19, 2008); Talk by open scientist Anthony Williams, President of ChemSpider “a vision of future directions” (Blog, August 5, 2008)

99

disciplines to develop Jmol. Murray-Rust publicly announced the World-Wide Molecular Matrix in a 2002 paper and presented in 2003 at an eScience conference (Murray-Rust, et al., 2003).

In analyzing the data, I coupled the work the open scientists conducted within their own disciplines and other disciplines, especially in information technology, because the informants themselves noted the ‘dual nature’ of their professional careers that they were conducting simultaneously:

Much of my academic career has been spent living a "double life", split between the responsibilities of a university scientist and trying to build valuable open source tools for scientific computing. I started writing IPython when I was a graduate student, and I immediately got pulled into the birth of the modern Scientific Python ecosystem. I worked on a number of the core packages, helped with conferences, wrote papers, taught workshops and courses, helped create a Foundation, and in the process met many other like-minded scientists who were committed to the same ideas. (Pérez, Blog http://blog.fPérez.org/; Interview 2012 - https://www.youtube.com/watch?v=F4rFuIb1Ie4)

I have gone to a couple of meetings that have been primarily about open science issues. Data archiving and reproducibility. There was a really interesting reproducibility workshop that involved about 40 of us that were all in the open science community but in different domains. Yeah, it is a split personality, that is very separate sphere of activity from my domain of science. (Gezelter, Interview 2014)

So, there is communicating with the archaeological audience through professional conferences and through papers like that. And the other side is working with the larger open science and open data movements in the sciences. So, we also try to participate in those sorts of venues. (Kansa, Interview 2014)

At the meso-level, the informants performed new as well as similar institutional work as at the organization micro-level, at times with an expansion of the related themes. Distinct from the work at the micro-level, the informants engaged in institutional work in: 1) the creation normative associations, 2) the forging of new relations, alliances, coalitions, and associations, and, 3) advocacy. Similar to the institutional work at the micro-level, the informants also: 1) theorized the innovation and articulated a vision for change, 2) created standards of practice, and 3) mobilized resources. The informants’ research and teaching at the academic organization micro-level provided them with the ‘right to voice’ (Maguire, et al., 2002, p. 86; Tracey, et al, 2011) at the meso-level wherein they cited the success of their innovation in their own research and/or teaching as proof of its appropriateness as well as providing technical demonstrations of its efficiency and effectiveness. The data analysis revealed that the majority of the critical institutional work occurred at this level as the open scientists worked to legitimate and diffuse

100 their open data innovation, as well as open science in general. The institutional work at the meso- level of the scientific community is summarized in Table 7 and discussed below.

Table 7: Legitimation and Diffusion: Meso-level scientific community institutional work Institutional Work Association with taken-for-granted practices Technical alignment with practices Fitting in to the larger disciplinary ecosystem Inclusion of professional elements

Forging new relations, alliances, associations, membership Conferences and workshops with a broad segment of scientists Media as legitimizing agents Aligning with very legitimate actors

Theorization: Specifications and justifications for the need for change Moral and ethical necessity for use Technical demonstrations of effectiveness and efficacy Rebels challenging the status quo, alignment with way of the future, next generation Way of the future, harnessing technological advances Gains for citation and recognition Ability to attract human, financial and material resources

Creation of practice: training and education, support and awards Demonstrations of successful training practices at micro-level Educational and financial supports needed Competitions and awards

Mobilization of resources Financial, human and material resources Greater potential for collaborators from the larger meso-level

Advocacy Need for funding mechanisms by research agencies to support open data Need to change the norms of how open data and source achievements are recognized.

Institutional work: Associating the innovation with taken-for-granted practices

As discussed above in Section 5.2, the informants first developed and used their open data innovation within their teaching classrooms and research labs at the academic organization micro- level. Successful initiatives that begin as solutions to every-day problems are often adopted more broadly because they turn out to be typical problems for a larger set of users (Raymond, 1999). In order to have their initiatives be institutionalized at the scientific community meso-level, the informants considered the practices, needs and culture of the scientific community, and

101 strategically incorporated aspects of taken-for-granted practices and acknowledged professional elements so that their innovation could more easily be adopted at the meso-level.

Murray-Rust acknowledged these requirements when he stated: “Selling ourselves is very difficult. We have to: be technically excellent in chemistry; adopt best mainstream practice in computing (JUnit, Schematron, etc.); look good visually; be easy to use; do something useful; save people effort; be portable; be easily maintainable and developable” (Murray-Rust, 2005). As an example of incorporating taken-for-granted practices, Bradley noted that in designing UsefulChem as an Open Notebook he mimicking the existing practice of using a written lab notebook to that of an electronic form:

It does not necessarily have to look like a paper notebook but it is essential that all of the information available to researchers to make their conclusions is equally available to the rest of the world. (Bradley, 2006)

Open Notebook Science maintains the integrity of data provenance making assumptions explicit. (Bradley, 2009 – NASA presentation)

Kansa notes how Open Context data needs to fit into a larger archaeological data ecosystem:

The way we designed Open Context with this idea and basically trying to do linked data kinds of things—that was a pretty early architecture decision that we made. That in part is driven by a recognition of how we think we need to fit within a larger information ecosystem and how researchers communicate. The ability to reference, say, a specific thing with a stable URL is a technical decision. It creates a requirement around database design, architecture of the web site, that sort of thing, but all of those sort of technical decisions are really predicated on how this needs to work within the context of researchers communicating on the web and integrating/linking up web-based communication with other forms of scholarly communication. … So, yeah, I would say that the technical things are interesting but they are more interesting in how they fit within the professional landscape…. My main role around it is basically curating the entire system and trying to make sure that it works well within that larger information ecosystem of what’s on the web. (Kansa, Interview 2014)

In establishing the AAI, Kansa incorporated standards of practice that supported its legitimacy as an organization, including publication of annual reports and tax forms. The AAI’s governance structure78 includes a Board of Directors and an Advisory Board. The Board of Directors approves the AAI Bylaws and evaluates the AAI’s performance and supports its growth. The Advisory Board is composed of established academics and professionals that provide periodic

78 https://alexandriaarchive.org/about/governance/

102 guidance for AAI. AAI also has policies for Conflict of Interest and Record Retention. An Open Context Editorial Board was established in 2010.

Open Context has a team of editors and an editorial board comprising experts in various archaeological domains and specializations. Editorial boards can perform important signaling roles in academia by elevating the prestige of data sharing. Editorial oversight, coupled with clear and trustworthy citation practices, can make data dissemination a recognized and professionally valued form of publication. (Kansa and Kansa, 2013, p. 92-93)

One of the primary incentives for traditional publication is the strong norm of priority and recognition that is based on confirmed claims of priority (den Besten et al., 2010) for scholars across all disciplines (Aldrich, 2012; Benner & Sandström, 2000; Dasgupta and David, 1994; David, 1998; Merton, 1957, 1973; Shamoo & Reznik, 2009). Informants also noted the need to incorporate acknowledged professional elements of their research communities as related to citation and reward mechanisms in publication and licensing:

We argue here for a new model of “data sharing as publication” in order to address the technological, ethical, and professional concerns surrounding archaeological data distribution today. (Kansa and Kansa, 2013, p. 89)

Yes, initially I started with a blog but realized fairly quickly that it was not sufficient to function as a lab notebook because there is no record of changes made. A wiki is really close to a perfect tool for the actual notebook since all page versions are time-stamped. We use Wikispaces as our hosting service, which has the advantage of providing third-party timestamps on everything recorded or changed. (Bradley in Coturnix, 2008)

I think that is the most important aspect of the discussion right now. How to give credit to people who have done the work. We use a lot of tools to analyze data. And I try every single time we submit a paper to make sure that every single one of those tools that I use gets some credit. I cite tools that normally wouldn’t be cited, software tools that normally would be ignored. And even things like visualization programs – because I wrote one. If we visualize a graph, I make sure I note where that graphing program came from… That’s actually one of the reasons that we do Open MD as open source is that even though that program may not be as widely-used as some other molecular simulation tools, parts of that code have diffused into other widely used packages. So we have parts of our electrostatics code that has shown up in other packages that everybody has heard about. So that’s the thing I can say, “Well, these guys are using our electrostatics code in their standard program and that program uses our heat transfer code.” So I can treat it just like a citation; these major packages that have imported parts of our code. (Gezelter, Interview 2014)

Institutional Work – Forging new relations, alliances, coalitions, and memberships

Institutional entrepreneurs establish new relations with like-minded actors in order to enhance their legitimacy and/or advance a change via collective action (Battilana, et al., 2009; Garud, et

103 al., 2007; Hardy & Maguire, 2008; Ritvala & Nyquist, 2009; Stuart, et al., 1999; Thompson, et al., 2015). Relationships can be individual, collaborations, alliances, trade or professional associations and coalitions, and can include forging alliances with very legitimate actors. The network of scientific collaborators has been shown as critical in the diffusion of knowledge in science (Crane, 1971, 1972).

In order to forge these relations, in addition to publicly chronicling their progress and obstacles on their open data initiative on their personal blogs and websites, the informants communicated through traditional publications and editorials in scientific journals, as well as granting interviews to the scientific media. They attended conferences in order to network with a broad segment of researchers in addition to their own scientific discipline (Appendix 3: List of Analyzed Documents for each researcher). The informants were diligent in forging new relations and alliances within the scientific community. Conferences and workshops at which researchers from a breadth of disciplines were represented were considered important strategic opportunities to both advertise their initiative as well as to identify new relations and form alliances. New relations were viewed as critical to building the legitimacy of their innovation as well as for increasing its diffusion or uptake by others.

The development of networks was important also in amassing further resources, as well as legitimacy and diffusion for the innovation. For example, when asked how Bradley raised funds from the Royal Society of Chemistry to support the ONS Challenge, he noted that he “just know people, by, like going to conferences. It’s a pretty small community ultimately of open scientists” (Bradley, Interview 2014). Similarly:

Some of my involvement around other kinds of digital humanities or open science projects, hard to pin it down as science in the archaeology world, straddle the humanities and natural sciences and social sciences, but a lot of what I do is working with other teams that are doing different kinds of things with the open web. I see it as pretty strategic in the sense that it helps build ties with other types of projects and that people get invested a little bit in what we do with Open Context because we invite in/try to incorporate someone else’s API to bring in related data. We work with other systems that publish data and we reference them as linked data. What is useful about that is that other people know about us. It is also basically reciprocity that because we, essentially endorse what they are doing by trying to incorporate what they are doing into our project, then that’s a positive outcome for them, but it also means that they value what we are doing as well. (Kansa, Interview 2014)

Andrew Lang has been a close collaborator for a long time and has written code that enables us to visualize our solubility results (with Rajarshi Guha) and process NMR files automatically. Andy has also initiated other high impact actions that are unrelated to writing code: our ONS Wikipedia

104

entry, recruiting David Bulger at ORU to do solubility measurements and adding our measurements to common chemicals in Wikipedia - which has ended being a popular portal to our data. Cameron Neylon has done a tremendous amount - recently he pushed to get a group of us to publish a chapter in the upcoming O'Reilly Media book Beautiful Data. Organizing my trip to the UK last fall was another major accomplishment that he made happen. Cameron speaks extensively about Open Notebook Science…. (Bradley, Blog May 9, 2009)

I got hooked by that [open source] community. I met John Hunter there, and we struck up a friendship immediately. We started collaborating very quickly on matlock, integrating IPython. … The notebook viewer was written kind of like a weekend hack by a grad student; a core developer of IPython, Matthias Bussonier, a physicist in Paris. (Pérez, Interview 2014)

The informants connected with the broader open science discourse by also participating in and organizing open science conferences and meetings and highlighted their innovations in order to increase its diffusion—visibility and uptake—in various scientific communities. For example, Gezelter participated in workshops on data and code-sharing that included participants from a wide range of backgrounds – scientists, lawyers, funding agencies, as well as members of the open science community (Gezelter Blog November 23, 2009). Because of his diligent participation at conferences, when IPython was ready for a public launch, Pérez announced it at a conference as an important ‘marketing’ method noting that “all we did really was announce it at a couple of conferences and it spread like wildfire” (Interview, 2014). Kansa organized a mini- conference with the Creative Commons and the Internet Archive on intellectual property and open education:

The nice thing about doing that and talking about those sorts of issues, we were some of the first people to look at, say, this world of open licensing, creative commons, copyless – that sort of thing, with cultural property issues. That’s one of the things that… you know, my primary objective has been around archaeological data sharing and everything, but this built recognition and built credibility that this is an interesting and thought-provoking area. (Kansa, Interview 2014)

And the other side is working with the larger open science and open data movements in the sciences. So, we also try to participate in those sorts of venues. Recently, I was part of a panel discussion at something called Publish or Perish which is at UC. All the videos of that are online. There are a gazillion people tweeting it so as twitter it was great. (Kansa, Interview 2014)

Kansa’s efforts to legitimate and diffuse Open Context was also observed by others. For example, in discussion open tools for archaeology, Sheehan (2015) discussed how Open Context had become increasingly visible in the U.S. archaeological community through recognition in professional associations, articles published in journals and newsletters, as well as presentations and marketing at conferences such as the Society for American Archaeology Annual Conference. Indeed, Sheehan considers one of the ideal characteristics of a data repository is “high visibility”

105

within a disciplinary community, as the “more visible a repository and its contents are, the more likely it is that data will be discovered, used, and re-used” (p. 184) as well as to receive more data submissions thus increasing its relevance and perceived value.

The open scientists extensively highlighted these relationships and communities to support the legitimacy of the open science in their personal blogs, and through their scientific publications. In addition to communicating through less traditional online disciplinary venues, they also provided interviews highlighting their collaborations to mainstream journals and magazines such as Nature, and Science (see Appendix 3: List of Analyzed Documents). The informants noted the importance of the media and acknowledged their value as legitimizing agents:

In the case of ONS, journalists and authors of review articles in both the popular media and the peer-reviewed literature turned out to be important collaborators. The journalists obtained material for their pieces on the changing dynamics of scientific collaboration and the open science movement and projects like UsefulChem received a significant amount of coverage that often led to new collaborations with other scientists as a result. News coverage also proved to be critical to lending legitimacy to the effort allowing the Wikipedia entry on ONS to be accepted in October 2008. (Bradley, et al, 2011)

I just spoke with a reporter last week. And that is another way to communicate what projects we are involved in, like, for example, this chemical rediscovery survey, I haven’t really gone out and promoted it but if I’m asked I reply. (Bradley, Interview 2014)

Through their networking, informants formed or participated in formal and informal alliances or coalitions. In 2005, with a fellow collaborator, Murray-Rust set up the ‘Blue Obelisk movement’ within the chemistry community with the aim of making “it easier to carry out chemistry research by promoting interoperability between chemistry software, encouraging cooperation between Open Source developers, and developing community resources and Open Standards” (O’Boyle, et al., 2011, p. 37). Blue Obelisk is an informal grass-roots organization with membership open to chemists who shared the goals of openness in data, standards, or source (Guha, et al., 2006; Murray-Rust, 2005). Murray-Rust wrote papers in scientific journals such as Nature theorizing the need for a community to “encourage openness in chemistry” (Murray- Rust, 2008). By bringing together researchers and developers with common interests in openness, additional resources were sought and made openly available to the chemistry community (O’Boyle, et al., 2011). Murray-Rust also collaborated with colleagues within the scientific community to develop the Panton Principles for open data in science (Murray-Rust, 2011).

106

The informants sought to build relationships with highly legitimate actors in order to capitalize on the profile for the innovation they had built at the micro-level. The actors with whom they aligned themselves included acknowledged scientific leaders, organizations and industries. The informants were able to develop these important contacts and then leverage them to legitimate themselves as actors who were competent in their own disciplinary research as well as open data. Bradley highlighted his work with an Australian chemist Matt Todd and British chemist Cameron Neylon (Poynder, 2010), both known practitioners of open data within the scientific community. Kansa was able to leverage a key relationship with Dr. Joukowky, a senior and very well respected archaeologist:

Sarah [Kansa] did a bone collection on another Nabitian site, in Jordan. Because she did that, Martha Joukowsky learned about Sarah. So Sarah studied the animal bones from Petra. And that’s how Sarah was able to communicate to Martha that we were working on this data-sharing initiative. Because she had already built up a relationship of trust with Martha Joukowsky (Sarah does excellent work). And, Martha Joukowsky saw that as an option. Similarly, a lot of the other data sets…Sarah has been able to work on several different projects and build up a pretty extensive professional network in that way. That’s been really instrumental in, word-of-mouth, essentially. And it is because of the zooarchaeology that a lot of the other archaeological data sets have been brought have been brought into Open Context. (Kansa, Interview 2014)

The scientists participated in conferences in other disciplines, announcing these invitations and presentations on blogs and in interviews, explicitly or implicitly aligning with the legitimacy of these organizations. For example, in noting he had spoken at a talk of the National Institute of Standards and Technology, Bradley noted that the “organization that has always been associated with authoritative and reliable measurements” (Bradley Blog, December 14, 2008). As they created their networks, the informants highlighted the legitimacy provided to their own initiatives. In highlighting the funding he received for Open Context, Kansa presents several highly legitimate organizations such as the National Endowment for the Humanities, the Centre for Hellenic Studies at Harvard, and several others.79 The Alexandria Archive Institute partnered with Carleton University in 2016 to sponsor the Open Context & Carleton Prize for Archaeological Visualization for open source data visualization.80 In the interview, Kansa noted that these were strategic partnerships, for example:

79 https://opencontext.org/about/sponsors

80 http://ux.opencontext.org/2016/06/20/open-context-carleton-prize-for-archaeological-visualization/

107

That linkage with the California Digital Library and our collaboration with them is one of the key, sort of, institutional ties that makes Open Context credible… We are, again, actively working with the California Digital Library and that’s a partnership that is very strategic for us. It helps build credibility in a variety of ways. We are starting to build close ties with archaeological publishers. And another thing, is that I will be working early in the fall with the German Archaeological Institute (Deutsches Archäologisches Institut, DAI). This is a national institution. It is actually part of the German Foreign Ministry and it is the main organizational entity of archaeology in Germany. The Germans have a tremendous history of archaeological research, very big, significant projects. So, very significant collection and resources and that is also very strategic for us. (Kansa, Interview 2014)

Institutional Work: Theorization including technical demonstrations of effectiveness and efficiency

Similar to the individual and micro-level, at the scientific community meso-level, the informants theorized the appropriateness of their innovation. They specified both the moral and pragmatic appropriateness of open data and articulated a vision for change in the conduct of science. In addition to specifying the pragmatic benefits of open data in relation to efficiency and effectiveness, at this institutional level, the informants theorized aspects of open data that could be understood by actors across their own and other scientific disciplines and by associated stakeholders. In particular, the informants specified the benefits for citation and recognition, as well as the mobilization of human, material, and financial resources.

The informants continued to theorize critiques of the closed and inefficient aspects of traditional science and through narrative and rhetoric, they put forward the moral and ethical necessity for open science both for society and the scientific community. The informants theorized how open science aligned more closely in practice with basic principles of the scientific revolution, and also presented their particular innovation as meeting that need:

Transparency facilitates rapid access to existing and new collaborators, as well as exposing our work to the scrutiny of many, which can only make it better. (Bradley in Drahl, 2009)

Michael Faraday’s advice to his junior colleague to: “Work. Finish. Publish.” needs to be revised. It shouldn’t be enough to publish a paper anymore. If we want open science to flourish, we should raise our expectations to: “Work. Finish. Publish. Release.” That is, your research shouldn’t be considered complete until the data and meta-data is put up on the web for other people to use, until the code is documented and released, and until the comments start coming in to your blog post announcing the paper. If our general expectations of what it means to complete a project are raised to this level, the scientific community will start doing these activities as a matter of course. (Gezelter, Blog, July 28, 2009)

108

In addition, at the meso-level, the informants to justify their innovations, they provided technical demonstrations of efficiency and effectiveness by citing the successful use of their own open initiative in their research and teaching at the micro-level of their academic organization. The establishment of their innovation provided the informants with ‘right to voice’ (Hardy & Maguire, 2008) in legitimizing their innovation. The open data entrepreneurs made use of rhetoric and engaging storytelling to highlight the success of their open data innovations.

We have named this approach Open Notebook Science and have demonstrated its implementation and feasibility with the UsefulChem project, started in the summer of 2005, with the aim of synthesizing novel anti-malarial compounds. (Bradley, 2008)

As an example, my group and collaborators (Andrew Lang and Antony Williams) have created and curated open collections of solubility and melting point data. We now have over 27,000 melting points, sufficient to create models based on these datasets that can provide usable estimates where experimental values do not exist (a good fraction of the organic chemical space). With no restrictions on these datasets and models, and with various machine-readable interfaces, anyone in the world can obtain the melting point of an organic compound without financial or computational obstacle. I expect that within a few years virtually any organic chemical property will be available, either as a collection of open data measurements or ‘good enough’ estimates. (Bradley, 2013)

The informants engaged in storytelling by putting forward an exciting future in which the pragmatic benefits of increased efficiency and effectiveness of open data could be used to work collaboratively to find cutting-edge results, with the sense that open science was a new movement and revolutionary way of conducting science. Pérez for example, tells presents an engaging narrative that conveys the real-time excitement of scientific discovery that was enabled by IPython:

I was invited to the workshop and went with the lead on IPython works with me at Berkeley… I gave a technical seminar to the same group two days prior to the workshop. My colleague who invited me said, “Look. That seminar was good but it was kind of boring and dry and too technical. Your examples were taken from numerical computing and biologists are not interested in that stuff. Do you have an example that a little bit more interesting to biologists.” I said, “I take your criticism to heart but I am not a biologist. Do you have some data or some problem that, now that you saw what we can do, you think we can use. And if you think so, I’m happy to give it a try. And at a party on a Monday night, he described it to me over beers and I kind of walked through it in my head. I said, “Ok. If it is really what you are telling me, I think we can work with that. Let’s see what happens.” And so, on Wednesday or Tuesday morning, we sat down at 9 in the morning to try to take his scripts, because they had a code to do this serially, to parallelize it, and we put an IPython cluster on Amazon. We all began working on his script in parallelizing it and by midday I was getting nervous that this was beginning too ambitious and complicated. And I told one of the students, “Look. I just need to have something that works. We are not trying a paper here, I just need to have a little demo for this afternoon when my talk is scheduled. Let’s just keep it simple.” And I see him walking up later to a screen, very excited. And I said, ‘What’s going on?” He says, “Oh my god, the results are running and we are actually seeing stuff in the data. Now that this is

109

running in parallel, it is much faster. I’m going to call the editor of ISME (which is a journal from the Nature group in microbiology) to see if they would accept a rapid communication.” And at first, I thought he was honestly joking, because if you start at 9 in the morning, you are not having a conversation about a Nature paper at 2 pm, right? I’m dead serious. Ten minutes later he catches me in the hallway, “I just got off the phone with the editor. Let’s do it.” And so literally, I gave the talk – we got it working -- I gave the talk at 4 pm. The next day by the time we got on the plane to leave the workshop, we had the paper drafted 50%; we finished running the analysis with a little bit more data over the next few days. And a week later we submitted the manuscript. So, the analysis would have taken over a month to run on one computer and in a week, we had done the parallelization, run the code, completed the analysis, created all the documents and written the manuscript and submitted it. In a week.

It was a very unusual anecdote in science because things like this don’t happen. But it was an interesting example of the collaborative power of the tool, right. Because at 9 in the morning, the team that began working was people who knew IPython and didn’t know anything about the biology, and biologists who know python but didn’t know our stack very well. But we could all log in to the web, type together, work together, talk about who is doing what, and have a fully working scientific tool by the end of that very same afternoon. (Pérez, Interview)

With these demonstrations and storytelling, the informants undermined the assumption of the traditional scientific process by highlighting how the open innovation could work and easily support the aspirational goals of open science.

In establishing their own innovation at the micro-level, the informants portrayed themselves as innovators, challenging the morality of traditional methods, aligning with the new generation and relating their experiences. They included reference to their own open data innovations as meeting the moral and ethical necessity for open science:

But what really caught my attention was his [Karl Bailey] mention of UsefulChem and the image of skateboarders he used on the post. What a great representation of Open Source Science, at least the way that many of my friends and I conceive of it. I also get the same vibe from many of the young people that see me after I speak on the topic. I suppose it represents a form of rebellion from the status quo, but not without standards for competence and dedication. Without that rebellion is just cynicism.” (Bradley, Blog April 21, 2007 http://usefulchem.blogspot.ca/2007_04_01_archive.html)

So, I went in there and I’m basically challenging that paradigm which is that you’re classifying these reactions as being successful or failed and I think that that is, it can be a useful concept within a certain framework but I think it is more productive to simply answer the question what happens when you mix A and B. And there is no success or failure. If nothing happens, it is not a failure. As a chemist, I want to know if I mix two chemicals together, if nothing happens, that is useful information. (Bradley, Interview 2014)

We started our nonprofit (the Alexandria Archive Institute) to help preserve and share archaeological data, because these data are the only way we'll ever understand our history and origins. If these data disappear through neglect, our past is erased. It's that simple. (Kansa in Ellson, 2013)

110

If most scientists are motivated by ego, is it possible to do egoless science - and what would that look like? For starters I think that keeping a true Open Notebook (All Content shared immediately) does a lot to keep your ego in check. If you report on what you find, when you find it, you don't have time to succumb to the temptation to cherry pick results and embellish the story of what happened.” (Bradley blog 2009 Apr 7 Is the Human Ego good for Science? http://usefulchem.blogspot.ca/2009_04_01_archive.html)

The informants articulated a vision for change that was enabled by technological advances, arguing that open science is the way of the future and that data and source sharing norms were changing in science. Similar to micro-level themes, they pointed to the internet and information technology advances as enabling this future. When asked about the single most essential piece of software for chemists, Bradley responded that this was the ‘general purpose Internet browser’ (Bradley, 2006):

The trends of Open Science, crowdsourcing, automation and cheminformatics are creating new opportunities for increasing the efficiency of discovery. The integration of these phenomena promises to enable new forms of scientific collaboration. (Bradley, 2008. in Nature Precedings)

In the past few years the same scenario has been unfolding in chemistry research and chemical information. The social web has evolved into a more semantically aware, machine-readable web in which an ecology of services and tools has flowered. Chemical information can now be communicated to both human and automaton at every point on the spectrum of openness. (Bradley, 2013 in )

The rise of the World Wide Web represents one of the most significant transitions in communications since the printing press or even since the origins of writing... Nevertheless, Open Access and Open Data face steep adoption barriers… Rather, these reform movements offer much needed and trenchant critiques of the academy’s many dysfunctions. These dysfunctions, ranging from the expectations of tenure and review committees to the structure of the academic publishing industry, go largely unknown and unremarked by most archaeologists. At a time of cutting fiscal austerity, Open Access and Open Data offer desperately needed ways to expand research opportunities, reduce costs and expand the equity and effectiveness of archaeological communication. (Kansa, 2012, p. 498 in World Archaeology)

Technology also enables openness in research, sharing research as it happens, almost in real time. This philosophy has proven remarkably fault resistant because it requires making public not only the experimental details but also all of the raw data used to draw inferences. Any researcher can step through every detail and make an independent evaluation. Students make mistakes – as do professors – and in the past trusting people might have been a necessary evil. Today, it is a choice. Optimally, trust should have no place in science. (Bradley, 2013 in Chemistry World)

Using semantically rich formats and automation at zero publication cost – is this the way to the technological singularity? (Bradley, 2007 in Nature Precedings)

Again, they put forward how their open data initiative met the open science exciting future: In the Google Age, the retrieval and sorting costs for a large number of documents are negligible. Thus, in the Google Age, the most important criterion for the usability of a document is immediate online availability... Articles and information sources that are not immediately available will only

111

be pursued as a second choice, even if the vehicle is reputable. It is now possible to connect a researcher providing information with a researcher looking for that information very quickly with minimal technological obstacles. Keeping the actual laboratory notebook of a research group in real time on a public wiki and holding discussions on a public blog is the natural extension of the openness concept leveraging today's technologies and global infrastructure. Operating with such transparency and demonstrating that science can be accomplished in this type of an environment is at the core of the UsefulChem project. (Bradley, Blog April 24, 2007)

They continued to theorize open science in general, highlighting the professional benefits of open science for citation and recognition, and specified how their own open data innovation met the justified need for recognition:

Researchers participating in data publishing therefore can see continued use and impact of their data contributions, and in turn, earn rewards coming from enhanced prestige and recognition. In other words, data publication models can align professional and career interests with the research interests of the larger community (see Costello 2009; Griffiths 2009; Piwowar, Day, and Fridsma 2007). (Kansa & Kansa, 2013, p. 92)

Open data publishing promises to improve the efficiency and quality of data-sharing in much the same way that conventional publication improves the dissemination of research findings. (Kansa and Kansa, 2013, p. 90 in Journal of Eastern Mediterranean Archaeology and Heritage Studies)

As scientists become more open, it is likely that their ability to claim sole priority for all aspects of a discovery will be reduced. However, they will retain priority for the observations and calculations that they made first. (Bradley blog 2010 Jul 11 Secrecy in Astronomy and the Open Science Ratchet http://usefulchem.blogspot.ca/2010_07_01_archive.html

There has been a lot of discussion about the fear of getting "scooped" as a reason to be weary of using new scientific publication vehicles... Considering all of these difficulties over the years is really the main motivation behind our migration away from a login based system like SMIRP to our adoption of Open Notebook Science based on a wikis and blogs, which are very efficiently indexed in real time by Google and thus easily discoverable without additional formatting work.” (Bradley blog 2010 Apr 2 Bipolar Electrodeposition of CdS: Scientific Results in Limbo? http://usefulchem.blogspot.ca/2010/04/bipolar-electrodeposition-of-cds.html )

They also noted the importance of open data for patent priority recognition:

Also against company patent, ONS: I have often mentioned during my talks that Open Notebook Science could be used not only in a defensive manner to claim academic priority - but also as an offensive tactic to block patent applications. (Bradley blog 2010 Jun 1 Use of ONS to protect Open Research: the case of the Ugi approach to Praziquantel http://usefulchem.blogspot.ca/2010_06_01_archive.html )

However, unfortunately, after we had done the experiment, Todd found that a German patent had recently been awarded for the process, so he won’t be able to use it after all. I posted about this on my blog, pointing out that we had missed a good opportunity…. Had we done that work 3 or 4 years ago (which we might easily have done) and recorded it in our open notebook, we might have been able to block the German patent…it also reminds us that one of the best reasons for using ONS is to prevent people blocking science with patents. (Bradley, Online journalist and advocate for openness - Poynder, 2010)

112

The informants also provided examples to justify how the open nature of their innovation allowed for greater resources - both human and material -- from the scientific community:

I think the open part is pretty critical. Because for example, in order to do what we’re doing on an Amazon cluster, we were using a project that came out of MIT called StarCluster. We were actually locked in over IRC with a StarCluster developer who was helping us. There was no way he would have added support for IPython parallel when he developed starcluster so that we could do this out of the box – if IPython had not been an open source tool. And there is no way that we could have installed all these libraries into starcluster immediately from the biology library if they had not been open source. So, because what we were doing was actually gluing together work from my own team, work from the biologists who had developed their genetics libraries, which themselves used python and numpy and all that, and running on this tool that was build by this MIT guy who – I became friends with him afterwards – but at first he just did it. A completely independent project, right, to do cluster stuff at MIT. But he added support for IPython of his volition. That intersection could not have happened if these were proprietary tools. They would probably be in silos that would not talk to each other. (Pérez, Interview 2014)

We just recently published data for zooarchaeology in Turkey…. They document the origins and dispersion of domesticated animals from the near east and going into Europe. It is a bunch of different sites in Turkey. More than, I think about 34 different zooarchaeologists created these data from 15 different sites. So there are lots of people participating in this. And 15 sites in Turkey. What is interesting about this, is that it is more or less the entire zooarchaeology community of Turkey all of a sudden switching to sharing data, at once. That I think is an interesting kind of case study because it can show how a norm can change. And now the expectations are probably going to be quite different about how one conducts zooarchaeological research in that area. (Kansa, Interview 2014)

The scientists highlighted that the benefits of their innovations for different stakeholders, including connecting with research collaborators, private sector partners and research agencies that would not have known about their work without openness. For example, Bradley highlighted his successful collaborations with the National Cancer Institute and how collaborators there had found out about his UsefulChem project through the network of open scientists in 2007 (Waldrop, 2008; Bradley et al., 2011).

Our experience with the melting point data was truly a win-win situation for the chemistry community and Alfa Aesar. The provenance information from our collections leads directly to their catalogue – a form of free marketing and advertising. And it is probably also beneficial that their contribution is critical to this story’s telling. (Bradley, 2013)

Over time, a collaboration with other scientists has evolved. Rajarshi Guha at Indiana University and Tsu-Soo Tan from Nanyang Polytechnic in Singapore have invested significant amounts of time in running docking calculations for UsefulChem virtual libraries and reporting their results openly, in near real-time. Dan Zaharevitz, from the National Cancer Institute has contributed by testing compounds for potential anti-tumor activity. Phil Rosenthal, from UCSF, is in the process of testing some compounds for anti-malarial activity. (Bradley, 2008)

113

Although theorizing the need for change, several of the open scientists were explicit that open data would not be successfully adopted within the scientific community via mandating its use; what was required a change in the thinking about the conduct of science.

Open chemistry will not appeal to everyone. But it does not need unanimous openness; the actions of a few are all that is required to effect its progress. And its benefits are available to all – the spectrum’s whole population, those who share and withhold alike. Indeed, the spectrum of participation is both necessary and useful. Open chemistry is unalterably inclusive. (Bradley, 2013)

I’m not, you may have seen some of my interviews, not a proponent of mandating openness. I don’t think that is going to be very successful because there’s always… I think if you don’t want to share your data, you will find ways of not sharing it. (Bradley, interview)

Institutional work: Creation of standards of practice, including training, competitions, and awards

Similar to the creation of standards of practice in teaching and training at the organizational micro-level, the informants conducted institutional work at the level of the scientific community including codes of practice, training, competitions and awards. The creation of standards of practice support the legitimacy of a practice and standards can be institutionalized through training and education programs (Lawrence, 1999).

The informants established standards of practice such as guidelines for in the use of their particular open data innovation. These are available at the organizational level and beyond. Open Context has an extensive description of how data will be reviewed, published, annotated, and reviewed for example.81 Gezelter, along with his students and collaborators, initiated the development of manuals and online tutorials for OpenMD82 and Jmol83. Guidelines for the use of IPython are available in text as well as video form.84

One informant put forward meso-level informal standards for open data. Working with collaborators Cameron Neylon of the UK Science and Technology Facilities, Rufus Pollock of

81 https://opencontext.org/about/publishing 82 http://openmd.org/wp-content/docs/OpenMD-2.4.pdf 83 http://wiki.jmol.org/index.php/Main_Page 84 YouTube video https://www.youtube.com/watch?v=XFw1JVXKJss

114

Cambridge University, and John Wilbanks of Science Commons, in 2010, Murray-Rust co- authored the Panton Principles for Open Data in Science (Murray-Rust, 2011). This effort was supported the legitimation of open data. It was described in the scientific community as a way “to articulate a clear definition of ‘open data’ and help scientists make the right choices in trying to make their data ‘open.’ The principles set forth the general steps that scientists should take to create more effective and sustainable data commons” (Bollier, 2010). A news article published in the UK journal Science stated that, “The Panton Principles provide guidelines on how to liberate your data” (Wald, 2010).

The informants were clear that training, education and support were needed in order to build up the community and network of scientist allies to further legitimize the innovation at the meso- level:

It is being able to support new people. As difficult as it may have been for someone like me, and I will admit that there have been a lot of difficulties in doing this in academia, and many of these people who began 10-12 years ago like me, ended leaving academia… there is a lot of younger new talent coming up through the pipeline who wants to contribute, who we would like to support. And we would like to support them in a way that is better than the kinds of hoops we had to jump through in the past. So we would like to educate them, train them, provide them with financial assistance, at least temporarily. We may not have the funds to support someone for 5 years, but starting with something like a 6-month fellowship, that’s already something that none of us had when we started. And we really want to make that better because growing this, training these people, not only gives us immediate work, more importantly, we think it creates people who work in this way, think in this way, will themselves be leaders in the future of these kinds of ideas when they go into industry, when they go into research, when they go into faculty positions. And so, it will spread the reach of these ideas. Not just the tools, but these methods of work. We really are trying to build communities. (Pérez, Interview 2014)

The informants set up their own training and education platforms for conveying the standards of practice in order to educate their disciplinary colleagues with the skills and knowledge necessary to support their initiative. Murray-Rust initiated open data bootcamps (Murray-Rust, 2013 UKSGLIVE presentation). Bradley set up the Chemical Rediscovery Survey and worked with high school students on open green chemistries (Bradley, Interview 2014). Pérez and a colleague led workshops to teach others how to use IPython for their own research as well as to further develop the open source tool (Pérez, Interview 2014).

Researchers have even begun to publish papers directly from IPython, says Pérez. The University of California, Berkeley, offers courses in IPython, and Harvard University and the Massachusetts Institute of Technology in Cambridge, and Columbia University in New York, among others, have adopted it. (Mascarelli, 2014 in Nature)

115

In 2005, John Hunter and I were invited to teach a very small workshop at UC Berkeley because there was team of neuroscientists who were beginning to build open source tools for neuroscience with python and they contacted John Hunter and he contacted me and asked me to come in, I think, December... John and I also taught a workshop at Los Alamos where we had been invited by Erik Handberg who now is the project lead for NetworkX, the graph theory analysis. (Pérez, Interview 2014)

The informants also set up incentive and reward mechanisms—competitions and awards—both to recognize and celebrate open data achievements as well as to provide financial resources to recipients. In 2008, Bradley set up the crowd-sourcing Open Notebook Science Challenge (for worldwide participants) as well as the Submeta Open Notebook Science Awards (for US and UK university students) to fund open notebook science initiatives in order to encourage students to contribute to the online notebook (Bradley, et al., 2009; Drahl, 2009; Poynder, 2010). He negotiated with the Royal Society of Chemistry to receive funding in order to award cash prizes to students that participated in the Challenge (Bradley, Interview 2014). Murray-Rust awarded Blue Obelisk Awards85 to recognize achievements in promoting open data (Interview, 2014).

Institutional work: Mobilizing financial, human, and material resources

Institutional entrepreneurs mobilize allies and material resources to ensure that stakeholders agree to support a new practice and to sustain an initiative (Hardy & Maguire, 2008; Battilana, et al., 2009). As discussed in Section 5.2, the informants found that resources from traditional research grants within their academic organization were limited, as was support from their departmental colleagues. They sought and found greater opportunities to mobilize human, material and financial resources at the meso-level of the scientific community. At this level, they engaged in bargaining and negotiating in order to mobilize resources for their open data initiatives. Bargaining and negotiating are inevitable and these exchange mechanisms rely on the perception that there will be tangible or intangible benefits forthcoming (Colomy, 1988). The informants aimed to ensure that their data innovation was became “formally, legally, and financially established and sustainable” (Pérez, Interview 2014).

85 https://sourceforge.net/p/blueobelisk/bowiki/Blue_Obelisk_Awards/ Recipients included Jean-Claude Bradley.

116

Financial resources, both large and small were important to both incentivize and legitimize the informants’ open science initiatives. At the meso-level, the informants sought resources from non-traditional organizations, scientific societies, and the private sector to fund their own innovation, as well as to fund open data competitions and collaborations across the scientific community.

The informants also partnered with private corporations, and promoted collaborations and funding, especially well-known corporations such as Microsoft (Pérez, Interview 2014):

We are thrilled to announce that in August 2013, Microsoft made a donation of $100,000 to sponsor IPython’s continued development… We are extremely grateful for this contribution, which we will use to continue strengthening multiple aspects of IPython. (Microsoft release posted on IPython website, http://ipython.org/microsoft-donation-2013.html)

Bradley applied for a grant with Submeta, a small, not-for-profit organization to fund an Open Notebook Science challenge across the scientific community. He then leveraged these funds to obtain further funding from the Royal Society of Chemistry (Bradley, Interview 2014) and the Nature Publishing Group (Bradley, Blog, November 2008).

Through his institutional work at the micro- and meso-levels, Kansa was successful in bringing Open Context to recognition at the meso-level. Beginning in 2011, the US National Science Foundation (NSF) required all submitted proposals to include a data management plan.86 Although not endorsing any specific data repository, the NSF includes mention of Open Context as an example of an archaeological data repository. Open Context is similarly referenced by the National Endowment for the Humanities (NEH) (Kansa, 2012, p. 499). Kansa has put in place a funding model for the Open Context data service for the archaeological community so that costs associated with archiving data can be included as part of a researchers agency budget submission. In addition to providing ongoing financial resources for Open Context, this meso- level institutional success served to legitimize Open Context as well as to diffuse its use through the archeological community.

Only one informant stated that he did not seek resources at the meso-level. Gezelter used existing research grants develop JMol and OpenMD and he stated that were self-sustaining as open

86 https://www.nsf.gov/sbe/SBE_DataMgmtPlanPolicy.pdf

117 source. However, he also stated that “we talk about creating a service company that would run molecular dynamics simulations for other people” (Interview, 2014) but given he was coming up for tenure and also had a young family, there had been no time to devote to such an endeavour.

The informants also conducted institutional work to mobilize human resources through their work to forge new relations, and alliances and aligning with very legitimate actors as described above. The informants were clear that the meso-level, understandably, allowed for a bigger pool from which to identify a “community of like-minded people” (Pérez, Interview 2014) from the “coalition of the willing” (Murray-Rust, Interview 2014).

Open chemistry will not appeal to everyone. But it does not need unanimous openness; the actions of a few are all that is required to effect its progress. (Bradley, 2013)

Yeah there certainly is a lot of resistance… But you know people who are resistant to this kind of procedure are not going to be doing it. So it's kind of irrelevant, we are really interested in interacting with people who understand why we are doing open source science and want to participate in it. (Bradley podcast transcript 2006)

So, the Society for American Archaeology has roughly, I don’t know, about six or seven thousand members I believe? The American School of Oriental Research that focuses on near eastern archaeology, has upwards, roughly maybe a thousand members. A minority of these people, say 10%, are heavily engaged in data sharing, then that is actually quite busy. It is a matter of, we don’t need to convince everybody, we just need to convince a sufficient number of people that this is an important issue. (Kansa, Interview 2014)

The informants were explicit that negotiating with this community was critical in allowing them to seek out new supporters to build relationships – both with academia and industry – and to also diffuse the innovation through the expansion of its user base:

My first public talk on IPython was actually at the scipy in 2003… So that was a little bit less than 2 years after starting the project…. And when I attended the conference it was really interesting because I found a community of like-minded people basically. …. They might be in astronomy; they might be in medical imaging; they might be in electromagnetics, in computer science, in geophysics -- but they were all people who happened to care enough about open tools to spend some of their own time and their own money out of their own pockets to go to these things….I really got involved with trying to build an ecosystem and not just IPython. There were a few of us; it was a small group of like-minded people…. So we all sort of banded together and ‘Let’s build this thing.’ And we would do whatever needed to be done – if it was teaching a workshop, writing a library, helping someone on the list – this smallish coalition of people would kind of jump in and do everything. (Pérez, Interview 2014)

Some of my involvement around other kinds of digital humanities or open science projects, hard to pin it down as science in the archaeology world, straddle the humanities and natural sciences and social sciences, but a lot of what I do is working with other teams that are doing different kinds of things with the open web. I see it as pretty strategic in the sense that it helps build ties with other

118

types of projects and that people get invested a little bit in what we do with Open Context because we invite in/try to incorporate someone else’s API to bring in related data. We work with other systems that publish data and we reference them as linked data. What is useful about that is that other people know about us. It is also basically reciprocity that because we, essentially endorse what they are doing by trying to incorporate what they are doing into our project, then that’s a positive outcome for them, but it also means that they value what we are doing as well. (Kansa, Interview 2014)

For Pérez financial resources were critical to support acquiring human resources and to assist in diffusion of IPython through the community:

These things have given us real capabilities to hire people, to travel to conferences and workshops, etc. The things money can buy and which when you the right people and the right things to do with money is very useful. (Interview 2014)

The informants forged alliances with collaborators to support the building of their innovation. After developing the initial version of JMol, Gezelter enlisted collaborators as lead programmers, characterizing the building of Jmol as working on a ‘Doctor Who’ model (Gezelter, blog November 7, 2012). This development of a virtual and interested development community, is particularly relevant for the open source scientists and builds on the broader open source development model (Schweik & Grove, 2002).

Jmol has worked by handing off lead development to different people over the years. We call it the Doctor Who model; the Dr. regenerates model. I was the first doctor and every once and a while I come back and make some modifications to the code, but not very often any more. The second doctor was a guy who worked in a pharmaceutical company. And then the third doctor was a computer science graduate student, completely outside the chemistry world, who just happened to be really expert in graphics. That’s what we needed at the time. And it has come back – the current lead develop is a guy named Bob Hanson who is another chemistry professor. It has changed its focus a lot based on the interests of who those lead developers are but it has come back, in and out of the chemistry world…. The fact that all the code is out there means that if someone even drops the ball it doesn’t really matter. Somebody else can pick it up; it’s not a big deal. (Gezelter, Interview 2014)

The informants also sought non-financial resources in the form of materials and other tools. They negotiated for material resources in the form of data for their innovations. Bradley convinced Alfa Aesar, a manufacturer and supplier of research chemicals, to provide UsefulChem with an entire collection of melting points to the public domain: “We are talking here about over 10,000 melting points that were closed and all of a sudden were open which was really amazing” (Bradley, Interview 2014). With this initial success, Bradley actively and doggedly approached and negotiated with other companies and organizations to donate chemical info to the public domain:

119

We needed some melting points but that actually came from my chemical information retrieval class where students had to find 5 different sources for a given property… Alpha Aesar is a chemical vendor. I was actually contacted by their VP for Marketing. They had read the blog post and they wanted to see if they could collaborate in some manner. I took a chance and asked them if they would be willing to donate their entire collection of melting points to the public domain. And they agreed to it. So we are talking here about over 10,000 melting points that were closed and all of a sudden were open which was really amazing. (Bradley, Interview 2014)

After that, then, I was more active, I contacted more companies. No other company agreed to donate their melting points. But the EPA [Environmental Protection Agency] did. They donated all of the melting points they had to the public domain. And again, here the distinction is I could have approached them and said, I’m interested in doing a model melting point, could I have a special license to use your melting points in my study?” … For me, the key was not donating to me but donating to the public domain. (Bradley, Interview 2014)

He negotiated with firms, incentivizing the benefits of providing materials for his research:

So, I contacted all these other companies, to tell them, “Look, there is also a way…” Like, every time I give a talk about this, I am essentially acting as … you know, I am marketing for Alpha Aesar. I mention them over and over again, because they helped us and I do whatever I can do. The other thing is that when people click on that value, there is link to that catalogue. So they, having nothing to lose from doing it. But it is not something that all companies think through. I feel I lucked out that Alpha Aesar did see it as a benefit to them. (Bradley, Interview)

The informants also made use of their professional networks to obtain material resources and by promoting the relationships, they provided incentive to establish further resources:

I think an early one would be that we got a pretty high profile data set early on with Petra, the great temple excavations, Martha Joukowsky’s at Brown. It was sufficiently large and diverse with a wide variety of materials, including excavation content, context, object catalogues, zooarchaeology, diaries describing the process of excavation, and lots of pictures. That was a powerful kind of testament in addition to a few of other testaments just to show that Open Context had a lot of flexibility in order to accommodate a wide set of materials that are common on archaeological projects. (Kansa, Interview 2014)

So, I contacted all these other companies, to tell them, “Look, there is also a way…” Like, every time I give a talk about this, I am essentially acting as … you know, I am marketing for Alfa Aesar. I mention them over and over again, because they helped us and I do whatever I can do. (Bradley Interview 2014)

Institutional work: Advocacy

Advocacy is defined as the “mobilization of political and regulatory support through direct and deliberate techniques of social suasion” (Lawrence & Suddaby, 2006, p. 221). Institutional entrepreneurs can draw on regulatory systems to create new rules and/or provide support for change (Garud, et al., 2002). Within the scientific community, the informants actively engaged in

120 advocacy with organizations and research granting agencies to garner resources for open data initiatives. The informants advocated along two themes: the lack of funding mechanisms by research agencies to support open data and source activities, and the need to change the norms of how open data and source achievements are recognized. Informants also actively engaged in advocacy at the societal level as described in Section 5.4 below.

The informants worked within the norm of science organizations and structures in order to advocate for open science from a place of accepted legitimacy. For example, Bradley become co- editor of Chemistry Central Journal and gained experience in seeing the “whole process of how much data authors submit and what is acceptable by reviewers, editors, and that’s been very eye opening. There are a lot of problems with the system because you are not required to share your raw data” (Bradley, Interview 2014). He also made the strategic choice to accept an invitation to join the Advisory Board of Chemical and Engineering News noting that the appointment allowed him to attend meetings in Washington D.C. “where important trends in chemistry and the focus of the magazine are discussed” (Bradley, Blog, June 18, 2009). From 2007-2008, he provided a regular column on ScientificBlogging.com called Chemistry Wide Open.87 In 2011, Bradley co- moderated a session on Open Notebook Science at the Science Online conference that attracted

“a large proportion of people advocating Open Science” (Bradley, Blog January 17, 2011).88 Such roles allowed him to have positions authority from which he could further advocate for open science: As Co-Editor-in-Chief at Chemistry Central Journal, Bradley also has concerns about scientific data and reproducibility issues. He said that the standards for what and how much information is needed by journals varies widely by field. “The reality is,” he admonished, “that the editors have been pretty lenient about how much information should be required before it is accepted.” Essentially, a lack of standards and established benchmarks within and across disciplines is a problem. In the next section, I will argue that a mandated, centralized model can help to fix that problem. (Bohle, 2014 SciLogs)

For Murray-Rust, advocacy at the meso- and macro-levels became critical institutional work: Today, therefore, the main challenge is to persuade researchers and publishers to share their data, which is why Murray-Rust is now a passionate advocate of Open Data - a cause to which he spends an increasing amount of time, involved in activities such as lobbying publishers, educating researchers, and alerting the world to the issue via his blog. (Computer Weekly, 2008)

87 http://www.science20.com/chemistry_wide_open

88 http://usefulchem.blogspot.ca/2011/01/science-online-2011-thoughts.html#links

121

As the informants were conducting institutional work at the meso-level to legitimize their open data innovations in the mid- to late 2000s, government policy makers in both the UK and the US were communicating interest in ensuring that the results of nationally funded research would be accessible to the general public. Several arguments and resulting initiatives directly connected national public accountability to the need for access to research results (Lasthiotakis, et. al., 2015; de Silva & Vance, 2017). The UK e-Science programme, established in 2001, was the first coordinated initiative to involve all the research councils in supporting the development of cyberinfrastructure in science and engineering, and encourage the development of middleware. In 2003, the NIH released its statement on sharing research data in which investigators submitting a grant proposal seeking $500,000 or more grants were expected to include a data sharing plan (NIH, 2003a, b). In 2004, the science ministers from both governments formally pledged support by adopting the OECD Declaration on Access to Research Data from Public Funding.89 In the US, the House Appropriations Committee adopted recommendations for the 2005 federal budget that development of a policy by the NIH requiring free online access to articles based on NIH-funded research. In addition, at around the same time, both countries established organizations to support the cyberinfrastructure needed for open data sharing. The US National Science Foundation (NSF) established the Office of Cyberinfrastructure (OCI) to coordinate e-infrastructure programmes and to fund information technology infrastructure research and training, including access to data (NSF, 2006; OCI, n.d.).

The long-term commitment of funding agencies for open data was critical for the informants. They spoke of the ‘unfunded mandate’ implied by agencies mandating data management plans but not financially supporting the mechanisms to make data open accessible, nor for the norms of scientific publication to acknowledge its importance:

The harder issue about this is that most of the open data, at least in the US, is an unfunded mandate associated with research. An unfunded mandate is a requirement but there are no resources for the implementation, no additional financial resources for the implementation. My big worry is that we get a lot of data being thrown into repositories which is more or less useless because no effort has been put into their curation because there is nothing to fund that effort. That’s where I get a little concerned about this. We can’t just clap our hands and say ‘yeah, we won’ with open data or open access, there is a huge concern about the sustainability of all of this and the meaningfulness of it, in the sense …. We definitely showed with the animal bones in Turkey

89 http://acts.oecd.org/Instruments/ShowInstrumentView.aspx?InstrumentID=157

122

that it takes a tremendous amount of work to actually bring different researcher data sets to a state that can be compared in a meaningful way. That is not free. That requires some support that doesn’t really exist institutionally. … There is a wonderful new world, people getting excited about data, open data, wanting to work with it. It seems like a very high profile sort of thing. But, the sort of long term sustained support of all of that is something that there has not been commitment towards yet. (Kansa, Interview 2014)

So, it was basically this hard attitude: that is not scholarly work. If you have to do it, do it. But then come back and write more papers. That’s what matters because that is what matters. Even though someone like David Donoho was making these arguments, purely on scientific grounds, that this is where scholarship and the value of science was, a large fraction of the establishment that controls funding, that controls tenure decisions, hiring decisions, etc., goes back to saying, “Well, show me how many papers in Nature, Science.” … That mindset is changing only very, very, very slowly. The NSF recently changed its guidelines for CVs to indicate that now. Instead of being called relevant publications it is called relevant research outcomes. They explicitly say that can be publications or it can be software, etc. That was a hugely valuable signal by the NSF that says that your 2-page summary statement of who you are in the sciences, is now not a summary of your publications anymore; it is a summary of your contributions. That shift is important but it is very recent. (Pérez, Interview 2014)

The existing blogs and avenues in which the informants appeared became vehicles for advocacy. In a Nature article, Murray-Rust noted that, “I also started a blog and have found that campaigning for Open Data has become one of the main themes” (Murray-Rust, 2008). The informants directly lobbied funding agencies for supports for open data:

Research Councils UK and the Higher Education Funding Council for England have agreed to work together to advance the transition to open-access publishing of research. Peter Murray-Rust, an open-access advocate and a chemistry researcher at the University of Cambridge, urged funders to enforce their mandates more rigorously. “The reality is that unless scientists are forced to comply [by a financial penalty or some other sanction], many of them will not,” he said. (Times Higher Education – Corbyn, 2010)

Through his institutional work and advocacy to change the ‘unfunded mandate’ of open data, Kansa was successful in institutionalizing Open Context at the meso-level given that, as noted above, the NSF includes mention of Open Context as an example of an archaeological data repository. Kansa was also explicit that, in order to gain material resources in the form of archaeological data for Open Context from highly legitimate actors such as the German Deutsches Archäologisches Institut, what was required was “Just lots of advocacy,” (Kansa, Interview 2014).

The informants participated in events or created events at the meso-level that included funding agencies and legitimate actors, broadening the scope of their advocacy beyond their own innovation (Bradley, Blog February 8, 2008; Gezelter Blog 23 November 2009). For example,

123

speaking as part of panel discussions that included representatives from the National Science Foundation, Bradley, argued that funding agencies should see the benefits of open science as a “higher return on investment - in terms of knowledge gained and shared with the scientific community - as well as the wider population ultimately footing the bill…. Funding agencies can help transparency by making it clear that the whole truth is more valuable than a subset of the truth presented in a way that might be conveniently misleading” (Bradley, Blog February 8, 2008).

Scientific community meso-level institutional work outcome – Legitimation and diffusion

At the meso-level, analysis revealed that the informants conducted several types of institutional work. Similar to the institutional work at the micro-level, the informants created standards of practice and mobilized resources but expanded their scope. They also continued the critical work of theorization, also with expanded set of themes that would resonate with the more diverse scientific community. They further developed their innovations to align with the needs and practices of the scientific community, associating their innovations with taken-for-granted practices. Distinct from the work at the organization micro-level, the open data entrepreneurs engaged in institutional work in the forging of new relations, alliances, and associations as well as advocacy for resources with granting agencies and organizations.

Similar to their institutional work at the organization level, at the level of the scientific community, the informants created standards of practice, including creation of open data principles, training, competitions, financial supports and awards. These efforts expanded the network of users for the open data innovations, as well as supporting their legitimacy through increased use, and acknowledgement within the meso-level. The informants also actively mobilized and advocated for financial, material and human resources through the scientific community, including research granting councils, industries, and professional associations.

As discussed in Section 5.2, an analysis of the interviews and archival documents shows that the work of the informants to advance institutional change was iterative between the micro- and meso-level. Soon after designing and implement their open data innovation for their own teaching and/or research, the informants engaged in institutional work within the scientific community, actively seeking ‘like-minded’ open data practitioners (both in their disciplines and

124

beyond) in order to diffuse of their innovation as broadly as possible. The informants continued to publish their scientific work in academic journals and present at conferences and workshops while at the same time incorporating their open data innovations or presenting on them separately. The work at the micro-level was critical as their own research and teaching in their organizations provided them ‘right to voice’ (Maguire & Hardy, 2008) at the meso-level. Scholarly publication at the meso-level also allowed the informants to raise global scientific interest in their academic work and to build a position of legitimacy for both their scholarly and open data endeavours.

The data analysis revealed that the majority of the critical institutional work occurred at this level as the open scientists worked to legitimate and diffuse their open data innovation. Through their institutional work at the micro-level and meso-levels, the informants were successful at the meso-level in having their innovations legitimized and adopted for use within the scientific community. Kansa’s Open Context is referenced for research data management by both the National Science Foundation (NSF) and the National Endowment for the Humanities (NEH) (Kansa, 2012, p. 499). In a 2013, a survey of its users was conducted by IPython and revealed that it is primarily used in the US, with 455 users were reported from 48 countries.90 The book and online project, Mining the Social Web (Russell, 2013), profiles almost 130 examples with the IPython Notebook.

At the meso-level, the informants themselves were also acknowledged legitimate leaders in the open data movement. In 2011, Murray-Rust and collaborator Harry Rzepa were recognized by the American ’s Division of Chemical Information for their work in advancing new ways to collaborate and exchange chemical data.91 In 2012, as the creator of IPython, Pérez was awarded the annual Award for the Advancement of Free Software by the Free Software Foundation,92 a non-profit with “a worldwide mission to promote computer user freedom and to defend the rights of all free software users” who noted that: “In addition to technical excellence,

90 https://ipython.org/usersurvey2013.html 91 http://www.ch.cam.ac.uk/news/peter-murray-rust-wins-herman-skolnik-award 92 https://www.fsf.org/about/

125

Fernando's skill as a leader has attracted a large and growing group of contributors to IPython”.93 The Archaeological Institute of America (AIS) awarded Open Context the 2016 recipient of its Award for Outstanding Digital Archaeology.94

5.4 Societal Macro-level: Policy Change

The informants conducted institutional work at the societal macro-level and participated in discourses on openness that extended beyond the scientific community, connecting to themes such as openness and transparency in society. This level was labelled the macro-level aligning with the institutional macro-level described by Lawrence & Suddaby (2006) as including societal institutions concerned with the role of, for example, family, gender and religion. At this level, the informants conducted institutional work in: 1) theorization, 2) connecting to macro-level open science discourses, and, 3) advocacy. The main outcome of this level of work by the informants was for policy change for open data practices in general. Through their institutional work at the meso-level and then at the macro-level, three of the informants gained exposure and became sought after as acknowledged leaders in the open data movement at the societal level. The institutional work at the societal macro-level is summarized in Table 8 and discussed below.

Table 8: Policy change: Macro-level societal institutional work Institutional Work Theorization: Specifications and justifications for the need for change Open data and science important for society and scientists; distinction from closed corporate science Validity and reproducibility important Digital enlightenment Fighting for societal rights; revolutionary

Connecting with macro-level discourses Results of research should be easily and inexpensively accessed by the public High costs in place by large academic publishers are wrong

Advocacy Public statements of support Help shape government policy Aligning with very legitimate actors

93 https://www.fsf.org/news/2012-free-software-award-winners-announced 94 http://ux.opencontext.org/2016/01/20/open-context-wins-award-for-outstanding-work-in-digital-archaeology/

126

Institutional work: Theorization

At the societal level, the justifications and specifications presented by the informants in theorizing the need for open data built on the narratives provided at the scientific community level. These included theorizing the moral and ethical reasons for practicing open data more broadly, as well as for pragmatic reasons of efficiency in the progress of science. They also specified open data as taking advantage of the new information and communication technologies, and embodying the future orientation of open science. In addition, at this level, the informants increased focus on specifying open data as enabling knowledge as a public good and as a communal benefit to humanity, as well as highlighting the publicly-funded nature of research that should be easily and freely available to the public.

Building on their work with the media at the meso-level, the informants provided interviews on open data to the media, both traditional and new forms of social media, aimed at the broader society beyond the scientific community. They connected with mainstream general news and science media outlets such as Scientific American (Bradley in Waldrop 2008), Drug Discovery Today, Information Today (Bradley – in Poynder, 2010), National Geographic (Bradley in Coturnix, 2008), National Public Radio, ABC Radio National Australia (Gezelter in Funnell 2010), and various local media and online fora (see Appendix 4: List of Analyzed Documents) including their own blogs. In addition to presentation at public venues, they placed videos of their presentations and interviews on YouTube, that reached a broad public audience.

They presented at meetings and conferences beyond those of the scientific community. For example, Murray-Rust sought out opportunities to present outside of academia such as in 2014 to Wikimania with hundreds of citizens in the audience95 (Murray-Rust, Interview 2014); Bradley presented on “Accelerating Discovery by Sharing: a case for Open Notebook Science” at the National Breast Cancer Coalition Annual Advocacy Conference in 2011 (Bradley Blog).

The informants continued to justify the pragmatic benefits of open data as per those of the individual, micro- and meso-level:

95 Wikimania 2014 – Peter Murray-Rust presentation https://commons.wikimedia.org/wiki/Category:Peter_Murray- Rust_at_Wikimania_2014

127

One of the ideas of doing a more open form science is that there are a lot of eyes looking at the same experiment, and every once in a while, somebody from a totally different field will come along and say, 'Oh, you should just try that', and it turns out that 'just trying that', which is something that you hadn't thought of, is a really great idea. If you close yourself off and don't publish and don't release information about what you're doing, you're missing out on that piece of science. So yes, I mean I think there are a number of advantages for doing science in a more open fashion, not only to the community, but also to the individual researcher. (Gezelter in Future Tense ABC Radio National, Radio Australia - Funnell, 2010)

The productivity gain achieved by not having to unnecessarily repeat experiments or struggle with technical issues, and by the higher guarantee of reproducibility, will allow more science to be done. (The Scientist Magazine – Murray-Rust and Brooks, 2011)

"It's sort of going away from a culture of trust to one of proof," Bradley says. "Everybody makes mistakes. And if you don't expose your raw data, nobody will find your mistakes." (in Science, Wald, 2010)

While we're really excited about Web technologies, lots about what we do with Open Context is not really about technology. We're also trying to promote more openness and collaboration in the research community, and also greater positive recognition for researchers who share data back to the larger community. This incentive issue is a big challenge, since many archaeologists hesitate to share their data since they worry they won't get any credit for all their hard work. That's why we're making sharing data a form of peer-review publication. Since it helps to both make higher-quality more usable data and it helps give researchers the right kinds of kudos. (Kansa in The Alamedan - Ellson, 2013)

Similar to institutional work observed by Hardy and Maguire, 2008 in the study of HIV/AIDS advocacy entrepreneurs, the informants constructed rationales for institutional change as appropriate and discredited existing practices in ways that would resonate with the public (Hardy & Maguire, 2008).

If you're doing research and not sharing it, and not sending out your experiments, or not publishing, or not providing enough details for other people to reproduce it, in some sense you're really not doing science any more, you're doing research and development for a corporate entity. Science is more about increasing the global total of knowledge. (Gezelter, Future Tense ABC Radio National, Radio Australia - Funnell, 2010)

At the societal macro-level, the scientists also used powerful and evocative statements to connect through rhetoric. They characterized open data as “liberation software…. to liberate knowledge” and part of the “digital enlightenment” (Murray-Rust, 2014, FWF talk; Interview, 2014); “communities of open source representing the ideals of science” (Pérez, Interview 2014). Kansa spoke to making archaeological data accessible as the “only way we’ll ever understand our history and our origins” (in Ellson, 2013); and Bradley cited open notebook science as making research “available to the rest of the world” (Bradley, 2006).

128

Institutional work: Connecting with macro-level discourses

At the societal level, three informants connected with “broad discourses and associated sets of institutions that extend beyond the boundaries of any institutional field and are widely understood and broadly accepted in a society” (Lawrence & Phillips 2004, p. 691). Murray-Rust and Kansa, in particular, tapped into macro-level discourses related to the high cost of scholarly publishing.

As discussed in Section 5.3, at the time that the informants were working to legitimize their open data innovations in the mid- to late 2000s, policy makers in both the UK and the US had shown increasing interest in ensuring that the results of nationally-funded research would be accessible to the general public. Several arguments and resulting initiatives directly connected national public accountability to the need for access to research results (Lasthiotakis, et al., 2015). One particular high-profile public discourse emerged that related to the costs and restrictions of scholarly publishing. The charges by large academic publishers such as Elsevier for access to individual articles and subscription costs were criticized with harsh characterizations such as “the most ruthless capitalists in the western world” (Manbiot, 2011) in mainstream media such as the Guardian. Universities in the US had singled our Elsevier and other large academic journals for their high pricing policies.96 The UK House of Commons Science and Technology Committee 2004 Report noted that, “[w]e are convinced that the amount of public money invested in scientific research and its outputs is sufficient to merit Government involvement in the publishing process” (House of Commons, 2004). In the US, in 2003 the Public Access to Science Act was followed in 2006 by the introduction of the bipartisan Federal Research Public Access Act (FRPAA) (Garson, 2008).

Kansa, Murray-Rust, and to a lesser extent Bradley, tapped into the discourse related to the costs of scientific publishing and intellectual property barriers that did not allow the public to easily access research results. They specified the issues lack of access to scientific outputs that resulted by these practices: Kansa blogged on influential sites such as the London School of Economics using strong rhetoric to characterize open proponents as “the rebels blasting against the exhaust

96 Faculty Senate minutes February 19 meeting Stanford Report, 25 February 2004 http://news.stanford.edu/news/2004/february25/minutes-225.html

129 vents of Elsevier’s Death Star” (Kansa, 2014).97 Murray-Rust spoke out against Elsevier specifically through blogs, articles, presentations (ex. Presentation, 2013, from Columbia University), and several mainstream media interviews:

An academic is asking researchers and librarians to send him more examples of cases where open access article fees have been paid to the publisher Elsevier but the article in question remains behind a paywall. The call has been made by Peter Murray-Rust, reader in molecular informatics at the University of Cambridge, after Elsevier admitted it had charged some people to reuse articles published with open licences… But, on his blog, Dr Murray-Rust, a critic of Elsevier’s approach to open access and data mining, dismisses the response as inadequate: “This problem is of similar seriousness to faulty electrical goods where responsible companies will advertise in the national press [to alert customers to the problems].” (Chronicle of Higher Education - Jump, 2014)

A huge part of this is to find a sustainable way to finance our work in publishing open data. The hard part of data, is that if you start restricting access or create intellectual property barriers by charging for data, you end up breaking and ruining lots of the value of data. Scientific data works best as a "public good," free for anyone to enjoy and reuse. And like most public goods, that requires some form of public financing to create. But let's face it. The public (taxpayers) fund research, including archaeology. Currently though the vast majority of that research results in private intellectual property, in the form of articles that cost $35 or more to even look at. These articles don't even have the raw data on which conclusions are based! So, if we don't support open data in the science, we're throwing away scarce public money, and worse, public financing of research ends up to ever more limited hands of the big commercial scientific publishers. So, what’s next for us is going to be continued advocacy to help make sure the public actually benefits from their support of science! (Kansa -The Alamedan - Ellson, 2013)

By connecting to this larger discourse, these informants were able to gain access to mainstream media and, concurrently, advocate for open data and open science.

Institutional work – Advocacy

As described in the meso-level Section 5.3, within the scientific community, the informants engaged in advocacy for their own open data innovations as well as open science in general. They advocated for funding mechanisms by research agencies to support open data and source activities, and recognition for open data. At the societal level, the informants continued to theorize and engage in social suasion for resources and recognition for open data, as well as to

97 http://blogs.lse.ac.uk/impactofsocialsciences/2014/01/27/its-the-neoliberalism-stupid-kansa/ It’s the Neoliberalism, Stupid: Why instrumentalist arguments for Open Access, Open Data, and Open Science are not enough.

130 advocate for changes towards open data as a way to fight the high cost of scholarly publishing, as well as for open science in general. As part of their institutional work at the macro-level, the informants connected with highly legitimate policy, political, and corporate actors to advocate for policy change and to enhance the legitimacy of open data initiatives.

The informants theorized that open data was critical for the future conduct of science, but were also clear that institutional change required advocacy:

Open Data in science is now recognized as a critically important area which needs much careful and coordinated work if it is to develop successfully. Much of this requires advocacy; it is likely that when scientists are made aware of the value of labeling their work, the movement will grow rapidly. (Nature – Murray-Rust, 2008)

Today, therefore, the main challenge is to persuade researchers and publishers to share their data, which is why Murray-Rust is now a passionate advocate of Open Data - a cause to which he spends an increasing amount of time, involved in activities such as lobbying publishers, educating researchers, and alerting the world to the issue via his blog. (Computer Weekly, 2008)

In relation to scholarly publishing, Murray-Rust successfully advocated for open access for data to the UK science minister and UK university leadership (Jump, 2014). In mainstream and highly legitimate media such as the Times Higher Education, he publicly challenged the established scholarly publishers such as Elsevier on the ability to content mine scientific articles without seeking permission by publishers. From his blog, he advocated for financial penalties for scientists that did not comply with open access publishing of their research (Corbyn, 2010). At the macro-level, the tone of his advocacy communications included themes of combat – portraying those that advocated for open science as fighting for societal rights:

Academics highlight other concerns over the control journals exert. Murray-Rust puts them in the dock over the copyright restrictions they impose, describing them as a "major impediment" to progress. As an example, he points to the way so-called text or data mining - the use of technological tools to extract and tabulate data automatically from online papers - is becoming "increasingly expressly forbidden" by most major subscription-based publishers (although Nature has recently changed its policies to allow some). "You are actually barred from using modern techniques to enhance your science ... it has taken us back ten years in the use of scientific information," Murray-Rust says. (Corbyn 2010 – Times Higher Education)

Access to public knowledge is a fundamental human right…. “Lack of information to scholarly information means people die… I’ve spent half my life fighting. We are fighting in Europe because the publishers are trying to tie down text mining. We are in the middle of a digital revolution. And we are seeing a split between the cultures. And the young people are not accepting what we have offered them. (Murray-Rust, Columbia 2013 talk)

131

Top universities, working together, could force the reform of copyright laws, Murray-Rust believes, but, given their inaction, he thinks that a better answer might be "civil disobedience on a mass scale". (Corbyn 2010 – Times Higher Education)

Last year I applied for a Shuttleworth Foundation Fellowship… And, in March 2014 I was awarded one (This was my fifth job in open source). We are going to extract 100 million facts from the literature whether or not the publishers like it, because we've had the law changed. (Opensource.org – Murray-Rust, 2014)

The informants associated with highly legitimate macro-level actors at the policy and government levels. In 2005, Murray-Rust was asked to join the Advisory Board of the international, nonprofit Open Knowledge Foundation (OKF),98 a “worldwide non-profit network of people passionate about openness, using advocacy, technology and training to unlock information and enable people to work with it to create and share knowledge”.99 In the late 2000s, the open movement had gained prominence in the UK, including mainstream media as the Guardian newspaper campaigned to “Free our Data”.100 Kitchin (2014) stated that the shift to openness had been facilitated by influential lobby groups such as the Open Knowledge Foundation.

Bradley, Kansa and Gezelter took advantage of invitations to the White House as part of its open science initiatives, and highlighted alignments with their own open data innovations:

I was invited to visit the White House over the summer, because they have this open science committee. One pretty powerful, well, perhaps powerful is not the word, but passionate group of people that are demanding science to be more open are patients. Where they can’t freely access journals that are relevant to their disease. They have to pay $40-60 just to read an article that may turn out to be completely irrelevant. And they figure, those are my tax dollars that paid for that study, why shouldn’t… So, that falls more under the category of open access journals. But, I mean, honestly, as kind of a poster child for openness, you push it even further… Imagine, that these researchers could in fact have access to the lab notebooks. … As long as we can find a way to run the assays openly. …They could see the results of the assays. And they wouldn’t have to be scientists. (Bradley, Interview 2014)

98 https://blogs.ch.cam.ac.uk/pmr/2011/07/04/the-open-knowledge-foundation-builds-its-organizational-dna- okcon2011-jiscopenbib/ 99 https://okfn.org/about/ Formed in 2004. Now known as Open Knowledge International. One of Murray-Rust’s collaborators, Dr. Rufus Pollack, was one of the co-founders of the OKF and sits on the UK government Public Sector Transparency Boar https://www.gov.uk/government/groups/public-sector-transparency-board#members-of- the-public-sector-transparency-board . 100 http://www.theguardian.com/technology/free-our-data

132

Several of the informants developed important contacts amongst policy circles. Because of the profile of IPython that Pérez had developed at the meso-level, his open initiative helped him to legitimate IPython with the key actors in the public and corporate sector. He described a personal contact with the director of the Alfred P. Sloan Foundation's Digital Information Technology program where he was able to approach him with “By the way, I have this project in which I think you might be interested in,” that contributed to the development and supported the funding of the Berkeley Institute for Data Science (BIDS) founded in 2013 (Pérez, Interview 2014).

Societal macro-level institutional work outcome – Policy change

At the societal level, four informants conducted institutional work by theorizing, connecting to macro-level discourses, and advocacy. Their efforts were directed towards the policy and regulatory changes in order to support open data and open science in general.

Four informants (Bradley, Kansa, Pérez and Murray-Rust) actively engaged in institutional work at this level. Gezelter had noted that he was focused on successfully completing the tenure process at his university – for which he noted design of JMol and OpenMD would not necessarily be considered as a scholarly achievement—as well as on his family and young children. These responsibilities may not have had the time at this point in his career to devote to macro-level institutional work.

Through their institutional work at the meso- and macro-levels, Murray-Rust, Kansa, Pérez, and Bradley gained exposure and became sought after as acknowledged leaders in the open science movement at the societal macro-level. Bradley was described as pioneer of the open notebook science ‘movement’ (Coturnix, 2008) and Open Notebook Science was described as a ‘radically transparent’ approach in Scientific American (Waldrop, 2008). In 2014, Kansa was acknowledged by the White House as an open science leader – a Champion of Change for Open Science – “dedicated to better ethics and practices in sharing and preserving knowledge of the past”.101 Both Bradley and Gezelter were also acknowledged as open data advocates invited to

101 http://www.whitehouse.gov/champions/open-science/eric-kansa,-ph.d

133 present and be present the Office of Science and Technology Policy recognition event for the Champions of Change for Open Science.

This chapter described the findings of the institutional work of informants at four institutional levels. A summary of the research findings is presented in the next chapter, and findings are related to the literature and theoretical framework. Conclusions relating to the research questions are summarized and discussed, as are implications and possible future research directions.

Chapter 6 Conclusions and Implications

This chapter summarizes the findings of the research, noting responses to the research sub- questions, and frames the implications of the study. The first section of the chapter presents the main research findings in terms of the institutional work of the open data entrepreneurs, the multilevel nature of their institutional work and its iterative nature. The key findings are discussed in the broader context of institutional work as related to the literature. The second section analyzes the presents the main conclusions and implications for policy contexts.

6.1 Summary of the Findings

Data analysis of the interviews and archival data revealed that all five informants purposefully engaged in institutional work at the level of themselves as individuals, within an organization micro- level, and the scientific community meso-level. Several informants conducted significant institutional work at the broader societal macro-level. The findings suggest that open data entrepreneurs conduct a variety of institutional work at multiple institutional levels, with some types occurring across levels, with the greatest effort expended at the level of the scientific community. The focus of the outcome of the institutional work is distinct at each level: opportunity recognition at the individual level; development and establishment of an innovation at the organization level; legitimation and diffusion at the scientific community level; and policy change at the societal level. In addition, the findings suggest that the institutional entrepreneurs are adept at conducting institutional work in an iterative manner between the organization, scientific community, and societal levels in order to advance institutional change for open data practices within the institution of science.

135

6.1.1 The Institutional Work of the Open Data Entrepreneurs

Institutional entrepreneurs are considered the actors that change existing practices and/or introduce new practices, beliefs or values and then ensure that these become adopted more widely by other actors in the field (Battilana, et al., 2009; Hardy & Maguire, 2008). To understand the institutional work of the open science entrepreneurs, the first research question posed was: What is the institutional work conducted by open data entrepreneurs in order to institutionalize an open data innovatino? Through the data analysis of the interviews and archival documents, the findings indicate that Bradley, Kansa, Gezelter, Pérez and Murray-Rust purposefully conducted institutional work as they sought to change existing practices and introduce innovations and to have these adopted as a way to change the institution of science towards the use of open data practices. The open data entrepreneurs engaged in ten distinct types of institutional work: 1) theorizing; 2) counterfactual thinking; 3) disassociating with traditional modes; 4) creation of standards of practice; 5) mobilization of resources; 6) naming and creation of new symbols; 7) forging new relations, alliances, coalitions and associations, membership; 8) association with taken-for-granted practices; 9) advocacy; and, 10) connecting with macro-level discourses.

Theorization

Greenwood, et al. (2002) argued that the theorization is the critical work in which institutional entrepreneurs render their ideas comprehensible to others in a compelling way that resonates with others. It was clearly a key form of institutional work for the open data entrepreneurs by which they justified and specified the need for change towards open data. Although they engaged in this form of institutional work at each institutional level, they increasingly broadened the associated justifications for open data in a way that would resonate with members of their organizations, scientific communities, and society.

At the individual level, the open science entrepreneurs were explicit about their own reflections and thought processes as they began to consider their open data process or tool. Although at different points in their academic careers, the motivation to create an open data initiative was rooted in their own research experiences and their frustrations with traditional science, in

136

particular, the lack of open sharing of data. As change agents, institutional entrepreneurs perceive institutional complexities, ambiguities, and contradictions (Seo & Creed, 2002; Thornton & Ocasio, 2008), as well as justifying how new practices are indispensable, appropriate, and valid (Hardy & Maguire, 2008; Rao, 1998). The open data entrepreneurs highlighted the contrast between idealistic Mertonian principles of science and the reality of the closed nature of science in terms of data sharing. Specification focuses on the need for change and can be broken down into components that include the identification of the problem and an account of why it is important (punctuation) and the diagnosis of the problem in terms of how it has come about (elaboration) (Battilana, et al., 2009; Hardy & Maguire, 2008). In analyzing the data for the reflections of the open science entrepreneurs at the individual level, four main justifications were identified. They specified the need for change in that they identified aspects of the problems associated with traditional science and why these are important (punctuation) including: 1) concerns that the traditional publication of research data was not serving science and society morally in terms of transparency and verifiability; 2) pragmatic issues with traditional methods that impeded collaboration, data evaluation and progress in science; 3) the lack of a formalized structure for sharing data; and, 4) a diagnosis of how the closed nature of science had come about. An important aspect of specification is elaboration of the problem in terms of how it has come about (Greenwood, et al., 2002). The open data entrepreneurs diagnosed the problem, speaking to the history of the traditional science in their fields and how the closed nature of science had evolved, including the competitive nature of scientists for citation and recognition, as well as the lack of formalized and easy-to-use practices and infrastructure in place for open data in their particular area of study.

The open data entrepreneurs specified their motivation and justified why it was important to use open science methods including: 1) scholarly altruism for the betterment of science including drawing on Mertonian principles for the openness and transparency of science; and, 2) a way to deal with their professional frustration with the lack of a formalized and accessible way for data and source sharing for their own particular scholarly needs. The first rationale aligns with research on data sharing by scientists that has identified scholarly altruism as a predictor of data sharing (Kim & Stanton, 2016). The open data entrepreneurs justified open data as an integral component of the conduct of science, critical to and valid for both supporting its moral and pragmatic value.

137

Within their organizations, scientific communities, and societal level, they continued to justify the need for open data, making use of these similar themes as expressed at the individual level. Demonstrated conformity of an innovation with existing values is an important feature of theorization (Battilana, et al., 2009; Greenwood, et al., 2002; Hardy & Maguire, 2008; Perkman & Spicer, 2007, 2008; Tracey, et al., 2011), and the open data entrepreneurs continued to specify moral and pragmatic issues of traditional science, and to justify open data for research transparency and efficiency. In addition, at the organization and scientific community levels, they justified the appropriateness of their own proposed solution.

However, the open data entrepreneurs expanded upon these specifications in ways that were meant to resonate with the different audiences at these levels. In emerging fields, where shared norms do not yet exist, Maguire, et al. (2004) proposed that institutional entrepreneurs frame a variety of reasons in order to satisfy a group of diverse stakeholders and be more influential than putting forward a single justification. In the case of emerging open data, at the organizational level, the open data entrepreneurs that incorporated open data in their teaching classrooms and/or research labs theorized open data in a way that would resonate with students, i.e. the benefits of open data as a learning tool and career development tool, as well as presenting open data as a natural progression in how science is conducted. At the scientific community level, they articulated a vision for change and the benefits of open data and the wave of the future, presenting themselves as rebels challenging the status quo. At the level of the scientific community, they theorized the benefits of open data for citation and recognition, and the mobilization of human, material, and financial resources. They also theorized how open science aligned more closely in practice with basic principles of the scientific revolution, and presented their particular innovation as a modern way to meet the ideal of openness, distancing the conduct of scholarship from that of corporate research.

Counterfactual Thinking

Counterfactual thinking is defined as a set of cognitive processes that allows for the envisioning of unexpected or unusual approaches (Roese & Olson, 2014). The open science entrepreneurs engaged in counterfactual thinking to identify alternative open data solutions to the traditional science issues they had identified as problematic, as well as to consider how emerging information and communication technologies could be harnessed to enable open data. They

138

sought a match between emerging technologies and how these could be introduced to support open data in the research practices of their discipline. In particular, they actively explored open data logics that had emerged in other areas such as open source software to see if open data could solve the issues they identified in traditional data dissemination for their research. They engaged in counterfactual thinking to consider how open data could meet their own personal values and provide value to sharing of research results to benefit both their discipline as well as society.

Disassociation from Traditional Modes

Institutional entrepreneurs “break away from scripted patterns of behaviour” (Dorado, 2005, p. 388). In seeking to establish their open science innovation, two informants incorporated their open data innovations into their existing research but disassociated themselves from traditional open data practices and developed new ones. Two approaches emerged: 1) Cutting ties with collaborators in traditional science (Bradley), and 2) the establishment of a parallel organization form (Kansa, Pérez). Bradley changed his field of scholarship from nanochemistry to chemistry that allowed greater openness including solubility testing. Having been discouraged from pursuing open data as part of their academic careers, Kansa and Pérez established organizational models outside of their academic departments to develop and provide a base for their innovations. Kansa established the Alexandria Archive Institute to house Open Context, and Pérez developed an open source organizational model to support the development of IPython.

Mobilization of Resources

Financial, human, and material resources are required to build a new initiative (Battilana, et al., 2009) and, not surprisingly, seeking out resources was critical institutional work for the open science entrepreneurs. The informants noted the time and energy they invested to garner resources at the organizational and scientific community levels.

Each of the open data entrepreneurs found different ways to assemble resources at the organization and scientific community levels. Bradley and Gezelter used existing research grants to support their open initiatives. Government funding from the timely e-Science initiatives launched in the UK, as well as support from his academic unit (the Unilever Centre in the Department of Chemistry), provided Murray-Rust with financial and human resources to build his open data initiatives. For Kansa and Pérez, the ability to raise resources was a key enabling

139

factor that led them to establish separate organizations to provide a structure for the development and establishment of their innovations.

The open data entrepreneurs’ efforts to mobilize resources was most successful at the scientific community level. Bradley, Kansa, Pérez, and Murray-Rust devoted significant time to seek financial resources from non-traditional organizations, scientific societies, and the private sector to fund the further development of their own innovation, and support their legitimation and diffusion. As a result of institutional work at the micro- and meso-levels, Kansa’s Open Context is referenced as an example of a data repository that can be used as part of a scientists’ NSF data management submission, supporting its ongoing financial sustainability. The contribution of material resources to research by Alpha Aesar was critical for Bradley’s UsefulChem and he devoted significant time to contact other firms for similar contributions. For Open Context, the incorporation of an archaeological data from a well-regarded archaeologist was a critical point for Kansa and he was active in conducting institutional work to seek out new contributors.

Naming and Creation of New Symbols

The open data entrepreneurs created names and new symbols for their innovations to assist in sharing ideas and enable a sense of identity as part of their development. Names, symbols, slogans and diagrams are important in shaping ideas regarding an innovation as well as providing a sense of identity (Zilber, 2007; Thompson, et al., 2015). Three incorporated the term ‘open’ in either names (Open Context, Open Notebook Science, OpenMD) to reflect the theorization narrative of openness. Taglines such as the AAI’s “Opening the past, inspiring the future” and OpenMD’s “Molecular Dynamics in the Open” reflected Kansa’s and Gezelter’s theorization narratives, aligning with research on sustainability entrepreneurs that revealed that they created new symbols and slogans that encapsulated their theorization narratives (Thompson, et al., 2015).

Scientific communities are situated both within local academic organizations and globally within a scientific community. Thus, although the creation of names, logos, and slogans for their innovations occurred at the organizational level, the open data entrepreneurs considered how these would be received at the level of the scientific community. These considerations informed their choices. The names and symbols were conveyed almost concurrently at the organizational and scientific community levels.

140

Creation of New Standards of Practice

Formalization of new practices can be supported through institutional entrepreneurs’ efforts to put in place practices encoded through training and education programs, guidelines and standards of practice (Lawrence, 1999).

Education and training are critical in allowing the educating of others in the skills and knowledge that are required to support a change (Nigar, 2013). At the organizational level, three open data entrepreneurs (Bradley, Gezelter, and Murray-Rust) incorporated their innovations and practices into their teaching classrooms and/or research labs, and both Bradley and Gezelter mandated their use. At the level of the scientific community, Kansa and Pérez developed training and educational opportunities for Open Context and IPython, respectively.

The open data entrepreneurs also created standards of practice, including creation of open data principles, training, competitions and awards. They developed guidelines for use of their own innovations. Murray-Rust co-authored the Panton Principles for Open Data in Science for the meso-level community. Pérez and colleagues mounted IPython workshops and training sessions, in addition to providing financial assistance to individuals for its further development. Murray- Rust initiated open data bootcamps. Bradley and Murray-Rust set up competitions and/or awards both to recognize and celebrate and bring attention to open data achievements.

Association with Taken-for-Granted Practices

In order for their initiatives to diffuse at the scientific community meso-level, the open data entrepreneurs considered the practices, needs and culture of the scientific community, and incorporated aspects of taken-for-granted practices and acknowledged professional elements into their open innovations and practices. They considered how their innovation would fit into and meet the needs of the scientific community in order to both enhance its legitimacy as well as increase the potential for diffusion: Bradley designed UsefulChem as an open notebook to mimic traditional lab notebooks and included third-party timestamps in UsefulChem to meet the recognition needs of the scientific community. Gezelter promoted the fact that parts of OpenMD code were incorporated into other open source tools. Kansa developed a model of ‘data sharing as publication’ for Open Context and also included governance elements into the structure of the AAI that housed Open Context. The open data entrepreneurs were attune to software and practices

141

being developed within the scientific community, and incorporated and adjusted their innovation to meet the needs. The creation of standards of practice, include development of workshops and training, as well as the establishment of competitions and/or awards to recognize and celebrate and bring attention to open data achievements also paralleled acknowledged practices within the scientific community level.

Forging of New Relations and Alliances

The forging of new relations and alliances has been identified as key institutional work to enhance legitimacy and/or advance a change via collective action (Battilana, et al., 2009; Garud, et al 2007; Hardy & Maguire, 2008; Ritvala & Nyquist, 2009; Stuart, et al., 1999; Thompson, et al., 2015). Institutional entrepreneurs often forge new relations to individual legitimate actors and/or may choose to act collectively by sharing responsibilities, networks, and resources in order to increase their resource-power and/or legitimacy (Stuart, et al., 1999) so that they may reshape the institution in a way that they could not have done alone.

For the open data entrepreneurs, the level of the scientific community provided them with a large and rich pool in which they could to seek out like-minded individuals and establish new relations. These relationships included colleagues at other institutions, scientific associations, as well as very legitimate actors. Their own research and teaching at the academic organization micro-level provided them with the ‘right to voice’ (Maguire, et al., 2002, p. 86; Tracey, et al., 2011) at the meso-level wherein they cited and made use of the success of their innovation in their own research and/or teaching as proof of its appropriateness.

The open data entrepreneurs were active in cultivating networks through their engagement in a broad spectrum of academic conferences and workshops, as well as open science meetings. They described these relationships as critical for building the legitimacy of their innovation as well as its success in term so of uptake or diffusion by others. They acknowledged the media as valuable legitimizing agents and also cultivated relationships with them in order to explain their work, responding to their requests for interviews. Two informants formed alliances through their activities. Blue Obelisk was set up Murray-Rust as an informal grass-roots organization that was open to chemists with similar open data goals, met on an annual base and awarded awards to individuals that promoted open data. By defining membership for those that wanted to promote openness, the establishment of Blue Obelisk provided legitimacy to open data, through the

142

“explicit expansion and visibility of the space within which the expertise related to a new practice is considered” (Lawrence, 1999). Along with like-minded individuals drawn from the scientific community, Pérez established NumFocus as a formal organization to promote open code development and reproducible scientific results as well as to be able to more easily garner resources for IPython and other open source initiatives.

Institutional entrepreneurs can enhance the legitimacy of a new practice by mobilizing support with key constituents such as highly embedded agents, and respected professionals, policy makers, government officials, experts, and associations (David, et al., 2012; Thompson, et al., 2015; Tracey, et al., 2011) who operate at the centre of a field (Battilanam et al., 2009). Analysis revealed that the open science entrepreneurs did seek to build relationships with highly legitimate actors in order to capitalize on the profile for the open data innovation they had built at the micro-level. The actors with whom they aligned themselves included acknowledged scientific leaders (for example, Bradley with Prof. Todd, Kansa with Dr. Joukowky), organizations (Bradley with the Royal Society of Chemistry support of the Open Notebook Challenge, Kansa with the Centre for Hellenic Studies at Harvard, Carleton University, the California Digital Library, and the Deutsches Archäologisches Institut) and industries (Bradley for data contributed by Alpha Aesar, Pérez with funding from Microsoft). The informants were able to develop these important contacts and then leverage them to legitimate themselves as actors who were competent in their own disciplinary research as well as leaders in open data. The affiliations the open data entrepreneurs developed and cultivated with highly legitimate actors enhanced the legitimacy of their innovations and signaled their personal reputations as being legitimate through the implicit sanctioning by elites and professional organizations.

Advocacy

The open data entrepreneurs actively engaged in advocacy at levels of the scientific community and three (Murray-Rust, Kansa and Bradley) also did so at the societal level.

Within the scientific community level, they engaged in advocacy in order to create new rules and support change for open data in general as well as their own innovations. They advocated along two themes: the lack of funding mechanisms by research agencies to support open data and source activities, and the need to change the norms of how open data and source achievements are recognized. The long-term commitment of funding agencies for open data was critical for the

143

open data entrepreneurs and they engaged in opportunities in their writings and networking to advocate for the need for sustainable financial support for open data. They attended or organized events that included representatives from research funding agencies. Bradley worked within the norm of science organizations and structures in order to advocate for open data from a place of accepted legitimacy, for example, as editors of disciplinary journals. Kansa, for example, was successful in that the NSF included Open Context as a legitimate tool for archaeological research data management.

Connecting with Macro-Level Discourses

Three open data entrepreneurs connected with macro-level discourses that are widely understood and discussed in a society (Lawrence & Phillips 2004) as part of their institutional work of legitimizing open data and enacting policy change. This type of institutional work aligns with that observed by Tracey, et al. (2011) in their study of social entrepreneurs. Specifically, Murray- Rust, Kansa and Bradley tapped into macro-level discourses in the UK and US related to the high cost of scholarly publishing and intellectual property barriers that did not for easy access to research results by the public. This macro-level discourse was part of the narrative of government to support public accountability for the cost of the funding of research. For example, the UK House of Commons Science and Technology’s 2004 report Scientific Publications: Free for all? concluded that a Government strategy was urgently needed to improve access to research results (House of Commons, 2004). By connecting to this discourse, these open data entrepreneurs were able to amass legitimacy for open data, connecting with mainstream media to concurrently advocate for open data and to advocate for policy change.

6.1.2 Multilevel Institutional Work

The second sub-question explored the institutional levels: What are the institutional levels at which the open science entrepreneurs conduct institutional work? The third sub-question explored the multilevel nature of institutional work: What is the institutional work of the open data entrepreneurs at different institutional levels? Coding revealed that all five open data entrepreneurs conducted the institutional work at three levels of analysis: individual, organizational micro-level, and scientific community meso-level. Only four actively pursued

144

institutional work at the societal macro-level. At the level of the organization, they were successful in establishing open data in areas in their teaching classrooms and/or research labs, but not within their academic departments. Two established organizations in parallel with their academic work. The entrepreneurs were successful in legitimizing their open data innovations within their research communities, as well as in supporting the institutionalization of open data as a taken-for-granted, natural and appropriate arrangement at the level of the scientific community (Greenwood, et al., 2002).

Although similar types of institutional work such as theorization occurred at more than one institutional level, the institutional work at each level was associated with a particular outcome: opportunity recognition at the individual level; design and establishment of the innovation of the innovation at the organization micro-level; legitimation and diffusion at the scientific community meso-level; and policy change at the societal macro-level. The work at the micro-, meso- and macro-levels was discursive in nature. The institutional work at each level, and outcomes are summarized in Table 9.

Table 9: Institutional work and outcomes at each institutional level

Institutional Work Level and Outcome

Individual Level: Theorization Opportunity recognition Counterfactual thinking

Disassociation with traditional modes Naming and creation of new symbols Organization Micro-level: Creation of standards of practice Development and Establishment Theorization Mobilization of resources

Association with taken-for-granted practices Forging new relations, alliances, associations, and membership Theorization Scientific Community Meso-level: Creation of standards of practice Legitimation and Diffusion Mobilization of resources Advocacy

Theorization Societal Macro-level: Connecting with macro-level discourse Policy Change Advocacy

145

Individual Level – Opportunity Recognition

At the level of the individual, the open data entrepreneurs engaged in the institutional work of theorization to both specify and justify the need for change for themselves and their research. They also engaged in counterfactual thinking to identify alternative open data solutions to the issues they had identified as problematic, as well as to consider how emerging information and communication technologies could be harnessed to enable open data. Based on the analysis, the institutional work of theorization and counterfactual thinking at this level was associated with an outcome of opportunity recognition for an open data innovation.

At the individual level, the open data entrepreneurs revealed their own reflections as they began to formulate their theorization for the need for an open data innovation in their research discipline, and to engage in counterfactual thinking. They may be considered as actors that introduce new ideas and the possibility of change in Suddaby & Hinings (2002), however, it was observed that the institutional work of theorization for their own selves was an integral part of institutional change that occurred before they began to engage in institutional work at the organizational or scientific community levels. This observation aligns with the definition of a pre-institutionalization stage that occurs at an individual level in the institutional entrepreneurship model put forward by Greenwood, et al. (2002). The observation also aligns with the individual level institutional work findings of Tracey, et al. (2011) in their study of two social entrepreneurs in the UK. These authors observed that institutional work at the individual level (in their study, labeled as the micro-level) occurred as opportunities for change were identified by the institutional entrepreneurs. Tracey, et al. found that institutional work at the level of the individual, included problem framing and counterfactual thinking contributed to opportunity recognition in the expression of a novel understanding of a problem and a refocusing of attention on alternative aspects of the issue (2011).

Organization Micro-level – Innovation Design and Establishment

The open data entrepreneurs were successful at independently innovating to design and establish a solution for their own research needs at the institutional micro-level. At this level of the organization (research labs, teaching classes, academic departments/institutes), the work of the

146

open data entrepreneurs was focused on members of their academic organizations— undergraduate and graduate students and/or colleagues—or in the creation of their own organizations. They conducted institutional work to disassociate with traditional modes and/or establish new organizations; create new standards of practice, including training and education; mobilize resources; as well as naming and creating new symbols for their innovations. The institutional work of theorization continued into the micro-level from the individual level, with the open da entrepreneurs expanding their theorization justifications and specifications, aligning them in ways that would resonate with students and colleagues. The organizational level both enabled and constrained the work of the open scientists.

Enabled by the high level professional autonomy afforded to them within their academic organizations (Casati & Genet, 2014), the open data entrepreneurs that were faculty members were able to use their classrooms and research labs as crucibles in which to test, design and establish the form of their innovations. Studies of institutional change in which professionals play a key role have shown that the entrepreneurs can make use of their expertise and legitimacy to challenge existing structures and formulate new ones (see Suddaby & Viale, 2011). As university faculty members, Bradley, Gezelter and Murray-Rust were able to make use of their legitimacy and authority in their own classrooms and/or labs to employ and mandate open data as part of the curricula and/or research lab processes, as well as to ‘bootstrap’ and include funding for their open data innovations as part of their research grants. For Pérez as a graduate student and Kansa as lecturer, the university was also constraining rather than enabling, leading these open data entrepreneurs not to pursue an academic career and to create their own organization (a non-profit institute and an open source model respectively) outside of a university in order to mobilize resources and establish their open data innovations.

The institutional work of disassociation from traditional modes, creation of standards of practice, theorization, mobilization of resources, naming and creation of symbols at the micro-level of the organization was critical in the development and establishment of their innovation. This finding aligns with a previous study of multilevel institutional work in which Tracey, et al. (2011) found that the work of building the organizational template by the social entrepreneurs was specific to the organizational level. The research contributions and success with open data within their own organizations (university units as well as those established outside of universities), provided them with ‘right to voice’ in their institutional work at the meso-level.

147

However, although the faculty members were successful in enacting change at this level within their own labs and courses, they uniformly expressed frustration in attempting to engage in change within their academic departments. They described either indifference to their initiative or active discounting of open data as a valid scholarly pursuit or methodology. This also had an impact on their ability to mobilize resources within their university. Thus, the open data entrepreneurs also engaged iteratively in work within the scientific community to create standards of practice, theorize, and mobilize of resources at this important meso-level. The reasons for this were two-fold: Firstly, because scientific communities and the legitimacy of scientific endeavours are situated at the level of the scientific community globally, and secondly, given the indifference and/or discounting of the worth of open data as a valid scholarly pursuit within their universities, the open scientists were pragmatic in seeking a allies from the larger pool of researchers at the level of the scientific community both within and outside of their own disciplines. Given the relation of the institution of science and the university organization, the critical work to legitimate and diffusion of open data at scientific community meso-level had to be undertaken concurrently. Only once legitimated at that institutional level, would organizations such as universities conform to the expectations of the higher social institution of science (Hatch, 2006) to, in turn, reflect their own legitimacy (DiMaggio & Powell, 1983; Suchman, 1995).

Scientific Community Meso-level – Legitimation and Diffusion

Soon after designing and implement their open data initiative for their own teaching and/or research, or through the development of their own organizational form, the open data entrepreneurs engaged in institutional work within the level of the scientific community. The data analysis revealed that the majority of the critical institutional work occurred at this level as the open scientists worked to legitimate and diffuse their innovation, as well as the concept of open data in general. The open data entrepreneurs were adept at conducting institutional work in an iterative manner between institutional levels of the organization and scientific community, fluidly mediating between their organizations and scientific communities in order to advance institutional change for open data practices in the institution of science.

In addition to formal publications, the open data entrepreneurs were very active in seeking ‘like- minded’ open science practitioners within and beyond their research disciplines through informal

148

networks or the ‘invisible college’ (Crane, 1972) in order to legitimate their innovation and diffuse it as broadly as possible. They continued to develop their innovations for the needs of the scientific community, associating their innovations with taken-for-granted practices. They also continued the critical work of theorization, but with expanded set of justifications and specifications that would resonate with the more diverse scientific community including highlighting the benefits of open data for citation and recognition, the ability to garner greater resources, as well as justifying open data as a way of the future conduct of science. Similar to their institutional work at the micro-level, at the meso-level the informants continued to standards of practice and mobilized resources but expanded their scope to include competition and awards for the former, and seeking resources from non-traditional research funding sources. Distinct from the work at the organization micro-level, they engaged in institutional work in: forging of new relations, alliances, and associations; increased mobilization of resources for both human as well as material resources; as well as engaging in advocacy for the recognition of the legitimacy of open data and required resources.

As noted in the previous section, the open data entrepreneurs first developed and established their innovation within their organization micro-level, either within a university or a separate organization that they established and led. However, in order for their innovation to be adopted more broadly, the open data entrepreneurs considered the practices, needs and culture of the larger scientific community, and strategically incorporated aspects of taken-for-granted practices and acknowledged professional elements. Bradley mimicked aspects of traditional scientific notebooks in the further development of his UsefulChem as well as collaborating with others to develop new open data software; Gezelter incorporated and cited the incorporation of code with OpenMD; Kansa designed Open Context with an eye to the needs of the professional landscape.

Within their persuasive theorization, institutional entrepreneurs often justify their innovation on grounds of increased effectiveness and efficiency (Déjean, et al., 2004; Nigam, 2013; Ritvala & Granqvist, 2009; Thompson, et al., 2015; Tracey, et al., 2011; Zilber, 2007). As the open data entrepreneurs had established their particular innovation and the micro-level, and as their use diffused through the scientific community, they justified their innovations as appropriate by incorporating narratives of technical demonstrations of efficiency and effectiveness. They also created standards of practice, including development open data principles, training, competitions and awards. Their institutional work at this level aligns with the definitions of scientific

149

entrepreneurs by Casati & Genet (2014) as scientists who are also involved in acquiring resources to shape scientific directions and gaining legitimacy by organizing workshops, conferences, standards and building on their scientific reputation to build networks.

Institutional work related to the legitimation of an institutional change involves the ongoing its increasing objectification—the gaining of social consensus concerning an innovation’s pragmatic value (Suchman, 1995). Greenwood, et al. (2002) referred to this work as diffusion – the increasing objectification of a change, and the imparting of its pragmatic legitimacy and value that leads to its accepted legitimacy. An important characteristic of this meso-level is that it served as a communication network in which researchers were connected by strong ties of informal collaboration that facilitate diffusion of information, especially of new developments (Crane, 1971). Whilst theorizing assists in mobilizing securing support for and acceptance of institutional change (Fligstein, 1997; Wijen & Ansari, 2007), during diffusion, institutional entrepreneurs attempt to further embed their institutional change or innovation. Theorization continues, but with an emphasis on mobilizing resources and forming new alliances in order to gain objectification.

In order to further diffuse their innovation within the scientific community level, the open data entrepreneurs advocated along two themes: the lack of funding mechanisms by research agencies to support open data activities, and, the need to change the norms of how open data and source achievements are recognized. The long-term commitment of funding agencies for open data was critical for the open science entrepreneurs and they engaged in opportunities in their writings and networking to advocate for the need for sustainable financial support. They attended or organized events that included representatives from research funding agencies. Bradley worked within the norm of science organizations and structures in order to advocate for open science from a place of accepted legitimacy, for example, as editors of disciplinary journals. Kansa, for example, was successful in that the NSF included Open Context as a legitimate tool for archaeological research data management.

The data analysis revealed that the majority of the critical institutional work occurred at this meso-level as the open scientists worked to legitimize and diffuse their open data innovation within their own disciplinary and other scientific communities, including researchers and practitioners, funding agencies, and related industry sponsors. This supports the observation by

150

Lawrence & Suddaby (2006) that the legitimation and diffusion of an innovation requires substantial institutional work by entrepreneurs in order to persuade others to adopt the innovation, modify it as needed in order to gain legitimacy, and create new practical connections for the change. The intensity of work at this level also aligns with the observation of Casati & Genet (2014) that critical alliance building occurs across the local and global levels in science- based fields as scientific and intellectual movements are global in nature. Although the stage model of institutional entrepreneurship put forward by Greenwood et al. (2002) was not presented as a multilevel model, they proposed that theorization and diffusion occurred at a meso- or macro-level. Within the multilevel institutional work study of Tracey, et al. (2011), the authors did not describe an equivalent field-level dimension, and the main legitimation and diffusion work of the entrepreneurs was focused on the societal level.

The open data entrepreneurs were adept at conducting institutional work in an iterative manner between institutional levels of the organization and scientific community, fluidly mediating between their organizations and scientific communities in order to advance institutional change for open data practices in the institution of science. Their teaching and/or research endeavours at the organization micro-level, provided them with legitimacy at the scientific community meso- level, and concurrently their institutional work at the meso-level provided legitimacy in their continued academic work and development of their innovation at the organization level. They published their scientific work in academic journals and presented at conferences and workshops while at the same time incorporating their open data innovations or presenting on them separately to open science networks. As noted above, the creation of names and logos for their innovations occurred at the organization micro-level but the open science entrepreneurs considered how the names and symbols would resonate at the scientific community level. Another example of the multilevel iterative nature of their institutional work was visible in their efforts to change practices within their own academic units. As they struggled to gain recognition for open data as a legitimate scholarly pursuit, they turned to seek legitimacy within the larger pool of the scientific community. And, as they began to be successful at this meso-level, they themselves presented their open data work within their own academic units and also arranged for external like-minded colleagues within their disciplines to present.

Although some institutional change models suggest a temporal order (Greenwood, et al., 2002; Perkman & Spicer, 2007), the iterative and fluid nature of institutional work of the open data

151

entrepreneurs in this study aligns with observations in multi-level institutional work studies such as that of social entrepreneurs (Tracey, et al., 2011). The findings also align with and support previous studies of the particular multilevel nature of institution of science, including the global formal and informal network of scientific communities, organizations such as universities, and associated individual researchers (Austin & Jones, 2016; Crane, 1971; Ziman, 2000). These findings also support previous findings that highlight the striking ability of scientific entrepreneurs’ to strategically and adeptly mediate between different institutional levels conducting institutional work such as theorizing to link local and global scientific communities and building on their scientific legitimacy to advance institutional change (Casati & Genet, 2014; Ritvala & Granqvist, 2009).

Societal Meso-level – Policy Change

Four open data entrepreneurs actively engaged in institutional work at this level. Murray-Rust, Kansa, Bradley and Pérez conducted institutional work to theorize, connect with macro-level discourses, and advocate for open data practices in general. The outcome of their institutional work at the society macro-level focused on policy change as well as continued legitimation for open data and open science in general. Through their institutional work at the meso- and macro- levels, three of the informants gained exposure and became sought after as acknowledged leaders in the open data movement at the societal level.

They constructed rationales for institutional change as appropriate and discredited existing practices in ways that would resonate with the public, expanding upon the theorization themes presented at the individual, organization and scientific community levels. At this level, they used social suasion to advocate for open data as enabling knowledge for the good and a communal benefit to humanity. They advocated that publicly-funded research should result in data that is easily and freely available to the public. This observation of the expanded theorization themes put forward by the open science entrepreneurs aligns with similar insights put forward by Hardy & Maguire (2008) in their study of HIV/AIDS advocacy entrepreneurs.

Murray-Rust, Bradley, Kansa and Pérez made strategic use of public fora. They engaged with mass media, both traditional mainstream media and new forms of social media channels. They

152

placed videos of their presentations and interviews on YouTube and presented at large public events and meetings, maximizing their potential audience, making use of their legitimacy as open data scientists at the meso-level. They also connected with highly legitimate actors in policy, political, and corporate spheres to further advocate for open data. For example, Murray- Rust joined the advisory board of an open data lobby group.

In addition, Murray-Rust and Kansa also tapped into macro-level discourses related the high cost of scholarly publishing and public access to publicly-funded research. They included powerful and evocative statements as part of their theorizations, advocating for open data. By connecting to this larger discourse, they were able to gain access to mainstream media and, leverage this to advocate for regulatory changes to support open data and open science.

The observations indicate that not all institutional entrepreneurs devote time to institutional work at the macro-level. Four of the five open science entrepreneurs (Bradley, Kansa, Pérez and Murray-Rust) actively engaged in institutional work at this level, with only two connecting to macro-level public discourses. Gezelter was focused on successfully completing the tenure process at his university – for which he noted design of JMol and OpenMD would not necessarily be considered as a scholarly achievement—as well as on his family and young children. This finding aligns with the work of Casati & Genet (2014) who noted that although scientific entrepreneurs were involved to a greater or lesser extent in change practices, the emphasis was different depending on the stage of their careers with new principal investigators being focused on scientific production, while those that had gained tenure became diversified with some undertaking a greater role in the broader scientific community or influencing interactions between science and society.

6.2 Main Conclusions and Implications

Research funding agencies, policy bodies, professional societies, and publishers are increasingly encouraging and/or requiring open data practices, specifically related to the storing and sharing of primary data collected by scientists (Davis & Vickery, 2007; de Silva & Vance, 2017; Lasthiotakis, et al., 2015). The institutional change literature suggests that institutional are key agents that change existing practices and/or introduce new practices and ensure that these

153

become adopted more widely by other actors in the field (Hardy & Maguire, 2008). Research has focused in the study of how institutional entrepreneurs engage in strategic interventions to promote institutional change (Lawrence, 1999; Rao et al., 2000; Lawrence & Suddaby, 2006; Hardy & Maguire, 2008; Battilana, et al., 2009) and studies of institutional change indicate that individual institutional entrepreneurs, including scientific entrepreneurs, conduct institutional work at different institutional levels in order to innovate, shaping new paradigms and practices. Scholars have acknowledged this multiple embeddedness of institutional entrepreneurship and called for multilevel studies of work conducted at different institutional levels (Battilana, et al., 2009; Kaghan & Lounsbury, 2011; Lawrence et al., 2011; Scott, 2001). The lack of attention that institutional theorists have paid to the study of levels and level interactions in organizational and change studies more broadly has also been raised as an issue (see Bitektine & Haack, 2015).

This research study is the first in the area of institutional change to focus on the multilevel institutional work of individual open data entrepreneurs, analyzing their institutional work in order to introduce and institutionalize open data innovations within the institution of science. The results contribute to the understanding of institutional work strategies, multi-level recursive and iterative institutional work, scientists as institutional entrepreneurs, and the nature of change within the institution of science including academic organizations. The findings align with previous research that indicates that institutional entrepreneurs engage in a variety of institutional work at different institutional levels, adeptly mediating in an iterative manner between the local level of the organization to the global scientific and societal levels. This study may also be significant to scientists, university and government administrators who are considering the implementation of open data policies.

The following conclusions emerge based on the research findings:

The open data entrepreneurs engaged in a variety of forms of purposeful action or institutional work to change data dissemination practices towards open data within institution of science including: 1) theorizing; 2) counterfactual thinking; 3) disassociating with traditional modes; 4) creation of standards of practice; 5) mobilization of resources including financial, human and material resources; 6) naming and creation of new symbols; 7) forging new relations, alliances, coalitions and associations, membership; 8) association with taken-for- granted practices; 9) advocacy; and, 10) connecting with macro-level discourses. The forms of

154

institutional work are similar to those previously observed in studies of institutional entrepreneurs including scientific entrepreneurs (Casati & Genet, 2014; Maguire, et al., 2004; Perkmann & Spicer, 2008; Rao, Monin, & Durand, 2003; Ritvala & Granqvist, 2009; Symon, Buehring, Johnson & Cassell, 2008; Thompson, et al., 2015; Tracey, et al., 2011).

The open data entrepreneurs purposefully engaged in multilevel institutional work including work at the individual level, academic organization meso-level, and the scientific community meso-level institutional level supporting models of multi-level institutional change (Holm, 1995; Lawrence & Suddaby, 2006; Meyer & Rowan, 1977; Scott, 1995, 2001). Most, but not all, of the entrepreneurs also conducted significant institutional work at the societal macro-level in order to effect policy change. Institutional entrepreneurs render their ideas comprehensible to others in a compelling way that resonates with others, and the critical work of theorization was observed as a key form of institutional work at each institutional level. At each level from the individual to the societal, the open data entrepreneurs increasingly broadened the associated justifications and specifications, theorizing open data in a way that would resonate with members of their organizations, scientific communities, and society. Similarly broadening of scope was observed for the institutional work creation of standards of practice, mobilization of resources, and advocacy from the micro- to macro-levels.

The findings contribute to the understanding of multilevel institutional work (Tracey, et al., 2011) and multilevel institutional change models (Holm, 1995; Scott, 1995, 2001), as well as to the understanding of institutional entrepreneurship and change within science and academic organizations. The institutional work of the open data entrepreneurs was associated with distinct outcomes at each institutional level: Opportunity recognition at the individual level; development and establishment of the open data innovation and practice at the organization micro-level; legitimation and diffusion at the scientific community meso-level; and policy change at the societal macro-level. At the organization micro-level, the open data entrepreneurs conducted institutional work to establish their open data innovation, making use of the organization as a crucible to test and develop their innovation. The organizational level of the academic organization both enabled and constrained the work of the open scientists. The three informants that were faculty members were successful in establishing their innovation in their own research labs and courses but were frustrated in attempting to engage in change within their academic departments. Two informants that were not faculty members created parallel

155

organizations external to their academic ones in order to develop and establish their innovations. Aside from a few collaborations with colleagues, the entrepreneurs described the environment of their academic department as either indifferent to their initiative or active discounting of open data as a valid scholarly pursuit or methodology. At the meso-level of the scientific community, the informants conducted work to legitimate and diffuse their innovation, with the researchers making use of their work at the micro-level as a foundation for legitimation at the meso-level (Maguire, et al., 2004). The entrepreneurs were successful in legitimizing their open data innovations within at the level of the scientific community, as well as in supporting the institutionalization of open data as a taken-for-granted, natural and appropriate arrangement with scientists as well as with research agencies. At the broader societal macro-level, four informants continued their legitimation institutional work and also focused on public policy change by connecting with related macro-level discourses, working directly with the mass media. One early stage informant did not conduct work at the societal macro-level while he focused on attaining tenure at his university, supporting the findings of Casati & Genet (2014) in which new principal investigators more focused on scientific production, while those that had gained tenure undertook a greater role in the broader scientific community or influencing interactions between science and society.

The institutional work of the open science entrepreneurs at the micro- and meso-levels was iterative in nature, while those that engaged in macro-level institutional work began to conduct work at that level after they had established their innovations at the organizational level. Given the relation of the institution of science and the university organization, the institutional work at the organization and scientific community levels was undertaken concurrently by the open science entrepreneurs. The open data entrepreneurs were adept at conducting institutional work in such an iterative manner between institutional levels, fluidly mediating between their organizations and scientific communities in order to advance institutional change for open data practices in the institution of science. Similar to previous observations of science entrepreneurs (Casati & Genet, 2014; Ritvala & Granqvist, 2009), they were able to adeptly mediate between the scientific and societal levels by strategically working communication channels through scientific journals and networks, as well as mass media. Their teaching and/or research endeavours at the organization micro-level, provided them with legitimacy at the scientific community meso-level, and concurrently their institutional work at the meso-level provided

156

legitimacy in their continued academic work and development of their innovation at the organization level. They continued to publish their scientific work in academic journals and present at conferences and workshops while at the same time incorporating their open data innovations or presenting on them separately to other disciplines and open data communities.

The iterative, multilevel institutional work of the open data entrepreneurs provides support to the recursive, iterative multilevel institutional change models that highlight the interplay of both bottom-up and top-down processes that combine to result in institutional change (Holm, 1995; Scott, 1995, 2001). Based on their scientific findings and innovations from within their research labs, the open data entrepreneurs were able to conduct work at the scientific community level and at the same time, once open data innovations were legitimated at the scientific community level, would organizations such as universities then conform to the expectations of the higher social institution of science (Hatch, 2006) to, in turn, reflect their own legitimacy (DiMaggio & Powell, 1983; Suchman, 1995). These findings also support the observation by Ritvale & Granqvist (2009) that the capacity of a scientist to theorize and link organizational and scientific communities is an important capability of scientific entrepreneurship with institutional change being advanced through the institutional work within their own scientific fields and organizations as well as in other disciplines and institutional levels.

This study may also be significant to scientists, university and government administrators who are considering the implementation of open data innovations and policies. For academic researchers, the findings indicate that, despite movements towards the acknowledgement and demonstration of the value of open data practices for the institution of science as well as requirements and supports from some research funding agencies and scientific journals and technological advances, open data as a taken-for-granted practice is not incorporated within all disciplines nor by all researchers, even barring ethical and intellectual property considerations. Open data entrepreneurs need to be able to devote considerable time and effort in institutional work both at the level of the organization as well as within the scientific community, and at the broader societal level. At the organizational, scientific community and government levels, supporting open data need also involve a multilevel approach that includes fostering recognition of open data as an integral aspect of the conduct of research, supported by resources and policies to allow for open data innovation development and scholarship at all institutional levels. In particular, universities can foster supportive environments for the initiation and establishment of open data innovations, beyond

157

scientific entrepreneurs’ own teaching and/or research labs. Open data entrepreneurs can be supported and encouraged to pursue open innovations at all levels, leveraging their abilities to adeptly conduct a variety of forms of institutional work and mediating between the scientific and societal levels to advance institutional change.

In summary, the findings contribute to the institutional entrepreneurship and institutional work literature, and in particular to the understanding of iterative, multilevel institutional work within the institution of science, in several ways. Firstly, the study adds to literature of institutional work in examining how institutional entrepreneurs work to establish their innovations and practices as individuals, and within organizations, the scientific community, and society. Secondly, the study develops the multilevel institutional work approach by looking at agency across different levels, that is, the distinct institutional work conducted by agents as well as iterative manner of their agency between the levels of an organization, the scientific community and society. The thesis draws attention to the adept nature of institutional entrepreneurs at conducting institutional work in such an iterative manner between levels, in particular between the organization and scientific community levels within the institution of science. Thirdly, the study adds to the literature on entrepreneurship in an area not related to economic entrepreneurship, and casts light on the nature of science-based institutional entrepreneurship and the advancement of change within the institution of science and associated academic organizations. In particular, the importance of the bottom-up and top-down aspects of institutional change in the development and establishment of an open data innovation at the level of an organization and its legitimation and diffusion at the level of the scientific community.

158

References

Aldrich, H. E. (2012). The emergence of entrepreneurship as an academic field: A personal essay on institutional entrepreneurship. Research Policy, 41(7), 1240-1248.

Alford, R. R. and R. Friedland, 1985. Powers of Theory: Capitalism, the State, and Democracy. Cambridge: Cambridge University Press.

Arias, J. J., Pham-Kanter, G., & Campbell, E. G. (2015). The growth and gaps of genetic data sharing policies in the United States. Journal of Law and the Biosciences, 2 (1), 56–68

Armstrong, A., Smith, M., Thomas, J., & Johnson, A. (2015). Consumerism and Higher Education: Pressures and Faculty Conformity. The William and Mary Educational Review, 3(2), 36-46.

Arzberger, P., Schroeder, P., Beaulieu, A., Bowker, G., Casey, K., Laaksonen, L., Moorman, D., Uhlir, P., and & Wouters, P. (2004). An International Framework to Promote Access to Data. Science. 303(5665):1777-78.

Atkins, D., Droegemeier, K., & Feldman, S. E. (2003). Final report of the NSF blue ribbon advisory panel on cyberinfrastructure: Revolutionizing science and engineering through cyberinfrastructure. Retrieved from http://www.nsf.gov/od/oci/reports/atkins.pdf

Austin, I., & Jones, G. A. (2015). Governance of higher education: Global perspectives, theories, and practices. Routledge.

Baker, M. (2015). Over half of psychology studies fail reproducibility test. Nature News, 27.

Battilana, J. (2006). Agency and institutions: The enabling role of individuals’ social position. Organization, 13(5), 653-676.

Battilana, J., & D’aunno, T. (2009). Institutional work and the paradox of embedded agency. Institutional work: Actors and agency in institutional studies of organizations, 31-58.

Battilana, J., Leca, B., & Boxenbaum, E. (2009). 2 How Actors Change Institutions: Towards a Theory of Institutional Entrepreneurship. The Academy of Management Annals, 3(1), 65- 107.

Batts, S. A., Anthis, N. J., & Smith, T. C. (2008). Advancing science through conversations: Bridging the gap between blogs and the academy. PLoS biology, 6(9), e240.

159

Benner, M., & Sandström, U. (2000). Institutionalizing the triple helix: research funding and norms in the academic system. Research Policy, 29(2), 291-301.

Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, & Sayers, EW (2012) GenBank. Nucleic Acids Research, 40, D48-53.

Berger, P.L. and Luckman, T. (1967). The Social Construction of Reality. New York: Doubleday Anchor.

Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities (2003) Available at: http://oa.mpg.de/lang/en-uk/berlin-prozess/berliner-erklarung/ (accessed on 1 March 2013).

Berman, F., & Cerf, V. (2013). Who will pay for public access to research data?. Science, 341(6146), 616-617.

Bethesda Statement on Open Access Publishing (2003) Available at: http://dash.harvard.edu/handle/1/ 4725199 (accessed on 1 March 2013)

Bird, C. L., Willoughby, C., & Frey, J. G. (2013). Laboratory notebooks in the digital era: the role of ELNs in record keeping for chemistry and other sciences. , 42(20), 8157-8175.

Bishop, L. (2014). Re-using qualitative data: A little evidence, on-going issues and modest reflections. Studia Socjologiczne, (3), 167.

Bitektine, A., & Haack, P. (2015). The “macro” and the “micro” of legitimacy: Toward a multilevel theory of the legitimacy process. Academy of Management Review, 40(1), 49- 75

Blumenthal, D., DesRoches, C., Donelan, K., Ferris, T., Jha, A., Kaushal, R., ... & Shield, A. (2006). Health information technology in the United States: the information base for progress. Robert Wood Foundation.

Bohle, S. (2014). A Four Part Series on Open Notebook Science. SciLogs. January 16, 2014. http://www.scilogs.com/scientific_and_medical_libraries/a-four-part-series-on-open- notebook-science-part-three/

Bollier, David (2010). Can That Data Be Shared? News and Perspectives on the Commons. February 23, 2010 http://bollier.org/can-data-be-shared

Borgman, C. L. (2012). The conundrum of sharing research data. Journal of the American Society for Information Science and Technology, 63(6), 1059-1078.

Borgman, C. L., & Sands, A. E. (2016). Open Data in Astronomy Sky Surveys." Panel Presentation, SciDataCon Available at: http://works.bepress.com/borgman/392/

160

Boulton, Geoffrey (2014). Large volumes of data are challenging open science. SciDevNet http://www.scidev.net/global/data/opinion/large-volumes-of-data-are-challenging-open- science.html

Boulton, G., Rawlins, M., Vallance, P., & Walport, M. (2011). Science as a public enterprise: the case for open data. The Lancet, 377(9778), 1633-1635.

Bradley, D. (2006). Jean-Claude Bradley Drexel University and blogmaster of usefulchem.blogspot.com. Chemistry WebMagazine. 51 (January 2006).

Bradley, J. (1997). Creating electrical contacts between metal particles using directed electrochemical growth. Nature. 389: 268-271. http://www.nature.com/nature/journal/v389/n6648/abs/389268a0.html

Bradley, J. (2007). Open Notebook Science using Blogs and Wikis. Nature Precedings. Presented at the American Chemical Society, 27 March 2007 http://precedings.nature.com/documents/39/version/1/files/npre200739-1.pdf

Bradley, J. (2013). Opening up and sharing. Chemistry World. 18 April 2013 http://www.rsc.org/chemistryworld/2013/04/open-science-chemistry-sharing-information

Bradley, J., Guha, R., Lang, A., Lindenbaum, P., Neylon, C., Williams, A., & Willighagen, E. (2009). Beautifying data in the real world. Beautiful Data, Sebastopol, US: O’Reilly Media, Inc. 259-278.

Bradley, J. C., Lang, A. S., Koch, S., & Neylon, C. (2011). Collaboration using open notebook science in academia. Collaborative computational technologies for biomedical research, 425-452.

Bricker, B. J. (2013). Climategate: A Case Study in the Intersection of Facticity and Conspiracy Theory. Communication Studies, 64(2), 218-239.

Bruton, G. D., Ahlstrom, D., & Li, H. L. (2010). Institutional theory and entrepreneurship: Where are we now and where do we need to move in the future? Entrepreneurship Theory and Practice, 34(3), 421-440.

Budapest Open Access Initiative (2002) Available at: http://www.opensocietyfoundations.org/ openaccess/read (accessed on 1 March 2013).

Butler, D. (2005). Joint efforts. Nature, 438(1 December), 548-549.

Bygrave, W. D. (2002). The entrepreneurship paradigm (I). Entrepreneurship: Critical Perspectives on Business and Management, 3(1), 415.

Campbell EG, Clarridge BR, Gokhale M, Birenbaum L, Hilgartner S, et al. (2002) Data withholding in academic genetics: evidence from a national survey. The J of the Am Med Assoc 287: 473–480.

161

Carey, J. (2013). Scientific communication before and after networked science. Information & Culture, 48(3), 344-367.

Carter-Thomas, S., & Rowley-Jolivet, E. (2016). Open science notebooks: New insights, new affordances. Journal of Pragmatics.

Casati, A., & Genet, C. (2014). Principal investigators as scientific entrepreneurs. The Journal of Technology Transfer, 39(1), 11-32.

Choudhury, S., Fishman, J. R., McGowan, M. L., & Juengst, E. T. (2014). Big data, open science and the brain: lessons learned from genomics. Frontiers in human neuroscience, 8, 239.

Chretien, J. P., Rivers, C. M., & Johansson, M. A. (2016). Make Data Sharing Routine to Prepare for Public Health Emergencies. PLoS Med, 13(8), e1002109.

Clark, B. R. (1983). The higher education system: Academic organization in cross-national perspective. Berkeley, CA: University of California Press.

Colomy, P. (1998, June). Neofunctionalism and neoinstitutionalism: Human agency and interest in institutional change. In Sociological forum (Vol. 13, No. 2, pp. 265-300). Kluwer Academic Publishers-Plenum Publishers.

Corbyn, Z. (2009). A threat to scientific communication. Times Higher Education. 13 August 2009. http://www.timeshighereducation.co.uk/407705.article

Coturnix. (2008). Doing science publicly: Interview with Jean-Claude Bradley. Science Blogs.

Costa, M. R., Qin, J., & Wang, J. (2014, September). Research networks in data repositories. In Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries (pp. 403- 406). IEEE Press. http://scienceblogs.com/clock/2008/05/23/doing-science-publicly- intervi/

Crane, D. (1971). Transnational networks in basic science. International Organization, 25(03), 585-601.

Crane, D. (1972). Invisible colleges; diffusion of knowledge in scientific communities. Chicago: University of Chicago Press.

Crawford, S. E. S. and Ostrom, E. (1995). A grammar of institutions. American Political Science Review, 89(3), 582–600.

Cresswell, J.W. 2013. Qualitative inquiry and research design: Choosing among five approaches (3rd ed.). Los Angeles, CA: Sage.

Cutcher-Gershenfeld, J., Baker, K. S., Berente, N., Flint, C., Gershenfeld, G., Grant, B., ... & Lewis, S. (2017). Five ways consortia can catalyse open science. Nature, 543(7647), 615.

162

Dasgupta, P., & David, P. A. (1994). Toward a new economics of science. Research policy, 23(5), 487-521.

David, P. A. (1998). Common agency contracting and the emergence of" open science" institutions. The American Economic Review, 88(2), 15-21.

David, R. J., Sine, W. D., & Haveman, H. A. (2013). Seizing opportunity in emerging fields: How institutional entrepreneurs legitimated the professional form of management consulting. Organization Science, 24(2), 356-377.

Davis, H. M., & Vickery, J. N. (2007). Datasets, a shift in the currency of scholarly communication: Implications for library collections and acquisitions. Serials Review, 33(1), 26-32.

Dawes, S. S., Vidiasova, L., & Parkhimovich, O. (2016). Planning and designing open government data programs: An ecosystem approach. Government Information Quarterly, 33(1), 15-27.

De Silva, P. U., & Vance, C. K. (2017). Sharing Scientific Data: Moving Toward “Open Data”. In Scientific Scholarly Communication (pp. 41-56). Springer International Publishing.

Déjean, F., Gond, J. P., & Leca, B. (2004). Measuring the unmeasured: An institutional entrepreneur strategy in an emerging industry. Human relations, 57(6), 741-764.

Delingpole, J. (2009, November 20). Climategate: The final nail in the coffin of ‘‘Anthropogenic Global Warming’’? The Telegraph. Retrieved from http://blogs.telegraph.co.uk/news/james delingpole/100017393/climategate-the-final-nail- in-the-coffin-of-anthropogenicglobal-warming/

Den Besten, M., David, P. A., & Schroeder, R. (2010). Research in e-science and open access to data and information. In International Handbook of Internet Research (pp. 65-96). Springer Netherlands.

Denzin, N.K., and Lincoln, Y.S. (Eds.). (2005). The Sage handbook of qualitative research. Sage.

DiMaggio, P. J. (1988). Interest and agency in institutional theory. Institutional patterns and organizations: Culture and environment, 1, 3-22.

DiMaggio, P.D. (1991). Constructing an organizational field as a professional project: U.S. art museums, 1920-1940. In W.W. Powell & P.J. DiMaggio (Eds.), The New Institutionalism in Organizational Analysis (pp. 267-292). Chicago, IL: University of Chicago Press.

DiMaggio, P.J., and Powell, W.W. (1983). The iron cage revisited: Institutional isomorphism and collective rationality in organizational fields, American Sociological Review 48:147- 60.

163

Donoho, D. L. (2010). An invitation to reproducible computational research. Biostatistics, 11(3), 385-388.

Dorado, S. (2005). Institutional entrepreneurship, partaking, and convening. Organization studies, 26(3), 385-414.

Drahl, C. (2009). Organic chemist champions open science, Web technology. Chemical & Engineering News. 87(6): 34-35.

Dunn, M. B., & Jones, C. (2010). Institutional logics and institutional pluralism: The contestation of care and science logics in medical education, 1967–2005. Administrative Science Quarterly, 55(1), 114-149.

Eisenhardt, K. M. (1989). Agency theory: An assessment and review. Academy of management review, 14(1), 57-74.

Eisenhardt, K. M. 1989. Building theories from case study research. Academy of Management Review, 14(4): 532–550

Eisenhardt, K. M. and Graebner, M.E. (2007). Theory building from cases: Opportunities and Challenges. Academy of Management Journal. 50(1): 25-32.

Ellson, M. (2013). The Profiler: Open Context’s Eric Kansa. The Alamedan. June 24, 2013.

Elkhedim, B., Benard, E., Bronz, M., Gavrilovic, N., & Bonnin, V. (2016, July). Optimal design of long endurance mini UAVs for atmospheric measurement. In 2016 Applied Aerodynamics Conference.

Emirbayer, M., & Mische, A. (1998). What Is Agency? American journal of sociology, 103(4), 962-1023.

Eschenfelder, K. R., & Johnson, A. (2014). Managing the data commons: controlled sharing of scholarly data. Journal of the Association for Information Science and Technology, 65(9), 1757-1774.

Etzkowitz, H., & Leydesdorff, L. (1997). Introduction to special issue on science policy dimensions of the Triple Helix of university-industry-government relations. Science and Public Policy, 24(1), 2-5.

Fecher, B., & Friesike, S. (2013). Open Science: One Term, Five Schools of Thought (No. 218). German Council for Social and Economic Data (RatSWD).

Fligstein N. (1997). Social skill and institutional theory. American Behavioral Scientist. 40: 397- 405.

Frestedt, J. (2008). The role and impact of the principal investigator. Monitor, 31–35.

164

Frey, J. G., & Bird, C. L. (2011). Web-based services for drug design and discovery. Expert opinion on drug discovery, 6(9), 885-895.

Friedland, R., & Alford, R. R. (1991). Bringing society back in: Symbols, practices and institutional contradictions. In Powell W.W. and DiMaggio, P.J. (Eds). The New Institutionalism in Organizational Analysis pp. 232-263, Chicago: University of Chicago Press.

Friesike, S., & Schildhauer, T. (2015). Open science: many good resolutions, very few incentives, yet. In Incentives and Performance (pp. 277-289). Springer International Publishing.

Fry, J., Schroeder, R., & den Besten, M. (2009). Open science in e-science: Contingency or policy? Journal of Documentation, 65(1), 6–32.

Fuchs, J. E., Bender, A., & Glen, R. C. (2015). Cheminformatics Research at the Unilever Centre for Molecular Science Informatics Cambridge. Molecular informatics, 34(9), 626-633.

Funnel, A. (2010). Interview – Open Science. Future Tense ABC Radio National, Radio Australia and ABC online. 4 February 2010. http://www.abc.net.au/radionational/programs/futuretense/open- science/3100152#transcript

Gaglio, C. M. (2004). The role of mental simulations and counterfactual thinking in the opportunity identification process. Entrepreneurship Theory and Practice, 28(6), 533- 552.

Garson, G. D. (Ed.). (2008). Handbook of Research on Public Information Technology. IGI Global.).

Garud, R., Hardy, C., & Maguire, S. (2007). Institutional entrepreneurship as embedded agency: An introduction to the special issue. Organization Studies Berlin-European Group for Organizational Studies. 28(7), 957.

Garud, R., Jain, S., & Kumaraswamy, A. (2002). Institutional entrepreneurship in the sponsorship of common technological standards: The case of Sun Microsystems and Java. Academy of Management Journal, 45(1), 196-214.

Garud, R., & Karnøe, P. (2003). Bricolage versus breakthrough: distributed and embedded agency in technology entrepreneurship. Research policy, 32(2), 277-300.

Gewin, V. (2016). Data sharing: An open mind on open data. Nature, 529 (7584), 117-119.

Gezelter, J. D. (2015). Open Source and Open Data Should Be Standard Practices. The Journal of Physical Chemistry Letters, 6(7), 1168-1169.

Gitelman, L. (Ed.). (2013). Raw data is an oxymoron. Cambridge, MA: MIT Press.

165

Glasser, L., Herráez, A., & Hanson, R. M. (2009). Interactive 3D phase diagrams using Jmol. J. Chem. Educ, 86(5), 566. Glen, R., & Aldridge, S. (2002). Developing tools and standards in molecular informatics. Chemical Communications, 2002(23), 2745-2747.

Godbeer, A. D., Al-Khalili, J. S., & Stevenson, P. D. (2015). Modelling proton tunnelling in the adenine–thymine base pair. Physical Chemistry Chemical Physics, 17(19), 13034-13044.

Grand, A. (2015). Open Science. Journal of Science Communications. 14(4) C02.

Gray, J. Talk on “The Politics of Open Data: Past, Present and Future” at Data Power conference, , 22nd June 2015.

Greenwood, R., & Hinings, C. R. (2006). Radical organizational change. The Sage handbook of organization studies, 814-842.

Greenwood, R., Hinings, C. R., & Whetten, D. (2014). Rethinking institutions and organizations. Journal of Management Studies, 51(7), 1206-1220.

Greenwood, R., Raynard, M., Kodeih, F., Micelotta, E. R., & Lounsbury, M. (2011). Institutional complexity and organizational responses. The Academy of Management Annals, 5(1), 317-371.

Greenwood, R. & Suddaby, R. (2006). Institutional entrepreneurship in mature fields: The Big Five accounting firms. Acad. Management J. 49 27–48.

Greenwood, R., Suddaby, R., & Hinings, C. R. (2002). Theorizing change: The role of professional associations in the transformation of institutionalized fields. Academy of management journal, 45(1), 58-80.

Grégoire, D. A., & Shepherd, D. A. (2012). Technology-market combinations and the identification of entrepreneurial opportunities: An investigation of the opportunityindividual nexus. Academy of Management Journal, 55(4), 753–785.

Guha, R., Howard, M. T., Hutchison, G. R., Murray-Rust, P., Rzepa, H., Steinbeck, C., ... & Willighagen, E. L. (2006). The Blue Obelisk—interoperability in chemical informatics. Journal of chemical information and modeling, 46(3), 991-998.

Gurstein, M. (2013) ‘Should “Open Government Data” be a product or a service (and why does it matter?)’, Gurstein‘s Community Informatics, 3 February, http://gurstein.wordpress.com.myaccess.library.utoronto.ca/2013/02/03/is- open-government-data-a-product-or-a-service-and-why-does-it-matter/

Hallonsten, O. (2016). Big Science Transformed. Palgrave Macmillan.

Hannay, T. (2011). A new kind of science?. Nature Physics, 7(10), 742-742.

166

Hanson, B., Sugden, A., & Alberts, B. (2011). Making data maximally available. Science, 331 (6018), 649–649

Hanson, R. M. (2010). Jmol–a paradigm shift in crystallographic visualization. Journal of Applied Crystallography, 43(5), 1250-1260.

Hardy, C., and Maguire, S. (2008). Institutional entrepreneurship. In R. Greenwood, C. Oliver, R. Suddaby, and K. Sahlin-Andersen (Eds.), Handbook of organizational institutionalism. (pp. 198-217). London, UK: SAGE.

Hartley, J., Butler, M., & Benington, J. (2002). Local government modernization: UK and comparative analysis from an organizational perspective. Public Management Review, 4 (3), 387–404.

Hatch, M. J. (2006). Organization theory: Modern symbolic and postmodern perspectives. Oxford, England: Oxford University Press.

Hedstrom, M., & Niu, J. (2008). Incentives for Data Producers to Create “Archive-Ready” Data: Implications for Archives and Records Management. In Proceedings of the American Archivists (SAA) Research Forum.

Herráez, A. (2008). How to use Jmol to study and present molecular structures (Vol. 1). Lulu. com.

Hey, T., & Trefethen, A. E. (2005). Cyberinfrastructure for e-Science. Science, 308(5723), 817- 821.

Hohman, M., Gregory, K., Chibale, K., Smith, PJ, Ekins, S., and Bunin, B. (2009) Novel web- based tools combining chemistry informatics, biology and social networks for drug discovery. Drug Discovery Today, 14: 261-270.

Holdren, J.P. (2013, Feb 22). “Increasing Access to the Results of Federally Funded Scientific Research.” Office of Science and Technology Policy, Executive Office of the President: Washington, DC.

Holm, P. (1995). The dynamics of institutionalization: Transformation processes in Norwegian fisheries. Administrative science quarterly, 398-422.

Hjorth, D., Jones, C., & Gartner, W. B. (2008). Introduction for ‘recreating/recontextualising entrepreneurship’. Scandinavian Journal of Management, 24(2), 81-84.

Hodgson, G. M. (2007). Institutions and individuals: interaction and evolution. Organization studies, 28(1), 95-116.

House of Commons. (2004). Scientific Publications: Free for all? Science and Technology Committee. Volume 1. https://www.publications.parliament.uk/pa/cm200304/cmselect/cmsctech/399/399.pdf

167

Howe, A., & Chain, P. S. (2015). Challenges and opportunities in understanding microbial communities with metagenome assembly (accompanied by IPython Notebook tutorial). Frontiers in microbiology, 6.

Huggett, J. (2014). Promise and Paradox: Accessing Open Data in Archaeology. In: Clare Mills, Michael Pidd and Esther Ward. Proceedings of the Digital Humanities Congress 2012. Studies in the Digital Humanities. Sheffield: HRI Online Publications.

Hwang, H. and Powell, W.W. (2005). Institutions and Entrepreneurship. Handbook of Entrepreneurship Research. 179-210

Jamali, H. R. (2015). Analysis of Emerging Reputation and Funding Mechanisms in the Context of Open Science 2.0.

Jones, C. (2001). Co-evolution of entrepreneurial careers, institutional rules and competitive dynamics in American film, 1895-1920. Organization Studies, 22(6), 911-944.

Jones, C., & Massa, F. G. (2013). From novel practice to consecrated exemplar: Unity Temple as a case of institutional evangelizing. Organization Studies. 34(8) 1099–1136.

Jones, G. A. (1993). Professorial pressure on government policy: University of Toronto Faculty. Review of Higher Education, 16 (4), 461–482.

Jump, P. (2011). RCUK and Hefce step up push for open access. Times Higher Education. May 26, 2011 http://www.timeshighereducation.co.uk/news/rcuk-and-hefce-step-up-push-for- open-access/416334.article

Jump, P. (2014). Elsevier: bumps on road to open access. Times Higher Education. March 27, 2015 https://www.timeshighereducation.com/news/elsevier-bumps-on-road-to-open- access/2012238.article

Kaghan, W., & Lounsbury, M. (2011). Institutions and work. Journal of Management Inquiry, 20(1), 73-81.

Kalantaridis, C., & Fletcher, D. (2012). Entrepreneurship and institutional change: A research agenda. Entrepreneurship & Regional Development, 24(3-4), 199-214.

Kansa, E. (2012). Openness and archaeology’s information ecosystem. World Archaeology, 44(4), 498–520. doi:10.1080/00438243.2012.737575

Kansa, E. (2014). It’s the neoliberalism, stupid: Why instrumentalist arguments for open access, open data, and open science are not enough [blog post]. Impact of Social Sciences, The London School of Economics and Policy Science. http://blogs.lse.ac.uk/impactofsocialsciences/2014/01/27/its-the-neoliberalism-stupid- kansa/

168

Kansa, E.C., & Kansa, S.W. (2013). We all know that a 14 is a sheep: Data publication and professionalism in archaeological communication. Journal of Eastern Mediterranean Archaeology and Heritage Studies, 1(1), 88–97. doi:10.5325/jeasmedarcherstu.1.1.0088

Karthikeyan, M., Krishnan, S., Pandey, A. K., & Bender, A. (2006). Harvesting chemical information from the internet using a distributed approach: ChemXtreme. Journal of chemical information and modeling, 46(2), 452-461.

Karlesky, M. J. (2015). Identifying Entrepreneurial Opportunities: Cognition and Categorization in Nascent Entrepreneurs (Doctoral dissertation, University of Michigan).

Kim, Y., & Stanton, J. M. (2016). Institutional and individual factors affecting scientists' data‐ sharing behaviors: A multilevel analysis. Journal of the Association for Information Science and Technology, 67(4), 776-799.

Kitchin, R. (2014). The data revolution: Big data, open data, data infrastructures and their consequences. Sage Publications, Thousand Oaks, California

Kondra, A. Z., & Hinings, C. R. (1998). Organizational diversity and change in institutional theory. Organization studies, 19(5), 743-767.

Korneva, G., Ye, H., Gogotsi, Y., Halverson, D., Friedman, G., Bradley, J. C., & Kornev, K. G. (2005). Carbon nanotubes loaded with magnetic particles. Nano letters, 5(5), 879-884.

Kovic, I., Lulic, I., & Brumini, G. (2008). Examining the medical blogosphere: an online survey of medical bloggers. Journal of Medical Internet Research, 10(3).

Kraatz, M. S., & Block, E. S. (2008). Organizational implications of institutional pluralism. The SAGE handbook of organizational institutionalism, 840.

Kratz, J., & Strasser, C. (2014). Data publication consensus and controversies. F1000Research, 3.

Krill, P. (2014). IPython founder details road map for interactive computing platform. InfoWorld. February 14, 2014. Retrieved from http://www.infoworld.com/t/data- visualization/ipython-founder-details-road-map-interactive-computing-platform-236429

Krippendorf, K. (2013). Content Analysis. An Introduction to Its Methodology. Third Edition. California, US: Sage Publications, Inc.

Kuhn T.S. (1996). The Structure of Scientific Revolutions. London: University of Chicago Press; 1996.

Kvale, S., & Brinkman, S. (2007). Ethical issues of interviewing. Doing Interviews. London.

169

Lasthiotakis, H., Kretz, A., & Sá, C. (2015). Open science strategies in research policies: A comparative exploration of Canada, the US and the UK. Policy Futures in Education, 13(8), 968-989.

Lawrence, J. H., Celis, S., & Ott, M. (2014). Is the tenure process fair? What faculty think. The Journal of Higher Education, 85(2), 155-192.

Lawrence, T. B. (1999). Institutional strategy. Journal of management, 25(2), 161-187.

Lawrence, T. B., Hardy, C., & Phillips, N. (2002). Institutional effects of interorganizational collaboration: The emergence of proto-institutions. Academy of management journal, 45(1), 281-290.

Lawrence, T. B., Leca, B., & Zilber, T. B. (2013). Institutional work: Current research, new directions and overlooked issues. Organization Studies, 34(8), 1023-1033.

Lawrence, T. B., & Phillips, N. (2004). From Moby Dick to Free Willy: Macro-cultural discourse and institutional entrepreneurship in emerging institutional fields. Organization, 11(5), 689-711.

Lawrence, T.B., & Suddaby, R. (2006). Institutions and institutional work. In S.R. Clegg, C. Hardy, T.B. Lawrence, & W.R. Nord (Eds.), Handbook of organization studies (2nd ed., pp. 215–254). London: Sage.

Lawrence, T. B., Suddaby, R., & Leca, B. (2009) .1. Introduction: theorizing and studying institutional work. In Lawrence, T. B., Suddaby, R., & Leca, B. (Eds.). Institutional work: Actors and agency in institutional studies of organizations. Cambridge University Press (p. 1-28).

Lawrence, T., Suddaby, R., & Leca, B. (2011). Institutional work: Refocusing institutional studies of organization. Journal of Management Inquiry, 20(1), 52-58.

Leblebici, H., Salancik, G. R., Copay, A., & King, T. (1991). Institutional change and the transformation of interorganizational fields: An organizational history of the US radio broadcasting industry. Administrative science quarterly, 333-363.

Leca, B., & Naccache, P. (2006). A critical realist approach to institutional entrepreneurship. Organization, 13(5), 627-651.

Leonelli, S., Spichtinger, D., & Prainsack, B. (2015). Sticks and carrots: encouraging open science at its source. Geo: Geography and Environment.

Lounsbury, M. and Boxenbaum, E. (2013). Institutional Logics in Action. In M. Lounsbury and E. Boxenbaum (Eds.), Institutional logics in action, Part B: Research in the Sociology of Organizations (Vol. 39a, pp. 3-22). Bingley, UK: Emerald.

170

Lounsbury, M., & Glynn, M. A. (2001). Cultural entrepreneurship: Stories, legitimacy, and the acquisition of resources. Strategic management journal, 22(6‐7), 545-564.

Maguire, S., Hardy, C., & Lawrence, T. B. (2004). Institutional entrepreneurship in emerging fields: HIV/AIDS treatment advocacy in Canada. Academy of management journal, 47(5), 657-679. Martin, E. G., Law, J., Ran, W., Helbig, N., & Birkhead, G. S. (2016). Evaluating the Quality and Usability of Open Data for Public Health Research: A Systematic Review of Data Offerings on 3 Open Data Platforms. Journal of Public Health Management and Practice.

Martins, L. F. (2014). IPython notebook essentials. Packt Publishing Ltd.

Martone M, Murray-Rust P, Molloy J, Arrow T, MacGillivray M, Kittel C, Kasberger S, Steel G, Oppenheim C, Ranganathan A, Tennant J, Udell J (2016) ContentMine/Hypothes.is Proposal.Research Ideas and Outcomes 2: e8424. Mascarelli, A. (2014). Research tools: Jump off the page. Nature, 507(7493), 523-525. Mayernik, M. S. (2016). Research data and metadata curation as institutional issues. Journal of the Association for Information Science and Technology, 67(4), 973-993. Mayernik, M. S. (2017). Open data: Accountability and transparency. Big Data & Society, 4(2), 2053951717718853. Mayernik, M., Phillips, J., & Nienhouse, E. (2016). Linking Publications and Data: Challenges, Trends, and Opportunities. D-Lib Magazine, 22(5), 4. Melkers, J., & Kiopa, A. (2010). The social capital of global ties in science: The added value of international collaboration. Review of Policy Research, 27(4), 389-414. Mena, S. & Suddaby, R. (2016). Theorization as institutional work: The dynamics of roles and practices. Human Relations, 69(8), pp. 1669-1708. Merriam, S. B. (1998). Qualitative Research and Case Study Applications in Education. Revised and Expanded from "Case Study Research in Education." San Francisco: Jossey-Bass Publishers.

Merton, R. K. (1942). Science and technology in a democratic order. Journal of legal and political sociology, 1(1-2), 115-126.

Merton, R. K. (1957). Priorities in scientific discovery: a chapter in the sociology of science. American sociological review, 22(6), 635-659.

Merton, R. K. (1973). The sociology of science: Theoretical and empirical investigations. University of Chicago press.

Meyer, J. W., and Rowan, B. (1977). Institutionalized organizations: Formal structure as myth and ceremony. American Journal of Sociology, 83(2), 340–363.

171

Meyer, J. W., Ramirez, F. O., Frank, D. J., & Schofer, E. (2005). Higher Education as an Institution. In P. Gumport (Ed.) Sociology of Higher Education: Contributions and Their Contexts, Baltimore, MD: The Johns Hopkins University Press.

Michener, W. K. (2015). Ecological data sharing. Ecological Informatics. doi: 10.1016/j.ecoinf.2015.06.010

Miles, M. B., & Huberman, A. M. (1984). Qualitative data analysis: a sourcebook of new methods.

Ministry of Education and Culture, Finland (2014) Open science and research leads to surprising discoveries and creative insights Open science and research roadmap 2014–2017 Reports of the Ministry of Education and Culture, Finland 2014:21 http://www.minedu.fi/export/sites/default/OPM/Julkaisut/2014/liitteet/okm21.pdf?lang=e n

Molloy, J.C. The open knowledge foundation: open data means better science PLoS Biology, 9 (2011), p. e1001195

Monbiot, G. (2011). Academic publishers make Murdoch look like a socialist. The Guardian.

Mounce, R. (2014). The right to read is the right to mine: Text and data mining copyright exceptions introduced in the UK. London School of Economics. http://blogs.lse.ac.uk/impactofsocialsciences/2014/06/04/the-right-to-read-is-the-right-to- mine-tdm/

Mukherjee, A., & Stern, S. (2009). Disclosure or secrecy? The dynamics of open science. International Journal of Industrial Organization, 27(3), 449-462.

Murray-Rust, P. (2002). The World Wide Molecular Matrix-a peer-to-peer XML repository for molecules and properties. EuroWeb2002, Oxford, UK, 1024-1035.

Murray-Rust, P. (2005) The Blue Obelisk. CDK News, 2:43-46. Retrieved from http://superb- dca2.dl.sourceforge.net/project/cdk/CDK%20News/2_2/cdknews2.2.pdf

Murray-Rust, P. (2008). Open data in science. Serials Review, 34(1), 52-64.

Murray-Rust, P. (2014). We cannot do modern science unless it's open. Opensource.com 11 Aug 2014 http://opensource.com/education/14/8/open-source-deep-academia

Murray-Rust, P., Adams, S. E., Downing, J., Townsend, J. A., & Zhang, Y. (2011). The semantic architecture of the World-Wide Molecular Matrix (WWMM). Journal of cheminformatics, 3(1), 1.

Murray-Rust, P., & Brooks, B. (2011). Thought Experiment-The Scientist's Amanuensis-A virtual lab—Where all sorts of parameters are monitored and recorded—Promises researchers a higher degree of reproducibility. The Scientist, 25(7), 24.

172

Murray, R. P., Glen, R., Rzepa, H., Stewart, J., Townsend, J., Willighagen, E., & Zhang, Y. (2003). A semantic GRID for molecular science. In 2nd UK e-Science Conference All Hands Meeting (AHM03), Nottingham, UK.

Murray-Rust, Glen, P.R.C., Y. Zhang and J. Harter. (2002) The World Wide Molecular Matrix - a peer-to-peer XML repository for molecules and properties. 163-164 “EuroWeb2002, The Web and the GRID: from e-science to e-business", Editors: B. Matthews, B. Hopgood, M. Wilson, 2002 The British Computer Society

Nature Special 2013 Challenges in irreproducible research (www.nature.com/nature/focus/reproducibility/index.html) Accessed 10 June 2015

Neergaard, H. and Ulhøi, J.P. (eds) (2007). Handbook of Qualitative Research Methods in Entrepreneurship. Cheltenham: Edward Elgar.

Nicol, A., Caruso, J., & Archambault, É. (2013). Open data access policies and strategies in the european research area and beyond. info@ science, 1, 495-6505.

Nielsen, M. (2009). Doing science in the open. Physics World, 22 5: 30-35.

Nielsen, M. (2012). Reinventing Discovery: The New Era of Networked Science. Princeton University Press, Princeton.

Nigam, A. (2013). How institutional change and individual researchers helped advance clinical guidelines in American health care. Social Science & Medicine, 87, 16-22.

NIH (2003a) Data Sharing Policy Brochure. Available at: http://grants.nih.gov/grants/policy/ data_sharing/data_sharing_brochure.pdf (accessed on 1 March 2013).

NIH (2003b) NIH Data Sharing Policy and Implementation Guidance. Available at: http:// grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm (accessed on 1 March 2013).

Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., ... & Contestabile, M. (2015). Promoting an open research culture. Science, 348(6242), 1422- 1425.

NSF (2006) NSF Names Daniel Atkins to Head New Office of Cyberinfrastructure [Press release 06- 025]. Available at: http://www.nsf.gov/news/news_summ.jsp?cntn_id¼105820 (accessed on 1 March 2013)

NSF (2011) Award and Administration Guide (NSF Document 11-1). Available at: http:// www.nsf.gov/pubs/policydocs/pappguide/nsf11001/gpg_2.jsp#dmp (accessed on 1 March 2013).

OCI (n.d.) About Office of Cyberinfrastructure. Available at: http://www.nsf.gov/od/oci/about.jsp (accessed on 1 March 2013).

173

O’Boyle, N. M., Guha, R., Willighagen, E. L., Adams, S. E., Alvarsson, J., Bradley, J. C., & Murray-Rust, P. (2011). Open data, open source and open standards in chemistry: The blue obelisk five years on. Journal of cheminformatics, 3(1), 1-15.

OECD. (2004, January 30). Science, Technology and Innovation for the 21st Century. Meeting of the OECD Committee for Scientific and Technological Policy at Ministerial Level, 29-30 January 2004

OECD. (2015). Making open science a reality. OECD Science, Technology and Industry Policy Papers No. 25. Paris: OECD. doi:10.1787/5jrs2f963zs1-en

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.

Owens, B. (2016). Montreal institute going ‘open’ to accelerate science. Science, 351(6271), 329-329.

Pampel, H., & Dallmeier-Tiessen, S. (2014). Open research data: From vision to practice. In Opening science (pp. 213-224). Springer International Publishing.

Parker, T. H., Nakagawa, S., & Gurevitch, J. (2016). Promoting transparency in evolutionary biology and ecology. Ecology Letters, 19(7), 726-728.

Perkmann, M., & Schildt, H. (2015). Open data partnerships between firms and universities: The role of boundary organizations. Research Policy, 44(5), 1133-1143.

Perkmann, M., & Spicer, A. (2007). Healing the scars of history': Projects, skills and field strategies in institutional entrepreneurship. Organization Studies, 28(7), 1101-1122.

Perkmann, M., & Spicer, A. (2008). How are management fashions institutionalized? The role of institutional work. Human Relations, 61(6), 811-844.

Pisani, E., Aaby, P., Breugelmans, J. G., Carr, D., Groves, T., Helinski, M., ... & Mboup, S. (2016). Beyond open data: realising the health benefits of sharing data. BMJ, 355, i5295.

Piwowar, H. A. (2011). Who shares? Who doesn't? Factors associated with openly archiving raw research data. PloS one, 6(7), e18657

Piwowar, H. A., Becich, M. J., Bilofsky, H., & Crowley, R. S. (2008). Towards a data sharing culture: recommendations for leadership from academic health centers. PLoS medicine, 5(9), e183.

Piwowar, H. A., & Chapman, W. W. (2010). Public sharing of research datasets: a pilot study of associations. Journal of informetrics, 4(2), 148-156.

Piwowar, H. A., Day, R. S., & Fridsma, D. B. (2007). Sharing detailed research data is associated with increased citation rate. PloS one, 2(3), e308.

174

Piwowar, H. A., & Vision, T. J. (2013). Data reuse and the open data citation advantage. PeerJ, 1, e175.

Pollock, R. (2006) ‘The value of the public domain’, IPPR, http://www.ippr.org/publication/55/1526/the-value-of-the-public- domain (last accessed 13 August 2013).

Poynder, R. (2008a). The Open Access interviews: Peter Murray-Rust. 21 January 2008. Retrieved from http://poynder.blogspot.ca/2008/01/open-access-interviews-peter- murray.html

Poynder, R. (2008b). Peter Murray-Rust and the data-mining robots. Computer Weekly. 05 February 2008 http://www.computerweekly.com/news/2240084902/Peter-Murray-Rust- and-the-data-mining-robots

Poynder, R. (2010). Interview With Jean-Claude Bradley - The Impact of Open Notebook Science. Information Today. 27(8): September 2010 http://www.infotoday.com/it/sep10/Poynder.shtml

Poynder, R. (2012). A New Declaration of Rights: Open Context Mining. 8 June 2008. Retrieved from http://poynder.blogspot.co.uk/2012/06/new-declaration-of-rights-open-content.html

Price, A. (2015). Content Mining of the bioscience literature. The International Network for Knowledge about Wellbeing.

Pryor, G. (2009). Multi-scale data sharing in the life sciences: Some lessons for policy makers. The International Journal of Digital Curation, 4(3), 71–82.

Quint-Rapoport, M. (2010). Open source in higher education: A situational analysis of the open journal systems software project (Doctoral dissertation, University of Toronto).

Quint-Rapoport, M. (2012). Open Source in Higher Education: towards an understanding of networked universities. Policy Futures in Education, 10(3), 315-327.

Rao, H. 1998. Caveat Emptor: The Construction of Nonprofit Consumer Watchdog Organizations. American Journal of Sociology 103:912–61.

Rao, H. and Giorgi, S. (2006). Code breaking: How entrepreneurs exploit cultural logics to generate institutional change. Research in Organization Behavior, 27, 269-304.

Rao, H., Monin, P., & Durand, R. (2003). Institutional Change in Toque Ville: Nouvelle Cuisine as an Identity Movement in French Gastronomy 1. American journal of sociology, 108(4), 795-843.

Rao, H., Morrill, C., & Zald, M. N. (2000). Power plays: How social movements and collective action create new organizational forms. Research in organizational behavior, 22, 237- 281.

175

Ravven, W. (2013). Wresting New Tricks From a Python: Fernando Pérez Wins 2012 Award for the Advancement of Free Software. UC Berkeley Research News. April 11, 2013. http://vcresearch.berkeley.edu/news/wresting-new-tricks-python-fernando-Pérez-wins- 2012-award-advancement-free-software

Raymond, E. (1999). The Cathedral and the Bazaar. Sebastopol, CA: O'Reilly & Associates..

Reichman, O. J., Jones, M. B., & Schildhauer, M. P. (2011). Challenges and opportunities of open data in ecology. Science, 331(6018), 703-705.

Resnik, D. B. (2006). Openness versus Secrecy in Scientific Research Abstract. Episteme (Edinburgh), 2(3), 135.

Richards, J. D., & Winters, J. (2015). Digging into data: Open Access and Open Data. Post- Classical Archaeologies, 285-298.

RIN/NESTA. (2010). Open to All? Case studies of openness in research. http://www.rin.ac.uk/ourwork/data-management-and-curation/open-science-case-studies. Accessed 10 February 2011

Ritter-Guth, B. (2006). Interview with Jean-Claude Bradley. Drexel CoAS E-Learning Transcript. http://drexel-coas-talks-mp3-podcast.blogspot.ca/2006/09/interview-with- jean-claude-bradley.html

Ritvala, T., & Granqvist, N. (2009). Institutional entrepreneurs and local embedding of global scientific ideas—The case of preventing heart disease in Finland. Scandinavian Journal of Management, 25(2), 133-145.

Roese, N. J., & Olson, J.M.. (2014). Counterfactual thinking: A critical overview. N. J. Roese, J. M. Olson, eds., What Might Have Been: The Social Psychology of Counterfactual Thinking. Lawrence Erlbaum Associates, Mahwah NJ, 1–55

Ross, J. S., & Krumholz, H. M. (2013). Ushering in a new era of open science through data sharing: the wall must come down. Jama, 309(13), 1355-1356.

Ross, S. A. (1973). The economic theory of agency: The principal's problem. The American Economic Review, 63(2), 134-139.

Ross, S., Sobotkova, A., Ballsun-Stanton, B., & Crook, P. (2013). Creating eresearch tools for archaeologists: The federated archaeological information management systems project. Australian Archaeology, 77(1), 107-119.

Rossant, C. (2014). IPython interactive computing and visualization cookbook. Packt Publishing Ltd.

176

Rossi, M. P., Ye, H., Gogotsi, Y., Babu, S., Ndungu, P., & Bradley, J. C. (2004). Environmental scanning electron microscopy study of water in carbon nanopipes. Nano Letters, 4(5), 989-993.

Rowhani-Farid, A., Allen, M., & Barnett, A. G. (2017). What incentives increase data sharing in health and medical research? A systematic review. Research Integrity and Peer Review, 2(1), 4.

Ruef, M., & Lounsbury, M. (2007). Introduction: The sociology of entrepreneurship. Research in the Sociology of Organizations, 25, 1-29.

Russell, M. A. (2013). Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More. " O'Reilly Media, Inc.".

Ryghaug, M., & Skjølsvold, T. M. (2010). The global warming of climate science: Climategate and the construction of scientific facts. International Studies in the Philosophy of Science, 24(3), 287-307.

Sá, C., & Grieco, J. (2016). Open Data for Science, Policy, and the Public Good. Review of Policy Research, 33(5), 526-543.

Sanderson, K. (2008) Data on display. Nature News, September 2008 http://www.nature.com/news/2008/080915/full/455273a.html

Science Commons 2008. Principles for Open Science http://sciencecommons.org/resources/readingroom/principles-for-open-science/. Accessed 20 February 2011.

Schroeder, R. (2008). e-science as research technologies: Reconfiguring disciplines, globalizing knowledge. Social Science Information, 47(2), 131–157.

Schumpeter, J.A. (1942, 1975). Capitalism, Socialism and Democracy. Harper & Row: New York.

Schweik, C. M., & Grove, J. M. (2000). Fostering open-source research via a world wide web system. Public Administration and Management: An Interactive Journal, 5(3).

Scott, W. R. (1995). Institutions and organizations. Thousand Oaks, CA: Sage.

Scott, W. R. (2001) Institutions and Organizations, 2nd Ed. Foundations for Organizational Science Series, SAGE Publications, Thousand Oaks, CA

Scott, W.R. (2004). Institutional theory: Contributing to a theoretical research programme. In Smith K.G. and Hitt, M.A. (Eds), Great minds in management: The process of theory development, 460–85, Oxford: OUP.

Searle, J. R. (2005). What is an institution. Journal of institutional economics, 1(1), 1-22.

177

Seo, M. G., & Creed, W. D. (2002). Institutional contradictions, praxis, and institutional change: A dialectical perspective. Academy of management review, 27(2), 222-247. Shaikh-Lesko, Rina. (2014). Open Science Evangelist Dies. The Scientist. 2014 May 30. http://www.the-scientist.com/?articles.view/articleNo/40105/title/Open-Science- Evangelist-Dies/ Shamoo, A. E., & Resnik, D. B. (2009). Responsible conduct of research. Oxford University Press. Sheehan, B. (2015). Comparing Digital Archaeological Repositories: tDAR Versus Open Context. Behavioral & Social Sciences Librarian, 34(4), 173-213. Shen, H. (2014). Interactive notebooks: Sharing the code. Nature, 515(7525), 151-152. Shneiderman, B. (2008). Copernican challenges face those who suggest that collaboration, not computation are the driving energy for socio-technical systems that characterize Web 2.0. Science, 319(5868), 1349-1350. Sine, W. D. and David, R. J., (2010). Institutions and Entrepreneurship. Research in the Sociology of Work, v21, Emerald Group Publishing, Bingley, UK

Slocum, M. (2014). IPython creator Fernando Pérez: Surprises from IPython's evolution. Interviewed by Mac Slocum at O’Reilly Media. July 7, 2014-08-01. Retrieved from https://www.youtube.com/watch?v=g8xQRI3E8r8 Smets, M., & Jarzabkowski, P. (2013). Reconstructing institutional complexity in practice: A relational model of institutional work and complexity. Human Relations, 66(10), 1279- 1309. Smets, M., Morris, T. I. M., & Greenwood, R. (2012). From practice to field: A multilevel model of practice-driven institutional change. Academy of Management Journal, 55(4), 877- 904. Smith, R., & Roberts, I. (2016). Time for sharing data to become routine: the seven excuses for not doing so are all invalid. F1000Research, 5. Spires-Jones, T. L., Poirazi, P., & Grubb, M. S. (2016). Opening Up: open access publishing, data sharing, and how they can influence your neuroscience career. European Journal of Neuroscience, 43(11), 1413-1419. Spruijt, P., Knol, A. B., Vasileiadou, E., Devilee, J., Lebret, E., & Petersen, A. C. (2014). Roles of scientists as policy advisers on complex issues: a literature review. Environmental Science & Policy, 40, 16-25. Stake, R. (1995). The art of case study research (pp. 49-68). Thousand Oaks, CA: Sage. Stevens, J. L. R., Elver, M., & Bednar, J. A. (2013). An automated and reproducible workflow for running and analyzing neural simulations using Lancet and IPython Notebook. Frontiers in neuroinformatics, 7. Stinchcombe, A. (1965). Social structure and organizations. In J.G. March (Ed), The Handbook of Organizations. Chicago: Rand-McNally. Storer, N. W. (1966). The social system of science. Holt, Rinehart and Winston.

178

Stoutenborough, J. W., Bromley-Trujillo, R., & Vedlitz, A. (2015). How to win friends and influence people: Climate scientists’ perspectives on their relationship with and influence on government officials. Journal of Public Policy, 35(02), 269-296. Strang, D., & Meyer, J. W. (1993). Institutional conditions for diffusion. Theory and society, 22(4), 487-511. Stuart, T. E., Hoang, H., & Hybels, R. C. (1999). Interorganizational endorsements and the performance of entrepreneurial ventures. Administrative science quarterly, 44(2), 315- 349. Suchman, M. C. (1995). Managing legitimacy: Strategic and institutional approaches. Academy of management review, 20(3), 571-610. Suddaby, R., & Greenwood, R. (2005). Rhetorical strategies of legitimacy. Administrative science quarterly, 50(1), 35-67. Suddaby, R. and Viale, T. (2011). ‘Professionals and field-level change: Institutional work and the professional project’. Current Sociology, 59: 423–442. Susha, I., Grönlund, Å., & Janssen, M. (2015). Organizational measures to stimulate user engagement with open data. Transforming Government: People, Process and Policy, 9(2), 181-206. Symon, G., Buehring, A., Johnson, P., & Cassell, C. (2008). Positioning qualitative research as resistance to the institutionalization of the academic labour process. Organization Studies, 29(10), 1315-1336. Tempel, A., & Walgenbach, P. (2007). Global standardization of organizational forms and management practices? What new institutionalism and the business‐systems approach can learn from each other. Journal of Management Studies, 44(1), 1-24. Tenopir, C., Allard, S., Douglass, K., Aydinoglu, A. U., Wu, L., Read, E., ... & Frame, M. (2011). Data sharing by scientists: practices and perceptions. PloS one, 6(6), e21101. Tenopir, C., Dalton, E. D., Allard, S., Frame, M., Pjesivac, I., Birch, B., ... & Dorsett, K. (2015). Changes in data sharing and data reuse practices and perceptions among scientists worldwide. PloS one, 10(8), e0134826. Thamali, R. J. K. A., Kumbalatara, A. A., Liyanage, D. D., Ukwatta, A., Hewage, J., & Witharana, S. (2015). Modelling and Understanding of Highly Energy Efficient Fluids. arXiv preprint arXiv:1512.09242. Thompson, N.A. (2013). Entrepreneurial assemblage, institutional complexity, and the creation of new organizational forms. Paper to be presented at the 35th DRUID Celebration Conference 2013, Barcelona Spain, June 17-19 2013. Thompson, N. A., Herrmann, A. M., & Hekkert, M. P. (2015). How sustainable entrepreneurs engage in institutional change: insights from biomass torrefaction in the Netherlands. Journal of Cleaner Production, 106, 608-618. Thornton, P. H., & Ocasio, W. (1999). Institutional logics and the historical contingency of power in organizations: Executive succession in the higher education publishing industry, 1958–1990. American Journal of Sociology, 105(3), 801–843.

179

Thornton, P.H., and Ocasio, W. (2008). Institutional logics. In R. Greenwood, C. Oliver, R. Suddaby, and K. Sahlin (Eds.), Handbook of organizational institutionalism (pp. 99–129). Thousand Oaks, CA: Sage. Thornton P.H., Ocasio W., and Lounsbury M. (2012). The Institutional Logics Perspective: A New Approach to Culture, Structure, and Process. New York: Oxford University Press Tolbert, P. S., David, R. J., and Sine, W. D. (2011). Studying choice and change: The intersection of institutional theory and entrepreneurship research. Organization Science, 22(5): 1332-1344 Tolbert, P. S., and Zucker, L. G. (1983). Institutional sources of change in the formal structure of organizations: The diffusion of civil service reform, 1880–1935. Administrative Science Quarterly, 28(1), 22–39. Tolbert, P. A., & Zucker, L. G. (1996). The institutionalization of institutional theory in Handbook of organizational studies. SR Clegg, C. Hardy, and WR Nord (eds.), 175-190. Tracey, P., Phillips, N., & Jarvis, O. (2011). Bridging institutional entrepreneurship and the creation of new organizational forms: A multilevel model. Organization Science, 22(1), 60-80. Trench, B. (2012). Scientists’ blogs: Glimpses behind the scenes. In The Sciences’ Media Connection–Public Communication and Its Repercussions (pp. 273-289). Springer Netherlands. Treloar, A. (2014). The research data alliance: Globally co-ordinated action against barriers to data publishing and sharing. Learned Publishing, 27(5), 9-13. Udell, J. (2008). A conversation with Jean-Claude Bradley about open notebook science and the educational uses of Second Life. http://blog.jonudell.net/2008/06/24/a-conversation-with- jean-claude-bradley-about-open-notebook-science-and-the-educational-uses-of-second- life/ Uhlir, P. F., & Schröder, P. (2007). Open data for global science. Data Science Journal, 6(0), OD36-OD53. Väänänen, I., & Peltonen, K. (2016). Promoting Open Science and Research in Higher Education: A Finnish Perspective. In Blessinger, P., & Bliss, T. J. (Eds.). Open Education: International Perspectives in Higher Education. Open Book Publishers. p. 268. Veletsianos, G. (2015). A case study of scholars’ open and sharing practices. Open Praxis, 7(3), 199-209. Vincent-Lamarre, P., Boivin, J., Gargouri, Y., Larivière, V., & Harnad, S. (2014). Estimating open access mandate effectiveness. The MELIBEA Score.(Submitted) http://eprints. soton. ac. uk/370203. Vickers, A. J. (2006). Whose data set is it anyway? Sharing raw data from randomized trials. Trials, 7(1), 15. Vines, T. H., Andrew, R. L., Bock, D. G., Franklin, M. T., Gilbert, K. J., Kane, N. C., ... & Veen, T. (2013). Mandated data archiving greatly improves access to research data. The FASEB journal, 27(4), 1304-1308.

180

Wald, C. (2010) Scientists Embrace Openness. Science. April 0, 2010 http://www.sciencemag.org/careers/2010/04/scientists-embrace-openness Waldrop, M.M. (2008). Science 2.0 -- Is Open Access Science the Future? Scientific American (January 9, 2008). Watson, T. J. (2013). Entrepreneurship in action: bringing together the individual, organizational and institutional dimensions of entrepreneurial action. Entrepreneurship & Regional Development, 25(5-6), 404-422. Watson, M. (2015). When will ‘open science’become simply ‘science’?.Genome biology, 16(1), 101. Wagner, C. S. (2009). The new invisible college: Science for development. Brookings Institution Press. Wagner, C. S., & Leydesdorff, L. (2005). Network structure, self-organization, and the growth of international collaboration in science. Research policy, 34(10), 1608-1618. Wells, T. N., Willis, P., Burrows, J. N., & van Huijsduijnen, R. H. (2016). Open data in drug discovery and development: lessons from malaria. Nature Reviews Drug Discovery. Westenholz, A. (2009). Institutional entrepreneurs performing in meaning arenas: Transgressing institutional logics in two organizational fields. Research in the Sociology of Organizations, 27, 283-311. Whyte, A., & Pryor, G. (2011). Open science in practice: Researcher perspectives and participation. International Journal of Digital Curation, 6(1), 199-213. Wicks, D. (2001). Institutionalized mindsets of invulnerability: Differentiated institutional fields and the antecedents of organizational crisis. Organization studies, 22(4), 659-692. Wijen, F., & Ansari, S. (2007). Overcoming inaction through collective institutional entrepreneurship: Insights from regime theory. Organization studies, 28(7), 1079-1100. Wilhelm, E. E., Oster, E., & Shoulson, I. (2014). Approaches and costs for sharing clinical research data. Jama, 311(12), 1201-1202. Williams, A. J. (2008). Internet-based tools for communication and collaboration in chemistry. Drug discovery today, 13(11), 502-506. Willinsky, J. (2006). The access principle: The case for open access to research and scholarship. Cambridge, Mass.: MIT Press. Wilson, A. T., & Edwards, B. (Eds.). (2015). Open Source Archaeology: Ethics and Practice. Walter de Gruyter GmbH & Co KG. Wijen, F., & Ansari, S. (2007). Overcoming inaction through collective institutional entrepreneurship: Insights from regime theory. Organization Studies, 28(7), 1079-1100. Yin, R. (1984/1989), 'Case Study Research, Design and Methods', Beverly Hills, CA, Sage. Yin, R.K. (2009). Case study research: Design and method (4th ed.). Thousand Oaks, CA: Sage. Zilber, T.B. (2007). Stories and the Discursive Dynamics of Institutional Entrepreneurship: The Case of Israeli High-tech after the Bubble. Organizational Studies, 28(7): 1035-1054.

181

Zilber, T. (2013). Institutional logics and institutional work: Can they be agreed? In M. Lounsbury and E. Boxenbaum (Eds.), Institutional logics in action, Part A: Research in the Sociology of Organizations, 39a: 77–96. Bingley, UK: Emerald Group. Ziman, J. (2000). Real science: What it is and what it means. Cambridge University Press.

Appendix 1: Informed Consent Letter

OISE letterhead

[Date]

Dear ______,

I am writing to invite you to participate in a multiple-case research study regarding Open Science researchers. This letter includes provides information for you to decide whether you wish to participate.

The purpose of the study is to determine and describe scientists’ the institutional work undertaken by scientists—their strategies to disseminate research results within an open science paradigm, implementing and establishing new forms of data and research project sharing. Sharing of data and research processes amongst scientists in Open Science organizational forms is a little-studied area yet understanding how such institutional change occurs is critical for the success of Open Science and related policies. The five scientists participating in the study will be selected from a pool of scientists who have practiced and championed open science research data publication and who have institutionalized open science practices. I am considering these scientists as institutional entrepreneurs in that they have an interest in particular institutional arrangements and they leverage resources to create new organizations or to transform existing ones.

This study will be carried out under the supervision of Professor Creso Sá, Department of Leadership, Higher and Adult Education, at the Ontario Institute for Studies in Education, University of Toronto. The data is being collected for the purposes of a PhD thesis and perhaps for subsequent research articles.

The study will entail a one-hour interview over the telephone or Skype, at no cost to yourself. During the interview you will be asked questions about the approaches and strategies you used to establish your open science initiative [insert here the particular initiative, ex. UsefulChem], what were your strategies to establish this initiative both within and external to your organization, how you considered and overcame, or not, obstacles, and whether there was a timing or sequence to your strategies. As the interview proceeds, I may ask questions for clarification or further understanding, but my part will be mainly to listen to you speak about your views, experiences, and the ways you approached developing and establishing your open science initiative. During the interview, I will write brief notes that I will be used to assist me in considering further questions or items that I want to highlight for consideration for myself after the interview.

It is the intention that each interview will be digitally audio taped and later transcribed to paper; you have the choice of declining to have the interview taped. Your transcript will be sent to you to read in order for you to add any further information or to correct any misinterpretations that could result. The information obtained in the interview will stored at a secure location as an encrypted file. All raw data (i.e. transcripts, field notes) will be destroyed five years after the completion of the study.

183

You may at any time refuse to answer a question or withdraw from the interview process. You may request that any information, whether in written form or audiotape, be eliminated from the project. At no time will value judgments will be placed on your responses nor will any evaluation be made of your effectiveness as a principal. Finally, you are free to ask any questions about the research and your involvement with it and may request a summary of the findings of the study. I would be happy to share my findings with you after the research is completed.

The risks of participating are limited to the possibility that you may feel uncomfortable with some of the questions asked during the interviews. Should you feel uncomfortable with anything asked of you at any time during the interview or subsequent follow-up, you may end your participation with no penalty to yourself.

The expected benefits associated with your participation are the information about the experiences of open science entrepreneurs and the opportunity to participate in a qualitative research study.

If you have any questions, please feel free to contact me at (416) 801-2827 or at [email protected]. You may also contact my supervisor, Prof. Sá at (416) 923-6641. Finally, you may also contact the U of T Office of Research Ethics for questions about your rights as a research participant at [email protected] or 416-946-3273.

Thank you in advance for your participation.

Helen Lasthiotakis Prof. Creso Sá PhD Candidate Theory and Policy Studies in Education Theory and Policy Studies in Education OISE OISE University of Toronto University of Toronto Tel: 416.978.1206 Tel: 416.801.2827 Email: [email protected] Email: [email protected]

By signing below, you are indicating that you are willing to participate in the study, you have received a copy of this letter, and you are fully aware of the conditions above.

Name: ______Institution: ______Signed: ______Date: ______

Please initial if you would like a summary of the findings of the study upon completion: _____ Please initial if you agree to have your interview audio taped: _____

Please keep a copy of this form for your records.

Appendix 2: Participant Consent Letter

OISE ONTARIO INSTITUTE FOR STUDIES IN EDUCATION UNIVERSITY OF TORONTO

March 16, 2014

To the participants of the study,

The purpose of the present study is to consider Open Science researchers as institutional entrepreneurs and to determine their strategies to establish new forms of data and research process sharing. Your participation would entail a one-hour interview over the telephone or Skype, at no cost to yourself. Your participation would make an invaluable contribution to this study, which will develop a better conceptualization of the institutional work undertaken by Open Science researchers to establish their initiatives within their organizations and within their disciplinary community.

This study will be carried out by phone and skype under the supervision of Professor Creso Sá, Department of Leadership, Higher and Adult Education, Ontario Institute for Studies in Education/University of Toronto. The data is being collected for the purposes of a PhD thesis and perhaps for subsequent research articles.

Participation in this study will involve you answering 8 open-ended questions about your strategies to establish your open science initiative. The interview will be conducted by telephone or skype approximately one to one and a half-hours. During the interview you will be asked questions about your strategies to establish your open science initiative both within and external to your organization, and how you considered and overcame, or not, obstacles. As the interview proceeds, I may ask questions for clarification or further understanding, but my part will be mainly to listen to you speak about your strategies. I will also ask you for any relevant publicly available documents, published

185

documents and reports, and institutional and organizational documents that you consider may be relevant to this study.

It is the intention that each interview will be audio taped and later transcribed to paper; you have the choice of declining to have the interview taped. Your transcript will be sent to you to read in order for you to add any further information or to correct any misinterpretations that could result. All data collected will be used for the purposes of a PhD thesis and perhaps for subsequent research articles. By participating in the study, your identity will be presented in the thesis and possible subsequent publications. As the main selection criterion for the participants is that they practice open science, this lack of anonymity is not expected to be an issue as participants have provided numerous interviews regarding their opinions and role in promoting open science. All raw data (i.e. transcripts, field notes) will be destroyed five years after the completion of the study.

The research will be carried out in accordance with the University of Toronto ethical standards for research. Given the nature of open science, researchers taking part in the study be identified.You may at any time during the interview refuse to answer a question or withdraw from the interview process. A specific explanation is not required. You may request that any information, whether in written form or audiotape, be eliminated from the project at any time up to the point that you approve of the interview transcript. At no time will value judgments will be placed on your responses. Finally, you are free to ask any questions about the research and your involvement with it and may request a summary of the findings of the study.

If you have any questions, please feel free to contact me at (416) 801-2827 or at [email protected]. You may also contact my supervisor, Prof. Sá at (416) 923- 1206 / [email protected]. Finally, you may also contact the U of T Office of Research Ethics for questions about your rights as a research participant at [email protected] or 416-946-3273.

Thank you in advance for your participation.

Sincerely,

186

Helen Lasthiotakis Prof. Creso Sá PhD Candidate Leadership, Higher and Adult Education Leadership, Higher and Adult Education OISE OISE University of Toronto University of Toronto Tel: 416.978.1206 Tel: 416.801.2827 Email: [email protected] Email: [email protected]

187

Appendix 3: List of Analyzed Documents

Jean-Claude Bradley

Year Source Type Main Level 1997 Bradley, Jean-Claude. 1997. Creating electrical contacts between metal particles Participant – Meso using directed electrochemical growth. Nature. 389, 268-271. article/paper http://www.nature.com/nature/journal/v389/n6648/abs/389268a0.html (nano) 2004 M. Pía Rossi ,† Haihui Ye ,† Yury Gogotsi ,*† Sundar Babu ,‡ Patrick Ndungu ,‡ and Participant – Meso Jean-Claude Bradley ‡ Environmental Scanning Electron Microscopy Study of article/paper Water in Carbon Nanopipes Nano Letters, 2004, 4 (5), pp 989–993 DOI: (nano) 10.1021/nl049688u 2005 Guzeliya Korneva ,† Haihui Ye ,‡ Yury Gogotsi ,*‡ Derek Halverson ,§ Gary Participant – Meso Friedman ,§ Jean-Claude Bradley ,† and Konstantin G. Kornev ǁ Korneva, C 2005. article/paper Carbon Nanotubes Loaded with Magnetic Particles. Nano Lett., 2005, 5 (5), pp (nano) 879–884 DOI: 10.1021/nl0502928 http://pubs.acs.org/doi/abs/10.1021/nl0502928 2005-12 UsefulChem blog http://UsefulChem.blogspot.ca Participant - All This blog chronicles the research of the UsefulChem project in the Bradley lab at Blog Drexel University. The main project currently involves the synthesis of novel anti-malarial compounds. The work is done under Open Notebook Science conditions with the actual detailed lab notebook located at UsefulChem.wikispaces.com. More general comments posted here relate to Open Science, especially when associated with chemistry. http://usefulchem.wikispaces.com/Jean-Claude+Bradley 2006 Ritter-Guth, B. (2006). Interview with Jean-Claude Bradley. Drexel CoAS E- Interview/article Micro Learning Transcript. Coining of term http://drexel-coas-talks-mp3- podcast.blogspot.ca/2006/09/interview-with-jean-claude-bradley.html 2006 Bradley, D. (2006). Jean-Claude Bradley Drexel University and blogmaster of Interview/article Meso usefulchem.blogspot.com. Chemistry Magazine http://www.reactivereports.com/51/51_0.html 2007-08 Bradley on Science 2.0 Participant - Meso http://www.science20.com/chemistry_wide_open/crowds_solubility_and_future_ Blog organic_chemistry_0 2007 Bradley, J.C. 2007. Open Notebook Science using Blogs and Wikis. Nature Participant - Meso Precedings. Presented at the American Chemical Society, 27 March 2007 Blog http://precedings.nature.com/documents/39/version/1/files/npre200739-1.pdf

2007 Zivkovic, B., Bradley, J. C., Stemwedel, J., Edwards, P., & Vaughan, K. T. L. Participant – Meso/ (2007). Opening science to all: Implications of blogs and wikis for social and article/paper Macro scholarly scientific communication. Proceedings of the American Society for (openness) Information Science and Technology, 44(1), 1-3. http://onlinelibrary.wiley.com/doi/10.1002/meet.1450440122/epdf 2007 Turner, B. (2007). The Pursuit of Automation: Open Notebook Science. The Per Interview/article Macro Contra Interview with Jean-Claude Bradley. Per Contra. Summer 2007. http://www.percontra.net/archive/7bradley.htm 2008 Bradley, JC. Curriculum Vitae - http://openwetware.org/images/d/d6/BradleyCVshort.doc 2008 Bradley, J.C. (2008, November 11). iSchool Open Notebook Science Talk. Drexel Interview macro CoAS E-Learning Transcript. http://drexel-coas-elearning- transcripts.blogspot.ca/ 2008 Bradley, J.C. (2008, July 10). How should Open Notebook Science be used? (Web Participant – macro log comment) Retrieved from http://usefulchem.blogspot.no/2008/07/how- article/paper

188

Year Source Type Main Level should-open-notebook-science-be.html (openness) 2008 Bradley, J. C., Owens, K., & Williams, A. (2008). Chemistry crowdsourcing and Participant – meso open notebook science. Nature Precedings. article/paper (openness) 2008 Waldrop, M. M. (2008). Science 2.0. Scientific American, 298(5), 68-73. Interview/article macro 2008 Udell, J. (2008 June 24). A conversation with Jean-Claude Bradley about open Interview/article macro notebook science and the educational uses of Second Life. Interviews with Innovators. Udell Blog Podcast. Podcast retrieved from http://blog.jonudell.net/2008/06/24/a-conversation-with-jean-claude-bradley- about-open-notebook-science-and-the-educational-uses-of-second-life/ 2008 Bradley, J. C. (2008). Open Notebook Science: Implications for the Future of Participant - macro Libraries. Presentation to the University of British Columbia Library School Presentation http://eprints.rclis.org/11362/1/UBC08.pdf 2008 Bradley, JC. 2008 Open Notebook Science in 15 minutes. Drexel Mini- Participant - Meso/ Symposium. 10 October 2008. http://www.youtube.com/watch?v=_LE36oSy8n0 Presentation macro 2008 Bradley, J. C., Rosenthal, P., Guha, R., Mirza, K., & Gut, J. (2008). Open Participant - macro Notebook Science–Falcipain-2 Preliminary Results. Nature Precedings Presentation 2008 Sanderson, K. (2008). Data on display. Nature News Interview/article macro http://www.nature.com/news/2008/080915/full/455273a.html 2008 Williams, A. J. (2008). Internet-based tools for communication and collaboration in Interview/article meso chemistry. Drug discovery today, 13(11), 502-506. 2008 Coturnix. 2008. Doing science publicly: Interview with Jean-Claude Bradley. Interview/article macro ScienceBlogs. http://scienceblogs.com/clock/2008/05/23/doing-science-publicly- intervi/ 2009 Bradley, J. (2009). The learning revolution. Nature, 457(7226), 151-152 Participant – macro article/paper (openness) 2009 Drahl, C. 2009. Jean-Claude Bradley. Chemical & Engineering News Interview/article meso http://cen.acs.org/articles/87/i6/Jean-Claude-Bradley.html 2009 Bradley, J.C., Guha, R., Lang, A., Lindenbaum, P., Neylon, C., Williams, A., & Participant – macro Willighagen, E. (2009). In Beautifying data in the real world. Beautiful Data: The article/paper Stories Behind Elegant Data Solutions, p. 259-278., Sebastopol, US: O’Reilly (openness) Media, Inc. 2009 Segaran, T., & Hammerbacher, J. (2009). Beautiful data: the stories behind elegant Interview/article Macro data solutions. O'Reilly Media, Inc. Chicago 2009 Bradley, J. C., Lancashire, R. J., Lang, A. S., & Williams, A. J. (2009). The Participant – meso spectral game: leveraging Open Data and crowdsourcing for education. Journal article/paper of cheminformatics, 1(1), 1-10 (openness) 2009 Bradley, J. C., Neylon, C., Williams, A., Guha, R., Hooker, B., Lang, A. S., ... & Participant – meso Truong, H. (2009). Open notebook science challenge: Solubilities of organic article/paper compounds in organic solvents. ONS Books. (openness) 2009 Lang, A., & Bradley, J. C. (2009). Chemistry in second life. Chemistry Central Participant – meso Journal, 3(1), 14. article/paper (openness) 2009 Bradley, J.C. 2009. Open Notebook Science. NASA Goddard Space Flight Centre. Participant - macro http://istcolloq.gsfc.nasa.gov/spring2009/presentations/bradley.pdf Presentation

2009 Bradley, JC, Canton, B., and Zivkovic, B. (2009). Open Science: Good For Participant - macro Research, Good For Researchers? Panel at Columbia University, New York. Presentation August 10, 2010. https://www.youtube.com/watch?v=Zh7wzv6Oauc 2009 Brumfiel, G. (2009). Breaking the convention? Nature, 459(7250), 1050-1051. Interview/article macro 2009 Leman, H. (2009). Open Notebook Science: Interview with Jean-Claude Bradley. Interview/article macro Next Generation Science. February 26, 2009. http://archive.today/u7X1v

189

Year Source Type Main Level 2009 Nielsen, M. (2009). Information awakening. Nature Physics, 5(4), 238-240 Interview/article macro 2010 Wald, C. (2010). Scientists Embrace Openness. Science. April 9, 2009. Interview/article macro http://sciencecareers.sciencemag.org/career_magazine/previous_issues/articles/2 010_04_09/caredit.a1000036 2010 Poynder, R. 2010. Interview with Jean-Claude Bradley - The Impact of Open Interview/article macro Notebook Science. Information Today. 27(8) — Sept. 2010 http://www.infotoday.com/it/sep10/Poynder.shtml 2010 Bradley, J. C., Mirza, K., Lang, A., Bohinski, T., Bulger, D., Merchant, A., ... & Participant – meso Shah, M. (2010). Reaction Attempts Edition 1: the UsefulChem Project.. article/paper (chemistry) 2010 Stafford, N. (2010). Science in the digital age. Nature, 467(7317), S19-S21. Interview/article macro by other 2011 Bradley JC., Curtin, E., Lang, A., and Williams, A. (2011) Open Notebook Science Participant – meso Melting Point Data – First Edition. ONSBooks. Retrieved from article/paper http://usefulchem.blogspot.com/2010/04/reaction-attempts-book-edition-1- (chemistry) and.html 2011 Bradley, J.-C., Lang, A.S.I.D., Koch, S. & Neylon, C. (2011). Collaboration using Participant – macro open notebook science in academia. Collaborative Computational Technologies article/paper for Biomedical Research (eds S. Ekins, M.A.Z. Hupcey & A.J. Williams), pp. (chemistry) 425–452. John Wiley& Sons, Hoboken, NJ 2011 Nielsen, M. (2011) Reinventing discovery: the new era of networked science. Interview/article macro Princeton University Press. 2011 O’Boyle, N. M., Guha, R., Willighagen, E. L., Adams, S. E., Alvarsson, J., Participant – meso Bradley, J. C., & Murray-Rust, P. (2011). Open data, open source and open article/paper standards in chemistry: The Blue Obelisk five years on. Journal of (openness/chem cheminformatics, 3(1), 1-15. istry) 2013 Bradley, J.C. 2013. Opening and sharing. Chemistry World. Participant –Op- meso http://www.rsc.org/chemistryworld/2013/04/open-science-chemistry-sharing- Ed openness information 2013 Bradley, JC. 2013. Open Education. UDelaware Tech Talk. Participant - macro http://www.youtube.com/watch?v=BN8UjULNG9A Presentation

2013 Bird, C. L., Willoughby, C., & Frey, J. G. (2013). Laboratory notebooks in the Interview/article Meso digital era: the role of ELNs in record keeping for chemistry and other sciences. Chemical Society Reviews, 42(20), 8157-8175. http://pubs.rsc.org/en/content/articlehtml/2013/cs/c3cs60122f

2014 Bohle, S. (2014). A Four Part Series on Open Notebook Science. SciLogs. January Interview/article macro 16, 2014. http://www.scilogs.com/scientific_and_medical_libraries/a-four-part- series-on-open-notebook-science-part-three/

Daniel Gezelter

Year Source Type Main Level 2004-14 Gezelter, J.D. The OpenScience Project http://www.openscience.org/blog/ Participant - Micro/ Blog meso 1999 Gezelter, J.D. (1999). Catalyzing Open Source Development in Science: The Open Participant - Micro/ Science Project, 19 March 2000, Blog meso http://www.openscience.org/talks/bnl/OSOS.pdf 1999 Gezelter, D. (1999, October). Catalyzing Open Source Development in Science: Participant - Meso The OpenScience Project. In OpenSource/OpenScience conference, Brookhaven Presentation National Laboratory, Upton, NY, October (Vol. 2).

190

Year Source Type Main Level 1999 Gezelter, J.D. (1999). Catalyzing Open Source Development in Science: The Open Participant - Macro Science Project, 19 Mar 2000, Presentation 2000 Gezelter, J.D. (2000). 3 March 2000 2000 Schweik, C. M., & Grove, J. M. (2000). Fostering open-source research via a world Interview/article Macro wide web system. Public Administration and Management: An Interactive Journal, 5(3). This is why chemist Dan Gezelter (1999, 2000) and others (see Wilson, 1999) are calling for open-source programming in science, so others have all the information necessary to test and replicate a scientific endeavor. 2010 Funnell, A. (2010). Open Science. Radio ABC Australia – Future Tense. 4 Interview/article Macro February 2010 Retrieved from http://www.abc.net.au/radionational/programs/futuretense/open- science/3100152#transcript 2011 Gezelter, J. D. (2011). Open Science and Verifiability. The Open Science Project, Participant – Macro 5. http://web.stanford.edu/~vcs/Nov21/dg-OpenScienceandVerifiability.pdf article/paper (openness) 2015 Gezelter, .D. (2015). Open Source and Open Data Should be Standard Practices, J. Participant – Meso Phys. Chem. Lett. 6 (7), pp. 1168-1169 article/paper DOI: 10.1021/acs.jpclett.5b00285 (openness)

Eric Kansa

Year Source Type Main Level 2005-12 http://www.alexandriaarchive.org/blog/? Blog Micro/ meso 2005 Kansa, E. (2005). A community approach to data integration: Authorship and Participant – Meso building meaningful links across diverse archaeological data sets. Geosphere, article/paper 1(2), 97-109. (openness) 2005 Kansa, E.C., Schultz, J., and Bissell, A.N. (2005). Protecting Traditional Participant – Meso Knowledge and Expanding Access to Scientific Data: Juxtaposing Intellectual article/paper Property Agendas via a “Some Rights Reserved” Model. International Journal of (openness) Cultural Property. 12(3):285–314. 2007 Kansa, S.W., Kansa, E.C., & Schultz, J. M. (2007). An Open Context for Near Participant – Meso Eastern Archaeology. Near Eastern Archaeology, 70(4):188-194. article/paper (openness) 2007 Kansa, E.C. (2007). An Open Context for Small-scale Field Science Data. Participant – Meso Proceedings of the IATUL Conferences. Paper 12. article/paper (openness) 2007 Kansa, E.C, (2007) Publishing Primary Data on the World Wide Web: Participant – Meso Opencontext.org and an Open Future for the Past. Technical Briefs in Historical article/paper Archaeology, 2(1):1-11. (openness) 2009 Weier, M. H. (2009). Transparency 2.0. InformationWeek, (1230), 54-60. Macro 2010 Kansa, E., Kansa, S.W., Burton, M.M., and Stankowski, C. (2010) Googling the Participant – Meso Grey: Open Data, Web Services, and Semantics. Archaeologies, Journal of the article/paper World Archaeological Congress 6(2):301-326. (openness) 2011 Matei, A., Kansa, S., Kansa, E., & Rauh, N. (2011). The Visible Past/Open Context Article/paper on Meso Loosely Coupled Model for Digital Humanities Ubiquitous Collaboration and openness Publishing: Collaborating Across Print, Mobile, and Online Media. Spaces & Flows: An International Journal of Urban & Extra Urban Studies, 1(3). 2011 Kansa, E.C. and Kansa, S.W. (2011). Enhancing Humanities Research Productivity Participant – Meso in a Collaborative Data Sharing Environment. White Paper to the NEH Division article/paper of Preservation and Access Advancing Knowledge: The IMLS/NEH Digital (openness)

191

Year Source Type Main Level Partnership. Accessed June 11, 2014 http://ux.opencontext.org/blog/wp- content/uploads/2011/06/white_paper_PK_50072.pdf 2011 Kansa, E. C., & Kansa, S. (2011). Toward a do-it-yourself cyberinfrastructure: Participant – Meso open data, incentives, and reducing costs and complexities of data sharing. article/paper Archaeology 2.0: New Approaches to Communication and Collaboration, 57-91. (openness) edited by E.C. Kansa, S.W. Kansa, and E. Watrall. Cotsen Institute of Archaeology Press: Los Angeles, CA. [Chapter] [WorldCat] 2011 Kansa, E. C., & Kansa, S. (2011). Toward a do-it-yourself cyberinfrastructure: Participant – Meso open data, incentives, and reducing costs and complexities of data sharing. article/paper Archaeology 2.0: New Approaches to Communication and Collaboration, 57-91. (openness) 2011 Kansa, E., Kansa, S.W. and Watrall, E., eds. (2011). Archaeology 2.0: New Participant – Meso Approaches to Communication and Collaboration. Cotsen Digital Archaeology article/paper Series Volume 1. Cotsen Institute of Archaeology: UC Los Angeles. (openness) 2012 Kansa, E. (2012). Openness and archaeology’s information ecosystem. World Participant – Meso Archaeology, 44(4), 498–520. doi:10.1080/00438243.2012.737575 article/paper (openness) 2012 Kansa, E.C. (2012) – Blog notes response to Office of Science and Technology Participant – Macro Policy (OSTP) recently issued a Request for Information welcoming comments presentation and recommendations for ensuring long-term stewardship of, and broad public access to, digital data resulting from federally funded research. http://ux.opencontext.org/blog/2012/01/12/our-office-of-science-and-technology- policy-recommendations/#more-910 2013 Kansa, E. (2013, December 11). It’s the Neoliberalism, Stupid: Why Participant – Meso instrumentalist arguments for Open Access, Open Data, and Open Science are article/paper not enough. Digging Digitally. Retrieved from (openness) http://www.alexandriaarchive.org/blog/?m=201312 2013 Kansa, E.C., & Kansa, S.W. (2013). We all know that a 14 is a sheep: Data Participant – Meso publication and professionalism in archaeological communication. Journal of article/paper Eastern Mediterranean Archaeology and Heritage Studies, 1(1), 88–97. (openness) doi:10.5325/jeasmedarcherstu.1.1.0088 2013 Kansa, E.C. (digital futures) September 12, 2013. A More Open Future for the Participant – Meso Past: Publishing, Data, and Archaeology. presentation http://www.youtube.com/watch?v=trJPbqEKEhE Harvard University digital humanities consortium, Digital Futures, presents its inaugural guest speaker in a series of talks regarding digital scholarship Kansa, E.C. Publish or Perish Conference: Beyond Journals & New Forms of Participant – Meso Digital Publishing http://icis.ucdavis.edu/?page_id=22 presentation http://icis.ucdavis.edu/?page_id=187 2013 Kansa, E.C. (2013) Reimagining Archaeological Publication for the 21st Century. Participant – Meso Conference Abstract 2013. http://www.caa2013.org/drupal/speakers presentation 2013 Ellson, M. (2013). The Profiler: Open Context’s Eric Kansa. The Alamedan. June Interview/article Meso 24, 2013. 2014 Kansa, E., Kansa, S.W. and Arbuckle, B. (2014) Publishing and Pushing: Mixing Participant – Meso Models for Communicating Research Data in Archaeology. International Journal article/paper of Digital Curation 9(1):57-70. (openness) 2015 ArchaeoWebby (2015) Open Context and Data Sharing with the Kansas - Episode Interview/artic Meso/M 6.. February 9, 2015 le acro http://www.archaeologypodcastnetwork.com/archaeotech/6

Peter Murray-Rust

192

Year Source Type Main Level 1998 Murray-Rust, P. (1998). The globalization of crystallographic knowledge. Participant – Meso Acta Crystallographica Section D: Biological Crystallography, 54(6), article/paper 1065-1070. (openness) 1999 Peter Murray-Rust and Henry S Rzepa. Chemical markup, XML, and the Participant – Meso Worldwide Web. 1. Basic principles. Journal of Chemical Information article/paper and Computer Sciences, 39(6):928{942, 1999. (openness) 2003 Murray-Rust, P., Glen, R. C., Rzepa, H. S., Stewart, J. J. P., Townsend, J. Participant – Meso A., Willighagen, E. L., & Zhang, Y. (2003). A semantic GRID for Presentation molecular science. UK e-Science All Hands Meeting, Nottingham, UK, 2- 4 September 2003. Retrieved from: http://www.nesc.ac.uk/events/ahm2003/AHMCD/pdf/157.pdf 2005 Murray-Rust, P. (2005) The Blue Obelisk. CDK News, 2:43-46. Retrieved Participant – Meso from http://superb- article/paper dca2.dl.sourceforge.net/project/cdk/CDK%20News/2_2/cdknews2.2.pdf (openness) 2006- http://blogs.ch.cam.ac.uk/pmr/ Blog Micro/ 2016 meso/ Macro 2008 Murray-Rust, P. (2008). Open data in science. Nature. Serials Review, Participant – Meso/ 34(1), 52-64. article/paper Macro (openness) 2008 Murray-Rust, P. (2008). Chemistry for everyone. Nature, 451(7179), 648- Participant – Meso 651. article/paper (openness) 2008 Poynder, R. (2008). The Open Access interviews: Peter Murray-Rust. 21 Interview/article Meso/ January 2008. Retrieved from http://poynder.blogspot.ca/2008/01/open- Macro access-interviews-peter-murray.html 2010 Murray-Rust, P. (2010). Peter Murray-Rust on Open Data. Faculty of 1000. Interview/article Meso 16 November 2008. Retrieved from https://www.youtube.com/watch?v=DIOeNCrBYk4 2011 Murray-Rust, P. (2011). Semantic science and its communication-a Participant – Meso personal view. Journal of Cheminformatics, 3(1), 48. article/paper (openness) 2011 Murray-Rust, P., Adams, S. E., Downing, J., Townsend, J., & Zhang, Y. Participant – Meso (2011). The semantic architecture of the World-Wide Molecular Matrix article/paper (WWMM). Journal of Cheminformatics, 3, 42. (openness) 2011 O'Boyle, N. M., Guha, R., Willighagen, E. L., Adams, S. E., Alvarsson, J., Participant – Meso Bradley, J. C., ... & Murray-Rust, P. (2011). Open Data, Open Source and article/paper Open Standards in chemistry: The Blue Obelisk five years on. J. (openness) Cheminformatics, 3, 37. 2011 Murray-Rust, P. (2011, March). Open Data and the Panton Principles. In Participant – Meso ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY article/paper (Vol. 241). 1155 16TH ST, NW, WASHINGTON, DC 20036 USA: (openness AMER CHEMICAL SOC. 2011 Jump, P. (2011). RCUK and HEFCE step up push for open access. Times Interview/article Meso / Higher Education. May 26, 2011 Macro http://www.timeshighereducation.co.uk/news/rcuk-and-hefce-step-up- push-for-open-access/416334.article 2012 Poynder, R. (2012). A New Declaration of Rights: Open Context Mining. 8 Interview/article Macro June 2008. Retrieved from http://poynder.blogspot.co.uk/2012/06/new- declaration-of-rights-open-content.html

193

Year Source Type Main Level 2013 Murray-Rust, P. (2013) Research Data Symposium Opening Keynote: Peter Participant – Meso Murray-Rust. Held at Columbia University on February 27, 2013. Presentation Retrieved from https://www.youtube.com/watch?v=QVY-WDVITWw 2013 Murray-Rust, P. (2013) Scientific data costs billions but almost all is Participant – Macro thrown away - what should be done? Published November 15, 2013. Presentation Retrieved from https://www.youtube.com/watch?v=LHkHGgYfaP0 2014 Murray-Rust, P. and Michelle, B. (2014) Open Notebook Science. Austrian Participant – Meso Science Fund FWF. Vienna on June 3, 2014. Retrieved from Presentation https://www.youtube.com/watch?v=6_bad49Celg 2014 Murray Rust, P., & Murray Rust, D. (2014). Reproducible physical science Participant – Meso and the declaration. Chapman and Hall/CRC. article/paper (openness) 2014 Jump, P. (2014). Elsevier: bumps on road to open access. Times Higher Interview/article Meso / Education. March 27, 2015 Macro https://www.timeshighereducation.com/news/elsevier-bumps-on-road-to- open-access/2012238.article 2014 Bowley, C. (2014). We can’t live with anything less than Open. PLOS Interview/ Macro http://blogs.plos.org/thestudentblog/2014/03/15/cant-live-anything-less- article open/ 2014 Robinson, S. (2014). Elsevier loves Open Access, especially when they can Interview/ Macro still charge for articles. Melville House. article https://www.mhpbooks.com/elsevier-loves-open-access-especially-when- they-can-still-charge-for-articles/ 2014 (2014). Elsevier Still Charging For Open Access Copies, Two Years After Interview/ Macro It Was Told Of The Problem. article https://www.techdirt.com/articles/20140319/11185526626/elsevier-still- charging-open-access-copies-two-years-after-it-was-told-problem.shtml 2015 Price, A. (2015). Content Mining of the bioscience literature. The Interview/ Macro International Network for Knowledge about Wellbeing. article 2015 Polich, K. (2015) Audio interview with Peter Murray-Rust on the Data Interview/ Macro Skeptic Podcast. http://discuss.contentmine.org/t/audio-interview-with- article peter-murray-rust-on-the-data-skeptic-podcast-53-minutes/134

Fernando Pérez

Year Source Type Main Level 2007 Pérez, F., & Granger, B. E. (2007). IPython: a system for interactive scientific Participant – Meso computing. Computing in Science & Engineering, 9(3), 21-29. article/paper (openness) 2011 Pérez, F. (2011). Reproducible software vs. reproducible research. Retrieved from Participant – Meso http://web.stanford.edu/~vcs/AAAS2011/f article/paper Pérez_aaas_2011_repro_research_extabs.pdf (openness) 2012 Von Krogh, G., Haefliger, S., Spaeth, S., & Wallin, M. W. (2012). Carrots and Interview/article Meso rainbows: Motivation and social practice in open source software development. Mis Quarterly, 36(2), 649-676. 2012 Pérez, F. (2012) Science And Python: retrospective of a (mostly) successful Participant – Meso decade. Pycon Canada. November 11, 2012. Retrieved from Presentation https://www.youtube.com/watch?v=F4rFuIb1Ie4 2013-15 http://blog.fperez.org/ Participant - Micro/

194

Year Source Type Main Level Blog meso 2013 Ravven, W. (2013). Wresting New Tricks From a Python: Fernando Pérez Wins Interview/article Meso 2012 Award for the Advancement of Free Software. UC Berkeley Research News. April 11, 2013. http://vcresearch.berkeley.edu/news/wresting-new-tricks- python-fernando-Pérez-wins-2012-award-advancement-free-software 2013 Pérez, F. and Granger, B.E. (2013). An open source framework for interactive, Participant – Meso collaborative and reproducible scientific computing and education. Sloan Grant article/paper Proposal. Retrieved from http://ipython.org/_static/sloangrant/sloan-grant.html (openness) 2014 Millman, K.J. and Pérez, F. (2014) Developing open source scientific practice. In Participant – Meso Stodden, V., Leisch, F., & Peng, R. D. (Eds.). Implementing Reproducible article/paper Research. CRC Press. (openness) 2014 Krill, P. (2014). IPython founder details road map for interactive computing Interview/article Meso platform. InfoWorld. February 14, 2014. Retrieved from http://www.infoworld.com/t/data-visualization/ipython-founder-details-road-map- interactive-computing-platform-236429 2014 Mascarelli, A. (2014). Research tools: Jump off the page. Nature, 507(7493), 523- Interview/article Meso 525. 2014 Slocum, M. (2014). IPython creator Fernando Pérez: Surprises from IPython's evolution. Interview/article Meso Interviewed by Mac Slocum at O’Reilly Media. July 7, 2014-08-01. Retrieved from https://www.youtube.com/watch?v=g8xQRI3E8r8 2014 Pérez, F. (2014) IPython: From Interactive computing to computational narratives. Meso Presentation at the University of Manchester Department of Mathematics. April 2014. Retrieved from http://www.walkingrandomly.com/?p=5450

195

Appendix 4: Interview Protocol

Date of Interview: ______Time: ______Interview method: Skype Address ______Telephone Number ______

Interviewer: Interviewee:

Interviewer provides a brief overview of the project: Thank you for agreeing to be interviewed today. As you know, the purpose of this study is to determine and describe scientists’ strategies to disseminate research results within an open science paradigm, implementing and establishing new forms of data and research project sharing. I am considering scientists such as yourself as institutional entrepreneurs. Today, I’d like to speak with you about [insert specific open science initiative, for example, UsefulChem].

Questions: What were the data publication norms for your discipline at the time you were establishing your initiative?

What were the important steps you took to establish [open science initiative] from the idea you had to its establishment?

Why did you feel these steps were necessary and significant?

What was your role in seeing these steps through?

What obstacles did you encounter? Did you overcome them? If so, how?

If not answered as part of above: Were there steps you took with your colleagues, within your organization, and/or within the broader discipline, and beyond?

Was there a temporal order to the steps you undertook to establish your initiative?

Are there any other public domain documents that you have that can help me to understand the project and its establishment?

Interviewer thanks the individual for participating in the interview. Discuss matter of confidentiality of responses and potential future interviews. The transcript of the interview will be sent to her/him for review/error checking.