Downloadable Software for Academic Research Purposes

What Computational Scientists Need to Know About Intellectual Property Law: A Primer Victoria Stodden1 Abstract As computation takes a central role in scientific research, the production of digital scholarly objects has new implications for Intellectual Property Law. The goal of this chapter is to provide a “rough guide” to IP Law for scientists who work with computers and are making available datasets and code, as well as the published article. These digital scholarly objects are potentially subject to copyright and patent law, and this chapter attempts to disentangle the various options available to scientists who practice really reproducible research and share their code and data. The basics of copyright law are explained in the context of scientific research, and options such as the various Creative Commons licenses, open licensing for software, and permissioning for dataset re-use. This chapter addresses the three primary digital research outputs in turn, the manuscript (including open access publishing), the code, and the data. It ends with citation recommendations for each of these, in particular for software and data. Introduction Data and code are becoming as important to research dissemination as the traditional manuscript. For computational science the evidence is clear: it is typically impossible to verify scientific claims without access to the code and data that generated published findings. Gentleman and Lang [1] introduced the notion of the “Research Compendium” as the unit of scholarly communication, a triple including the explanatory narrative, the code, and the data used in deriving the results. One of the reasons for including the code and data is to facilitate the production of really reproducible research, a phrase coined by Jon Claerbout in 19912 to mean research results that can be regenerated from the available code and data. Claerbout’s approach was paraphrased by Buckheit and Donoho [2] as follows: The idea is: An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. 1 Victoria wishes to thank an anonymous reviewer for many extremely helpful comments. 2 See http://sepwww.stanford.edu/doku.php?id=sep:research:reproducible for the Stanford Exploration Project’s pioneering recommendations for reproducible research. Enabling computational replication typically means supplying the data, software, and scripts, including all parameter settings, that produced the results. [3, 4]. This approach runs headlong and unavoidably into current Intellectual Property law, which creates a stumbling block rather than an impassable barrier to the dissemination of really reproducible research. In this chapter I describe these Intellectual Property stumbling blocks to the open sharing of computational scientific knowledge and present solutions that coincide with longstanding scientific norms. In Section 1, I motivate scientific communication as a narrative with a twofold purpose: to communicate the importance of the findings within the larger scientific context and to provide sufficient information that the results may be verified by others in the field. Sections 2 and 3 then discuss Intellectual Property barriers and solutions that enable code and data sharing respectively. Each of these three research outputs, the research article, the code, and the data, require different legal analyses and action in the scientific context as described below. The final section discusses citation for digital scholarly output, focusing on code and data. A widely accepted scientific norm, as labeled by Robert K. Merton, is Communism or Communalism [5]. By this Merton meant that property rights in scientific research extend only to the naming of scientific discoveries (Arrow’s Impossibility Theorem for example, named for its originator Kenneth Arrow), and all other intellectual property rights are given up in exchange for recognition and esteem. This notion, at least in the abstract, underpins the current system of publication and citation that forms the basis for academic promotion and reward. Computational science today is facing a credibility crisis: without access to the code and data that underlie scientific discoveries, published findings are all but impossible to verify [4]. Reproducible computational science has attracted attention since Claerbout wrote some of the first really reproducible manuscripts in 1992.3 More recently, a number of researchers have adopted reproducible methods [2, 6, 7] or introduced them in their role as journal editors [8, 9, 10]. This chapter discusses how Intellectual Property Law applies to data in the context of communicating scientific research. 1. Publishing the Research Article Scientific publication has taken the well-recognized form of the research article since 1665 with the first issue of the Philosophical Transactions of the Royal Society of London.4 This section motivates the sharing of the research paper, and discusses the clash that has arisen between the need for scientific dissemination and modern intellectual property law in the United States. Scientific results are described in the research manuscript, including their derivation and context, and this manuscript is typically published in an established academic journal. It is of primary importance that the body of scientific knowledge, comprised of journal publications, have as little error as possible. This is in part accomplished through peer review, and in part through the very act of publication and permitting a wide audience access to the work. The recognition that the scientific 3 See http://sepwww.stanford.edu/doku.php?id=sep:research:reproducible 4 For a brief history see http://rstl.royalsocietypublishing.org/ including an image of the first issue with the endearing title “Philosophical Transactions Giving Some Account of the Present Undertakings, Studies, and Labours of the Ingenious in Many Considerable Parts of the World.” research process is error prone, that error can creep in anywhere and from any source, is central to the scientific method and wider access to the findings increases the chances that errors will be caught. The second reason property rights have been eschewed in scientific research is the understanding that scientific knowledge about our world, such as physical laws, mathematical theorems, or the nature of biological functions, are to be discovered, not invented created, and this knowledge belongs to all of humanity. This is not to say scientific discovery is not a creative act, quite the contrary, but that the underlying scientific fact is a public good, a facet of our world not subject to ownership. This is the underlying rationale behind U.S. federal government grants of over $50 billion dollars for scientific research in 2012 [11]. This vision is also reflected both in the widespread understanding of scientific facts as “discoveries” and not “inventions,” and in current intellectual property law which does not recognize a scientific discovery as rising to the level of individual ownership, unlike an invention or other contribution. We will see this notion rise again in the discussion on scientific data. Copyright law in the United States originated in the Constitution, stating that “The Congress shall have Power … To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.”5 Through a series of subsequent laws, copyright has come to assign a specific set of rights to authors of original expressions of ideas by default. In the context of scientific research, this means that the written description of a finding is automatically copyrighted by the author(s) (how copyright applies to data and code is discussed in the following two sections). Copyright secures exclusive rights vested in the author to both reproduce the work and prepare derivative works based upon the original. There are exceptions and limitations to this power, such as Fair Use, but these do not provide for an intellectual property framework for scientific knowledge that matches longstanding scientific norms of openness, access, and transparency. Intellectual Property law, and its interpretation by academic and research institutions, means that authors have copyright over their research manuscripts. Copyright can be transferred to others and the copyright holders can grant permissions for use to others as they see fit. In a system established many decades ago journals typically request that copyright be assigned to the publisher for free, rather than remain with the authors, as a condition of publication. Many journals have a second option for authors if they request it, where copyright remains with the author but permission is granted to the journal to publish the article.6 If copyright was transferred, access to the published article usually involves paying a fee to the publisher. Typically scientific journal articles are available only to the privileged few affiliated with a university library that pays the journal subscription fees, and articles are otherwise offered for a surcharge of about $30 each. Authors of scientific articles, and the owners of copyright, typically transfer copyright to publishers as a condition of publication. Publishing scientists today have other options. A transformation is

Downloadable Software for Academic Research Purposes

ERC Document on Open Research Data and Data Management Plans

Open Science Toolbox

Assessing the Impact and Quality of Research Data Using Altmetrics and Other Indicators

Response to NSF 20-015 Request for Information on Data-Focused

Generalist Repository Comparison Chart

How to Cite Datasets and Link to Publications

2018 Organizational and Membership Report

Data Governance Workshop, Final Report Arlington, VA December 14-15, 2011

Open Data As Open Educational Resources: Towards Transversal Skills and Global Citizenship

A Study of Current Policies, Practices and Infrastructure Supporting the Sharing of Data to Prevent and Respond to Epidemic and Pandemic Threats

The Open Knowledge Foundation: Open Data Means Better Science

Assigning Creative Commons Licenses to Research Metadata: Issues and Cases