The Legal Framework for Reproducible Scientific Research Licensing and Copyright
Total Page:16
File Type:pdf, Size:1020Kb
R EPRODUCIBLE RESEARCH The Legal Framework for Reproducible Scientific Research Licensing and Copyright As computational researchers increasingly make their results available in a reproducible way, questions naturally arise regarding copyright, subsequent use and citation, and ownership rights in general. The author proposes the Reproducible Research Standard for all components of scientific researchers’ scholarship, which should encourage replicable scientific investigation through attribution, the facilitation of greater collaboration, and the promotion of the engagement of the larger community in scientific learning and discovery. n the US, when scientists put their original computational work because researchers often research on the Web, it automatically falls need more than just the published article to re- under copyright. However, copyright is produce the results. an unsuitable legal structure for scientific Although this trend seems to be changing, the Iworks. Scientific norms guide scientists to repro- typical journal publishing format doesn’t allow for duce and build on others’ research, and default transmission of supporting files such as accom- copyright law by its very purpose runs counter panying images, source code, or demonstrations to these goals. In this article, I present a meth- of work to interested readers—although some odology for scientists to rescind copyright from journals are beginning to require that code or their work in such a way that it realigns scientific data be released as a precondition for publication, information sharing with long-established scien- (for example, Nature [www.nature.com/authors/ tific norms. editorial_policies/availability.html], The Insight Journal [www.insight-journal.org], and Annals of Options for Researchers Internal Medicine [www.annals.org/cgi/content/ Computational research is becoming more per- full/0000605-200703200-00154v1]). Evidence ex- vasive across a growing number of fields—for ex- ists that reproducible research receives more cita- ample, in the June 1996 issue of The Journal of the tions than nonreproducible work,1,2 and releasing American Statistical Association, nine of 20 articles research on the Web is a growing trend that seems were computational, whereas a decade later, in to be gathering institutional support. On 12 Feb- the June 2006 issue, 33 of 35 were computation- ruary 2008, Harvard University’s faculty of arts al. Different journals have different agreements and sciences adopted a policy that requires faculty with article authors, but most require authors to members to let the university make their scholarly relinquish their ownership rights to articles, in- cluding copyright. Thus authors have very little 1521-9615/09/$25.00 © 2009 IEEE or—more typically—no say in how their work is COPUBLI S HED BY THE IEEE CS AND THE AIP used after publication; they also find that it’s fre- Victoria Stodden quently bound away in journals that can be very expensive to access. This is especially tough for Harvard Law School COMPUTING IN SC IEN C E & ENGINEERING THIS ARTICLE HAS BEEN PEER-REVIEWED. 35 articles available freely online (rights are turned the underlying idea or discovery (www.copyright. over to the university, nonexclusively): gov/title17). Another option, quite different from copyright, is the patent, granted by the US Patent Each Faculty member grants to the President and Office only after reviewing an invention to make Fellows of Harvard College permission to make sure it’s relevant, useful, and not obvious. Where- available his or her scholarly articles and to exer- as authors don’t need to apply for copyright pro- cise the copyright in those articles. In legal terms, tection because it “follows the author’s pen across the permission granted by each Faculty member the page,”3 inventors must apply for a patent. Just is a nonexclusive, irrevocable, paid-up, world- as a copyright protects an author from plagiarism, wide license to exercise any and all rights under a patent’s raison d’être is to open the knowledge copyright relating to each of his or her scholarly publicly while granting the right to exclude oth- articles, in any medium, and to authorize others ers from making, using, offering for sale, or sell- to do the same, provided that the articles aren’t ing the invention in the United States (35 USC sold for a profit (www.fas.harvard.edu/~secfas/ 154(a)(1)). However, copyright and patent laws February_2008_Agenda.pdf, p. 3). work counter to prevailing scientific norms— copyright was intended to give authors of creative However, Stuart M. Shieber, the Harvard works (literature and music, for example) exclusive University computer science professor who pro- rights, such as the right to be credited, to deter- posed the new policy, said in a press release that mine who may adapt or perform the work, or who the decision “should be a very powerful message may benefit financially from it. A derivative work to the academic community that we want and is one that’s “based upon one or more preexist- should have more control over how our work is ing works” and gives the original copyright holder the exclusive right to prepare such works (17 USC Copyright and patent laws work counter to 106(2)). A scientific contribution is considered valuable if, among other things, researchers can prevailing scientific norms—copyright was reproduce the results successfully (verifiability), and the work is built upon it, thus uncovering new intended to give authors of creative works scientific discoveries. Copyright stands in the way of both actions. exclusive rights. As explained earlier, open licenses offer an op- tion to override default copyright law. Two of the most common types of open licenses that rescind used and disseminated” (www.news.harvard.edu/ copyright are those designed for code (for ex- gazette/2008/02.14/99-fasvote.html). Stanford’s ample, the GNU Public License or GPL and the School of Education soon followed suit with a Berkeley Software Distribution or BSD license) or mandate for open access: All faculty members media (for example, the family of Creative Com- will deposit a copy of their published work in an mons licenses). open access repository as of 26 July 2008 (www. news.harvard.edu/gazette/2008/02.14/99-fasvote. Licenses for Code html and www.earlham.edu/~peters/fos/2008/06/ Because copyright extends to code, Richard Stall- oa-mandate-at-stanford-school-of-ed.html). man began the Free Software movement in the General concern clearly exists about ownership early 1980s to encourage programmers to release rights to scholarly research. Tension arises be- their source code along with the software com- cause the scientific ethos guides scientists to both piled for end users (www.gnu.org/gnu/thegnu reproduce previous results and build on them, project.html). His license, which became the thereby generating further scientific understand- GPL, has two main components: ing. Copyright stands as a bar preventing the open sharing, dissemination, and use of work. To re- •• if publicly distributed, all software subject to scind copyright on scientific research, you must the license must also have its source code re- actively choose to do so, and this is typically done leased, and with a license. •• once the license is attached to code, it also at- taches to any body of code that uses the origi- Licensing and Rescinding Copyright nal code. Copyright is a set of rights that attach by default to “original works of authorship” although not to In brief, Stallman’s license has a viral effect de- 36 COMPUTING IN SC IEN C E & ENGINEERING signed to propagate the release of all source code. the same or similar license to this one (see http:// This means that if you use GPL-licensed code in creativecommons.org/licenses/by-nc-sa/2.0, for the development of another body of code, your example). The Creative Commons CC BY license entire work must also carry the GPL unless you ensures attribution but doesn’t contain the Share negotiate an alternative with the original’s copy- Alike component. right holders. This is the Share Alike provision of the license. The Share Alike Provision BSD license is an attribution license and doesn’t in the Scientific Context contain the Share Alike provision. Software under The Share Alike concept is inappropriate in the a BSD license retains the original license when it’s scientific context because it can impose limits on used in a derivative work, but the entire derivative the use and reuse of others’ work, which in the work doesn’t necessarily become BSD-licensed scientific context, should be avoided whenever unless the downstream author chooses to do so. possible. This would render the research un- Typically, the computational researcher re- available for reuse by people who don’t want to leases code that consists of instruction scripts use the same license as the original research for for a proprietary compiled language, rather than their own resulting research compendia. Ideally, a compiled binary, although it might require downstream researchers will choose to license the proprietary binary code to run. There has been original components of their compendia so that a marked increase in the use of these types of researchers, even those working within a propri- quantitative programming environments such as etary context, can build upon the work without Matlab, SPSS, SAS, R, and Stata for experimenta- legal encumbrance, but scientific research should tion and data display across a variety of fields. To be encouraged over particular license use. rescind copyright from these instruction scripts, Furthermore, expanding the license to cover a license for code would be used. But all of a com- the entire derivative work product makes attempts putational scientist’s research can be considered at attribution more difficult. Under Share Alike, code—for example, figures, articles, data struc- it’s no longer clear how to give credit to upstream tures, and even pseudocode descriptions wouldn’t work in a derivative product because a single attri- be classified as code.