The Open Knowledge Foundation: Open Data Means Better Science
Total Page:16
File Type:pdf, Size:1020Kb
Community Page The Open Knowledge Foundation: Open Data Means Better Science Jennifer C. Molloy* Department of Zoology, University of Oxford, Oxford, United Kingdom Data provides the evidence for the In response to these problems, multiple data wranglers, lawyers, and other indi- published body of scientific knowledge, individuals, groups, and organisations are viduals with interests in both open data which is the foundation for all scientific involved in a major movement to reform and the broader concept of open science. progress. The more data is made openly the process of scientific communication. available in a useful manner, the greater The promotion of open access and open The Open Knowledge Definition the level of transparency and reproduc- data and the development of platforms ibility and hence the more efficient the that reduce the cost and difficulty of data The definition of ‘‘open’’, crystallised in scientific process becomes, to the benefit of handling play a principal role in this. the OKD, means the freedom to use, society. This viewpoint is becoming main- One such organisation is the Working reuse, and redistribute without restrictions stream among many funders, publishers, Group on Open Data in Science (also beyond a requirement for attribution and scientists, and other stakeholders in re- known as the Open Science Working share-alike. Any further restrictions make an item closed knowledge. It also empha- search, but barriers to achieving wide- Group) at the Open Knowledge Founda- sises the importance of usability and access spread publication of open data remain. tion (OKF). The OKF is a community- to the entire dataset or knowledge work: The Open Data in Science working group based organisation that promotes open at the Open Knowledge Foundation is a knowledge, which encompasses open data, community that works to develop tools, free culture, the public domain, and other ‘‘The work shall be available as a applications, datasets, and guidelines to areas of the knowledge commons. Found- whole and at no more than a rea- promote the open sharing of scientific ed in 2004, the organisation has grown sonable reproduction cost, prefera- data. This article focuses on the Open into an international network of commu- bly downloading via the Internet Knowledge Definition and the Panton nities that develop tools, applications, and without charge. The work must also Principles for Open Data in Science. We guidelines enabling the opening up of be available in a convenient and also discuss some of the tools the group has data, and subsequently the discovery and modifiable form.’’ developed to facilitate the generation and use of that data. Its working groups are in use of open data and the potential uses fields as broad as government, develop- that we hope will encourage further This is an important consideration for ment, science, economics, archaeology, scientific data where in some cases data is movement towards an open scientific and geodata. However, all are united by knowledge commons. accessible, for example, in online supple- the same organisational values and prin- ments to published papers, but is not ciples, and share a common understanding licensed to be reuseable; or it’s accessible Introduction of openness, as set out in the Open Know- and reuseable but in a form that inhibits Science is built on data: its collection, ledge Definition (OKD; http://www. capture and modification. Prior to online analysis, publication, reanalysis, critique, opendefinition.org/okd/). supplementary materials, requesting and and reuse. However, the current system of The OKF Working Group on Open obtaining permissions and data was an scientific publishing works against maxi- Data in Science (http://science.okfn.org/ extremely time-consuming process, but mum dissemination of the scientific data About/) began in 2009 with the purpose of even with instant downloads, deciding underlying publications. Barriers include developing guidelines, tools, and applica- what rights one has to reuse data can be inability to access data, restrictions on us- tions to promote open data in the sciences confusing due to a lack of licensing and age applied by publishers or data provid- and enable scientists to maximise the use clear terms of use. In some cases, the ers, and publication of data that is difficult and impact of that data. It is now a diverse supplementary data associated with papers to reuse, for example, because it is poorly and international community of scientists, is open even if the article itself is not; but annotated or ‘‘hidden’’ in unmodifiable tables like PDF documents. In addition, Citation: Molloy JC (2011) The Open Knowledge Foundation: Open Data Means Better Science. PLoS Biol 9(12): there is a cultural reluctance to publish e1001195. doi:10.1371/journal.pbio.1001195 data openly, for multiple reasons—from Published December 6, 2011 researchers’ fears about releasing data Copyright: ß 2011 Jennifer C. Molloy. This is an open-access article distributed under the terms of the ‘‘into the wild’’ where they lack control Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any over its usage to a lack of incentive or medium, provided the original author and source are credited. credit for doing so. Funding: The author received no specific funding for this work. Competing Interests: I have read the journal’s policy and have the following conflicts: I volunteer with the OKF as the Coordinator of the Open Data in Science Working Group. The Community Page is a forum for organizations and societies to highlight their efforts to enhance Abbreviations: BMC, BioMedCentral; IIOD?, Is It Open Data?; OA, open access; OKD, Open Knowledge the dissemination and value of scientific knowledge. Definition; OKF, Open Knowledge Foundation; WDTK?, What Do They Know? * E-mail: [email protected] PLoS Biology | www.plosbiology.org 1 December 2011 | Volume 9 | Issue 12 | e1001195 The scope of the principles covers all primary experimental data published with- in or alongside research papers, including the data content of any table or graph and all images, audio, or video acting as the primary mechanism of data capture, e.g., protein gels or animal vocalisation record- ings. The crux of the Panton message is that all such data—with very few excep- tions—should be placed explicitly in the public domain. Good reasons for not releasing data would include the risk of violating patient privacy or revealing the precise location of an endangered species. The Open Data Movement in Science The Panton Principles are not an iso- Figure 1. Screenshot of the CrystalEye entry for the structure of coenzyme lated initiative but part of a wider move- cob(II)alamin with a copy of the OKF Open Data button displayed on the site. ment to promote open data in science that doi:10.1371/journal.pbio.1001195.g001 is gathering momentum. Historically, sci- entific data has not been openly available, for a great variety of reasons. Some are this is often not explicit. Clear labelling extend the OKD with a new set of prin- technological—paper is not an efficient and licensing is vital to save scientists the ciples specific to the scientific field. form of sharing datasets—but the web has many hours they may spend discovering opened up not just new possibilities for the openness or otherwise of datasets and The Panton Principles for Open sharing, collaboration, and analysis, but becomes even more imperative as com- Data in Science also for exploring new forms of scientific puterised analysis of the scientific literature enquiry. For example, automated text and increases, for example via data and text In collaboration with John Wilbanks of Creative Commons, key members of the data mining of large swathes of the pub- mining. Websites such as the crystallogra- lished corpus of scientific knowledge is phy data aggregator CrystalEye (http:// OKF—Rufus Pollock (University of Cam- bridge), Peter Murray-Rust (University now feasible if such material is accessible. wwmm.ch.cam.ac.uk/crystaleye/) promi- Encouraging scientists to share their nently display an Open Data web button of Cambridge), and Cameron Neylon (STFC)—spent two years developing a data is a challenge, even when it directly on their website and link to the Public supports published work. A 2009 report by Domain Dedication and License (PDDL) set of principles for publishing open scientific data, using the OKD and the the Research Information Network [1] license as well as the OKD (Figure 1). Science Commons’ Protocol for Imple- found that some researchers were unwill- Deciding what constitutes open is par- menting Open Access Data (http:// ing to share their data openly due to fears ticularly pertinent to the movement in sciencecommons.org/projects/publishing of exploitation, particularly for datasets science towards open access, or OA, which /open-access-data-protocol/) as prece- where they felt they could extract multiple is related to open data but has different dents and guides. The result was the publications; another problem is the lack immediate goals. OA is defined in the Be- Panton Principles (see Box 1; http:// of career rewards, recognition, or incen- thesda Statement (http://www.earlham. www.pantonprinciples.org/), named after tives to publish data, which makes it , edu/ peters/fos/bethesda.htm) in terms the Panton Arms pub in Cambridge where difficult for researchers to justify the time that embrace open data. However, non- the majority of the drafting sessions and effort required to make data available. OA publishers often use the term to mean occurred. The principles were officially However, there is top-down pressure to ‘‘free’’ access to publications. An impor- launched in February 2010 and have since move towards open data publication from tant distinction is drawn within the open gained more than 150 endorsers. funders such as the Wellcome Trust and community between libre ‘‘free as in freedom’’, as expressed in the OKD, and gratis ‘‘free as in beer’’.