Open Metadata of Scholarly Publications
Total Page:16
File Type:pdf, Size:1020Kb
Open Metadata of Scholarly Publications Open Science Monitor Case Study Ludo Waltman EN July 2019 Open Metadata of Scholarly Publications European Commission Directorate-General for Research and Innovation Directorate G — Research and Innovation Outreach Unit G.4 — Open Science E-mail [email protected] [email protected] European Commission B-1049 Brussels Manuscript completed in July 2019. This document has been prepared for the European Commission however it reflects the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained therein. More information on the European Union is available on the internet (http://europa.eu). Luxembourg: Publications Office of the European Union, 2019 EN PDF ISBN 978-92-76-12011-7 doi: 10.2777/132318 KI-01-19-807-EN-N © European Union, 2019. Reuse is authorised provided the source is acknowledged. The reuse policy of European Commission documents is regulated by Decision 2011/833/EU (OJ L 330, 14.12.2011, p. 39). For any use or reproduction of photos or other material that is not under the EU copyright, permission must be sought directly from the copyright holders. EUROPEAN COMMISSION Open Metadata of Scholarly Publications Open Science Monitor Case Study 2019 Directorate-General for Research and Innovation EN Table of Contents ACKNOWLEDGEMENTS .......................................................................... 4 1 Introduction ..................................................................................... 5 2 Drivers ............................................................................................ 6 3 Barriers ........................................................................................... 7 4 Impact ............................................................................................ 8 5 Lessons learnt .................................................................................. 9 6 Policy conclusions .............................................................................. 10 ACKNOWLEDGEMENTS Disclaimer: The information and views set out in this study report are those of the author(s) and do not necessarily reflect the official opinion of the Commission. The Commission does not guarantee the accuracy of the data included in this case study. Neither the Commission nor any person acting on the Commission’s behalf may be held responsible for the use which may be made of the information contained therein. The case study part of Open Science Monitor led by the Lisbon Council together with CWTS, ESADE and Elsevier. Authors Ludo Waltman – Centre for Science and Technology Studies (CWTS) 4 STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017) 1 Introduction The Open Science Monitor partly relies on proprietary data sources, in particular the Scopus database. Scopus is a data source that provides metadata of scholarly publications. It has been created by Elsevier, which contributes to the Open Science Monitor as a subcontractor. The use of Scopus data in the Open Science Monitor has been subject of debate. In a complaint to the European Commission, the use of Scopus data has been criticized. Among other things, the signatories of this complaint raised the following question: “Given the EU’s emphasis on Open Science, including Open Data, why is there (apparently) no requirement to insist that the Open Science Monitor must be based upon open data, open standards, and open source tools (with appropriate licenses for re-use accessibility) as a matter of principle?”1 The response of the Open Science Monitor consortium has been that it is not possible to create the Monitor based exclusively on open data sources. Given the currently available data sources, the only way to create the Open Science Monitor is to make use of proprietary data sources such as Scopus or Web of Science. The same response has also been given by the European Commission: “Overall, the Commission wishes to have an as comprehensive Monitor as possible. … as long as there is in the European Union no fully open and transparent data-infrastructure, we are dependent on a fragmented data infrastructure and data sources from private operators. This implies that the Monitor has to be constructed under non-optimal conditions.” The debate about the Open Science Monitor illustrates the importance of developments toward open metadata of scholarly publications (e.g., open metadata of articles in scholarly journals and in conference proceedings). For many publications, metadata such as titles, abstracts, author lists, and reference lists is available in proprietary data sources such as Scopus, produced by Elsevier, and Web of Science, produced by Clarivate Analytics. The use of metadata provided by these proprietary data sources usually involves considerable cost and is subject to significant restrictions. Open data sources make metadata of publications available under minimal restrictions. Open metadata has several benefits. Open availability of metadata enables more researchers to carry out bibliometric studies, which will help to get a better understanding of the science system. There will also be more possibilities for testing the reproducibility of bibliometric studies. In addition, open metadata can be used in applied bibliometric analyses that aim to support research evaluation and research management. These analyses can be made more transparent, which will contribute to more responsible ways of using bibliometrics. There will also be more freedom in designing applied bibliometric analyses. For instance, these analyses do not need to rely on decisions made by a central authority (e.g., the producer of Scopus or Web of Science) on which scientific literature can and cannot be included in an analysis. Finally, open metadata may make scientific literature easier to find. New search engines for scientific literature can be developed based on open metadata. Open metadata is closely related to open access publishing. An increasing proportion of all scholarly publications are openly accessible. If a publication is openly accessible, its metadata is openly accessible as well, although not necessarily in a machine-readable format or in association with similar metadata from other publications. Conversely, if a publication is not openly accessible, the metadata of the publication may or may not be openly accessible, depending on the policies of the publisher. This report first provides an overview of the drivers of and barriers to open metadata of scholarly publications. It then demonstrates the impact of open metadata. Finally, lessons learnt and policy conclusions are discussed. The focus of this report is on metadata of scholarly publications. Metadata of other types of scholarly outputs (e.g., data sets and software) is also of considerable importance, but falls outside the scope of this report. 1 https://doi.org/10.5281/zenodo.2554199 5 STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017) 2 Drivers A prominent driver of open metadata of scholarly publications is the United States National Library of Medicine (NLM) at the National Institutes of Health. The NLM maintains PubMed, an open data source of metadata of a large share of all scholarly publications in the biomedical domain. PubMed was launched more than two decades ago, in 1996. It is widely used by biomedical researchers. A limitation of PubMed is that it does not include the reference lists of publications. Citation links between publications are therefore not available in PubMed. Also, for many publications, PubMed does not provide complete data on author affiliations. In recent years, there have been a number of significant developments toward open metadata of scholarly publications. First of all, scholarly publishers have increasingly made metadata of their publications openly available in Crossref, a registration agency for Digital Object Identifiers (DOIs). Publishers that are members of Crossref are obliged to “deposit timely and accurate metadata for (their) content”.2 When a publisher registers a DOI for a publication, Crossref obtains basic metadata for this publication, such as the title, the names of the authors, and the name of the journal in which the publication has appeared. Crossref then makes this metadata openly available. The data “is not subject to copyright and available to use for whatever purpose you may have”.3 In many cases, publishers also deposit the references of publications in Crossref. However, references are made openly available by Crossref only if the publisher grants permission for this. To persuade publishers to make the references of publications openly available in Crossref, the Initiative for Open Citations (i4OC) was established in April 2017.4 I4OC is an advocacy group that started as a collaboration of six organizations: OpenCitations, Wikimedia Foundation, PLOS, eLife, DataCite, and the Centre for Culture and Technology at Curtin University. The initiative is supported by a large number of other organizations. I4OC has had a major effect on the openness of citation data. Before the launch of I4OC, for only 1% of the publications with references deposited in Crossref the references were open. Two years after the launch of I4OC, this has increased to 55%, resulting in about half a billion references being openly available in Crossref. Most large publishers, including for instance Springer Nature and Wiley, support I4OC and make the references