HEFCE Letter on “Open Access and Submission of Outputs to a Post-2014 REF”
Total Page:16
File Type:pdf, Size:1020Kb
HEFCE Letter on “Open Access and Submission of Outputs to a post-2014 REF” Response from EMBL-EBI, European Bioinformatics Institute March 2013 Key Points • The European Bioinformatics Institute (EMBL-EBI) strongly supports HEFCE’s plans to make open access to research publications a requirement in the post-2014 REF. • Europe PubMed Central, run by the EMBL-EBI, is clearly embedded in the workflow of life scientists; omission of Europe PubMed Central as a repository recognised by the post-2014 REF would be out- of-step with community practice regarding deposition and use of subject-specific repositories. • Europe PubMed Central is actively developing several mechanisms to interoperate with Institutional Repositories to maximise access and reuse. • Europe PubMed Central is developed in the context of other major international databases at the EMBL-EBI, providing an opportunity to support existing and future mandates on access to research data in conjunction with articles. Introduction The European Bioinformatics Institute (EMBL-EBI) maintains the world’s most comprehensive range of freely available databases and services in support of research in the life sciences. Since 2011, the EMBL-EBI has led the development of Europe PubMed Central (Europe PMC), the database of life science literature, in partnership with The University of Manchester and the British Library, on behalf of 19 funders of life science research, led by the Wellcome Trust. As with all the major data resources at the EMBL-EBI, Europe PMC is part of an International Collaboration, in this case based on PubMed Central in the USA, and also including PubMed Central Canada. Europe PMC contains 2.6 million full text research articles and 28 million abstracts. All the content can be searched and read freely, with no barriers. Furthermore, almost 600,000 of the full text articles are "gold" open access, allowing not only reading but reuse, for example, to data mine. All the content is available via a web site and APIs, and in the case of the fully open access article set, by FTP for bulk download. Currently the website is visited by about 1.3 million unique IP addresses viewing over 15 million pages per month. Europe PMC meets international standards for journal article archiving formats The articles in Europe PMC are archived in XML according to the JATS DTD (NISO z39.96-2012). Each time an article is viewed on the web or served though an API, it is being retrieved and converted from XML to HTML on-the-fly from the database, which serves to constantly check the integrity and accessibility of the archive. If the content of Europe PMC were distributed across many locations in a variety of non-standard formats (e.g. MS Word, Latex, Open Office, PDF) this would significantly hinder effective archiving and the ability to capitalize on the content. The uniformity and richness of the format, along with the central nature of the collection, encourages the development of new tools and services. How content gets deposited in Europe PMC One of the reasons that Europe PMC has been able to build a significant content base is that the deposition of articles is made as easy as possible for researchers. In the simplest case, if a researcher publishes their article Open Access, then the journal deposits the XML in PMC on their behalf. The researcher has to do nothing more. In addition, many journals contribute to Europe PMC by depositing all content in full after a period of time (often 12 months); these include prestigious journals such as EMBO J. and Proc Natl Acad Sci USA. Very little of the content in Europe PMC (10% or less) is deposited via self-archiving. We strongly support the notion that researchers should have to deposit once, with minimal effort and maximal impact, to comply with mandates. We think that this activity should primarily be done via Europe PMC for the life sciences. Europe PMC: added value In addition to the core content management of Europe PMC, value is layered in several ways: (1) Citation counts are calculated and the citing articles are listed. Results from searches can be sorted according to how often the articles have been cited. (2) Data in core public molecular biology databases are cross-linked. Data citations are also mined from full text articles, along with key concepts such as organism names, gene names, and diseases. (3) Rich metadata also included extra information on, for example, funding source, MeSH keywords, full text DOIs. (4) Europe PMC supports Europe PMC Plus, which allows grant PIs to archive articles and link articles to funding information (5) Europe PMC Labs supports researchers in using the open access content to develop data mining applications The development of the above services is possible because the content is gathered in a common rich format, allowing the necessary software and algorithm development. Europe PMC in the context of EMBL-EBI data resources We recognise that not all disciplines have the same level of information infrastructure as the life sciences; however, we would like to suggest that in cases where a solid infrastructure is already established, that it is included in efforts such as the post-2014 REF. The EMBL-EBI has been mandated for 20 years to run core life science databases and services on behalf of the scientific community. We feel strongly that biological data should be open, archived and shared in such ways as to be of maximal use. The data are both deposited and used by the scientific community; as such the EMBL-EBI is in a position of public trust as custodians of this collective endeavour. This position has evolved in conjunction with journal editorial policies, funder mandates, and international scientific collaboration. We think that running Europe PMC in the context of related data resources is a natural extension of the public data requirement to do research, and that this position is compatible with the vision for the post-2014 REF. Europe PMC and data mining The EMBL-EBI strongly supports efforts that allow maximal reuse of information, as this progresses science and innovation. In Europe PMC, it is very clear which articles carry licences that allow data mining. These articles can be retrieved in full via the APIs (including all figures and supplemental data files) and the XML is also made available by FTP site for easy bulk download and reuse. By contrast, the licence information on content in institutional repositories is often difficult to ascertain, and there is no easy way to gather full text from those sources for data mining purposes. Approaches to integrating Europe PMC with Institutional Repositories Europe PMC is actively developing several services that will allow deep integration with Institutional Repositories: • Europe PMC is collaborating with the JISC-funded Repository Junction project to push content from Europe PMC to repositories at universities via the Repository Junction service. • The EMBL-EBI is a partner of OpenAIRE, the project that supports the European Commission’s open access pilot under FP7. We have mapped content in Europe PMC to content in OpenAIRE- compliant repositories, demonstrating that OpenAIRE records can be enriched with links to full text and related datasets from Europe PMC. • Europe PMC is developing a service that will allow institutional repositories to link out from articles in Europe PMC to records held in their repository. This service will be made public later in 2013. Taking all of the above into consideration, the EMBL-EBI therefore strongly recommends that HEFCE include Europe PubMed Central as a valid repository for inclusion in the post-2014 REF. .