SCOSS FUNDING APPLICATION FORM, 2019

Please note that questions marked with “*” are weighted more highly in the evaluation than others.

Deadline: 26 November 2019

For questions, email: [email protected] 1. General

1.1. Service name Include full name, acronym and URL:

OpenCitations (OC), http://opencitations.net

1.2. Name of organisation operating the service. Incl. acronym and URL

Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy (FICLIT, UniBo), http://www.ficlit.unibo.it/it

1.3. Short description of the service. What does it do and who does it serve? Please also included the country of the geographical home of the service.

OpenCitations is a small independent scholarly infrastructure organization dedicated to open scholarship and the publication of open bibliographic and citation data by the use of Semantic Web (Linked Data) technologies. It is also engaged in advocacy for semantic publishing and, as a key member of the Initiative for Open Citations (I4OC), also for open citations. Dr David Shotton and Professor Silvio Peroni are its two Directors. It provided, maintains and updates the OpenCitations Data Model (https://doi.org/10.6084/m9.figshare.3443876) which is based on our widely used SPAR (Semantic Publishing and Referencing) Ontologies (http://www.sparontologies.net), which may be used to encode all aspects of scholarly bibliographic and citation data in RDF, enabling them to be published as Linked (LOD). Separately, OpenCitations provides open source software of generic applicability for searching, browsing and providing APIs over RDF triplestores (https://github.com/opencitations). It has developed the OpenCitations Corpus (OCC, http://opencitations.net/corpus), a database of open downloadable bibliographic and citation data recorded in RDF and released under a CC0 public domain waiver, which currently contains information about ~14 million citation links to over 7.5 million cited resources. These are described using the OpenCitations Data Model, and are made freely available so that others may build upon, enhance and reuse them for any purpose, without restriction under copyright or database law. It has recently published a formal definition of an Open Citation (https://doi.org/10.6084/m9.figshare.6683855), and has launched a system for globally unique and persistent identifiers (PIDs) for bibliographic citations - Open Citation Identifiers (OCIs, https://doi.org/10.6084/m9.figshare.7127816) – which has been accepted by the community, and for which it maintains a resolution service at http://opencitations.net/oci. In addition, it is currently developing a number of Open Citation Indexes (http://opencitations.net/index), using the data available in third-party bibliographic databases. The first and largest of these is COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations (http://opencitations.net/index/coci), which presently contains information on more than 445 million citations, released under a CC0 waiver. Launched only last July, and pulling additional bibliographic information live from the Crossref API, COCI is already enjoying widespread use, with over one million API accesses between 1st Dec 2018 and 28th February 2019.

Currently, the OpenCitations hardware, data and services are hosted at the University of Bologna in the Department of Classical Philology and Italian Studies and in the Department of Computer Science and Engineering.

1.4. Year of establishment. 2010 (original prototype, University of Oxford); 2015 (new instance, University of Bologna).

1.5. Intentions for funding. In brief, describe your need for funding * The scholarly digital publishing landscape is undergoing a period of unprecedented disruptive change. As the open scholarship model gains traction, subscription models for access both to journal content and to citation indexes are crumbling, as evidenced by the Big Deal

For questions, email: [email protected] 2

Cancellation Tracking website maintained by SPARC (https://sparcopen.org/our-work/big- deal-cancellation-tracking/), which presently lists in excess of fifty academic libraries, institutions and consortia that have cancelled their subscriptions to the journals of Elsevier and other major publishers, judging that they are no longer value for money. These libraries are now wondering how they might more strategically spend the considerable sums of money saved by these cancellations, totalling many millions of dollars.

Of the subscription services that have yet to suffer in this manner are those for the citation indexes Web of Science (WoS) and Scopus. Providing direct alternatives to the monopolistic position previously held by these two commercial indexes are new providers of free citation data such as Google Scholar (launched in 2004) and Dimensions (launched in 2018). However, these newer sources too lack APIs from which citation data can be downloaded in bulk, and have restrictions on the open publication and reuse of the citation data they provide. Offering a genuine alternative to all these commercial citation indexes is OpenCitations, which offers free access to a growing corpus of totally open and reusable citation data. If OpenCitations is to be successful in achieving sufficient coverage of the world’s citations that it can provide a genuine open alternative to the citation data provided by WoS, Scopus and other commercial sources, and thus make the world’s citation data freely available to everyone, it needs significant public funding to maintain and expand its activities, datasets and services. This application for adoption as a SCOSS funding recipient is made in direct response to that need. 2. Value of the service to the or Community

2.1. How does this service fit into the Open Science landscape? Describe the service's general value to the Open Science / Open Access Community. How can you demonstrate your value as opposed to competing services?

The development of the open scholarship (open science) movement can be characterised as having four phases:

a) Open source software, leading, for example, to the creation and widespread uptake of Linux as an operating system (https://www.linux.com/), and to the establishment of the Free Software Foundation (https://www.fsf.org/); b) Open access publication, with the rise of new publishers such as the Public Library of Science (PLoS; https://www.plos.org/), the development of new publishing and peer review models including eLife (https://elifesciences.org/) and F1000 Research (https://f1000research.com/), and initiatives such as (https://www.coalition- s.org/); c) The increasing emphasis on the publication of open research datasets, with the developments of data repositories such as Dryad (https://datadryad.org/), support from organizations such as CODATA (http://www.codata.org/) and the Research Data Alliance (https://www.rd-alliance.org/), and infrastructures such as the European Open Science Cloud (EOSC, https://ec.europa.eu/research/openscience/index.cfm?pg=open- science-cloud); and most recently, d) Open metadata, involving, for example, the Metadata Registry (http://metadataregistry.org/), that has grown from the Semantic Web Community's Simple Knowledge Organization System (SKOS) to provide services for controlled vocabularies, and OpenCitations.

The importance of open metadata to support learning and research, to disseminate knowledge and to foster innovation has been emphasised by the early Jisc-funded Discovery metadata ecosystem programme (http://discovery.ac.uk/), by the actions of leading academic libraries including Harvard University Library (https://emeritus.library.harvard.edu/open-metadata) and the British Library (https://www.bl.uk/bibliographic/pdfs/sharing-bl-open-metadata-non-library- communities.pdf), and by the recent conference DC-2018: Open Metadata for Open Knowledge (http://dcevents.dublincore.org/IntConf/dc-2018/). Open bibliographic metadata

For questions, email: [email protected] 3

supports open scholarship, underpins bibliometrics and scientometrics, and facilitates the emerging field of study known as the Science of Science (also referred to as Research on Research).

Bibliographic citations — the links created when the author of a published work acknowledges other works in its bibliographic references — are one of the most fundamental types of bibliographic metadata, and are central to the world of scholarship. They knit together independent works of scholarship into a global endeavour, and are important for assigning credit to other researchers. The open availability of citation data is a crucial requirement for Open Science, since analyses of citations can reveal how scientific knowledge develops over time and can illuminate patterns of authorship (e.g. self-citations). Such information is essential for assessing scholars’ influence and making wise decisions about research investment. Bibliographic databases and citation indexes are also crucial to individual researchers, since they enable their use of automated tools to search the literature for papers of relevance to that scholar’s work.

At present, the two most authoritative sources of citation data are Clarivate Analytics’ Web of Science (WoS), which grew from the Science Citation Index created by Eugene Garfield in 1964, and Elsevier’s Scopus, launched in 2004. Neither are open, most research universities having to pay tens of thousands of dollars annually to access one or both of them, while institutions and independent scholars that cannot afford such costs have no access. More recently, in addition to a number of subject-specific indexes, other sources of general citation data have been made available by other commercial companies, for example by Google (Google Scholar), Digital Science (Dimensions) and Microsoft (Microsoft Academic, formerly Microsoft Academic Search). However, all have significant license restrictions on users’ ability to reuse and republish the citation data they provide, which seriously limits the full description and reproducibility of research studies using these data.

OpenCitations has been established for the specific purpose of disrupting that status quo, by providing a fully free and open alternative means of accessing global scholarly citation data. A bibliographic citation is factual in nature, and such facts cannot be copyrighted and should not be placed behind subscription paywalls. Rather, citations form a crucial component of the factual metadata describing scholarly publications, full access to and reusability of which is vital for open scholarship.

OpenCitations is already making available as CC0 material, in the Open Citations Corpus (OCC) and in COCI (the OpenCitations Index of Crossref open DOI-to-DOI citations), more than 450 million citations – a very significant fraction of all scholarly citations that exist. In particular, COCI contains the basic metadata for each citation – its Open Citation Identifier (OCI), its publication date and timespan, whether or not it is an author-self-citation or a journal-self-citation, and the DOIs of the citing and cited publications – and then pulls additional bibliographic metadata on these publications live from the Crossref, DataCite and Unpaywall APIs as required. These citations are all queryable and browsable by means of OpenCitations interfaces and query services, while the entire dataset can be downloaded in bulk for reuse for any purpose. The majority of the current scientific citations missing from this OpenCitations dataset either are not submitted to Crossref by publishers, or are submitted within the Crossref “Closed” category. OpenCitations thus does not yet have access to all existing scholarly citations. However, neither are WoS or Scopus comprehensive, since each selects which citation data to host, to the exclusion of others. For example, WoS contains very few computer science papers, since these are predominantly given as conference presentations which it does not comprehensively record.

Other providers that offer genuinely open citation data include Crossref itself (in the form of open reference lists, comprising about half of all the bibliographic reference lists submitted to them, the remainder – primarily from Elsevier – being in its “Closed” category), WikiData, DataCite and smaller repositories such as Dryad (which hosts citations from submitted datasets to related journal articles). OpenCitations is planning in the near future to release as Linked Open Data new OpenCitations Indexes (http://opencitations.net/index) of the open For questions, email: [email protected] 4

citations held by all three of these other providers, to complement the index of those available from Crossref in COCI. Using the OpenCitations API, it will be possible to conduct federated searches across all these indexes simultaneously. With the exception of Wikidata, OpenCitations is the only substantial source of open citation data published in RDF as Linked Open Data.

Thus, in summary, OpenCitations has the advantages over other sources of citation data that:

a) all its data are fully open, published under a Creative Commons CC0 Public Domain waiver; b) all its data are made available in RDF via a SPARQL endpoint, and are also available via the OpenCitations REST API (in JSON and CSV formats), via HTTP requests in different formats (HTML, RDF/XML, Turtle or JSON-LD, via content negotiation), via our searching and browsing Web interfaces OSCAR and LUCINDA, and as bulk data dumps; c) all its bibliographic and citation data are supported by full open provenance data; d) all its software is fully open, published under the permissive free ISC License for software (https://choosealicense.com/licenses/isc/); and e) all its publications describing its data and services are Open Access.

It needs to be made clear that it is not OpenCitations’ purpose to replicate all the added- value services that Web of Science and Scopus provide, built at considerable effort and cost on top of the citation data they host, for which subscription access is totally justifiable. Indeed, those commercial enterprises would be welcome to use open citation data from OpenCitations to enhance their services. Rather, OpenCitations primary focus is to make the ‘raw’ information concerning scholarly citations and associated bibliometrics metadata freely available for all to reuse, allowing others, if they choose, to develop their own competitive added-value services over these data.

For questions, email: [email protected] 5

2.2. Describe the service's general value to the Open Science / Open Access Community and how it fulfills an international need. Also describe how far the infrastructure fits into a key area of importance to serve a broad Open Access and Open Science need as opposed to a specific disciplinary need * OpenCitations, as a scholarly infrastructure organization, espouses fully the founding principles of Open Science. It complies with the FAIR data principles (https://www.force11.org/fairprinciples) proposed by Force11, of which Dr Shotton is a founding member, that open scholarly data should be findable, accessible, interoperable and re-usable, and it complies with the recommendations of the Initiative for Open Citations (I4OC, https://i4oc.org), of which Professor Peroni and Dr Shotton are both founding members, that citation data in particular should be structured, separable, and open. OpenCitations has published a formal definition of an open citation at https://doi.org/10.6084/m9.figshare.6683855.

OpenCitations is thus fully dedicated to open scholarship and the publication of open bibliographic and citation data under a CC0 waiver, as well as to the release of open source software that enable scholars to work with these data, enabling their reuse in other contexts.

The citation information provided by OpenCitations is, by its nature, global in scope, and useful to scholars worldwide. Since English is the default language of science in much of the world, the OpenCitations website is in the English language. Automated Web browser translation of this text into the languages of non-English-speaking users is now sufficiently accurate that OpenCitations has no immediate plans to ‘internationalize’ its web site.

We are collaborating with the following groups and academic projects, both to promote the use of the OpenCitations Data Model (OCDM), and to provide a publication venue for the citation data that they are liberating from the scholarly literature:

• Matteo Romanello of the Digital Humanities Laboratory at the University of Lausanne is using OCDM for modelling citations of the classical literature within ancient Venetian documents in the context of the Venice Scholar Index (https://www.venicescholar.eu/), and is currently working to produce a dataset of citation data compliant with OCDM so as to be directly ingestable into and published by the OpenCitations Corpus (OCC). • Two DFG-funded German projects that are extracting citations from Social Science publications: o The Linked Open Citations Database (LOC-DB, https://locdb.bib.uni-mannheim.de/) at the University of Mannheim is using OCDM to model their data, with the aim of producing them in conformity with that model so that they can be directly ingested into the OCC. o Steffen Staab (University of Koblenz) and Philipp Mayr (GESIS) are running the EXCITE Project (http://west.uni-koblenz.de/en/research/excite/), which also uses OCDM to model their citation data, and have already adopted the OCC as their publication platform. In fact, in September 2018, ~1 million citations coming from the EXCITE Project were successfully ingested into and published by the OCC. • Sergey Parinov is technically leading CitEcCyr (https://github.com/citeccyr/CitEcCyr), which is an open repository of citation relationships obtained from research papers in the and Cyrillic script. This project intends to model its citations using the OCDM, and will use the OpenCitations Corpus as its publication platform.

The following project and organizations have let us know that they are using OpenCitations data, either from the OpenCitations Corpus (OCC) or from COCI:

• Wikidata (https://www.wikidata.org/) includes alignments between several bibliographic entries with OCC resources; • OpenAIRE (https://www.openaire.eu/) imported OCC metadata about articles into their LOD database; • Daniel Ecer and Lisa Knoll of eLife (https://elifesciences.org/) have performed analytics on For questions, email: [email protected] 6

the OCC data; • Ontotext demonstrated SPARQL query federation (https://2017.semantics.cc/ivelina- nikolova-and-ilian-uzunov) between Springer Nature LOD and OCC; • Anna Kamińska published a bibliometrics case study of PLOS ONE articles in OCC (https://doi.org/10.5281/zenodo.1010450); • Daniel Himmelstein processed OpenCitations data to create DOI-to-DOI citation tables (https://github.com/greenelab/opencitations); • Thiago Nunes and Daniel Schwabe are using OCC to exemplify their XPlain framework (https://vimeo.com/227356693); • Antonina Dattolo and Marco Corbatto are using the OCC as a data source for their VisualBib framework (http://sasweb.uniud.it/visualBib/visualBib.html); • Nees Jan van Eck and Ludo Waltman of Leiden University’s Centre for Science and Technology Studies (CWTS) have extended VOSviewer (http://www.vosviewer.com/), a software tool for constructing and visualizing bibliometric networks, so as to use data in the OCC and in COCI by means of the OpenCitations REST APIs; • Barney Walker developed Citation Gecko (http://citationgecko.com/), a graph-based citation discovery tool based on the OCC and COCI for retrieving citation data about papers; • Philipp Zumstein developed a Zotero plugin (https://github.com/zuphilip/zotero-open- citations) that gives information about open citations using COCI; • Dominique Rouger developed a Web application (https://dossier-ng.univ-st- etienne.fr/scd/www/oci/OCI_graphe_accueil.html) that allows one to search articles by means of the COCI REST API, that are then visualised in a graph showing citations to the retrieved articles. It then enriches this visualisation by adding additional information about the publication venues, publication dates, and other related metadata; • Stephen Pearson presented a study (https://blog.research- plus.library.manchester.ac.uk/2019/03/04/using-open-citation-data-to-identify-new- research-opportunities/) run on publications by scholars at the University of Manchester which used COCI to retrieve citations between these publications, so as to investigate possible cross-discipline and cross-department potential collaborations; • Angelo Di Iorio and colleagues have used COCI data to conduct an experiment (https://arxiv.org/abs/1902.03287) on the latest Italian Scientific Habilitation (the national exercise that evaluates whether a scholar is appropriate to receive an Associate/Full Professorship position in an Italian university), which aimed at trying to replicate part of the outcomes of this evaluation exercise for the Computer Science research field by using only open scholarly data.

The SPAR Ontologies (http://www.sparontologies.net), which are maintained by OpenCitations and that are used for defining the OpenCitations Data Model, are in use by about 40 other projects and organizations (full list at http://www.sparontologies.net/uptake), including:

• The United States Global Change Information System, which encodes federal information relating to climate change, makes extensive use of SPAR ontology terms. • The United Nations Document Ontology (UNDO) has been specifically aligned with FaBiO (http://purl.org/spar/fabio). • Wikidata has many classes that have been alighted with FaBiO and CiTO (http://purl.org/spar/cito). • DBPedia’s DataID ontology uses the FaBiO and DataCite (http://purl.org/spar/datacite) ontologies. • W3C’s Data on the Web Best Practices: Dataset Usage Vocabulary uses SPAR Ontologies.

For questions, email: [email protected] 7

2.3. Describe the benefits of your service for specific stakeholder groups. Also explain any user engagement activities. Include key endorsements for any of the following: * OpenCitations’ publication of open citation data in a variety of formats, and its provision of related services for their access, serve several actors with different needs. To illustrate the support from these varied users of bibliographic citations for our goal of providing a fully open alternative to the expensive subscription access to global citation data provided by WoS and Scopus, we attach letters of support from a number of institutions and organisations.

2.3.1 Funders The availability of open citation data will help funders to assess the impact of scientific work and to decide which researchers, ideas and projects are worth funding to support scientific progress.

A letter of support is attached from Robert Kiley, Head of Open Research at the Wellcome Trust, and Coordinator of cOAlition S. 2.3.2 Research institutions Research institutions and universities that wish to track the scholarly productivity and influence of their members will benefit by being able to do so more readily and without cost, once the bibliographic and citation data for these individuals are openly available in machine-readable form. Other institutions act as bibliographic data providers, working in parallel and in collaboration with OpenCitations.

A letter of support is provided from Johanna McEntyre, Director of the European Bioinformatics Institute, that publishes Europe PubMed Central, from which OpenCitations harvests research article reference lists and metadata.

2.3.3 Libraries Any scholarly library needs to support its stakeholder communities (authors, researchers, students, institutional administrators) by providing access to data about scholarly products, with particular regard to their citations. Having such data openly available is the key component for the further development of these library services, made affordable by savings on current subscriptions.

Letters of support are included from Torsten Reimer, Heads of Research Services at the British Library, UK, and from Sören Auer, Director of the Leibniz Information Centre for Science and Technology University Library, Germany.

2.3.4 Researchers / Authors The open citation data published by OpenCitations will benefit all scholars and researchers, particularly those who are not members of the elite club of research universities that can afford subscription access to WoS and Scopus. These data are of particular value to bibliometricians, since they not only permit open research, but also allow re-publication of the actual data upon which the research findings are based. This is rarely possible when the research is based on data from proprietary citation indexes. Such scholars will now be able to pursue their studies with greater freedom, following reference trails through the citation network without hindrance, and have their own publications more easily found, discussed and cited.

Letters of support are attached from the following leading bibliometricians:

Ludo Waltman, Professor of Quantitative Science Studies and Deputy Director of the Centre for Science and Technology Studies (CWTS) at Leiden University, and Editor in Chief of the journal Quantitative Science Studies (MIT Press);

Vincent Larivière, Associate Professor of Information Science at the École de For questions, email: [email protected] 8

Bibliothéconomie et des Sciences de l'Information, Université de Montréal, Montreal, Quebec, Canada, and Associate Editor of the journal Quantitative Science Studies; and

Cassidy Sugimoto, Program Director for Science of Science, National Science Foundation, USA, and President of the International Society of Scientometrics and Informetrics.

2.3.5 Research managers Research managers will also benefit, because the data and software created by OpenCitations will be available for integration with other similarly described resources within their CRIS systems, including research information encoded using CERIF, the Common European Research Information Framework. In particular, FRAPO (the Funding, Research Administration and Projects Ontology, http://purl.org/cerif/frapo), one of our suite of interoperable SPAR ontologies, is a CERIF-compliant OWL 2 DL ontology for describing administrative information relating to grant funding and research projects.

2.3.6 Repositories Collecting and maintaining all of scientific knowledge in a single repository would be extraordinarily difficult, if not impossible. However, the use of the Semantic Web technologies, already adopted by OpenCitations and other services such as Wikidata, provides a basis for the creation of a federation of decentralised scholarly databases that can cooperate with each other by providing interoperable data. The idea of organizing existing scholarly metadata repositories (e.g. OpenCitations’ datasets, Wikidata, OpenAIRE) as part of a bigger interlinked graph of open repositories is envisioned in the 2017 report of COAR (https://www.coar-repositories.org/files/NGR- Final-Formatted-Report-cc.pdf), and has the benefit of allowing each of them to scale independently in terms of their infrastructure and the amount of data they need to handle. (See further in Section 6 Foresight below.) Key players within the open science landscape are data repositories, which contain bibliographic citations to the journal publications describing the research datasets they house.

A letter of support is provided from John Chodacki, Director of the California Digital Library at the University of California (CDL), both on behalf of CDL itself and also of behalf of the Dryad Data Repository, which CDL is currently managing.

2.3.7 Publishers Publishers will benefit considerably from open citation data relating to their publications, since more readers will be readily guided from the open citation data to their online journal articles. Few people these days scan the Table of Contents of each issue of relevant journals as they are published. Rather, a researcher will typically come to an article by following a web search or a citation link, hoping the publication might be relevant to his or her line of enquiry. Thus, the more readily accessible such citation links are, and the more that user interfaces facilitate such searches, the greater the traffic to the articles. In addition, we anticipate that journals that are more readily discoverable will benefit by attracting additional article submissions.

2.3.8 Other Computer scientists and software providers:

The open citation data made available by OpenCitations will also benefit computer science developers, who can exploit this free availability to build new applications and visualisations that we cannot even begin to imagine.

An example of what has already been achieved is given by VOSviewer (http://www.vosviewer.com/), the leading citation network visualization software, For questions, email: [email protected] 9

which has recently been expanded to use the OpenCitations API to access open citation data not only within the OpenCitations datasets but also from Wikidata.

A letter of support is attached from Nees van Eck, Senior Researcher at the Centre for Science and Technology Studies (CWTS) at Leiden University, and the lead developer of VOSviewer.

Open scholarship support institutions:

The Educopia Institute (http://educopia.org) exists to empower collaborative communities to create, share, and preserve knowledge. It recently conducted a census of scholarly communications infrastructure providers entitled "Mapping Scholarly Communication Infrastructure", in which OpenCitations participated, aimed at providing a system-level view of the major players in this space, and the governance, finance, community, and communications elements that support them, to improve collective understanding of how to stabilize and sustain the scholarly communications system and the diverse elements that comprise it.

Since then, we have had two extended videoconference discussions with Katherine Skinner, Executive Director of the Educopia Institute, and her colleagues, about how the Educopia Institute might support OpenCitations in this exciting transition to sustainability (see further in Section 5.5 Sustainability below). Since she is familiar with SCOSS, she has asked us to mention Educopia’s support of OpenCitations in this SCOSS application, and has provided a letter of support, which is attached.

2.4 How will SCOSS funding institutions be able to contribute feedback to the ongoing development and delivery of your infrastructure/service? Any SCOSS funding institution can provide feedback by means of the existing communication channel provided by OpenCitations, i.e. emails, social network accounts (Twitter), and the issue trackers available in each GitHub repository included within the OpenCitations GitHub organisation at https://github.com/opencitations. In addition, it will be possible to share feedback with any member of the OpenCitations Board and by participating in relevant workshops co-organised by OpenCitations, such as the workshop “Open Citations: Opportunities and Ongoing Developments” that will be held as part of the forthcoming 17th International Conference on Scientometrics and Informetrics in Rome (ISSI 2019, https://www.issi2019.org/), and the subsequent 2020 Workshop on Open Citations that will be held in Bologna.

For questions, email: [email protected] 10

3. Technical details

3.1. Technical relevance. Describe the hardware and software infrastructure, e.g. machines, location, redundancy, backup/failover arrangements, comments on robustness, load management, sustainability. Database(s) used, software, security. Open source access * The current OpenCitations server is an independent physical server – a Dell PowerEdge R730xd, 2 Intel Xeon E5-2620 2.1GHz, 512GB of RAM, 22.6TB HD – that both stores and handles all the datasets made available by OpenCitations in a number of separate databases, and also offer adequate performance to handle the OpenCitations query services (i.e. the SPARQL endpoints and the REST APIs). This server is supplemented with 30 additional Raspberry Pi 3Bs which are able to work in parallel to gather new reference data to upload into the datasets. In addition, the server is accompanied by a NAS for secure data backups – QNAP TS-1253U-RP NAS, equipped with 32TB of HD – and by an UPS – UPS Eaton 5PX 3000i RT2U Netpack – for handling possible interruptions to the electrical supply. All the hardware is currently located in the Department of Computer Science and Engineering of the University of Bologna, makes use of that department’s generic infrastructure, and is maintained by members of its IT staff. The server runs a Debian Linux OS (version 9.2), while each Raspberry runs a Raspbian OS.

All the software used and/or developed by OpenCitations is open source. In particular, the entire software suite developed by OpenCitations for ingesting new citation data, and for enabling search on and browsing of them, is available as open source code (released under the permissive ISC license) on GitHub at https://github.com/opencitations.

The graph databases used for storing all the data are realised by means of different Blazegraph instances (https://www.blazegraph.com). All the data stored in such databases are also replicated in the backup NAS server and in third party data repositories, namely Figshare (https://figshare.com) and the Internet Archive (https://archive.org).

All the OpenCitations data are released in CC0, while the OpenCitations Data Model, which is used for describing the data and is entirely based on the Semantic Publishing and Referencing (SPAR) Ontologies (http://www.sparontologies.net) developed by OpenCitations, is released in CC-BY.

The current physical infrastructure should support the continuing population of existing OpenCitations citation datasets, and the creation of the planned additional citation data indexes over the short term – i.e. for the next two years. However, if the OpenCitations datasets continue to grow as anticipated (see Section 6, Foresight), we expect that by the end of the SCOSS funding period we will either require additional local hardware or a full migration to cloud services, which, while reducing local maintenance requirements, will have an ongoing service cost for which continuing financial provision will be required.

3.2. Provide user data that demonstrates impact and significance. E.g. Web usage statistics by geographic region (and country where possible), incl. extent of usage, visitors, sessions, usages via API/harvesting, geographical distribution in the previous year * In the past year, the OpenCitations website, with all its services and pages, has been accessed more than 3.1 million times by more than 68,000 unique visitors (identified by their IP addresses) – we have excluded from all these counts all accesses made by automated agents and bots. Specifically, the number of accesses made between April 2018 and March 2019 (inclusive) is shown in Figure 1, that list five main categories of information access services available in the website – i.e. the direct HTTP access to particular bibliographic resource and/or citation (“HTTP_CONT_NEG”), the search/browse interfaces (“INTERFACE”), the REST APIs (“API”), SPARQL queries to the endpoints (“SPARQL”), and ‘other’ (visits to the OpenCitations homepage and other web pages). It is worth mentioning that the APIs were formally introduced in June 2018, following a few internal experiments run in May 2018. For questions, email: [email protected] 11

They have rapidly become the main service used for querying the citation data available in OpenCitations.

Figure 1. The overall number of accesses to the website pages and services in the past year, month by month. “HTTP_CONT_NEG” (i.e. HTTP content negotiation) indicates the direct accesses to stored resources by means of their HTTP URI, “INTERFACE” indicates the use of Web interfaces for browsing and searching bibliographic and citation data, “API” shows the calls to the various OpenCitations REST APIs, “SPARQL” indicates the calls to the OpenCitations SPARQL endpoints, while “OTHER” lists the accesses to all the other resources. Note that the y-axis is logarithmic.

These data are complemented by the chart shown in Figure 2, which shows the number of requests from different countries worldwide (identified by the IP addresses of the requests). As it is clear by looking at the diagram, the 43% of such requests came (surprisingly) from , followed by United States of America (21%) and Italy (14%).

Figure 2. The map showing the relative frequency of accesses to the OpenCitations website in the last year, organised per countries. Only six countries worldwide, coloured in white, did not access the website in the past year.

In Figure 3, we show the statistics concerning the OpenCitations resources made available on Figshare (i.e. the dumps of the datasets made available by OpenCitations, as well as the definition documents). In the past year, these Figshare documents have obtained more than 20,000 views and 3,000 downloads overall.

For questions, email: [email protected] 12

Figure 3. The overall number of views and downloads to all the OpenCitations resources stored in Figshare – mainly dataset dumps and definition documents.

All the CSV data used to create the previous charts are available at https://doi.org/10.6084/m9.figshare.8050352.

For questions, email: [email protected] 13

3.3. Provide information on your customer service, i.e. on your ability to perform, respond to issues of concern, etc. What process is in place and what is the average response time for what kinds of questions?

OpenCitations is not yet a legal entity, and has no formal customer service arrangements. However, the Directors as well as their collaborators communicate and are reachable in several ways – via email ([email protected]), Twitter (https://twitter.com/opencitations), Wordpress (https://opencitations.wordpress.com), and GitHub (https://github.com/opencitations). On average, the response time to any request we received through these channels is one day.

4. Costs

4.1. Total annual operational costs of the previous 2 years. * Please provide a financial report for the previous year preferably approved by an accountant. This should include a detailed breakdown of income and expenses, including itemised staff costs, including roles and functions, IT expenses and miscellaneous costs, i.e. travel & meetings.

Please also include the number of FTE. This data will be made public.

Gifts in kind that have benefitted OpenCitations include the provision of office space and services (heat, light, cleaning, etc.) by the University of Bologna, the time given by the University's grant administrators and finance officers, the free provision of computer hardware and IT network services by the University's Computer Science Department, and the time donated by that department's IT staff, together worth an estimated €20,000. Additionally, Dr Shotton, Director of OpenCitations, has donated considerable amounts of his time to OpenCitations over the past twelve months, worth €16,195, that was not covered by the Sloan Foundation grant.

With the exception of these gifts in kind, the costs of OpenCitations over the past two years have been entirely covered by the grant provided by the Alfred P. Sloan Foundation for the project entitled “The OpenCitations Enhancement Project”, the final report for which is available on the OpenCitations blog at https://opencitations.wordpress.com/2019/01/02/opencitations-enhancement-project-final- report/.

Specifically, we received from the Sloan Foundation a total amount of 124,993 USD, which has been spent as follows:

• 36,646.39 USD for a full-year PostDoc (1.0 FTE) working at the University of Bologna; • 24,853.40 USD for consultancy (for the Consultant Co-Investigator, 0.28 FTE); • 13,264.05 USD for attendance at international conferences; • 1,182.39 USD for supplies, subscriptions to services (i.e. Crossref), and APCs; • 37,274.17 USD for hardware (server, laptops, etc.); • 3,852.90 USD for organising workshops and events; • 7,919.70 USD for institutional overheads.

4.2. Total organisational costs for the 2 years of requested funding. * Please provide an organisational budget for the 2 years requested for funding. This should include a detailed breakdown of income and expenses, including itemised staff costs, including roles and functions, IT expenses and miscellaneous costs, ie. travel & meetings. This should also include any amounts of secured funding and/or further expected

For questions, email: [email protected] 14

funding from other sources in each year. Please also include the planned number of FTE. This data will be made public.

The requested SCOSS budget (itemized spreadsheet attached) covers one year of OpenCitations operations while it continues to be hosted within the University of Bologna, and one subsequent year as an independent legal entity which we will establish during Year One to receive and use funds and provide citation services as an independent body. The requested budget for a further third year is “for the bank” as defined by SCOSS, and is costed at 50% of the combined budgets for Years One and Two.

The requested amounts shown in the bottom line exclude the funding separately promised by CWTS, the Centre for Science and Technology Studies at Leiden University, Netherlands, which has already pledged the generous support shown in the penultimate line of the budget spreadsheet.

The requested items shown in the budget spreadsheet can be summarized in the following categories:

• Consultancy fees for the Educopia Institute to assist OpenCitations in making a successful transition to sustainability. • Legal and administrative costs to establish and maintain OpenCitations as an independent legal entity that can hold and administer funds to further the aims of OpenCitations. • Salaries for a CEO (Chief Executive Officer) (from Year 2), three computer specialists, one data curator, one Policy Advocate and Community Outreach Officer, and the OpenCitations Manager / Accountant / Administrator, to enable the work of OpenCitations to expand as outlined in Section 6 Foresight, below. • Consultancy fee for OpenCitations’ current Director David Shotton (recently retired from the University of Oxford, and receiving no other salary) to contribute 20% of his time, which will be spent in guiding the continuing development of and providing publicity for OpenCitations in the role of Consultant Director. • Personal computer hardware to support the above-mentioned employed staff. • A budget to cover the costs of conference attendance, meetings of the OpenCitations Board, and hosting of a further Open Citation Workshop in 2020 anticipated to have between 100 and 150 delegates. • A budget for consumables and Crossref subscriptions. • A budget to cover annual operating costs and a contingency fund to cover hardware repair/replacement, including in the first year departmental overheads as part of the University of Bologna, and in the subsequent year, after OC becomes a legal entity, service operating costs for ongoing server hosting and infrastructure provision by the Department of Computer Science, University of Bologna, plus OpenCitations’ own independent costs for office rental and for web site and infrastructure maintenance.

It has been clear for some time that what we are able to achieve has become severely limited by lack of human resources to develop OpenCitations, particularly since, in his recently promoted professorship, Professor Peroni has the very demanding and interesting task of developing his computer science teaching activities within the Department of Classical Philology and Italian Studies at the University of Bologna. Thus, for example, work to resume ingest of reference lists from PubMed Central using parallel processing on Raspberry Pis, and to improve the OpenCitations API to work more effectively with VOSviewer, the visualization software using OpenCitations data to visualize citation networks, are both on hold for lack of available computer scientist time.

It is thus very easy to justify the several staff members whose salaries are requested in this application, who will enable OpenCitations to expand its activities beyond its present

For questions, email: [email protected] 15

resources straightjacket. Each of the three computer scientists has a distinct role - being responsible, respectively, for development of the computational infrastructure, for citation and bibliographic data harvesting and management, and for user interfaces and Web development. The data manager/curator will be responsible for the accuracy and completeness of citation data and of bibliographic metadata, as we expand our activities to ingest citation data from new sources of more variable quality (see below), and to store bibliographic metadata in house for the first time for all publications referenced in the various OpenCitations Indexes. The Policy Advocate and Community Outreach Officer has a triple function: first to interact with those, particularly publishers and intermediaries such as Crossref and PubMed Central, who are the sources of the bibliographic and citation data that we will publish on our open platform, secondly to reach out to and help the scholars and citizens who will use OpenCitations data, to many of whom OpenCitations is presently completely unknown, and thirdly to promote OpenCitations to potential SCOSS support organizations and financial contributors. In addition, this person will be responsible for PR and for non-technical publications describing OpenCitations. The final two positions, of the OpenCitations Chief Executive Officer and of the OpenCitations Manager / Accountant / Administrator, will remove from the shoulders of Professor Peroni the current administrative load of managing running and representing OpenCitations as an open scholarly infrastructure organization, of taking day-to-day responsibility for staff HR issues, and of managing its finances, since Professor Peroni’s university responsibilities presently inhibit him from expanding these activities as he would wish.

The other budget entries are self-explanatory. We are particularly keen to organize further Open Citation Workshop in 2020, either alone or in partnership with Europe PubMed Central and other organizations, following the enormous success of the 2018 workshop at the University of Bologna, which did so much to cement together members of this nascent community of interest surrounding scholarly citations.

For questions, email: [email protected] 16

4.3. Total funding requested. Please indicate your figures in Euros for year1 and year2. * You may request up to 2 times your annual operational costs + 1 annual operational costs (for the bank), or a percentage of that (in total 3 years)

Please indicate the total of funding requested by SCOSS in Euros. Year One: 452,990.00 Euros Year Two: 546,975.00 Euros Year Three: 499,982.50 Euros (for the bank, calculated as the average of Years One and Two)

For a full breakdown of costs, please see the attached OpenCitations SCOSS application budget spreadsheet.

In addition to these costs, there is also a one-time fee to provide to SCOSS, which is of 25,000 Euros + VAT (21%), for a total of 30,250 Euros.

The total of funding requested by SCOSS (Year One-Two-Three + SCOSS fee) is of 1,530,198 Euros.

4.4. Indicate the % of the total budget requested * 100 %

5. Sustainability measures

5.1. Describe your funding model, i.e. how you source your funds. Provide information on your funding sources, including key revenue streams, and the total % of external funding that currently covers your total expenses

The OpenCitations Corpus prototype was funded by a small Jisc grant to Dr Shotton at the University of Oxford. More recently, as mentioned in Section 4.1 above, the funding we have used to achieve the current status of OpenCitations has come from a grant from the Alfred P. Sloan Foundation for The OpenCitations Enhancement Project (https://sloan.org/grant- detail/8017, start date May 2017, end date Nov 2018, total amount: 124,993 USD). We now have a new grant from the Wellcome Trust for a one-year project entitled Open Biomedical Citations in Context Corpus, https://wellcome.ac.uk/funding/people-and-projects/grants- awarded/open-biomedical-citations-context-corpus, starting later this year (amount: 55,500 EUR). And, in addition, we have recently been offered 48,000 EUR by the EU-funded project RISIS2 (https://www.risis2.eu) to support one new member of staff to provide citation data services tailored to that project’s needs, the administrative arrangements for which are currently being organized.

5.2. Describe any previous business model history if different and provide a short analysis of what your challenges have been in raising funds. As mentioned above, the OpenCitations Oxford prototype (2010) was funded by a small one- year grant from Jisc, which was awarded a six-month extension. There then followed a period without external funding, until the Sloan grant mentioned above was obtained in 2017. During that intervening period, development of OpenCitations was continued slowly on a ‘voluntary’ basis from the University of Oxford and the University of Bologna.

Since 2017, assisted by this Sloan Foundation grant, we have been able to expand the SPAR (Semantic Publishing and Referencing) Ontologies, create, revise and expand our new OpenCitations Data Model, install new OpenCitations hardware at the University of Bologna, set up the various OpenCitations databases and the new open citation indexes on Blazegraph instances running on that hardware, and populate these with new data. In addition, we have created and launched Open Citation Identifiers (OCIs) and their OpenCitations resolution service, and have provided new APIs and search and browse interfaces of generic usefulness using our OpenCitations open software OSCAR, LUCINDA and RAMOSE. For questions, email: [email protected] 17

OpenCitations is now poised for the major expansion described in Section 6 (Foresight) below, that will enable us to move to being a sustained and enduring open infrastructure organization that publishes comprehensive open scholarly citation data of global value, presenting a genuine alternative to the citation data currently offered by the commercial citation indexes of Clarivate Analytics and Elsevier, for which this opportunity to apply for SCOSS funding is acutely timely.

For questions, email: [email protected] 18

5.3. How will the service credit or promote the SCOSS programme? * Should OpenCitations be fortunate enough to be chosen as a recipient for the second round of SCOSS funding, we will, of course, prominently mention this on the OpenCitations website, write about the award in the OpenCitations blog and on Twitter, and acknowledge this support in all our publications and conference presentations. If invited to do so, we would also be pleased to participate in SCOSS events.

Additionally, we will recommend SCOSS to other open scholarship services seeking sustainability financial support, and will work as an exemplar to strengthen the existing links between SCOSS (SPARC Europe) and Educopia.

5.4. How will the service drive and support its own SCOSS fund-raising campaign? * While we envision that uptake and use of the OpenCitations open citation data services will continue to grow as it has been doing, without the need for strenuous publicity efforts on our behalf, beyond their initial announcements via our social media and conference presentations, both the Directors and the OpenCitations Policy Advocate and Community Outreach Officer will work to drive and support its SCOSS fund-raising campaign.

We already have numerous contacts with key individuals in major university and national libraries, research institutions, funding agencies and related organizations, and the members of our proposed OpenCitations Board, drawn from across our stakeholder communities, will have many more such contacts. We will mount our OpenCitations SCOSS fund-raising campaign by initially addressing these individuals, knowing that personal approaches often work best. So that this time-consuming task does not fall exclusively to the existing Directors and the new Board Members, this will be undertaken jointly with the Policy Advocate and Community Outreach Officer and by the OpenCitations Manager / Accountant / Administrator whom we propose to appoint.

Specifically, we will first write to these key individuals, and then schedule personal visits where possible. We will prepare a presentation for us all to use describing the benefits of OpenCitations, and the need for financial support from those academic institutions that will benefit from open citation data. We will in particular argue the financial benefits and value for money of supporting the growth of open citation data to a coverage that can offer a realistic alternative to Scopus and WoS, whereby small investments in OpenCitations will secure enormous future savings for these institutions in cancelled commercial citation index subscriptions.

For those that cannot be met with face-to-face, we will schedule one or more webinars during which we can provide the same information. We will make this same appeal in a series of blog posts on the OpenCitations Blog, and where appropriate at conferences and in scholarly publications.

5.5. How do you intend to become more sustainable after the SCOSS campaign is over? * The future development of the open scholarship landscape is changing rapidly and is impossible to predict accurately, and so we are presently unable to give a clear description of how OpenCitations will maintain sustainability funding following the SCOSS campaign. There is no ‘off-the-peg’ financial model that we can cite now that will apply to our situation in two or three years time. Rather, we will work towards a sustainable financial model in collaboration with our Board Members, SPARC Europe, Educopia and others who have offered to support us (see the Letters of Support attached). In particular, we wish to work closely with the Educopia Institute to achieve sustainability, and have requested funds in this SCOSS budget to support their work in assisting us. It is our hope that most of the libraries and scholarly institutions that offer support to OpenCitations during the SCOSS campaign will elect to continue to do so thereafter, and that others will join them as they realize the value of OpenCitations’ open citation data and the potential savings that using it will offer.

For questions, email: [email protected] 19

One thing that is clear is that there is no shortage of funds currently being used to obtain access to citation data. Major research universities and research institutions are presently spending many millions of dollars annually on subscriptions to WoS and Scopus, and even more on journal subscriptions. And as more and more cancel their “Big Deal” subscriptions, large amounts of money are being liberated within library budgets that could be diverted to support alternative open citation indexes and services that accord with the ‘open’ mission statements of such organizations, such as those provided by OpenCitations. Just 5% of the current subscriptions spent by academic libraries worldwide on commercial citation indexes would be sufficient to fund OpenCitations in perpetuity!

While the ‘targetted crowdsourcing’ model of funding adopted by SCOSS may be a hard sell to academic administrators, who would prefer to buy specific goods or services for known outlays e.g. by following a ‘membership’ model, it is the ideal method of providing ongoing support for open infrastructure organizations such as OpenCitations. This is what lies behind the Invest in Open Infrastructures (IOI) movement (https://investinopen.org/), in which Educopia, SPARC, SPARC Europe, ORFG and others are creating a decentralized organization to support deserving open infrastructure organizations by using ‘crowdsourced’ donations from major research institutions.

Having already participated in the first Invest in Open Infrastructures webinar led by Maurice York of the University of Michigan, we will continue to work with these people towards the adoption of OpenCitations by the Invest in Open campaign as a natural follow-on from the three years of SCOSS sustainability funding. The great advantage of this is that while SCOSS has a European focus, Invest in Open has more exposure in the United States, whose institutional support will be vital for long-term sustainability of OpenCitations. It may well be that OpenCitations will one of the grateful recipients of such future ongoing IOI support, or alternatively will be taken under the wing of a major academic library or institution with a mission to support open scholarship - we certainly hope so!

In addition, given the extent of the big deal cancellations tracked by SPARC (see https://sparcopen.org/our-work/big-deal-cancellation-tracking/), if all the institutions listed there would contribute (on average) 10,000 euros of their saved amount to OpenCitations, there would be enough money to entirely cover one year of OpenCitations’ request to SCOSS.

It is worth stressing that our first offer of sustainability funding has already come from CWTS, the Centre for Science and Technology Studies at Leiden University (see the Letter of Support from Professor Ludo Waltman). To support the work of OpenCitations, which Professor Waltman considers to be “crucial, innovative and unique”, CWTS will make available to OpenCitations a one-off financial contribution of 50,000 euros, followed by further contributions of 15,000 euros annually.

Finally, we will continue to apply for targeted project funding from other sources such as EU Horizon calls, and private charitable foundations including the Arcadia Fund, the Sloan Foundation and the Chan Zuckerberg Initiative.

For questions, email: [email protected] 20

5.6. Describe how the service addresses the Principles for Open Scholarly Infrastructures: http://cameronneylon.net/blog/principles-for-open-scholarly-infrastructures Bilder et al. recommend three sets of principles to which open scholarly infrastructures should adhere, under the headings Insurance, Governance and Sustainability.

Of these three categories of principles, OpenCitations is already completely fulfilling those described under Insurance:

• Open source: All the software released by OpenCitations is available on GitHub at https://github.com/opencitations and released with the ISC License, which is a very permissive free software license that allows maximum reuse of the software in different contexts, either commercial or non-commercial. • Open data: All the data published by OpenCitations are released with the CC0 waiver, while all the models used to describe these data (i.e. OpenCitations Data Model and the SPAR Ontologies) are made available in CC-BY. • Available data: All the data can be accessed by means of the OpenCitations SPARQL endpoints, the OpenCitations REST APIs (in JSON and CSV formats), through HTTP requests in different formats (HTML, RDF/XML, Turtle or JSON-LD, via content negotiation), using the OpenCitations searching/browsing Web interfaces, or as bulk data dumps (see http://opencitations.net/download) that are published periodically following updates to the datasets. • Patent non-assertion: OpenCitations does not hold nor will it seek to obtain any patent for any of its products.

We are currently working to address more fully the points introduced by the two other categories, Governance and Sustainability:

Concerning Governance:

• Coverage across the research enterprise: OpenCitations is engaged with citation data covering the whole spectrum of the scholarly research domain. In addition, all the OpenCitations applications developed for searching and browsing over these citation data have been designed to be of generic usefulness, made available in a manner that permits their reuse by members of the community in a plethora of different scenarios that need not be in any way related to bibliographic metadata and citation data. OSCAR (https://github.com/opencitations/oscar, https://doi.org/10.3233/DS-190016), LUCINDA (https://github.com/opencitations/lucinda), and RAMOSE (https://github.com/opencitations/ramose), OpenCitations’ tools developed for creating the text search interfaces, the browser interfaces, and the REST APIs, respectively, are all available from the OpenCitations website, and can all be used with data stored in any RDF triplestore providing a SPARQL endpoint.

• Stakeholder Governed: While OpenCitations is currently governed by its two Directors, the plan, as OpenCitations moved to become an independent legal entity, is to extend its governance by creating an OpenCitations Board involving eight Board Members drawn from the main stakeholder communities (librarians, bibliometricians, academics, data service providers, etc.), who will then work with the Directors to guide the development of the OpenCitations’ organization, services and infrastructure in ways that will best benefit these communities. Invitations to board membership have already been issued, and some provisional acceptances (pending explicit details) have already been received, including that for the important role of Chair. Whatever form OpenCitations’ future governance takes, we will ensure that its original aim of free provision of open bibliographic and citation data, services and software is maintained, and that OpenCitations as an organization cannot in future be taken over or controlled by commercial interests.

• Non-discriminatory membership: The membership of the proposed OpenCitations For questions, email: [email protected] 21

Board will be open to an adequate number of people that have already shown strong behaviour and support for the goals of open scholarship in general and of OpenCitations in particular, namely the publication of open citation data and the development of relevant associated open services. New Board Members will be considered as vacancies occur if those individuals already show evidence of being strongly aligned with the aims of OpenCitations, and have competencies that will facilitate its further development. We will strive for appropriate internationality and gender balance among Board Members.

• Transparent operations: The process that will be implemented to renew the OpenCitations Board during the existence of OpenCitations as a legal entity, together with minutes of its meetings and its budgets, will be appropriately documented and published as open material as far as the limitations regarding confidentiality and personal data permit.

• Cannot lobby: All the developmental choices that OpenCitations has made in recent years have been directly guided by the community surrounding open citations, as determined through events that OpenCitations has hosted or in which it has participated, including the 2018 Workshop on Open Citations (co-organised by OpenCitations) held in Bologna, and the WikiCite and PIDapalooza conference series, and through social interactions via OpenCitations’ Blog and Twitter social networks, its mailing lists, and even the issue tracker system of the OpenCitations GitHub repositories. In addition, OpenCitations’ current Directors, both through I4OC and independently, continue to lobby for publishers to open their reference lists and bibliographic metadata, and for funders to mandate such openness. However, OpenCitations is not, and will not become involved in political, regulatory, legislative or financial lobbying of any kind.

• Living will: Since all OpenCitations software and data are openly published, it is already possible for a third party to completely replicate what we are presently doing.

• Formal incentives to fulfil mission & wind-down: As long as scholarship survives and new scholarly results continue to be published, there will always be future demand for open citation and bibliographic data. Thus the work of providing such data for past, present and future scholarly publications will never be completed. If a third party was to provide the same open citation data and services that OpenCitations currently provides and plans to provide as it expands in future, and if the continuation of that alternative service could be guaranteed into the future, then there would be no purpose in the further maintenance and development of OpenCitations to duplicate that provision, and we could happily turn our attentions to other worthwhile projects. However, such an alternative service does not presently exist, and we consider that future scenario to be unlikely. That being the case, we will continue to devote our energies to this small but vital aspect of open scholarship.

About Sustainability:

• Time-limited funds are used only for time-limited activities: OpenCitations will continue to apply for targetted grants for specific projects, either alone or with partners, by participating in H2020 calls and by approaching additional funders including the Alfred P. Sloan Foundation, the Chan-Zuckerberg Initiative, and the Arcadia Fund. However, this SCOSS application is the only present application for OpenCitations sustainability funding, in contrast to such targetted project funding.

• Goal to generate surplus; goal to create contingency fund to support operations for 12 months; mission-consistent revenue generation; and revenue based on services, not data: Because all OpenCitations’ data and services are open, it has nothing to sell or against which to charge membership fees (as Crossref does from publishers and DataCite does from data repositories for the issuance of DOIs; and as ORCiD does from

For questions, email: [email protected] 22

scholarly institutions for ORCiD identifiers for their members). Charging for ‘added value’ services on top of free provision of basic open citation data (the ‘Freemium’ model) is a possibility, but runs counter to OpenCitations’ basic philosophy that all its data and services should be free. While the alternative crowd-sourcing model used by Wikipedia is in principle attractive, the scope of OpenCitations is too specialized to achieve the widespread public support that would make this a viable financial option. Ideally, OpenCitations would obtain ongoing institutional support from one or more major scholarly libraries (as arXiv has from the Cornell University Library) or from one or more funding agencies (as Pubmed has from NIH, and as Europe PubMed Central has from the Wellcome Trust and other biomedical research charities), those institutions thus gaining credit for supporting OpenCitations in conformity with their missions. To this end, OpenCitations has already made overtures to two major libraries, but so far without positive results. That is why the SCOSS model of ongoing targetted crowd-sourced funding by a number of academic libraries and other interested scholarly institutions, discussed in Section 5.5 above, is so attractive and appropriate. If OpenCitations continues to expand its coverage of the scholarly domain as it has been doing (from ~14 million to over 445 million citations in the past year), so as to offer a genuine alternative in coverage to the extensive citation data offered by WoS and Scopus (WoS has over a billion citations), then it stands every chance of attracting financial support from university libraries at a fraction of the cost of their current subscriptions to those commercial citation indexes.

For questions, email: [email protected] 23

5.7. Describe how the service addresses the Good Practice Principles for Scholarly Communication Services: https://sparcopen.org/our-work/good-practice-principles-for- scholarly-communication-services

• Good governance: For OpenCitations’ present and future governance practices, see Section 5.6 (Governance) above. We will ensure that different communities of interest are represented on the OpenCitations Board, and able to influence both its strategic and operational governance, while ensuring that our principles of openness can never in future be compromised nor brought under the control of commercial interests.

• Open standards: Everything that OpenCitations does and publishes is based on open standards and open source software. In particular, the open SPAR (Semantic Publishing and Referencing) Ontologies that OpenCitations has developed not only underpin the generic OpenCitations Data Model and all of OpenCitations’ data, but are also widely used throughout the Semantic Web community. Community input into the development of these ontologies is both welcomed and already occurring – see https://sparontologies.github.io/. As mentioned in Section 5.6 (Governance) above, OpenCitations’ OSCAR, LUCINDA and RAMOSE software tools have been specifically developed to be generic, so that the text search interfaces, the browser interfaces, and the REST APIs that they enable can be used by third parties over any data source providing a SPARQL endpoint, whether or not is has anything to do with bibliographic citations. In addition, all the technologies used are entirely based on open Semantic Web standards, published as Recommendations by the World Wide Web Consortium.

• FAIR data collection: As mentioned in Section 2.2, OpenCitations is fully compliant with the FAIR data principles of Force11 that open scholarly data should be findable, accessible, interoperable and re-usable (see https://w3id.org/people/essepuntato/papers/oc-garr2017.html for further details). The only open personal data recorded by OpenCitations relates to authors and editors of scholarly publication, their ORCiDs, and (in future) their institutional affiliations, as provided to OpenCitations by publishers, Europe PubMed Central, Crossref and ORCiD.

• Transparent pricing and contracts: Since OpenCitations makes all its data, services and software openly available, issues of pricing and contracts do not arise!

• Easy migration: Since all OpenCitations data are open and recorded using open standards, it is possible even now for third parties to take and re-use the data, or migrate it to new platforms, at any time.

• Succession planning: OpenCitations is in the process of forming itself into a formal non- profit legal entity, so that it can manage its own affairs and receive and use financial income. The statutes of this legal entity will include provision for the winding up of the organization if it has fulfilled its mission, outlived its usefulness, or been superseded by a superior open citation service.

• Open content: To facilitate reuse, all OpenCitations’ content and provenance data are immediately made openly and freely available in machine-readable format via open standards under a CC0 license. Usage data are published under CC0 once they have been collected and analysed. Software is made open on the OpenCitations GitHub repository under a permissive ISC license as it is developed. of publications describing OpenCitations’ data, services and software are made (green) open access upon submission of the final versions of the papers.

6. Foresight

Outline your work plan for the coming 2 years for which funding is requested. Please indicate in detail what activities you have planned to substantiate the funding requested. Please indicate how far funding will cover maintenance costs, and how far it will fund improving and innovating For questions, email: [email protected] 24

the current service. * OpenCitations’ goal is to support open scholarship by offering a fully open alternative to the citation data provided by WoS and Scopus, for the entire global corpus of scholarly citation data and associated bibliographic metadata.

The principle current limitations to achieving this goal are:

a) the existence of a large number of scholarly publishers, particularly in the arts and humanities, and of organizations like WHO that publish citable reports, that do not use Crossref DOIs for their publications, and thus fall outside the current ‘catchment area’ for OpenCitations datasets; b) the failure of many academic publishers who do use Crossref DOIs to submit open reference lists to Crossref along with their publication metadata; c) the refusal of certain academic publishers, of which Elsevier is the largest, to open its reference lists already deposited with Crossref; d) hardware performance restrictions limiting the speed by which OpenCitations can harvest references from the Open Access Subset of PubMed Central and from other sources; e) lack of programmers and development staff who would to permit OpenCitations to increase its current ingest efficiency, to develop ingestion systems for citations from new sources such as the arXiv e-prints archive, and to create added-value services over these open citation data, for example integration with citation network visualization systems, and citation metrics; f) lack of curatorial staff who could further improve the completeness and accuracy of OpenCitations’ data holdings and, indirectly, of the tools used for the automatic ingestion of bibliographic and citation data; g) lack of expertise and administrative staff to assist in OpenCitations’ structural transition from a small university-based academic activity to a vital component of the global Open Science infrastructure; h) lack of time to expand contacts with our user communities and potential support organizations; i) lack of sustained financial support (as opposed to time-limited grant funding for specific projects and developments) to maintain and develop OpenCitations as a sustainable major infrastructure organization in the service of open scholarship.

OpenCitations will overcome these limitations and achieve its overall objectives in a series of incremental steps:

To address points (a), (b) and (c), OpenCitations:

• will continuing its active involvement with the Initiative for Open Citations (I4OC) in campaigning for publishers to submit and publish open reference lists through Crossref, from which they can be harvested into COCI, • will seek out arrangements with non-Crossref-DOI publishers to access their references, • will appoint a Policy Advocate and Community Outreach Officer, to take much of the work this entails off the shoulders of the Directors, and separately • will continue to populate CROCI, the Crowdsources Open Citations Index (https://arxiv.org/abs/1902.02534) that it has just launched to host citation data submitted by third parties – authors, editors, scholars – allowing them to upload to OpenCitations citation data that are not already openly available, for example from their own publications, journals and reference collections, in an effort to fill the gap of missing citations from some publishers (particularly Elsevier) which are not presently available in Crossref as open material, and • will seek community support in making CROCI a success.

To address points (d) and (e), and specifically to expand its coverage of open citations across the entire scholarly domain, OpenCitations:

For questions, email: [email protected] 25

• will appoint programmers to activate the ingestion of complete bibliographic and citation data from the Open Access Subset of PubMed Central, using our new and tested parallel processing ingest hardware; • will, as a crucial component of the OpenCitations Wellcome Trust project, develop a new dataset entitled the Open Biomedical Citations in Context Corpus, that will house the textual contexts of each in-text citation (in-text reference pointer) harvested from full text articles obtained from PubMed Central, so that scholars can use this textual information to deduce the purpose or meaning of each in-text citation, which can then be encoded using CiTO, our Citation Typing Ontology; • will also continue to ingest complete bibliographic and citation data into the OpenCitations Corpus from our collaborative sister projects - the ExCITE Project (http://excite.west.uni-koblenz.de/website/) which has already extracted and submitted to the OpenCitations Corpus around one million citations from German-language Social Science publications, and the Venice Scholar Project (https://venicescholar.dhlab.epfl.ch/about), which is extracting citations from publications on the history of Venice, and will publish these data under a CC0 license, using OpenCitations Corpus as the platform for publishing their citation data as Linked Open Data; and • will, in a separate planned collaboration with colleagues from the ExCITE Project and Cornell University, harvest all the citations present within the reference lists of the 1.5 million e-prints stored in the arXiv e-print repository (https://arxiv.org/), which mainly contains papers from the fields of mathematics, physics, astronomy and the other ‘hard’ sciences, once these have been extracted and formatted in RDF according to the OpenCitations Data Model, and will then publish them under a CC0 waiver within the OpenCitations Corpus. These will complement the biomedical citations that presently constitute the majority of the OpenCitations Corpus holdings.

In addition:

• following the success of COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations, OpenCitations will release the following new citation indexes of existing open citation datasets: WOCI, the OpenCitations Index of open Wikidata citations, DOCI, the OpenCitations Index of open DataCite citations, and DROCI, the OpenCitations Index of open citations from the Dryad Data Repository; • will develop software that will automate the updating of these indexes as information is added to their data sources will be developed; and • will further develop the OpenCitations API to permit federated searches over all the OpenCitations’ indexes.

Furthermore:

• in a major new initiative to be undertaken in collaboration with our bibliometrics colleagues, OpenCitations will develop OpenCitations Meta, a new database containing the metadata relating to scholarly publications, both the basic bibliographic information of the type already present in the OpenCitations Corpus, and additionally the publications’ abstracts, keywords, author affiliations and funding details, information of crucial value for informed bibliometric analyses of research. This information will be of particular interest to universities and research institutes wishing to evaluate the output of their scholars. In addition, this new database has a crucial role for improving the existing OpenCitations APIs, since it would allow more complex calls on them and it will also reduce the waiting time that is currently spent while waiting for responses from external API services to retrieve the metadata of the bibliographic resources involved in a citation.

These developments should be understood as a radical refocusing of OpenCitation’s data provision strategy. Initially, the OpenCitations Corpus was conceived as a single database that would contain all our bibliographic and citation information. Now, with the developments (i) of

For questions, email: [email protected] 26

COCI and the other OpenCitations indexes containing citation metadata but not metadata about the citing and cited bibliographic entities, (ii) of the proposed OpenCitations Meta database that will contain bibliographic metadata but not citation data, and (iii) of the new OpenCitations database to be developed for the Wellcome Trust project that will house textual fragments that constitute the citation contexts of each in-text citation occurrence, together with information about their location within the publication (Introduction, Methods, Conclusions, etc.), we are moving to a federated system of interoperable and complementary OpenCitations triplestore resources.

This is because, as OpenCitations’ coverage of the global citation landscape expands, it will become technical inappropriate to handle and maintain everything (metadata, citations, reference lists, in-text reference contexts, abstracts, etc.) within a single repository. Better to organize each specific data type within one of a set of complementary and interoperable triplestore repositories, each encoding data in RDF according to the (expanded) OpenCitations Data Model, and all searchable using federated SPARQL queries.

Having bibliographic metadata ‘in house’ in OpenCitations Meta will result in significantly improved performance over the current situation when querying COCI for citation information, where such bibliographic metadata is pulled on-the-fly by live calls to the Crossref API. Additionally, it will avoid duplication of data by efficiently permitting us to keep in the Meta database a single copy of the metadata for each of the bibliographic entities involved as citing or cited entities in the different OpenCitations’ citation indexes, since these same citing and cited entities may be referenced independently within the different indexes, from Crossref references, Wikidata references, DataCite references, etc.

This segregation of distinct data types into different triplestore repositories can then, in future, be extended. And these different repositories can, if necessary, be maintained on different computers at different locations across the Internet or in the cloud, equipped with different hardware according to the specific need of each repository. General interoperability will be guaranteed by means of SPARQL and its federated service for queries, and by use of the same generic data model (the OpenCitations Data Model, OCDM) and of standard Linked Data protocols for describing all these data.

In future, the OpenCitations “New Corpus” will thus be a set of federated SPARQL-based and OCDM-encoded repositories, each describing a specific type of data, that can talk with each other.

Our plan is to continue to use the original OpenCitations Corpus database itself as a kind of experimental sandbox (https://en.wikipedia.org/wiki/Sandbox_(software_development)) in which all the data types handled by OpenCitations can be stored together, and over which we can test new software and new extensions to the OpenCitations Data Model on a known large-but- finite set of meaningful paper and their references harvested from the OA subset of Europe PubMed Central.

This approach is novel and revolutionary, and we are still working out the details. However, it opens the possibility of wider information federation between those resources maintained by OpenCitations and similar interoperable open resources maintained by third parties. Those might provide, for example, information about the publication types of journal items (research articles, Comment and Opinion pieces, corrigenda, reviews, etc.) in one repository that might be maintained by Crossref; about authors, their ORCiD and/or VIAF identifiers and current and past institutional affiliations in another repository, maintained hopefully by ORCiD and VIAF; and about the geographical focus of published infectious disease reports in yet a third, possibly maintained by WHO. Those resources would be maintained by third parties, thereby spreading the load of providing open scholarly information, while what would unite these Semantic Web resources would be be their use of SPARQL and the common data model.

To address point (f), OpenCitations:

For questions, email: [email protected] 27

• will appoint one full-time curator whose work will be to ensure that the data contained in our various databases is as complete and accurate as possible, and supported by complete provenance information. Her/his work will also be crucially useful in the following additional ways: by detecting errors in the data, she/he can reveal bugs in the code developed for the automatic ingestion of citations; she/he can provide additional services to users (e.g. by creating specific small dumps of citations to publications from a single institution); and she/he can also assist and train others in the use of OpenCitations software and interfaces for curatorial activities (see below); • will, to support this activity, develop new user interfaces for use by this curator for the specific purpose of inspecting and curating the datasets, with automatic provenance recording of all curation activities; and • will seek to extend and outsource OpenCitations curation activities to trusted members of the scholarly community, for example academic librarians, who will be able to use the OpenCitations interfaces to correct errors and input new information, with complete provenance tracking of who has made these changes and why.

To address points (g), having studied and taken on board the advice given by our colleagues Geoff Bilder, Jenny Lin and Cameron Neylon in their seminal 2015 paper Principles for Open Scholarly Infrastructure (http://dx.doi.org/10.6084/m9.figshare.1314859) (see Section 5.6 above), OpenCitations, during the period of SCOSS funding:

• will form itself into an independent non-profit legal entity (most likely based in Italy); • will, with assistance from external organizations, particularly Educopia, develop a sustainable business model; • will formalize the OpenCitations Statutes, which will be modelled on those of DOAJ, SPARC Europe, ORCID and DataCite; • will appoint the OpenCitations Board; and, to facilitate this, • will appoint the OpenCitations Manager / Accountant / Administrator.

To address point (h), OpenCitations:

• will appoint a Policy Advocate and Community Outreach Officer to reach out to our user communities and to solicit financial support from libraries and other relevant organizations.

To address point (i), OpenCitations:

• will require substantial sustained funding (as opposed to time-limited grant funding for specific projects and developments), for which (if this application is successful) SCOSS funding will be the first step.

We estimate that 15% or less of the SCOSS funding will be used to continue the present activities of OpenCitations, and 85% or more to pursue the new initiatives described above.

The technical infrastructure supporting the OpenCitations services (server, backup, etc.) is currently hosted in Bologna, Italy, in the premises of the University of Bologna, as outlined above. During the SCOSS funding period, OpenCitations:

• will maintain, update and expand as necessary the computational infrastructure that OpenCitations uses; • will negotiate with the University of Bologna to keep that infrastructure running in place, with the payment of appropriate maintenance fees to the University once OpenCitations becomes an independent legal entity; and additionally • will evaluate options for future computational and data storage provision following the end of the SCOSS funding period, whether that be maintained at the University of Bologna as part of their computational support for digital scholarship and in particular for digital humanities and social science, or alternatively whether that should in future For questions, email: [email protected] 28

be outsourced to local or cloud providers.

To ensure ongoing community engagement and input into the development of OpenCitations, OpenCitations:

• will host in 2020 its second Open Citations Workshop, either in Italy or the UK, and its third workshop in 2022. The first workshop in Bologna (https://workshop-oc.github.io), which attracted 60 people, was by all estimates a great success. The second will be used in particular as a forum for community involvement for assessing the usefulness and determining the future direction of OpenCitations, and in addition • will continue to use its blog and Twitter social media accounts to communicate with its community members and to publicise its activities, and will continue to communicate its development by the publication of conference papers and journal articles.

Having recently completed the Educopia “Mapping the Scholarly Infrastructure” survey and engaged in discussions with them, OpenCitations will work directly with the Educopia Institute (https://educopia.org/), who will provide expertise and advice to assist OpenCitations in making this crucial transition into a sustainable infrastructure organization.

This will be of particular importance as OpenCitations expands the scope of its activities to serve not just the sciences, where the open scholarship movement has already gained significant traction, but also the humanities and social sciences (HSS), the first recent steps in this direction being our publication in the OpenCitations Corpus of the citations from the ExCITE Project and the future publication of the citations from the Venice Scholar Project, mentioned above.

For questions, email: [email protected] 29

7. Best alternative to a negotiated agreement (BATNA)

Describe scenarios if funding is not successful, i.e. what the service plans are, e.g. reduce operations, close operations, other

In case the SCOSS funding application is not successful, the main effort in the next two years will be focussed on delivering the expected output for the Wellcome Trust project (Open Biomedical Citations in Context Corpus), finding additional research grants to expand our collaborations and increase the coverage of the citation data in our open citation indexes, and setting up OpenCitations as a formal legal entity. In addition, we will continue our existing collaborations, particularly with EXCITE, arXiv and the Venice Citation Index.

We will also apply for additional funding from other sources. For example, OpenCitations would fit very well in future infrastructure projects similar to RISIS, funded by future EU Horizon calls.

All the developments described in the previous section will still be within scope, but they will be delivered over a longer time-window, slowed by the lack of appropriate OpenCitations staff, to the extent made possible by the support we do receive from existing agreements (e.g. with CWTS), from future agreements (for example via Invest in Open Infrastructure) and grants, and through voluntary contributions.

8. Governance

8.1. Describe your organisational governance structure. Please describe the organisational governance structure and process – membership and representation, meeting cycle, reporting relationships, decision-making structure, as well any possible role SCOSS might have in your governance. How does your governance structure reflect your user ?

OpenCitations is presently co-directed by Professor Silvio Peroni (Assistant Professor, Department of Classical Philology and Italian Studies, University of Bologna) and Dr David Shotton (Retired University Reader, and Senior Researcher at the Oxford e-Research Centre, University of Oxford), who are presently its only staff members. Professor Peroni undertakes its direction, maintenance and technical development as part of his academic activities at the University of Bologna, and Dr Shotton presently serves in a part-time voluntary capacity.

In addition, Ivan Heibi, formerly employed as a developer/programmer within the OpenCitations Enhancement Project funded by the Alfred P. Sloan Foundation, and currently registered as Ph.D. student in the Department of Classical Philology and Italian Studies, University of Bologna, under the supervision of Professor Silvio Peroni, is still actively contributing to the software development and publications of OpenCitations.

8.2. How do you expect to adapt your current governance structure in the coming period if successfully recommended by SCOSS? What possible roles would SCOSS contributors have? * As mentioned above, we will move OpenCitations to become an independent legal entity during the first year of SCOSS funding. At the same time, the plan is to extend its governance by creating the OpenCitations Board involving leaders from the main stakeholder communities, which will then guide the future development of OpenCitations. Among such Board Members, we aim at including people who represents some of the SCOSS contributing organizations and who, thus, will take part in the decisions governing the development of OpenCitations in future years.

For questions, email: [email protected] 30