OPENAIRE2020 FINAL SUMMARY REPORT

Aug 2018 OpenAIRE2020 Infrastructure for Research in Europe towards 2020 Dissemination level: PUBLIC

This is a summary of activities of tasks as reported by consortium members for the whole duration of the project: January 2015- June 2018. Detailed descriptions are in the project deliverable reports.

Natalia Manola, University of Athens, Greece Najla Rettberg, University of Goettingen Paolo Manghi, CNR-ISTI

H2020-EINFRA-2014-1 Topic: e-Infrastructure for Open Access Research & Innovation action Grant Agreement 643410

PUBLIC

TABLE OF CONTENTS OpenAIRE2020 FINAL Summary Report ...... 0 1| OPENAIRE DATA ...... 3 2| LEGAL ENTITY ...... 4 3| NETWORK OPERATION ...... 5 4| DISSEMINATION AND OUTREACH ...... 7 4.1 BRANDING ...... 7 4.2 FACTSHEETS ...... 8 4.3 OUTREACH ...... 8 4.4 WORKSHOPS ...... 9 4.4.1 EUROPEAN WORKSHOPS ...... 9 4.4.2 FAIR ...... 10 4.4.3 DIGITAL INFRASTRUCTURES FOR EUROPE DI4R ...... 11 4.4.4 PORTAL USAGE ...... 12 4.4.5 NEWSLETTER ...... 13 4.5 EXECUTIVE REPORTS ...... 13 4.6 BLOGS ...... 14 4.7 SOCIAL (AND OTHER) MEDIA OUTREACH ...... 15 5| TRAINING AND SUPPORT ...... 16 5.1 WEBINARS ...... 16 5.2 RDM SUPPORT ...... 17 5.3 TICKETING SYSTEM ...... 18 6| INTERNATIONAL OUTREACH ...... 19 Global Alignment of Repository Networks ...... 19 Implementation of OpenAIRE Guidelines in Latin America and beyond ...... 20 Liaising with related EU and global initiatives ...... 21 7| FP7 POST-GRANT OA PILOT ...... 22 7.1 PAYMENT OF APCS ...... 22 7.2 FUNDS FOR ALTERNATIVE PUBLISHING MECHANISMS ...... 25 8| TECHNICAL ACTIVITIES ...... 26 Guidelines...... 26 OpenAIRE portal and end user services ...... 26 Data quality monitoring services ...... 27 Research monitoring services ...... 27 Usage statistics monitoring services...... 28 ...... 29 Literature-Data Integration ...... 30 OA Publication Broker ...... 31 LOD Services ...... 31 OpenAIRE APIs ...... 32 Anonymization services ...... 33 Inference and research analytics services ...... 33 9| MISC ...... 35 Open Tender Calls ...... 35 Data Pilot Legal Issues ...... 37 Open Review ...... 37 Measuring OA Impact ...... 38 Studies Impact ...... 38

OpenAIRE2020 FINAL Summary Report Page 1 PUBLIC

OPENAIRE2020 OBJECTI VES

OpenAIRE2020 represents a pivotal phase in the long-term effort to implement and strengthen the impact of the Open Access (OA) policies of the European Commission (EC), building on the achievements of the OpenAIRE projects. OpenAIRE2020 will expand and leverage its focus from (1) the agents and resources of scholarly communication to workflows and processes, (2) from publications to data, software, and other research outputs, and the links between them, and (3) strengthen the relationship of European OA infrastructures with other regions of the world, in particular Latin America and the U.S. Through these efforts OpenAIRE2020 will truly support and accelerate Open Science and Scholarship, of which Open Access is of fundamental importance. OpenAIRE2020 continues and extends OpenAIRE’s scholarly communication infrastructure to manage and monitor the outcomes of EC-funded research. It combines its substantial networking capacities and technical capabilities to deliver a robust infrastructure offering support for the Open Access policies in Horizon 2020, via a range of pan-European outreach activities and a suite of services for key stakeholders. It provides researcher support and services for the Pilot and investigates its legal ramifications. The project offers to national funders the ability to implement OpenAIRE services to monitor research output, whilst new impact measures for research are investigated. OpenAIRE2020 engages with innovative publishing and data initiatives via studies and pilots. By liaising with global infrastructures, it ensures international interoperability of repositories and their valuable OA contents. To ensure sustainability and long-term health for the overall OpenAIRE infrastructure, the proposed OpenAIRE2020 project will establish itself as a legal entity, which will manage the production-level responsibilities securing 24/7 reliability and continuity to all relevant user groups, data providers and other stakeholders.

Page 2 OpenAIRE2020 FINAL Summary Report PUBLIC

OpenAIRE2020 Summary of Activities JANUARY 201 5 – JUNE 2018 1| OPENAIRE DATA

OPENAIRE DATA IN PRODUCTION January 2015 August 2018 Publications 9.4 mi (not unique) 22.5 mi (unique publications Data objects linked to publications or 5.7 K 807 K projects Funders 4 16 Publications linked to funders 16 K 900 Κ Data providers 570 1.160

ZENODO January 2015 August 2018 Publications 7 K 175 K Datasets 600 31 K Software 3K 20K Images 80 189 K Other 700 8.7 K Communities 250 2.2 Κ

EC RELATED DATA Publications OA rate Articles OA rate Data objects OA rate1 FP72 243.997 71% 198.571 67% 8.245 98% FP7/SC393 33.625 76% H20204 41.501 77% 25.328 79% 1.081 65%

1 These numbers, especially for FP7m are biased towards OA as they come from OA data repositories. 2 Identified in 16,454 projects (from a total of 25,891) 3 Identified in in 1,566 projects (total of 1,907) 4 Identified in 5,659 projects (from a total of 19,043)

OpenAIRE2020 FINAL Summary Report Page 3 PUBLIC

2| LEGAL ENTITY OpenAIRE2020 has marked the transition of OpenAIRE from a solely project based initiative to a more permanent European structure through the formation of a legal entity (LE). To develop the optimum choice for the OpenAIRE LE and an ordered transition from the existing governance and operation of the OpenAIRE network to the legal person, a mixed methodology of desk and field research was followed as described below:  Extensive literature and web research aiming at understanding best practices both at the governance and legal entity levels  A set of questionnaires shared with the OpenAIRE community (NOADs) in order to obtain specific feedback regarding the functioning of the OpenAIRE network (services, operations, governance, need for specific roles). As these questionnaires covered many and hard aspects at the national and EU level, NOADs were supported by two Webinars.  Study visits and in-depth interviews with key stakeholders in order to further refine the results of the desk research and enrich the data obtained from the questionnaires.  Consultations and deliberations to ensure the proposed legal entity matches the emerging landscape as brought by EOSC developments in Europe. The proposed governance structure of OpenAIRE reflects its core needs and objectives., as well as the following levels of activities of OpenAIRE: Policy development in multiple levels (EU, Member State, Organisational) and ability to create a forum for its discussion and implementation. Service Provision: provision of pan-European services at a desired level of quality. Support, Advocacy, Interoperability and provision of toolkits: both at the EU and MS level. Fora: Developing actual or virtual spaces where multi- stakeholder dialogue may occur to achieve improvement of the provided services.

FIGURE 1. GOVERNANCE MODEL FOR THE OPENAIRE LEGAL PERSON AND COMMUNITY

The OpenAIRE legal entity has been registered in Greece as a non-profit organization.

Page 4 OpenAIRE2020 FINAL Summary Report PUBLIC

3| NETWORK OPERATION The OpenAIRE National Open Access Desks (NOADs) have actively contributed to the promotion of OA and open science in Europe, both in terms of policy and technology aligning. At the start of OpenAIRE2020 the aims specifically addressed to NOADs were as follows:  Increase the number of OA EC-funded publications significantly: at least to 75% of total publications by 2018: 71% OA in FP7 (77% in FP7 SC39) and 79% in H2020  Become the OA monitoring body for the EC, and other funding agencies in Europe: reached out to 12 national funders  Facilitate the reuse of OpenAIRE APIs to attract 3rd party providers: indexed by 2 global organizations, 2 commercial providers, showing a six-fold access increase.  To establish OpenAIRE as a permanent legally recognised entity. In addition, of the major achievements at the network level OpenAIRE2020 was characterised by outreach in terms of National workshops (38 national), building capacity and recognizing RDM as the ‘norm’ in the Open Science workflow, introduction of the concept of EOSC and how the OpenAIRE networking infrastructure can contribute to providing the open science arm. As illustrated in Figure 2 NOADs are now well placed in national settings to influence both policy and implementation aspects and play a key role in EOSC developments.

FIGURE 2. NOADS POSITIONING AT NATIONAL SETING FOR POLICY AND INFRASTRUCTURE The OpenAIRE2020 NOADs coordination was achieved via two main lines of activities: Regular regional meetings: Monthly monitoring and regional meetings by the regional coordinators has ensured a clear communication path to central coordination. These regular meetings have helped in the cross-fertilization of knowledge exchange through stories from NOADs regions, while informing the network of key developments in Open Science in Europe. Information on all outreach activities and attended events has been collected and shared with all

OpenAIRE2020 FINAL Summary Report Page 5 PUBLIC regularly. Information about consortium activities, in particular evolution of the OpenAIRE services, upgrades to the infrastructure/portal, Zenodo, guidelines, has been regularly communicated to NOADs via webinars and briefing sessions and NOAD feedback actively sought. NOADs have also promoted OpenAIRE and its dissemination materials nationally, including in local languages, maintained their national portal pages, and promoted and participated in OpenAIRE events (including many OpenAIRE national workshops). Action plans: As such a large network requires a rigorous and tight management/monitoring, all parties (including technical partners) their work on action plans, which are regularly checked and adapted. Based on the revised ‘NOAD Guide’ in 2016 NOADs were able to plan their tasks ahead and set out all actions in more detail. This guide has continued to be the basis for regional meetings and has helped all to schedule the dissemination channels input. The template for these Action Plans was revised in 2016 in line with changes in priorities for OpenAIRE (e.g., more emphasis on helping Data Repositories become OpenAIRE compatible). All NOADs have produced annual Action Plans according this template, which are regularly checked by the regional coordinators and revisited monthly.

Page 6 OpenAIRE2020 FINAL Summary Report PUBLIC

4| DISSEMINATION AND OUTREACH 4.1 Branding A new logo has been designed to mark the shift from open access to publications to open access to research data, and to open science in general. The new logo was progressively introduced and used in all OpenAIRE dissemination channels.

The dissemination material (leaflets and posters) targeted different stakeholders. The initial ones mainly targeted the EC H2020 OA policies, while the latter ones are also shaped around the OpenAIRE services (dashboards).

OpenAIRE2020 FINAL Summary Report Page 7 PUBLIC

4.2 Factsheets 7 factsheets and 4 guides have been created and updated for specific stakeholders, on specific themes, and have been downloaded from our site as the numbers indicate below:  OPEN ACCESS IN H2020 (5236 HITS, 2382 DOWNLOADS)  RESEARCHERS (3911 HITS, 2777 DOWNLOADS)  RESEARCH ADMINISTRATORS (4478 HITS, 3092 DOWNLOADS)  DATA PROVIDERS (3698 HITS, 1288 DOWNLOADS)  FP7 POST GRANT OA PILOT (2815 HITS, 2778 DOWNLOADS)  OPEN DATA PILOT (4857+2351 HITS, 2697+1055 DOWNLOADS)  PERSONAL DATA AND OPEN RESEARCH DATA PILOT (2569 HITS, 1118 DOWNLOADS)  ZENODO (1589 HITS, 871 DOWNLOADS)  FUNDERS (2241 HITS, 848 DOWNLOADS) 4.3 Outreach The consortium carried out a broad outreach in trying to establish the foundations for OA, and more specifically on implementing and aligning to H2020 policies:  The NOADs have contacted 53.200 H2020 participants, 3.281 repositories, 5.730 Journals, and handled 2.543 enquiries. They contacted policy makers and intermediaries influencing national policies and their implementations (2.456 multipliers).  In all, the NOADs and the OpenAIRE networking team  Held 87 webinars and trained 5.900 stakeholders via our online pan-European training programme.  Disseminated OpenAIRE via posters or presentations at 182 conferences and 38 national workshops with 4.491 attendees  Attended 136 other conferences and workshops around the world  Published 264 Open Science blog or newsletter items on themes for either the OpenAIRE blog/newsletter or a local news source.  Published 20 papers and 2 monographs on open science topics (6 peer reviewed) The core team has been invited in major conferences and events in Europe, the most prominent and relevant one being: The EU high level summit for A new start for Europe, Opening up to an ERA of Innovation where we presented our views in the Developing Research Infrastructures for Open Science panel; a key presentation in in ICT Lisbon in Open Science and Open Data for Innovation session showing the possibility of connected e-Infrastructures; a presentation for monitoring for policy makers in OECD Blue Sky III conference; an open Science seminar in OECD headquarters, co-organized with OpenMinTeD; active presence in three preparatory ESC workshops (summit); participation in the EOSCpilot forum event; participation in the E-IRG meeting (e-Infrastructures KPIs); organization of an RDA International Plenary EOSC-related project session - Data and computing infrastructures for open scholarship in the 6th RDA plenary.

Page 8 OpenAIRE2020 FINAL Summary Report PUBLIC

In parallel, the technical team co-organized 7 workshops, attended 40+ workshops/conferences and published 19 peer reviewed articles, all related to different technical aspects (LOD, de- duplication algorithms, metadata related services, text and data mining algorithms, while OpenAIRE services have been presented at community relevant conferences: IDCC, ELPUB and Open Repositories, International Open Science Conference. 4.4 Workshops In 2016, OpenAIRE organised a total of 56 conferences or workshops across Europe on a wealth of topics related to Open Science. Some highlights are: 38 national workshops have been organised during OpenAIRE2020: all of them show a strong interest in discussing the nuances of Open Science, from opportunities to critical issues, from policies to prospects developments, from services to best practices. A report for each of them is available on the OpenAIRE blog.1 TABLE 1. OPENAIRE NATIONAL WORKSHOPS 2015 2016 2017 2018 Total Workshops organized 3 18 13 4 38 Countries 3 17 13 4 32 Participants 284 2,076 1,867 264 4,491

4.4.1 European Workshops In 2015, OpenAIRE organised its 5th workshop: “Sharing Research Data and Open Access to publications in H2020” (November 18, Ghent). The workshop focussed on the Open Research Data Pilot and the H2020 Open Access mandate as well as practical tools and information on data management planning. BLOG | RECORDING In 2016, OpenAIRE held a workshop on “Open Peer Review: Models, Benefits and Limitations” (June 7th, Göttingen) in conjunction with The International Conference on Electronic Publishing (Elpub). To explore the possibilities of Open Peer Review, the wider problems with traditional peer review and the challenges with open peer review were discussed. Participants also considered different facets and features of an open peer review system and acknowledged that there was no “one size fits all”. Concluding that disciplinary differences need to be considered when designing a new peer review system and creating a code of conduct is essential. BLOG | RECORDING

In 2017, OpenAIRE organised 2 further workshops: “Measuring OA and impact”5 (February 14th, Oslo) featured speakers representing a wide spectrum of stakeholder organisations and institutions like OpenAIRE, Knowledge Exchange, the Danish OA Indicator, Crossref, the European

5 Workshop programme https://www.openaire.eu/impact-and-measurement-of-open-access-7th-openaire- workshop-february-2017 and report https://scholarlycommunications.jiscinvolve.org/wp/2017/03/14/openaire- impact-and-measurement-of-open-access/

OpenAIRE2020 FINAL Summary Report Page 9 PUBLIC

Research Council (ERC) and SCOAP3. Their presentations introduced and highlighted important elements and challenges concerning measuring Open Access. They also provided insight and input on what and how to measure the impact of Open Access. One of the topics touched upon was the connection between Open Access, impact and research assessment. This connection is an important part of the discussion on how to achieve Open Access, and also one of the crucial reasons for establishing consistent and meaningful methods for measurement for both researchers and research funders. Carlos Galan-Diaz, Research Impact Officer from the University of Glasgow, talked about researcher engagement and the social impact and metrics of science. In addition to the invited speakers, the workshop had a lightning talk session with five country representatives from different parts of Europe. Among these was Katie Shamash from Jisc, who mentioned the increasing costs of both journal subscriptions and APCs (Article Processing Charges) in the UK, illustrating some of the benefits of tracking and measuring APC costs. The discussion which rounded of the day reflected several of the previous topics. The panellists advocated openness and standardisation of definitions and indicators across borders. BLOG | RECORDING

“Legal Issues in Open Data”6 (April 4, 2017, Barcelona) took place as part of the RDA Ninth Plenary Meeting, Barcelona, and explored legal hindrances and possible solutions to open up research data. With the presentation of legal studies in making data open and interoperable, lessons learnt from the funders and with a critical understanding of issues related to text and data mining, data privacy and licenses, the event was an excellent opportunity to deepen all these issues. The workshop was also the opportunity to discuss the new OpenAIRE legal study (in print) and to present the data anonymization service developed by the OpenAIRE IT experts. BLOG | RECORDING 4.4.2 Open Science FAIR One highlight of OpenAIRE’s outreach activities was organising the Open Science Fair, held in Athens September 6-8. The three-day conference, a collaborative effort from OpenAIRE, OpenMinTeD, OpenUP and FOSTER, brought together a diverse international audience, including research communities, librarians and other stakeholders. The event offered a comprehensive 360° view of the current global dynamics between three major aspects of Open Science: Open Science policies and processes, Open Science impact and FAIR data and Services. During the conference, short and long-term visions for the future were addressed and leading Open Science experts from all different stakeholder groups discussed top priorities to accelerate the transition to Open Science. The opening and keynote presentations were the pacemakers of the event highlighting challenges, opportunities and benefits of a paradigm shift in scholarly communication and research dissemination, emphasising the importance of continuous methodological education and practical training all stakeholders’ interests alignment the way to resolving the current scenario ambivalence. Parallel sessions and workshops offered an emerging outline of Open Science framework and also gave attendees the opportunity to actively participate in lively discussions and hands-on activities. These included evaluating important Open Science drivers from the

6 Presentations https://www.slideshare.net/OpenAIRE_eu/clipboards/8th-openaire-worlshop and recordings https://vimeo.com/channels/openaireworkshop8

Page 10 OpenAIRE2020 FINAL Summary Report PUBLIC researchers’ standpoint; testing the Open Knowledge Maps interface and potential; becoming familiar with the innovative OpenUp Hub dissemination toolbox and critically analysing the effectiveness and adequacy of the web-based FAIR data assessment tool.

A short film about the Open Science Fair has been released on the OpenAIRE Vimeo channel: https://vimeo.com/233290266.7 4.4.3 Digital Infrastructures for Europe DI4R Digital Infrastructures for Europe (DI4R) conference is a joint user forum from Europe’s leading e- infrastructures (EGI, EUDAT, GÉANT, OpenAIRE and RDA Europe) aiming to gather researchers, developers, data practitioners and service providers to find new ways to support and facilitate science and research across Europe. Krakow, 28-30 October 2016: OpenAIRE was represented with various contributions presenting OpenAIRE tools and services and discussing on how to closely collaborate with other key e- Infrastructures on services, training, user engagement and support. Plans for an annual event of this kind would benefit our communities as OpenAIRE have a lot to share and a lot to learn. Brussels, 30 Nov – 1 Dec 2017. The theme for 2017 was on “Connecting the building blocks for Open Science” showcasing policies, processes, best practices, services and initiatives at a national, regional, European and international level, that represent the building blocks of the European Open Science Cloud and European Data Infrastructure. The conference was attended by over 380 people from all over Europe. OpenAIRE took part in the joint e-infrastructure booth, and engaged in many other ways during the two-days event through organising multiple sessions which demonstrated the crossover of horizontal infrastructures providing services to vertical communities (e.g., National Initiatives and Cross e-infrastructure of training/technical support).

7 During the Open Science Fair, OpenAIRE NOADs had the chance to share if and how OpenAIRE is effectively helping them in raising awareness for Open Science in their own countries. A collection of these videos is available at https://www.openaire.eu/open-science-fair-2017.

OpenAIRE2020 FINAL Summary Report Page 11 PUBLIC

• OpenAIRE support and training activities as a lightning talk during the session “Cross e- infrastructure of training/technical support” • Community engagement in OpenAIRE as a presentation in the session “EOSC engagement with target groups” • OpenAIRE Dashboard for Content Providers: monitoring and enriching local collections using OpenAIRE services as a lightning talk in the related session • OpenAIRE services in support of “Open Science as-a-Service” in the session “Service and data interoperability” • Everything counts in large amounts: measuring the impact of usage activity in Open Access Scholarly Environments in the session on “Impact evaluation and metrics” 4.4.4 Portal usage The portal usage saw a 5x increase (starting from 83K visitors in end of 2014) and a 4x increase in page views over the project duration. This is expected to rise considerably in the next release which fixes search engine indexing issues.8

FIGURE 3. OPENAIRE PORTAL USAGE DATA (UNTIL JUNE 2018) The graphs on the right show the different ways OpenAIRE pages are accessed: 3.177 websites around the world link to OpenAIRE; a significant global shows that 55% of hits come from Europe and the rest from around the globe.

FIGURE 4. PORTAL GLOBAL ACCESS

8 Usage stats for APIs OpenAIRE (a 6-8x increase) and Zenodo (a 8-10x increase) are found in the relative sections

Page 12 OpenAIRE2020 FINAL Summary Report PUBLIC

4.4.5 Newsletter OpenAIRE produced a monthly newsletter with Open Science, policy and infrastructure, and OpenAIRE activities and service related news. Over the course of the project duration we reached 13.500 subscribers (with a 20% average reading rate), with 1.500 subscribers immediately after GDPR compliance and an already upward trend in subscription after that. Each edition contains a mixture of blog content and news items from the portal, with an average of 5 articles authored by partners from across the consortium and beyond. 4.5 Executive reports In the 2015-2018 period OA and open science have seen a large uptake, mainly by policy makers. OpenAIRE has been in the center of such developments and has produced a number of executive statements and interventions on various topics, addressing various stakeholders. All have been widely disseminated and very well received. TABLE 2. OPENAIRE STATEMENTS AND INTERVENTIONS Date Title Link 03.13. 2015 Open Peer Review in OpenAIRE https://blogs.openaire.eu/?p=144 04.23.2015 Policy Recommendations for Open Access to https://blogs.openaire.eu/?p=198 Research Data 05.11.2015 The Hague Declaration: official launch https://blogs.openaire.eu/?p=247 06.26.2015 Not Enough: Elsevier's Sharing Policy not as https://www.openaire.eu/not-enough- Open as it seems elsevier-s-sharing-policy-not-as-open- as-it-seems-2 05.30.2016 Full OA by 2020 Competitiveness Council’s https://www.openaire.eu/full-oa-by- conclusions on Open Science 2020-competetiveness-council-s- conclusions-on-open-science 05.30.2016 Open Science Policy Platform announced https://www.openaire.eu/open- science-policy-platform 06.20.2016 SCHOLIX: A Global Framework for Linking https://blogs.openaire.eu/?p=1048 Publications and Datasets 10.27.2016 Implementing funder Open Access policies: https://blogs.openaire.eu/?p=1347 How OpenAIRE can help 10.29.2016 OpenAIRE welcomes the European Open https://blogs.openaire.eu/?p=1361 Science Cloud HLEG Report 01.30.2017 Europe and Latin America expand their https://www.openaire.eu/europe- collaboration for open science and-latin-america-expand-their- collaboration-for-openscience 04.06.2017 OpenAIRE is proud to support the new https://www.openaire.eu/openaire- Initiative for Open Citations (I4OC) is-proud-to-support-the-new-initiative- for-open-citations-i4oc 05.05.2017 OpenAIRE as the basis for a European Open https://blogs.openaire.eu/?p=1961 Access Platform

OpenAIRE2020 FINAL Summary Report Page 13 PUBLIC

Date Title Link 05.30.2017 OpenAIRE and ICSU World Data System https://www.openaire.eu/openaire- Announce Cooperation to Advance Open and-icsu-wds-announce-cooperation- Science to-advance-open-science 06.13.2017 OpenAIRE’s input statement at EOSC Summit, https://blogs.openaire.eu/?p=2032 Brussels, 12 June 2017 09.18.2017 OpenAIRE Position Paper on Open Research https://blogs.openaire.eu/?p=2221 Europe 09.18.2017 EOSC Declaration: OpenAIRE's Response https://blogs.openaire.eu/?p=2237 09.21.2017 OpenAIRE calls on the European Parliament https://www.openaire.eu/openaire- to halt potentially harmful copyright reform calls-on-the-european-parliament-to- halt-potentially-harmful-copyright- reform 10.31.2017 European University Association declared https://blogs.openaire.eu/?p=2388 strong support to EU current Open Science policies and to OpenAIRE: EUA statement issued on October 27th 03.06.2018 Open Science in practice in FP9 https://www.openaire.eu/open- science-in-practice-in-fp9 05.17.2018 OpenAIRE partners with the Open Science https://www.openaire.eu/openaire- MOOC partners-with-the-open-science-mooc

4.6 Blogs The OpenAIRE blog is the place to communicate and discuss developments, provide opinions, and raise issues related to policy and infrastructure for open science. In the 2015-2018 period we published ~260 blog articles on a range of themes related to OpenAIRE, OA and Open Science, including national FIGURE 5. BLOGS UPTAKE SINCE JUNE 2018 updates, conference reports and updates on the progress of project activities. The blog is open to direct editing by all project partners, and is a great engagement tool for the NOADs, but also for external parties, making it a wide-ranging and important source of news across Europe.

Page 14 OpenAIRE2020 FINAL Summary Report PUBLIC

4.7 Social (and other) Media Outreach OpenAIRE’s outreach via social media has increased and clearly shows that it is a channel of information and communication for (global) dissemination on aspects of Open Science..

Channel Jan-2015 Jan-2016 Jan-2017 June2018 Increase Twitter followers 3.200 4.211 6.000 9.023 x 3 Facebook members 600 940 1.267 1.660 x 3 LinkedIn members 183 306 400 436 x 2 Vimeo videos 91 98 120 125 x 1.3 SlideShare items/ 138 200 232 320 x 2 SlideShare followers 25 79 98 178 x 7 SlideShare downloads 300 1.444 2.632 4.198 x 14 SlideShare views 28.000 66.564 82.567 146.710 x 5

OpenAIRE2020 FINAL Summary Report Page 15 PUBLIC

5| TRAINING AND SUPPORT Activities were centred on the gathering, creating and disseminating support and training material. The team developed or revised a set of support tools: guides, FAQs, helpdesk, factsheets, information & dissemination material, copyright issues update and briefing papers. In parallel, an intensive webinar programme was delivered, adding further resources to the training package (video recordings, slides). 5.1 Webinars OpenAIRE carried out 96 WEBINARS WITH 6.633 PARTICIPANTS in the period Jan 2015-June 2018. A few of them were in collaboration with EUDAT (for Research Data Management), with FOSTER (for open science practices in domain discipline areas). TABLE 3. WEBINARS OVER THE YEARS OS H2020 For policies ORD OpenAIRE NOADs /practices pilot/RDM services National Total Participants 2015 4 3 2 3 6 18 802 2016 6 5 5 9 13 38 2.837 2017/18 4 15 4 9 8 40 2.994 Total 14 23 11 21 27 96 6.633

The webinars addressed different topics as illustrated in the accompanying tables and figure.

TABLE 4. TOPICAL ATTENDANCE IN OPENAIRE2020 WEBINARS OpenAIRE services (incl. post gran FP7) 1.184 RDM 1.750 OS Policies 2.685 OS Practices 1.511 Internal for NOADs 351 National Approach 1.205

FIGURE 6. ATTENDANCE FOR OPENAIRE2020 WEBINAR TOPICS

Page 16 OpenAIRE2020 FINAL Summary Report PUBLIC

5.2 RDM support OpenAIRE2020 was crucial into supporting the EC’s Open Research Data Pilot (ORDP) by building new materials, holding webinars and answering requests. Through the OpenAIRE portal, fact sheets specifically on the ORDP and personal data are available, as well as more general information on Zenodo. 17 webinars and 90 workshops focussed on RDM.

H2020 OPEN DATA PILO T Data Pilot and RDM training and support has been one of the major activities in OpenAIRE support activity. Relevant documents and support information available in the portal was enriched with several public webinars led by DANS and DCC, and in partnership with FOSTER and EUDAT - 12 webinars took place attended by 1,457 people. With the aim to ease data management planning and reviewing in the future, OpenAIRE and EUDAT jointly presented the EC with a couple of recommendations regarding the DMP template.

D M P S U R V E Y In addition, OpenAIRE and the FAIR Data Expert Group ran a survey over summer 2017 eliciting feedback on the Commission’s approach to Data Management Plans. Nearly 300 responses with 50% coming from researchers were received. Feedback was reassuring, with 60% finding the process of developing a DMP a positive experience, 45% agreeing that the template was very useful, and 23% encountering no issues in following the EC guidelines. Naturally there is always room for improvement and many respondents provided detailed textual responses explaining points of confusion or offering suggestions for improvement. The concept of FAIR was felt to be well-understood however the free-text questions probing on terminology issues, redundant questions and potential improvements, elicited many detailed responses, indicating that while the concepts work at a high-level, many questions remain around implementation. Interoperability is the main term that caused confusion, closely followed by metadata, standard vocabularies and ontologies. There were a number of references to local RDM support teams, national data services and project officers who clearly helped to interpret the requirements and apply them to the researchers’ context. Preservation and legal issues were felt to be underplayed in the template, while several questioned the inclusion of ethical issues since there are separate requirements on this. The template was felt to be too structured and prescriptive to allow for different types of projects. The chief recommendation we make are: ▪ consider restructuring the DMP template to make it easier to complete; tailored guidance, examples & terminology; ▪ reduce technical terminology; ▪ we recommend that the EC provides worked examples and guidelines on costing; ▪ clarity on how the DMPs will be reviewed and by whom. In January 2018, we organised the webinar “Results survey on H2020 DMP template” and shared the survey results with research support staff, data librarians, researchers, project leaders, and research funders. The 140 participants were encouraged to continue sharing their DMPs and feedback.

OpenAIRE2020 FINAL Summary Report Page 17 PUBLIC

5.3 Ticketing system Even though most questions and requests are intercepted from the NOADs, the helpdesk system, the ASK A QUESTION service, was a relevant and easy to use service that provides quick answers and solutions for specific concerns and issues reported by the OpenAIRE users. The support given by the technical and the guidelines team on aggregation and compatibility issues reported are a good example of this effect (see Figure 7. The impact of the update on the EC participant’s portal related to the list of H2020 project publications - deliver automatically from the OpenAIRE information space, resulted in a higher number of tickets associated with this topic (mainly tickets created in Open Access, Research Managers and Portal categories). “Data providers” and “FP7 post-grant OA pub funds” continued to be the most frequently used categories in the system, in a total of about 1.000 tickets. FIGURE 7. TICKETS BY STAKEHOLDER CATEGORY AND HIGH LEVEL TOPIC]

FIGURE 8. TICKETS BY TOPIC

Page 18 OpenAIRE2020 FINAL Summary Report PUBLIC

6| INTERNATIONAL OUTREACH OpenAIRE’s visibility and status as an important infrastructure for open science grew hugely during the timeframe of OpenAIRE2020. The main highlights were: • The emphasis placed on improving interoperability across regional networks. A repository networks accord, endorsed in May 2017 by 7 networks, laid the groundwork for a 2-day intensive technical meeting of 20 organizations representing repository networks around the world in May 2018. • The launch of technical working group for repository networks, a plan to internationalize vocabularies and guidelines, regular information sharing across networks, and engaging with other stakeholder organizations. • Increased efforts to engage in bilateral collaborations (as part of OpenAIRE 2020 and OpenAIRE Advance) most notably with LA Referencia in Latin America around common guidelines, but also with Canada, China, Japan, South Korea, West Africa. Global Alignment of Repository Networks Awareness and advocacy: Significant and ongoing efforts have been made to raise awareness of the important contribution of repositories and repository networks such as OpenAIRE in the contest of open science and scholarly communication. This requires making a case for repository services as part of interoperable networks in order to make visible library content and the need for collaboration. Bilateral engagement with networks/national initiatives: OpenAIRE has been increasing collaborative activities with various networks. In 2017, LA Referencia, the Latin American repository network, has already agreed to adopt common vocabularies and OpenAIRE guidelines and is working with national networks in Latin America to have them adopted at the level of local repositories. In addition, OpenAIRE has increased information exchange with a number of other networks (National Institute of Informatics in Japan, Chinese Academy of Sciences, KISTI in South Korea), in assisting the implementation of national data services efforts. The OpenAIRE Advance project will further build on these collaborations and expand pilots to other organizations (Canadian Association of Research Libraries and West and Central Africa Research and Education Network). Metadata guidelines and vocabularies: LA Referencia (the Latin American network which aggregates metadata from 9 countries) has already begun to implement the OpenAIRE guidelines into their network. Canada and Japan are also currently assessing the relevance of the guidelines for their regions and will likely begin to adopt them in the fall of 2018. International accord: In May 2017, the Confederation of Open Access Repositories convened a meeting to share information about current state of repository and repository networks across several regions: Canada, China, Europe, Japan, Latin America, South Africa, and United States. At this meeting an International Accord for Repository Networks was endorsed by eight regional organizations: Australasia, Canada, China, Europe, Japan, Latin America, South Africa, and United States. The aim of the accord is to increase collaboration across regional repository

OpenAIRE2020 FINAL Summary Report Page 19 PUBLIC networks in order to strengthen and enhance the distributed, community-based open access infrastructure around the world. Technical and strategic meeting: On May 14-15, 2018 COAR held a meeting of over 20 repository networks to expand the number of regional networks that were part of the discussions about international alignment, where OpenAIRE plays a leading role (Hamburg, DE at the Leibniz Information Centre for Economics). The aim of the meeting was to map current network services, define specific areas to improve interoperability, and work together on common challenges and solutions. There were three concrete outcomes, all related to OpenAIRE strategic priorities: 1. Launch a technical working group so that the networks can share information regularly 2. Develop a roadmap for internationalizing metadata and vocabularies 3. Promote the adoption and development of permanent identifiers with data providers (e.g. author IDs, funder IDs and organizational IDs), and Implementation of OpenAIRE Guidelines in Latin America and beyond LA Referencia approved the plan to adopt the OpenAIRE Guidelines in June 15, 2015 and delivered the first Report of current state and roadmap for implementation of guidelines in Latin America in May 20169. Within 2016 we saw the following adoption of the OpenAIRE guidelines: ▪ Support to SNAAC. National Node of Colombia. New National Guidelines Follows OpenAIRE. ▪ Mexico adopts OpenAIRE guidelines for literature and data repositories. ▪ Launch of the new National Node of Costa Rica (Kimuk) with OpenAIRE Guidelines. while in 2017 technical efforts helped to accelerate the processes for a faster and efficient alignment/integration with a new version of LA Referencia Harvester (3.0), adapting OpenAIRE Guidelines V.3. The LA Referencia harvester was transferred in 2017 to all countries with one exception as a national infra component. In parallel, two interoperability pilots were carried out: one resulted in approx. 1.4 mi OpenAIRE compatible records from the nine La Referencia nodes to be harvested, indexed and be visible in OpenAIRE; another in a proof of concept for identification of funder/grants and results (CONICYT). The main achievements and impact are summarized as follows: • At the technical/data level, the use of common technologies and guidelines easily facilitated interoperability, common services, transition to new practices, and a community of developers. • At the political level, there is a growing recognition of the role of LA Referencia in Open Science as a partner of a major EU eInfrastructure, placing it in advantageous position terms of managing research data and the adoption of data repositories. • LA Referencia is now viewed as key player in the OA and repositories community, participating in the development and evolution of OpenAIRE guidelines and vocabularies,

9 EU and Latin America working together towards a common Open Access implementation (press release).

Page 20 OpenAIRE2020 FINAL Summary Report PUBLIC

following the advances of v4.0 of OpenAIRE Guidelines, and defining priority areas of further interaction such as distributed statistics, and value add services such as brokers. In addition to La Referencia, the Japanese repository community, represented by National Institute of Informatics and the JPCOAR Consortium, announced in 2018 that they will also be adopting the OpenAIRE guidelines in Japanese repositories, while KISTI in South Korea is currently in agreement with OpenAIRE for the implementation of the OpenAIRE Guidelines for Literature and for Data Repositories as they are setting up their national data services. Liaising with related EU and global initiatives With the emerging developments in EU as presented with the European Open Science Cloud (EOSC) initiative and the proposed timeline for the integration of services for key EU e- Infrastructures, OpenAIRE’s plans have expanded to include: ▪ Intensified discussions on the positioning of OpenAIRE within EOSC. This involves more targeted outreach activities to established RIs to find commonalities, gaps, common synergies and if possible design and carry out pilots that will showcase a better integration of services. The focus of the discussions has primarily been on the assistance of OpenAIRE to RIs and research communities (CESSDA, CLARIN , Elixir, EPOS AgInfra+/FoodCloud, PARTHENOS) to provide support, training and services/tools for their acceleration to OS. ▪ Discussions on the integration of services (including support and training on open science) with EUDAT, EGI, Geant, OpenMinTeD and other e-Infras and VREs. More specifically, - OpenAIRE participated in the creation of the EC booklet “E-infrastructures: making Europe the best place for research and innovation”. - Was involved in the steering and programme committees for Digital Infrastructures for Research (DI4R)10, held in Krakow in September 2016 and Brussels in Oct 2017. ▪ Extension of the agenda with national, international and scientific communities that concern aspects discussed in the Open Science Policy Platform (models, impact, assessment, monitoring). OpenAIRE has actively participated in many strategic and technical hands-on collaborations with other similar or related initiatives, in order to exchange simple experience, data, software and/or services, or to undertake common pilots and visions. Among active collaborations, the most relevant include: • EOSC-Hub, EOSCpilot, eInfraCentral in preparation for collaborations for EOSC • DataCite, CrossRef, ORCID, ANDS, THOR/FREYA for collaboration on identifiers and their linking via Scholix • Software Heritage and Software Sustainability Institute for incorporating software as a research artifact in OpenAIRE • Force11 groups to identify implementation approaches for emerging services.

10 Co-organized by EGI, EUDAT, Geant, OpenAIRE, PRACE and RDA

OpenAIRE2020 FINAL Summary Report Page 21 PUBLIC

7| FP7 POST-GRANT OA PILOT The OpenAIRE Post-grant OA Pilot was launched by the European Commission (EC) in the context of the OpenAIRE2020 project for the period June 2015 to April 2017, which was extended for10 months until Feb 2018. The Pilot aims to fund the OA publication of research outputs arising from Framework Programme 7 (FP7)-funded projects and has initiated as a consequence of the EC’s “Communication Towards better access to scientific information: Boosting the benefits of public investments in research”, and the associated Recommendation on access to and preservation of scientific information. The Communication confirmed that ‘Gold’ OA publishing costs would be maintained in Horizon 2020 and included a commitment to consider ‘whether and under what conditions open access publication fees can be reimbursed after the end of the grant agreement’. 7.1 Payment of APCs

1.161 OA publications from 1.856 applications/requests 727 FP7 finished projects A total amount of1.923.743 EUR

Article average price: 1.473 EUR Monograph average price: 5.354 EUR Book chapter average price: 1.172 EUR

HIGHLIGHTS OF THE METHODOLOGY AND PROCESSES Rules: The consortium after a 6-month deliberation and consultation period established a set of eligibility criteria:  The post-grant Open Access Pilot will cover OA Article Processing Charges (APCs) for FP7 projects up to two years after they end.  A maximum of three publications per FP7 project will be funded.  Funded publications must be peer-reviewed and be made available under a CC-BY license where possible.  Publications should be deposited into an OpenAIRE-compliant repository  A maximum of €2,000 funding will be provided for covering the OA publication fees for research articles, book chapters and conference proceedings. A maximum of €6,000 will be allocated for publishing OA monographs (including VAT).  The €2,000 maximum funding above only applies to fully OA journal titles. No hybrid journals will be supported by this Pilot.  Eligible journal titles must be listed in standard directories like the Directory of Open Access Journals (DOAJ), Scopus, Web of Science or PubMed.

Page 22 OpenAIRE2020 FINAL Summary Report PUBLIC

Processes: The funds were distributed via: 1. An online platform, specifically built to accept the requests/applications from researchers, to process the requests for moderation, accounting and financing, and to monitor the . The platform worked quite efficiently and allowed us to better monitor the whole process. All APC values were fed to the OpenAPC initiative. 2. Block funds to libraries and publishers. This mechanism allowed us to delegate the effort (eligibility checking and funds processing) and to FIGURE 9. DISTRIBUTION OF FUNDING CHANNELS test out the interaction aspects. 355K EUR were channeled via 2 libraries (39 K EUR) and 5 publishers (361K EUR). Results: Some key facts are illustrated below:  Both averages are well below the imposed funding caps of € 2.000 for articles and € 6.000 for monographs. The median fee for articles was about € 1.413 and the lowest APC funded was € 209.  In total, we have received requests from 36 countries.

FIGURE 10. FP7-POST GRANT PILOT APC FUNDS OVER EUROPE

OpenAIRE2020 FINAL Summary Report Page 23 PUBLIC

FIGURE 11. FP7 POST GRANT PILOT FUNDING DISTRINTUINO OVER EUROPE

FIGURE 12. POST GRANT FP7 OA PILOT DISTRIBUTION OVER INSTITUTIONS

FIGURE 13. POST GRANT FP7 OA PILOT DISTRIBUTION OVER PUBLISHERS

Page 24 OpenAIRE2020 FINAL Summary Report PUBLIC

7.2 Funds for alternative publishing mechanisms In addition to the APC funds, the consortium had two open tender calls11 for funding alternative to APC OA publishing mechanisms. More specifically the first call (bids 1-11 in Table 5) was focused on facilitating technical improvements for existing OA journals and platforms, while the second call (bids 12-15 in Table 5) intended to experiment with sustainable funding models alternative to author facing article/book processing charges (APC/BPC). A total amount of 397.400 EUR was distributed to the institutes as illustrated in the following table, indicating a good distribution over Europe. TABLE 5. ALTERNATIVE FUNDING CONTRACTS No Title Institute Country Amount (€) 1 Internet Policy Review Alexander von Humboldt Institute for Germany 16.000 Internet and Society 2 Annals of Geophysics Istituto Nazionale di Geofisica e Italy 20.000 Vulcanologia 3 Hrčak University of Zagreb Computing Croatia 28.000 Centre (SRCE) 4 SCIndeks: The Serbian Centre for Evaluation in Education Serbia 17.000 Citation Index and Science (CEON) 5 Scientific Journals Online Federation of Finnish Learned Finland 10.000 Societies 6 Revistas CSIC Spanish National Research Council Spain 35.000K 7 Information Bulletin on Konkoly Observatory 14.000 Variable Stars 8 Open Praxis International Council for Open and Norway/Sp 10.400 Distance Education ain 9 EKT ePublishing National Documentation Greece 14.800 Centre/National Hellenic Research Foundation 10 Hungarian Educational University of Debrecen Hungary 15.000 Research Journal (HERJ) 11 International Journal of Digital Curation Centre (DCC) UK 17.200 Digital Curation (IJDC) 12 Mattering Press Lancaster University UK 47.500 13 Language Science Press Language Science Press Germany 18.500 14 Fair Open Access Foundation Fair Open Access Alliance International 40.500 Alliance (FOAA) 15 SciPost University of Amsterdam Netherlands 39.500 16 Open Library of Birkbeck College UK 30.500 Humanities 17 IBL PAN Institute of Literary Research of the Poland 23.500 Polish Academy of Sciences

11 https://blogs.openaire.eu/?p=2373, https://www.openaire.eu/second-call-for-proposals-for-alternative- funding-mechanism-for-non-author-fee-based-open-access-publishing

OpenAIRE2020 FINAL Summary Report Page 25 PUBLIC

8| TECHNICAL ACTIVITIES OpenAIRE maintains a TRL9 data platform which includes separate production, beta, testing and development environments, each with its own backend and frontend. These include dedicated servers for individual services as well as separate Hadoop clusters for compute intensive tasks (again for testing, production, and a dedicated cluster for inference operations), adequate storage (separate for backups) hosted at ICM data centre facilities. The operation has been set up, configured and tuned for optimum performance in 2016-17. The big data technologies run on dedicated clusters supporting Cloudera Hadoop (CDH5), Spark and Virtuoso (for LOD). The production infrastructure is based on 384 cores (768 threads), 2048 GB RAM and 384 TB HDD (HDFS)12. As the size of LOD data generated in OpenAIRE grows, the machine supporting virtuoso has been expanded up to 16 CPU cores and 48GB of memory. Guidelines Literature Guidelines: Through a series of consultation processes with partners of international and national repository networks the literature Guidelines are now in version 4.0. The schema replaces the current info:eu-repo Application Profile and allows the recording of identifier information. Data Archive Guidelines: The Data Archive Guidelines follow the DataCite 4.0 schema and allow for the granular indication of funding information as well as the linking to publications. CRIS Guidelines: The OpenAIRE-CRIS Guidelines are the result of collaboration with the EuroCRIS task group and use CERIF as an intermediate layer of information exchange. OpenAIRE portal and end user services The portal services have gone through an overhaul process to allow for improved user experience (UI/UX), better response times and more efficient maintenance procedures. Using a mix of Google’s Angular JS framework and traditional CMS, the new integrated portals design mark a shift to a Dashboard-like view, where each Dashboard groups functionalities addressed to specific stakeholders. All dashboards are interconnected through a Single Sign On (SSO) mechanism (AAI)13 which will effectively allow OpenAIRE services to seamlessly link to EOSC:  www.openaire.eu - main entry point for information on policies, support, training, helpdesk, organization and the projects  explore.openaire.eu - search, claiming, depositing or publishing, project and research institution dashboard services  provide.openaire.eu - registration, validation, subscription to broker, metrics/usage statistics for content providers  monitor.openaire.eu - funder dashboard for statistics & reports  develop.openaire.eu - OpenAIRE’s APIs: oai-pmh, REST, LOD

12 These numbers come from end of 2017 and will be updated in fall 2018. Compare with 282 CPU cores, 1TB of RAM, and 38 TB of disk space in 2015. 13 AAI service has been a joint effort with the OpenMinTeD project via GRNET’s implementation.

Page 26 OpenAIRE2020 FINAL Summary Report PUBLIC

Data quality monitoring services Starting from the aggregation of a few million objects, OpenAIRE now aggregates about 100+ million metadata records, provided by 1100+ data sources. This requires a variety of automated processes that take the data from ingestion to production (acquisition, cleansing and enrichment workflows), implemented with big data technologies running on the cloud, often resulting in quality issues of the final data. The Data Flow Monitoring Service performs semi-automated checks to verify the final content correctness and completeness before it becomes public, effectively checking the aggregation, de- duplication, inference, statistics mechanisms. The service, already in production since October 2015, sees incremental enhancements to include new monitoring features. The service offers a web UI from which it is possible to configure, observe and control if data processing and data provisioning workflows have been correctly executed, if the data has the properties expected from one indexing to the other, and overall keep under control the “expected behaviour” of the system. OpenAIRE deduplication requirements, derived from the high heterogeneity, multidisciplinarity, and dynamicity of the aggregated metadata records, as well from the considerable number of objects subject to deduplication, led to a big improvement of OpenAIRE’s data quality, while at the same time decreases human intervention in the dataflows. Research monitoring services Going beyond the EC we have initiated collaborations with national funding agencies to provide the funding extraction services. This has been an ongoing activity throughout the project and has resulted in the Funder Monitoring Dashboard (OpenAIRE.MONITOR) The table below illustrates the progress on different funders, run on full texts.

Based on the OpenAIRE data (with off-production linking with Scimago) OpenAIRE provided yearly reports to the EC, used by the H2020 HLEG for the interim evaluation, and by DG RTD for policy making. These reports were published in the EU’s Open Data Portal.

OpenAIRE collaborated with SMEs and other initiatives to provide data and data analysis services for: a. a pilot/prototype “Open Science Monitor”, an DG-RTD tender (led by RAND UK) b. three (2) ex-post FP7 Health evaluation tenders (led by PPMI LTI) c. an ERC commissioned study (led by PPMI LTI) d. feeding data to EU projects and other initiatives for research on funder indicator assessment (e.g., Data4Impact - http://www.data4impact.eu/)

OpenAIRE2020 FINAL Summary Report Page 27 PUBLIC

TABLE 6. FUNDER DATABASES IN OPENAIRE # of # of related Funder Country Status projects publications FCT Portugal production 37.277 42.053 Australian Research Council Australia production 24.269 15.993 National Health and Medical Research Australia production 26.500 12.202 Council National Science Foundation USA production 497.646 176.057 Science Foundation Ireland production 3.411 3.238 Ministry of Science Education and Sport Croatia production 2.120 2.319 Croatian Science Foundation Croatia production 881 1.392 Netherlands Organisation for Scientific Netherlands production 24.180 17.804 Research (NWO) Deutsche Forschungsgemeinschaft (DFG) Germany testing 21.796 6,585 National Institutes of Health (NIH) USA production 1.682.692 187.660 Serbia National Funding Scheme (MESTD) Serbia production 777 13.867 Swiss National Science Foundation (SNSF) Switzerland production 68.612 11.410

Austrian Science Fund (FWF) Austria production 14.395 15.668 Tara Expeditions Foundation Italy beta 5 73 Academy of Finland Finland beta 23.932 6.615 CONICYT - Comisión Nacional de Chile beta 8.613 7.257 Investigación Científica y Tecnológica European Commission EU production 44.406 281.369 FECYT - Gobierno de España Spain Beta 6.674 6.786 Research Council UK UK production 82,964 49.647 Tubitak - Türkiye Bilimsel ve Teknolojik Turkey production 16,609 2.024 Araştırma Kurumu Wellcome Trust UK production 12,966 61.798

Usage statistics monitoring services The usage statistics, started as a pilot and now a full operating service, currently retrieves usage data from 128 data providers and processes 47 mi usage events. The activity and service itself has started to attract attention from both literature and data repositories (RDA Data metrics WG and has been a key objective in the collaboration of EOSC-hub and OpenAIRE-Advance), as usage data becomes part of the broader altmetrics and research assessment trends.

Page 28 OpenAIRE2020 FINAL Summary Report PUBLIC

The service supports the tracking of usage events from individual repositories via compatible plugins for DSpace, ePrints and Islandora platforms (developed and distributed by OpenAIRE technical team to support a more accurate tracking of downloads14) or alternatively via COUNTER reports retrieved from SUSHI-Lite endpoints (e.g. from IRUS-UK compatible repositories). The service adopts the Matomo/Piwik Open Source Software while taking into consideration the de-duplicated records of OpenAIRE and COUNTER rules. The tracking software fully addresses data privacy issues by anonymising the IP addresses inside the usage event data already on the client side and also applying the General Data Protection Regulation (GDPR) on the server side15. Usage statistics have been integrated into the Content Provider Dashboard (OpenAIRE.PROVIDE) (beta) so that repository managers can join the service, receive configuration details and access usage statistics as graphs and reports for their data source(s). A Usage Statistics Guide is available at: https://openaire.github.io/usage-statistics-guidelines/. While all aggregated and cleaned usage statistics are visible to all in the new OpenAIRE.EXPLORE portal (beta), machine access is enabled via the OpenAIRE SUSHI-Lite endpoint, which supports the generation of COUNTER compliant reports about all items hosted in a data source, individual items and more specifically for journal articles, books and book chapters. Zenodo Zenodo.org is the OpenAIRE catch-all repository, capable of offering preservation, DOI minting, data curation, community management functionalities (and others) to researchers around the world and for free up to a 50GB quota per deposition. Zenodo has been updated during the OpenAIRE2020 to enhance such functionalities and to align its metadata data model to the OpenAIRE data model, in order to ensure smooth interoperability between the repository and OpenAIRE Information Space services. Zenodo went through a major overhaul in OpenAIRE2020 which brought major user experience improvements, while making maintenance and development significantly more efficient, i.e., allows us to provide more features and support the increased usage of Zenodo with the existing manpower. The re-launch also improved service stability and fixed the bottlenecks previously reported (e.g. average response time went down from 1000ms to 200ms for the front-page and search went from 8000ms to 700ms). New functionalities were added as follows:

14 https://github.com/openaire/OpenAIRE-Piwik-DSpace, https://github.com/openaire/EPrints-OAPiwik 15 OpenAIRE Usage Statistics – GDPR: https://docs.google.com/document/d/1UzPZ90EtuQshht0XVBUC1mE7J7fIoEGxpPUfoo2AYjs/edit

OpenAIRE2020 FINAL Summary Report Page 29 PUBLIC

 Access to restricted content: enable users to request access to restricted content on Zenodo (e.g. clinical trial data).  New module for backend monitoring and service metrics integrated with CERN cloud monitoring infrastructure.  New upload types for project deliverables, project milestones and lessons.  New metadata fields for contributors and subjects with support for identifiers (e.g. GND).  Spam fighting mechanisms to ensure quality of content.  User profile with support for changing email address.  Launch of DOI versioning support, allowing users to update their datasets while still maintaining specificity and persistent identification of individual versions.  Improved the integration between OpenAIRE and Zenodo, so as soon as a record is uploaded on Zenodo, an OpenAIRE icon appears that allow the user to see the record in OpenAIRE seconds after it was uploaded in Zenodo.  Integrated multiple funders so that users can upload and link records on Zenodo with any funder supported by OpenAIRE.  Integrated XML Sitemaps on Zenodo for better indexing by Google.  Updated all DataCite DOIs from v3.x to v4.1 the schema in a rate-limited manner.  Integrated preview for images larger than 5MB and generation of thumbnails.  Developed a module to dump the full metadata of all Zenodo records into a file to support transfer all Zenodo content to other providers.  Added dockerization of Zenodo

Usage-wise, Zenodo saw an eight-fold increase (8x) in number of visitors per year, from 140k visitors in 2015 to 1.3 mi visitors in 2018 (50% from Europe), and a twenty-fold increase (20x) in number of records from 20k to 423k, with a global ‘clientele’.

Literature-Data Integration ScholExplorer: In the context of the WDS/RDA Publishing Data Services WG OpenAIRE has contributed to (i) the design and development of a Data-Literature Interlinking Service, and (ii) the development of of the Scholix.org framework for enabling exchange of scholarly links. The OpenAIRE ScholExplorer Service, an instantiation of the Scholix framework, is now available in production16 offering access to one of the largest collection of links between publications and datasets: 38m bi-directional links from 13.000 publishers and 10 data centers (via DataCite, CrossRef and OpenAIRE). The service has had a great reception among RDA members as shown by the rapid uptake of the service API from international publishers (800Mi hits in Feb-Jun 2018, the majority coming from Elsevier Science Direct which has added link-to-dataset resolution

16 ElasticSearch cluster with 7 virtual machines and 44 CPUS

Page 30 OpenAIRE2020 FINAL Summary Report PUBLIC plugin to the platform UI). It will further add value to the open science ecosystem through its data integration to the OpenAIRE OA Broker. Data Citation index: EMBL-EBI, DANS and other partners have examined the way data is cited providing an overview of ongoing efforts in the field across the life, earth and social sciences and the humanities, as well as building requirements for the improvement of data citation practices. EBI has studied the extent to which metadata retrieved through public APIs for records from typical biomedical databases conforms to the FORCE11 data citation principles. A jointly funded OpenAIRE-THOR Literature-Data integration workshop was conducted with several actions being agreed upon. Furthermore, EMBL-EBI prototyped and introduced a tool to ease data citation in JATS at DCIP Boston. OA Publication Broker Via the OpenAIRE dashboard for content providers (OpenAIRE.PROVIDE), repository managers subsribe to and receive notifications delivered by the the OpenAIRE Broker Service. - Additions are events regarding the addition of records to a repository: this means that the event provider must be aware of the records available to a repository and realize that a given record is not part of the collection but should very likely be (e.g. one of the authors is affiliated with the institution operating the repository).

- Enrichments are events regarding enhancements of the records in a repository collection: this means the event provider has somehow enriched the original records of the repository. - Alerts are events regarding the identification of mistakes in a repository record, ranging from invalid XMLs, to wrong metadata information: this means the event provider has processed the original records of the repository and realized a mistake was present. Event providers, properly registered to the broker, can generate events as triples and send them via APIs to the broker using the OpenAIRE repository identification system, which is based on OpenDOAR. Content providers can subscribe to given topics relative to enrichments, additions, or alerts and receive the relative notifications. They can also configure selection criteria (restricting notification to matching events based on given authors, title, subject criteria), notification time intervals, and notification means (e.g. via mail or via APIs, for the moment only the former has been implemented). OA Broker activities and tasks are further enhanced via the OpenAIRE-Connect project. LOD Services The OpenAIRE LOD, one of the largest LOD datasets globally17, is accessible via a SPARQL endpoint and an RDF dump downloadable from Zenodo. The activities to develop, build, maintain and publish it include the following aspects:

17 The size of the OpenAIRE LOD has reached the technical limits of open source, non-commercial RDF databases like Virtuoso, pushing the technical team to move to commercial, academic licensed approaches.

OpenAIRE2020 FINAL Summary Report Page 31 PUBLIC

Ontology specification: A mapping of the rich OpenAIRE data model into a LOD compatible model, using the community emerging ontologies and vocabularies. The main challenges were to identify the most suitable vocabularies for reuse, but also to define our own, OpenAIRE specific vocabulary terms for fields not covered by existing Linked Data vocabularies. Reused vocabularies include Dublin Core for general metadata, SKOS for classification and CERIF for research organizations and activities. Our vocabulary specification maps entity types of the OpenAIRE data model to RDF classes, and their attributes and relationships to RDF properties. Mapping the OpenAIRE information graph to RDF: Through a comparison of mappings from different materialisations of the OpenAIRE graph (HBASE table, the Solr full-text index and the CSV), we selected CSV as the most efficient mapping strategy in terms of scalability, extensibility and maintenance. Through a D-NET workflow consisting of a chain of Map/Reduce jobs, the data is automatically mapped, transformed and fed into a Vistuoso installation. Interlinking to related datasets: A scalable, generic pipeline for interlinking of the OpenAIRE LOD with other candidate datasets has been implemented, and work is under progress. Target datasets that have been interlinked are currently DBLP, OpenCitations and WikiData, with preliniary results being under evaluation. OpenAIRE APIs OpenAIRE provides a set of APIs to allow 3rd parties to retrieve its data (the scholarly communication graph or parts of it). We provide data in different forms, used in different purposes via the page http://api.openaire.eu, where we provide description of the OpenAIRE schema, documentation, license, and SLAs and changelogs. HTTP Search API provide end points to retrieve any types of research FIGURE 14. INCREASING DEMAND OF OPENAIRE DATA (EXCLUDES BOTS) entities. Recent enhancements include results of type software, and requests for community related research results. OAI-PMH Publisher is the endpoint for the OAI-PMH API protocol. Project exporter for DSpace/ePrints repositories are specific APIs to support and DSpace repository platforms at accessing the list of EC projects during metadata deposition, in order to comply with EC regulations. Linked Open Data APIs provide a SPARQL endpoint to access the OpenIRE LOD. Similar to other initiatives this has a dedicated page accessible at http://lod.openaire.eu.

Page 32 OpenAIRE2020 FINAL Summary Report PUBLIC

Anonymization services AMNESIA, the OpenAIRE co-funded anonymization service allows scientists to anonymize their sensitive data collections to different degrees of anonymization so that they are able to publish them. It supports three algorithms for k-anonymity and km-anonymity, covering tabular and set valued data, and it is integrated with Zenodo. AMNESIA has been publicly released and is available at https://amnesia.openaire.eu and is available both as a standalone application (to be locally installed in intranets or to be embedded in research productions/publishing workflows), as well as a web application primarily to be used for testing. Inference and research analytics services The backend inference system (IIS) has seen a significant amount of developments in the project duration so as to (i) scale up to big data with the incorporation of Apache Spark and to ensure a seamless interaction with the aggregation and de-duplication systems, and (ii) accommodate new types of data sources and new requirements.

ENTITY RESOLUTION ALGORITHMS A continuous stream of new text and data mining (entity resolution) modules and workflows were introduced to enrich, create and connect research artifacts/entities:  Affiliation matching: based on and improving the CERMINE tool we extracted metadata and full text from PDF files for affiliation parsing, i.e., identifying affiliation string elements for institution name, address, country for authors.  Links to accession numbers: A domain-specific concept mining for extracting gene symbols, chemicals, organisms, and various bio-entities. In 2016 the mining of RCSB (PDB) codes was deployed in production, with 8171 publications with 35.335 links.  Links to publications: extract citations from PDFs and include them as a property of OpenAIRE publications.  Links to patents: An ongoing development of a patent-publication links mining algorithm, initially targeting EPO. A ‘strict’ TDM algorithm produced 112 EPO patents on ArXiv dataset.  Links to funders/grants: Text mining algorithms that identify and extract funding information from the full text of scientific publications. Each funding mining algorithm identifies the funders of a publication at the project (grant) level and is specific to each funder, taking into account each funder’s unique funding pattern of acknowledgement statement.  Links to OpenTrials: An extraction module for mining publications for identifying links to OpenTrials (visible from the OpenTrial web site).  Links to software: Through a collaboration with Software Heritage and their archive with more than 4 bi files from more than 80 mi origins, OpenAIRE and SH work together to enable references and citations to software projects in SH (pending in the OpenAIRE portal).  Links to communities: A set of customized inference algorithms capable of identifying links between publications and communities based on input provided by community administrators (combined work with OpenAIRE-Connect).

OpenAIRE2020 FINAL Summary Report Page 33 PUBLIC

 Narrative pattern analysis: Investigated the topic of narrative pattern analysis and prepared a prototype algorithm. Tested on PubMed data our algorithm yielded the best result (F-score) among similar ones (PDFX, GROBID, ParsCit).

CLASSIFICATION AND CLUSTERING ALGORITHMS Document classifiers: We developed a document classification module based on a classification scheme similar to the Naive Bayes classifier, optimized through an extended relational database. The OpenAIRE classification module has been trained and evaluated using several taxonomies (arXiv, WoS, PMC, DDC and ACM. During the course of the project we introduced a pre- classification step, whose purpose is to estimate the prior probability that a publication belongs to a given taxonomy, before the taxonomy-specific classification step. In this manner, we are able to appropriately classify documents from a wide range of scientific/scholarly domains. Probabilistic topic modelling: We have developed and deployed (for internal use) an end-to- end Probabilistic Topic Modeling (PTM) platform which analyzes massive collections of documents and related metadata to produce an interactive research map. It assists policy makers to: ▪ identify active areas of research: discover hidden themes (topics) ▪ understand what is actually produced: project the output to the reduced topic space (calculate topic distributions per document/entity) ▪ discover clusters and communities: topic based similarity analysis ▪ identify emerging research areas: topic based trend analysis ▪ assess coverage, identify gaps or new challenges: compare EU funded research vs. global ▪ assess research collaboration: authorship network analysis The platform consists of a set of integrated modules:  innovative supervised and unsupervised machine learning algorithms: intelligent, scalable PTM for mining Text Augmented Heterogeneous Information Networks (TA-HINets)  algorithms for trend-based analysis  a WikiPedia semantic annotator module, via DBPedia Spotlight annotator and the DBPedia Sparql Endpoint, which automatically annotatse all documents with DBPedia (WIKIPedia) terms  an interactive WEB-based interface that captures and demonstrates topic-based similarities between projects/grants, publications & authors, and identifies groupings and relations among them  interactive visualization mechanisms that provide an automated and extensible multi- dimensional analysis of all scholarly content including textual information (publications, project summaries, reports), as well as related side information including funder research areas, grants, related taxonomies (e.g., MESH Terms, ACM classification) and Journals.  metrics that promote qualitative and quantitative analysis of the results. The proposed platform is actively utilized in several real world use cases mainly related to Research Funding Evaluation, e.g., evaluating FP7 or specific FP7 sub research areas.

Page 34 OpenAIRE2020 FINAL Summary Report PUBLIC

9| MISC Open Tender Calls OpenAIRE has provided a fund of 80.000 EUR, divided into two lots, via an open call (published in Nov 2018) to address different levels of the OpenAIRE e-Infrastructure: LOT #1: REPOSITORY TOOLS AND SERVICES – 30.000 EUR This lot addressed the lower level of the infrastructure, i.e., the repository /journal/CRIS platform service providers, aiming to facilitate services that improve interoperability. The target was a large impact for the community, either covering popular repository platforms, or targeting large platforms with proprietary APIs, so as to ensure that the repository community uptakes OpenAIRE services at large. The lot included the following, non-exhaustive list of topics: ▪ Implementation and customization of new protocols ▪ Implementation of OpenAIRE metadata guidelines and OpenAIRE-API interaction ▪ Integration with OpenAIRE OA Notification Broker LOT #2: VALUE-ADDED SERVICES - 50.000 EUR This lot aimed for the provision of value-added services on top of OpenAIRE’s collected data, and ncluded the following, non-exhaustive list of suggested services, as gathered by previous requirement elicitation processes: ▪ Services to enhance scholarly communication ▪ Links to other infrastructures ▪ Services that improve internal OpenAIRE technologies/mechanisms ▪ Visualization ▪ Mobile applications 13 bids were submitted, 4 in LOT-I and 9 in LOT-II, which were evaluated by six (6) members of OpenAIRE on the following criteria: C1 – Alignment with the OpenAIRE objectives and promotion of open scholarship C2 – Appropriateness, feasibility of methodology C3 – Risk assessment and timescales C4 – Appropriateness of level of staffing, resources, expertise C5 – Level of innovation C6 – Cost breakdown, price and value for money C7 – Dissemination plan and initiatives. C8 – Project experience and proven track records planning, management, delivery C9 – Impact on open scholarship, OpenAIRE services and consumers

OpenAIRE2020 FINAL Summary Report Page 35 PUBLIC

TABLE 7. OPEN TENDER CALL WINNERS Proposal Budget Lot-I Repository tools (Andra Bollini from 4Science) 24,000 Objective: Increase the interoperability features supported out-of-box by the most broadly used platforms, in the open science ecosystem, for Literature Repositories, Data Repositories, Journals platforms and CRIS/RIMS, such as DSpace, Dataverse, OJS and DSpace-CRIS. In particular we propose to implement:  the signposting patterns in OJS  the resourceSync framework in DSpace  the OpenAIRE Guidelines for Data repositories in Dataverse Metis4OpenAIRE (Pablo de Castro from euroCRIS) 14,000 Objective: Implement the CERIF-XML Guidelines for CRIS Managers on METIS at Radboud University in Nijmegen, the Netherlands institutional CRIS platform through a joint effort by the developers at the institution and euroCRIS. Lot-II VIPER from Open Knowledge Maps (OKM) 15,000 Objective: Build graph visualization of publications and datasets of projects in the OpenAIRE database with award-winning open source mapping software Head Start. Exploit a unique property of OpenAIRE data, that is the link between projects and publications & datasets, which will in turn realize a unique open science application. OpenAIRE MatchMaker: (Know-Center) 14,500 Objective: A novel recommender service to enable researchers and institutions to find potential research collaborators, built upon OpenAIRE’s project and other scholarly information available via the OpenAIRE API. Data2Paper (Jemura Ltd) 10,000 Objective: Leveraging OpenAIRE metadata to improve the user experience in a ‘one-click’ process to streamline the data paper publication workflow, to incentivize researchers to deposit their data in high quality (DataCite and ORCID compliant) repositories and, through data papers, encourage and enable its sharing, verification and re-use, in alignment with FAIR (Findable, Accessible, Interoperable and Re-usable) principles. Total 77,500

Page 36 OpenAIRE2020 FINAL Summary Report PUBLIC

Data Pilot Legal Issues OpenAIRE carried out two legal related studies on the topics below which examined two areas of legal relevance for the Open Research Data Pilot: Data protection: The report describes the European data protection directive as well as the situation in selected Member States. In addition, the new European general data protection regulation, which will enter into force in 2018, and its implications for the open access use of research data is described. Special focus is placed on leading data protection principles. Thereafter, the open access use of research data as intended within the Open Research Data Pilot and the experiences of the Commission in terms of running the Data Pilot are assessed, as well as some basic example of repository use forms. Implications of PSI legislature for Open Data: The second part of the study analyses the extent to which legislation on public sector information (PSI) influences access to and re-use of research data. The PSI Directive (2003/98/EC) and the impact of its revision in 2013 (2013/37/EU) are described. There is a special focus on the application of PSI legislation to public libraries, including university and research libraries, and its practical implications. In the final part of the study the results are critically evaluated and core recommendations are made to improve the legal situation in relation to research data. The full study is now published and a monograph18. Open Review 2016 saw the successful implementation of the tenders for open review prototypes from (1) The Winnower, (2) Open Scholar. Both small-scale projects provided different ways to implement open peer review functionalities on top of e-Infrastructures, with the Winnower acting as a platform for reviews of Zenodo content and Open Scholar building a prototype open peer review module (OPRM) for open access repositories. Work was completed in early 2016. Open Edition (linked via ) carried out open peer review experiments in cooperation with the journal vertigo.revues.org. UGOE conducted multi-stakeholder surveys on attitudes to OPR via 1) focus-groups held at the OPR workshop in Goettingen, 6th June 2016, and 2) an online survey on attitudes to Open Peer Review from 8th September to 7th October (with 3062 complete responses received). The experiment concluded in 2016, generating invaluable feedback on the implementation process and underlining the importance of human mediation in open peer review19. A unified definition for “open review” or “open peer review” has now been created and is described on three OpenAIRE blogs: • Defining Open Peer Review: Part One – Competing Definitions • Defining Open Peer Review: Part Two – Seven Traits of OPR • Defining Open Peer Review: Part Three – A Community Endorsed Definition

18 OpenAIRE2020 Legal Study on PSI https://www.univerlag.uni-goettingen.de/handle/3/isbn-978-3-86395- 334-8 19 The reports for all three experiments/prototypes are available: https://doi.org/10.5281/zenodo.154647

OpenAIRE2020 FINAL Summary Report Page 37 PUBLIC

Based on a multi-stakeholder survey on attitudes to OPR20 a lot of interest was garnered from the publishing community and initial results were featured in Nature News. Measuring OA Impact This study21 analysed the OpenAIRE data to produce multidimensional indicators of scholarly performance and indicators for second level impact that link OA to possible social, cultural and economic impact. It explored which data sources should be the basis for OpenAIRE indicators defining the core principles on which standards for OpenAIRE metrics for evaluation should be based and the principles on which an infrastructure of OpenAIRE indicators should be based. Based on OpenAIRE data for FP7, CWTS produced a report to correlate OA EC’s publications with citation analysis metrics. This final report allowed the synching of FP7 publications from OpenAIRE with the (annual) release of Web of Science to produce the most relevant and complete possible analysis. The study lends an interesting perspective on the European research landscape, the successful teams when it comes to EU funding from the FP7 programme, and the way these teams published in open access journals or not. Compared to other studies, it was found that these teams do reach relative high impact scores on their open access format published research outputs. Yet another interesting finding was that in particular the East European member States seem to be relatively more successful in publishing in open access journals, as compared to their Western European partners/and/or competitors. More specifically, universities or research institutes in the engineering domains are quite successful in open access publishing. Studies Impact These studies were instrumental in moving the scholarly communication field forward. A number of publications were published, four of which in peer-reviewed journals which marks significant interaction and awareness raising of OpenAIRE’s standing in the field.

TABLE 8. PUBLICATIONS FROM OPENAIRE2020 STUDIES Bousfield D, McEntyre J, Velankar S et al. Patterns of database citation in articles and patents indicate long-term scientific and industry value of biological data resources [version 1; referees: 3 approved]. F1000Research 2016, 5(ELIXIR):160 (doi: 10.12688/f1000research.7911.1) Schmidt, B., Deppe, A., Bordier, J, and Ross-Hellauer, T., 2016, “Peer Review on the Move from Closed to Open”, In: Positioning and Power in Academic Publishing: Players, Agents and Agendas (Proceedings of the 20th International Conference on Electronic Publishing), Pp. 91 – 98, DOI: 10.3233/978-1-61499-649-1-91 Ross-Hellauer, T., 2017, “OpenAIRE2020 D7.4 - Novel Models for Open Peer Review”, DOI: 10.5281/zenodo.1257257

20 a) Focus-groups held at the OPR workshop in Göttingen, 6th June 2016, and b) on online survey on attitudes to Open Peer Review from 8th September to 7th October (with 3062 complete responses received). 21 Study on OA impact https://zenodo.org/record/1257286#.WypBZEiFNPZ

Page 38 OpenAIRE2020 FINAL Summary Report PUBLIC

Bordier, J., 2017, “Open peer review: from an experiment to a model: A narrative of an open peer review experimentation”, Retrieved from: https://hal.archives-ouvertes.fr/hal-01283582 Ross-Hellauer, T., Deppe, A., Schmidt, B., 2017, “OpenAIRE survey on open peer review: Attitudes and experience amongst editors, authors and reviewers”, DOI: 10.5281/zenodo.570864 Ross-Hellauer, T., 2017, “What is open peer review? A systematic review”, F1000Research, DOI: 10.12688/f1000research.11369.2 Adrian Burton, Hylke Koers, Paolo Manghi, Markus Stocker, Martin Fenner, Sandro La Bruzzo, Amir Aryani, Michael Diepenbroek, and Uwe Schindler (2016). The Scholix Framework for Interoperability in Data-Literature Information Exchange. D-Lib Magazine http://www.dlib.org/dlib/january17/burton/01burton.html

OpenAIRE2020 FINAL Summary Report Page 39