Report Exploring and optimising the Dutch data landscape

Final report of the project team “Exploring and optimising the national data landscape”, part of the FAIR data programme line

NPOS (2020) Final report Exploring and optimising the Dutch data landscape

Colophon

Copyright © 2020 National Programme Open Science / programme line FAIR data

Autors: Melle de Vries (Royal Academy of Arts and Sciences, KNAW), Project Manager Ruben Kok (Dutch Techcentre for Lifesciences, DTL) Maurice Bouwhuis (SURF) Pieter Schipper (/Netherlands Organisation for Scientific Research, NWO)

Title: NPOS (2020) Final report Exploring and optimising the Dutch data landscape

English translation: Livewords Maastricht

Use, reproduction and distribution of the content of this report is permitted if referenced to this report.

The views of the report’s authors do not necessarily reflect those of the organisations for which they work.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 1

Contents Summary ...... 4 Chapter 1. Introduction ...... 8 Background ...... 8 Approach ...... 8 Project scope ...... 9 Reader’s guide ...... 9 Chapter 2. National and international trends and developments ...... 10 The Netherlands ...... 10 Europe ...... 14 Worldwide ...... 20 Chapter 3. Services ...... 22 A. Coordination and networks ...... 23 B. Knowledge development and dissemination ...... 25 C. Repositories and other facilities ...... 26 D. Support and training ...... 31 Chapter 4. Organisations ...... 32 Policy organisations ...... 32 Research funding bodies ...... 33 Research institutions ...... 34 Service organisations ...... 35 Cooperation ...... 36 National knowledge institutions ...... 40 Commercial parties ...... 41 Chapter 5. Regulatory aspects ...... 43 Chapter 6. Governance ...... 47 Governance of research at national level ...... 47 Support at national level ...... 49 Local level ...... 51 Consultation structures ...... 51 Conclusion ...... 53 Chapter 7. Conclusions and recommendations ...... 56 Findings ...... 56 Recommendations ...... 58 A roadmap for the FAIR Data programme line ...... 62

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 2

Appendices ...... 65

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 3

Summary

Study This paper is the final report on the National Open Science Programme project ‘Exploring and optimising the Dutch data landscape’. The project commenced in May 2019 and involved studying various documents and websites, talking to stakeholders, holding panel meetings and conducting working visits. The focus was on organisational matters: which parties and structures play a role in the Dutch data landscape?

It proved difficult to acquire a detailed and exhaustive overview, on the one hand because the subject ‘data landscape’ or ‘data services’ is not clearly delineated and on the other because many matters are in fact organised, but not always in an obvious way. Our conclusions, in brief, are as follows. The Netherlands has an abundant but fragmented and complex data landscape, not only in terms of data services but also with regard to the development and dissemination of knowledge concerning research data management. The risk is that there will be overlaps and inefficiency and that we will miss out on opportunities for connectivity and innovation. Researchers themselves say that they lack an overview.

It is important to point out that our study concerns the Dutch national data landscape, whereas science is essentially an international affair. Dutch researchers often take part in international projects and infrastructures, making use of facilities and tools outside the country’s borders. At the same time, research clearly depends on national funding and rules, with the EU, the OECD, UNESCO and other international bodies relying heavily on countries in this respect. That requires national coordination, which will promote more effective use of available international resources and also deliver firmer guarantees for Dutch input at international level.

Optimisation The project also makes proposals for optimising the Dutch data landscape through the National Open Science Programme’s FAIR data programme line.

NPOS FAIR data programme line The aim of this programme line is to ensure appropriate facilities and satisfy other criteria for optimising the use/reuse of research data in the Netherlands, in line with European and other international trends and developments: • A consistent system for FAIR access to research data: Practical elaboration and implementation of FAIR criteria within technical and policy parameters. • Sustainable storage of research data for reuse: Research data stored in a consistent, reliable and sustainable manner. • Implementing standards to achieve interoperability of datasets across disciplines, where possible making use of existing interoperability between data management systems in organisations.

Essentially, by introducing some form of coordination, the programme seeks to achieve closer alignment, a better overview, greater synergy and more progress in the well-stocked Dutch data landscape with a view to reusing research data optimally in an international context (which is becoming ever more critical given the growing importance of artificial intelligence and data science).

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 4

The proposed approach uses existing structures/organisations as much as possible, but within a more clearly delineated framework. The recommendation is to put the NPOS Steering Group in charge for the next few years during the transition phase, assisted by a Programme Board and a small, temporary coordination centre.

In addition to closer coordination, real progress towards optimising the reuse of research data will require additional funding (in particular for data stewards) and improvements to recognition and reward mechanisms.

Roadmap for the FAIR data programme line: initial efforts

Proposed actions 1. Set up a National Data Coordination and Expertise Centre (NDCE) for at least the 2020-2025 period under the auspices of the NPOS Steering Group. The Centre will coordinate and support implementation of the recommendations, or itself undertake implementation, and establish the necessary alliances at national and international level, including participation in the European Open Science Cloud (EOSC). 2. Establish an advisory Programme Board for the FAIR data programme line, consisting of representatives of the research funding bodies, the knowledge institutions (universities, UMCs, KNAW, NWO-I and universities of applied sciences) and large-scale research infrastructures, both researchers and support staff. Consider the specific characteristics and related needs and coordination structures in individual fields of science, but monitor overall coherence. 3. Encourage knowledge institutions (if possible by providing additional funding) to make demonstrable progress towards making FAIR research data available and free up the necessary capacity and infrastructure to do so. Use simple and accepted indicators to measure progress.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 5

4. Establish a national network of all data service providers and repositories (a ‘National Open Science Cloud’ or ‘Commons’ or ‘Network’) aimed at closer coordination, a better overview and greater synergy, for example illustrated by a national catalogue of services, with the NDCE acting as coordinator. After a certain period, consideration can be given to appointing a coordinator. 5. Set up a national expertise centre (building on the achievements of the National Coordination Point Research Data Management (LCRDM) and NPOS Project F) for knowledge and expertise concerning all FAIR data-related aspects (including such aspects as data stewardship, ownership and ethical, legal and social implications (ELSI)) which will also offer general training courses, with the NDCE acting as coordinator. Here too, consider the specific characteristics of different fields of science and existing coordination structures, such as Health-RI in biomedical research. After a certain period, consideration can be given to appointing a coordinator. 6. Establish an international node for transparent liaison with European and international data organisations (EOSC, CODATA, GO FAIR, RDA, WDS), with the NDCE acting as coordinator. 7. Have the NDCE emphasise the importance of a national initiative concerning recognition and reward mechanisms (position paper: ‘Ruimte voor ieders talent’, November 2019) and contribute to this initiative from the perspective of open science. 8. Ask the NWO, ZonMw and organisations active in the field (VSNU, NFU, and VH) to plan possible improvements in data management funding (including data stewardship) for research. 9. Review the programme and the chosen governance structure at least once every two years by holding an international consultation procedure. 10. Within three years, develop a plan for organising the national data landscape over the longer term.

Remarks on the roadmap • The present study only addresses research data. In time, data in higher education (including educational materials) will also need to be taken into account. • Project F (Education and Training in Open Science and Data Stewardship) has yet to be completed. The results of this project will be incorporated into the final version of the roadmap at a later stage. • The NDCE must have certain powers if it is to coordinate the data landscape effectively. To play an effective role in optimising the data landscape, it must have some measure of control over the activities undertaken by parties active in that landscape. Those parties and their stakeholders should therefore give the NDCE a mandate to act. This study makes no recommendation as to the scope of that mandate, which will depend in part on whether the relevant parties are willing to relinquish a degree of autonomy in favour of a more effective larger whole. • In optimising the data landscape, the relationship with the other NPOS programme lines, specifically and Citizen Science, should be clear. • The transition to open science and the associated approach to FAIR data is not happening in isolation, nor is it occurring solely in the science sector. The relationship to education, the public sector and business is important because their data is crucial to research, and vice versa.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 6

• What must also be noted in this connection is the business community’s attempts to arrive at a general set of data-sharing agreements (the data coalition) and its efforts regarding artificial intelligence, in which data plays a key role.

NDCE activities The NDCE referred to in item 1 above could potentially focus on the following activities: • Harmonising, safeguarding and promoting (on a website) existing agreements, policy, good practices, protocols and standards • Coordinating international liaison: Making better use of international expertise and initiatives and harmonising national interfaces with these, including in specific scientific fields • Acting as a secondary source of expertise for the local Data Competence Centres (DCCs) at all knowledge institutions • Resolving remaining data issues (policy, ethical, legal, technical) • Launching and embedding a national curriculum for data stewards, coordinating training • Creating synergy in the national data services for scientific research, working towards a national catalogue of services • Encouraging recognition and reward mechanisms for data-sharing activities in scientific research, including working towards a recognised system of indicators (metrics)1 • Advocating the removal of constraints in privacy legislation in the interests of open science and FAIR data in particular.

NDCE Governance

The NDCE must act as a catalysing and unifying force, a role that is difficult to reconcile with the set of tasks that certain organisations in the field routinely carry out. We therefore recommend positioning the NDCE as a neutral/independent entity, and not assigning its tasks and duties to one of the parties that perform an operational role in the national data landscape. The NDCE would be accountable to the NPOS Steering Group and consult the FAIR Data Programme Board on important issues.

1 See also the draft RDA Recommendation (2020). FAIR Data Maturity Model. Specification and Guidelines 2020 (https://www.rd-alliance.org/groups/fair-data-maturity-model-wg)

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 7

Chapter 1. Introduction

Background In the first half of 2016, during the Netherlands’ EU Presidency, the Amsterdam Call for Action on Open Science2 recommended action items to accelerate the transition to open science. Dutch knowledge institutions subsequently published a National Plan for Open Science (NPOS) in 2017. One of the main aims of this Plan is to optimise the reuse of research data,3 based on the FAIR principles.4

After an initial period of investigation that involved setting up theme groups for the various targets, the NPOS was transformed into a project-driven programme. With so many parties in the Netherlands working towards the optimal reuse of research data, one of the programme projects (launched 1 May 2019) focused on mapping the Dutch data landscape and issuing recommendations for optimising this landscape.

Approach The project group assembled for this project consists of Maurice Bouwhuis (SURF), Ruben Kok (DTL), Pieter Schipper (NWO), and Melle de Vries (KNAW, Project Manager).

Between May and September 2019, the project group drafted a scoping paper5 based on a literature review and a number of preliminary discussions. The NPOS Steering Group approved the scoping paper in October 2019.

The project group held two panel meetings with both researchers and research support staff in October and November 2019 to gain further input on the data landscape.

The project group paid working visits to the research institutions between December 2019 and March 2020 and interviewed both researchers and research support staff there. The purpose of the visits was to survey local initiatives, good practices for further dissemination, and problems that might be tackled at national level.

Since then, there have been discussions with other national and international stakeholders. The national study is being conducted concurrently with similar initiatives in other European countries and with the study undertaken by the Landscape Working Group of the Executive Board of the European Open Science Cloud (EOSC).

The draft report was shared in a written consultation procedure in May 2020 with all (>250) of the individuals who were interviewed for this project or who play an important role in the national data landscape.

2 https://www.government.nl/documents/reports/2016/04/04/amsterdam-call-for-action-on-open-science 3 https://datasupport.researchdata.nl/en/start-the-course/i-a-birds-eye-view/data-jargon/research-data 4 https://www.force11.org/group/fairgroup/fairprinciples 5 https://www.openscience.nl/files/openscience/2019- 12/Note%20%E2%80%93%20Exploring%20the%20Dutch%20data%20landscape%20%20-%20approved%20.pdf

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 8

Project scope The scope of the project has been defined as follows: • The project primarily concerns data from publicly funded research but is not limited to publicly funded service providers. • The project concerns research data management (managing research data, metadata and the software required for this purpose), covering everything from data generation or collection to sustainable data availability, and not the IT that supports research (nor the development or configuration of software). This report therefore does not address basic IT support in terms of computing, networking and storage (making facilities available), but rather how these facilities are used to manage research data. • The project concerns organisations that are active in the Netherlands while also taking into account the importance of cross-border cooperation in research as well as the role of international service providers. • The FAIR principles are the starting point,6 but this study is mainly about the organisations that operate in this area, in conjunction with service optimisation. Which data researchers should store and for how long are questions being addressed by the relevant KNAW advisory committee.

Organisation within research institutions This study of the Dutch data landscape addresses the participation of institutional data stewards, data scientists and software engineers in research projects only with regard to the overall picture (to a certain extent). The same goes for local, primary support that research institutions have set up for this purpose. The knowledge institutions are expected to establish a form of primary support within their own organisation (consistent with the duty of care for data management identified in the Netherlands Code of Conduct for Research Integrity). The way in which they do so is beyond the remit of the present study.

Funding, governance and quality This study also does not address funding, governance or quality of service, although these aspects are mentioned in the final report’s recommendations.

Reader’s guide Chapter 2 sketches the Dutch, European and global setting. Chapter 3 discusses services that support the optimal reuse of data Chapter 4 lists the organisations active in this domain. Chapter 5 discusses data legislation and regulations. Chapter 6 describes how governance of the transition to FAIR data is currently organised in the Netherlands in terms of decision-making, consultative structures, and so on. Chapter 7 presents the project group’s findings and recommendations as well as a roadmap for the NPOS FAIR data programme line.

The appendices give the sources consulted, an overview of legislation and regulations and the outcomes of the panel meetings and the working visits + remarks and comments about the consultation process.

6 Some disciplines are still struggling to define what the FAIR principles mean for them.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 9

Chapter 2. National and international trends and developments

This chapter describes the setting, which has proved to be a dynamic one. What is the Dutch, European and global context in which the Dutch data landscape is taking shape? What trends and developments influence that context?

The Netherlands This first section highlights a number of Dutch initiatives undertaken by existing organisations in the field (listed in alphabetical order).

KNAW The Royal Netherlands Academy of Arts and Sciences (KNAW) organised an expert meeting on 4 December 2018 whose participants concluded that KNAW would do well to issue an advisory opinion clarifying and structuring the discussion surrounding the improvement of data storage and data availability for research purposes. Across the full breadth of science, the trend is towards making research data more readily available for reuse in accordance with the FAIR principles. KNAW wishes to clarify and structure discussions on data storage and availability, and to arrive at a coordinated, national approach. The KNAW Board has appointed Frank den Hollander to chair the committee preparing the advisory opinion. The coronavirus pandemic and other factors have led to some delay in the advisory process. KNAW expects to issue its advisory opinion in late 2020.

NFU The Netherlands Federation of University Medical Centres (NFU) runs a Data4lifesciences programme which aims to create an integrated research data infrastructure within, for, by and between the university medical centres (UMCs) and their partners. In Data4lifesciences, the UMCs are partnered with national programmes and organisations such as TraIT, BBMRI-NL, Parelsnoer, DTL, Mondriaan and SURF. The Data4lifesciences infrastructure includes not only technical facilities, but also, for example, an online catalogue of biobank specimens, a standard method for making data from an Electronic Health Record (EHR) available, privacy guidelines, and a manual detailing how researchers can best manage data (i.e. data stewardship). This infrastructure enables the UMCs involved in Data4lifesciences to make an important collective contribution to the larger, overarching health data infrastructure initiative, Health Research Infrastructure (Health-RI).

Health-RI, launched in January 2020, aims to create a single, integrated infrastructure for personalised medicine and health research. This infrastructure will link Dutch biobanks, population cohorts, data collections, image collections and experimental facilities and is closely aligned with the development of the European Open Science Cloud. Health-RI was initiated by BBMRI-NL2.0, EATRIS-NL and ELIXIR-NL, all three firmly embedded in pan- European research infrastructures (ESFRIs). It is being developed in cooperation with the NFU (Data4lifesciences), the Life Sciences and Health top sector (Health~Holland), the Dutch Techcentre for Life Sciences (DTL), the Netherlands Organisation for Health Research and Development (ZonMw), various ministries, universities, health fundraising organisations and the business community.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 10

In terms of biomedical and clinical research, Health-RI focuses on the reuse of data generated by the care process, subject to specific reusability and consent criteria. This illustrates how the availability of data obtained in the field is a determining factor in designing a data approach within a specific domain of science, in this case health research.

NWO The Dutch Research Council / Netherlands Organisation for Scientific Research (NWO) introduced a data management protocol in 2016 that adheres largely to the European Commission’s policy for Horizon2020 and the European Research Council (ERC). NWO requires a data management plan (DMP) to be drafted for each project that is awarded funding, which must describe: − whether data is being collected during the project, and which data; − where data is being stored during and after the project; − whether and in what way this data will be made available for reuse after the project ends.

NWO’s basic principle is for data to be ‘as open as possible, as closed as necessary’. As a minimum, NWO requires that the data underpinning publications stemming from NWO funding must be open and available for reuse. On 1 January 2020, NWO revised its own data policy to align it with the latest developments. The most important changes are: − the introduction of a new DMP form based on the Core Requirements for Data Management Plans developed by Science Europe, which represents the main public research or public research funding organisations in Europe. − allowing institutions to use their own templates. − stricter requirements for (ex post) evaluation of data management plans by institutional data stewards.

The introduction of NWO’s data policy in 2015 has proved to be a major impetus in the Netherlands. The fact that universities now invest in local university support offices and data stewardship offices is just one example of its influence.

Large-Scale Scientific Infrastructure NWO appointed the Permanent Committee for Large-Scale Scientific Infrastructure (PC- GWI) in July 2015, at the request of the Ministry of Education, Culture and Science. The Ministry has charged the Committee with the task of preparing a national strategy for investment in large-scale research facilities.

The Committee has surveyed the boards and directors of knowledge institutions to find out which large-scale infrastructures are already in place and which investment plans have been approved for the next five years. It approached universities, research institutes, the ‘TO2’ applied research institutes and state-owned institutes (including the Royal Netherlands Meteorological Institute/KNMI) and collected information on a total of 158 facilities.

There are many different types of large-scale research facilities. They may be highly specialised (expensive) equipment, such as particle accelerators, large telescopes or high field magnets. They may also be ‘virtual’ facilities, for example vast databases or computer networks. Research collections, such as a collection of soil samples or a university book collection, also qualify as large-scale research facilities.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 11

Examples of ‘virtual’ facilities from the national roadmap (NWO 2016): • BBMRI-NL: Biobanking and Biomolecular Research Infrastructure, the Netherlands • CIT Data Warehouse Infrastructure. After SURF, the ’s Information Technology Centre is the second most important national centre for high- performance computing (HPC) • CLARIAH: Common Lab Research Infrastructure for the Arts and Humanities • Delpher: The national infrastructure for full-text access and publication of Dutch- language works (books, newspapers and periodicals) • ELIXIR-NL: Life science data and bioinformatics infrastructure • ESSNeth: European Social Survey in The Netherlands • MESS: Advanced Multi-Disciplinary Facility for Measurement and Experimentation in the Social Sciences. The most important component is the Longitudinal Internet Studies for the Social sciences (LISS) panel • SHARE: Survey of Health, Ageing and Retirement in Europe • SURF: The national e-Infrastructure for research. Source: www.onderzoeksfaciliteiten.nl

Note: NWO was to have published a new Roadmap for Large-Scale Scientific Infrastructure in 2020. The coronavirus pandemic has led to a postponement until April 2021. Preparing the new roadmap has involved asking scientists and researchers to update the 2016 Landscape Inventory. Representatives of large-scale facilities were asked to register for inclusion in the new Landscape Inventory before the end of 2019.

The ICT Subcommittee appointed by the PC-GWI has issued an advisory report entitled Excellent research requires excellent infrastructure (Wyatt et al. 2017) in which NWO advises the new Dutch government (formed after the general elections that year) to make additional funds available ensuring that the Netherlands can retain its world-class national digital infrastructure and meet the Dutch research community’s growing need for such infrastructures.

In the 2017-2021 coalition agreement, the government pledged to invest an annual sum of €20 million in research digitalisation. In late 2019, NWO submitted an investment implementation plan for the digital infrastructure to the Minister of Education, Culture and Science (NWO 2019). This plan is a more detailed version of an earlier advisory report, Integrale aanpak voor digitalisering in de wetenschap (NWO 2018; available in Dutch only), addressing how the research IT infrastructure and associated expertise can be improved in the Netherlands. In this plan, NWO explains how the annual €20 million that the government had pledged to invest in the Dutch IT infrastructure for science will be used. Part of the funds will be spent on upgrading the national computing facilities and the rest on supporting the ongoing digitalisation of science, specifically by setting up a federated network of data competence centres (DCCs) in knowledge institutions.

The first call, for proposals supporting local DCCs at knowledge institutions, was issued in May 2020. A second call, for proposals supporting cross-institutional, thematic DCCs, is expected in spring 2021.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 12

SURF The national collaborative organisation for IT in Dutch education and research, SURF, has devoted much of the past year to overhauling its governance structure and to an internal reorganisation. In addition to its standard data services (specifically storage), SURF has begun coordinating and facilitating the federated network of DCCs (NWO 2019). SURF has also worked with a number of institutions on implementing internal pilot research data management (RDM) services. SURF further delivers a number of operational services that help institutions to meet their RDM needs. It deliberately positions these services to integrate seamlessly with institutional facilities, for example so that extra data storage becomes an ‘invisible extension’ of the institution’s own system, as it were (https://www.surf.nl/en/expand-your-storage-space-with-storage-scale-out). Various universities and other institutions make use of this service.

DCC implementation network In late 2019, the National Coordination Point Research Data Management (LCRDM) took steps to facilitate knowledge-building and knowledge-sharing between local DCCs by initiating an ‘implementation network’ (drawing on experience acquired within the GO FAIR context). The goals and tasks of this expertise network have been defined in a ‘positioning paper’ (LCRDM 2020). The network will be go live when NWO issues its call for proposals for local DCCs.

VH The Association of Universities of Applied Sciences (VH) helped to initiate the National Platform for Applied Sciences in late 2019. The applied sciences yield valuable knowledge and products for practitioners and education, but making this information more visible would foster a much broader dissemination of knowledge and products. This project unites associate professors, lecturer-researchers, research support staff, IT specialists and policymakers in an effort to improve the visibility of the applied sciences. The project will run until 2022.

VSNU The Association of Universities in the Netherlands (VSNU) has been working on behalf of the Dutch universities for several years (more recently in cooperation with NFU and NWO) to negotiate 100% open access publishing with major publishers. Back in 2015, VSNU asked SURF to set up a Research Data Management Coordination Centre, now known as the National Coordination Point Research Data Management (LCRDM).

Joint initiatives

National Open Science Programme At the start of 2017, ten national knowledge institutions (GO FAIR, the National Library of the Netherlands/KB, KNAW, NFU, NWO, PhD Candidates Network of the Netherlands/PNN, SURF, VH, VSNU, and ZonMw) signed the National Open Science Plan. In early 2020, this plan became the National Open Science Programme, with specific projects dispersed across three programme lines: Open Access, FAIR Data, and Citizen Science. One theme that cuts across all the programme lines is that of recognition and reward.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 13

The outcomes of projects E (Exploring the Dutch Data Landscape) and F (Education and Training in Open Science and Data Stewardship) will form the basis for the FAIR data roadmap (as from summer 2020).

NPOS Project F focuses on education and training in data stewardship in the Netherlands and makes recommendations concerning the data steward job descriptions and recognition of their competencies, including training. NPOS Project F builds on the recommendations of two previous projects addressing data stewardship education and training in the Netherlands (Scholtens et al. 2019; Verheul et al. 2019).

NPOS FAIR Data programme line The aim of this programme line is to establish appropriate facilities and satisfy other criteria for optimising the use/reuse of research data in the Netherlands, in line with European and other international trends and developments: • A consistent system for FAIR access to research data: Practical elaboration and implementation of FAIR criteria within technical and policy parameters. • Sustainable storage of research data for reuse: Research data stored in a consistent, reliable and sustainable manner. • Implementing standards to achieve interoperability of datasets across disciplines, where possible making use of existing interoperability between data management systems in organisations.

Recognition and reward In a joint position paper issued in 2019, Ruimte voor ieders talent (available in Dutch only), the Dutch public knowledge institutions and research funding bodies (VSNU, NFU, KNAW, NWO and ZonMw) have taken a step towards modernising the existing recognition and reward mechanisms in a manner that also encourages open science in all its facets.

Encouraging open science Specific steps must be taken to create more scope for open science. This new approach to science gives other people besides researchers the opportunity to collaborate on, contribute to and make use of research. This means, for example, that scientists share the results of their research more widely with the public, that they make their research results accessible, and that they involve the public in the research process itself (e.g. citizen science). Open science and the modernisation of recognition and reward mechanisms are inextricably linked. Researchers are asked to spend time and effort on something that does not automatically qualify as traditional scientific output, such as a publication, but that can have a major impact on society and science (e.g. the sharing of research data).

Europe In Europe, the most important development is the European Open Science Cloud (EOSC), part of the European Cloud Initiative – Building a competitive data and knowledge economy in Europe, launched in 2016. The EOSC federates existing and emerging data infrastructures to provide European science, industry and public authorities with a world- class data infrastructure to store and manage data, high-speed connectivity to transport data, and powerful high-performance computers to process data.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 14

The EOSC will offer 1.7 million European researchers and 70 million professionals in science, technology, the humanities and social sciences a virtual environment with open and seamless services for storage, management, analysis and reuse of research data, across borders and scientific disciplines by federating existing scientific data infrastructures, currently dispersed across disciplines and the EU Member States.

Starting in 2018, work has continued on building the EOSC with the support of an Executive Board (with NCOS Karel Luyben as co-chair), a Governance Board (consisting of representatives from the member countries, including Santje van Londen from the Dutch Ministry of Education, Culture and Science) and a Stakeholders Forum.

In mid-2019, the Executive Board drew up a Strategic Implementation Plan (Jones et al. 2019). The plan provides for the creation of various Working Groups, six of which commenced their work in 2019. They are: Architecture: This Working Group (with Erik van den Bergh of Wageningen University & Research and Hylke Koers of SURF representing the Netherlands) is studying how the national research infrastructures can support the technical framework enabling and sustaining the EOSC. In terms of data exchange, the proposal is to establish a single interoperability framework, in line with national efforts such as GO FAIR and other technical FAIR initiatives. Maintaining persistent identifiers for data is also seen as essential for the EOSC architecture; the Netherlands leads the way in this respect, with various libraries making active use of DataCite DOIs and with SURF’s EPIC PID options. In addition to data exchange, the Working Group regards the Authentication and Authorisation Infrastructure (AAI) as essential. The Netherlands is fully prepared because SURF’s SURFconext offers an AARC blueprint-compatible national AAI system used by virtually all Dutch institutions. The development of SRAM will flesh out this service, among other things to improve data rights management within cross-institutional partnerships. FAIR: This Working Group (with Rob Hooft of DTL representing the Netherlands) is working on recommendations regarding the implementation of FAIR practices within the EOSC. It is addressing interoperability, the best structure for persistent identifiers (PIDs), indicators (metrics), certification and user practices that allow for differences between scientific disciplines. Although the Netherlands is a step ahead of other countries in this regard, the expectation is that the Working Group’s final recommendations will serve as a catalyst for activities in the Dutch data landscape and, in particular, ensure that the Netherlands adheres as closely as possible to agreements made within and between disciplines regarding the application of the FAIR principles. It is important that the knowledge gained in this Working Group is disseminated within the Netherlands to the DCCs and data stewards at the institutions. Landscape: The aim of this Working Group (with Ronald Stolk of the University of Groningen representing the Netherlands) is to map out the relevant national infrastructures for incorporation into the EOSC. The WG held a virtual meeting on 27 and 28 April 2020 during which it discussed a draft report (dated 17 April 2020). A number of European projects have already provided a broad survey of the Dutch situation.7,8 In the remaining months of 2020, the WG will complete the survey, analyse the results and make recommendations. Rules of Participation: This Working Group (with Wouter Los of the representing the Netherlands) is examining the Rules of Participation governing

7 http://www.eosc-synergy.eu/europe/netherlands/ 8 https://www.openaire.eu/item/netherlands

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 15 all EOSC transactions. The general idea is that the EOSC will not become a monolithic organisation but adhere to a federated model. The WG is considering what needs to be done to ensure homogeneity. Will this require adopting such principles as ‘open and transparent’, or recommendations and guidelines, or more technical operational conditions, or mandatory rules? And to whom do the ‘Rules’ apply and are there different ‘Rules’ for data providers, digital service providers, and the EOSC itself? Will public and private service providers be lumped together or not? What about each party’s rights, obligations and liability? The Dutch position is that market forces within the scientific community should be powerful enough to ensure the visibility of providers and the quality of data and services. After all, few users will be interested in data sets that are hard to access because they do not comply with FAIR data management practices, or that contain data of questionable quality. If enforcement is by a set of ‘Rules’ imposed from the top down, the question is who will check compliance and who will pay for these checks. An alternative is to make data publications subject to peer review. The draft ‘Rules’ currently under discussion lay down general principles but beyond that are limited to such necessary requirements as PIDs, machine-readable metadata, legal conditions and possible costs, accessibility and use, and quality of service. Skills & Training: This Working Group (with Celia van Gelder of DTL representing the Netherlands) began its work in February 2020 and is focusing on EOSC competencies and capabilities, whereas actual capacity-building is regarded as a task for the member countries and the institutes. The WG is addressing four themes: (a) EOSC minimum skills set; (b) landscape analysis of European, thematic and national competence centres; (c) recommendations for positioning EOSC skills/training within national digital skills roadmaps and strategies; (d) specifications for an EOSC training catalogue. The initial impression is that the Netherlands is at the forefront when it comes to developing data stewardship competencies and skills and organising research support desks and competence centres at institutions. In addition, to the best of our knowledge, the Netherlands is the only country that plans to issue a national call for proposals for a network of institutional and inter-institutional competence centres. Sustainability: This Working Group (with Franciska de Jong of Utrecht University representing the Netherlands) will make recommendations concerning the implementation of an operational, scalable and sustainable EOSC federation after 2020. It recognises the enormous variety of national research data instruments and policies. Proposals for ‘business models’ are in keeping with the idea that an initial Minimum Viable EOSC should focus on the needs of researchers in the public sector who wish to make use of or contribute to open data collections. In part, these proposals concern incentives and rewards to encourage researchers to contribute to and participate in a culture of sharing research data, and more generally, to adhere to the FAIR data principles. The Netherlands has already taken several steps in this direction. It is important to strike the right balance between having a local and national infrastructure that furnishes support and developing thematic (= domain-specific) services. This balance is important to avoid interfering with dynamic scientific processes that are, in principle, regulated internationally.

All Working Groups must have completed their work by the end of 2020.

In the meantime, the Executive Board is working to establish a partnership between the European Commission and a new Association that will embed the EOSC for the longer term.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 16

A few other relevant European projects: • FREYA. This three-year project began in December 2017. It aims to build a persistent identifier (PID) infrastructure as a core element of open science. One of the project objectives is to improve the visibility of research data by building on existing PID systems, such as Crossref, DataCite, ORCID and identifiers.org. DANS represents the Netherlands in the project organisation. • EOSC Synergy. This three-year project commenced in autumn 2019 and aims to expand EOSC coordination at national level in the member countries, i.e. Czechia, Germany, the Netherlands, Poland, Portugal, Slovakia, Spain and the United Kingdom. DANS represents the Netherlands in the project organisation (Peter Doorn). This project also involves a landscape survey, in particular to see which national services can join the EOSC. • EOSC Pillar. This three-year project started in July 2019 and aims to coordinate and harmonise the national efforts of Austria, Belgium, France, Germany and Italy in aligning with and implementing the EOSC. • EOSC Nordic. This three-year project began in September 2019 and aims to facilitate EOSC initiatives in the Nordic and Baltic States and to achieve synergies at policy and service level. • NI4OS. This project aims to develop and build national open science initiatives jointly in the countries of south-east Europe.

In its report on the national EOSC nodes, the e-Infrastructure Reflection Group concludes that ‘the (organisation of the) national e-Infrastructure landscape varies considerably between the countries’ (E-IRG 2019, 21) and consequently recommends that ‘[m]embers [sic] states and associated countries should continue to increase the level of coordination between and consolidation of the various national players on e-Infrastructure provisioning’ (E-IRG 2019, 22).

Members [sic] states and associated countries should continue to increase the level of coordination between and consolidation of the various national players on e-Infrastructure provisioning.

For a European landscape survey and a more detailed review of all infrastructures and initiatives undertaken at EU level, see the study by the EOSC Executive Board’s Landscape Working Group. A draft version was discussed during a virtual meeting on 27/28 April 2020. It makes clear just how many generic and domain-specific infrastructures are available. The report distinguishes between computing infrastructures (high-performance computing/high- throughput computing), e-infrastructures (networks, computing facilities and data centres, such as those offered by SURF in the Netherlands), data infrastructures (for managing and sharing research data, including data repositories), thematic infrastructures (research infrastructures, including RIs and ESFRIs, which in many cases include a Dutch node), and the European Intergovernmental Research Organisations (including CERN, EMBL, and ESA). The WG’s final report is scheduled to be issued in autumn 2020.

The All European Academies Working Group E-Humanities offers FAIR guidance to researchers in the humanities. Two of its recommendations are (ALLEA 2020, 29): • To ensure the best possible stewardship of your data, choose to deposit it in a digital repository that is certified by a recognised standard such as the CoreTrustSeal. The Registry of Research Data Repositories (re3data) provides a good starting point, noting

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 17

disciplines, standards, content types, certification status and more. FAIRsharing (manually curated information on standards, databases, policies and collections) allows you to search databases by subject, and includes entries tagged ‘Humanities and Social Sciences’. • Use disciplinary repositories where they exist, as they are more likely to be developed around domain expertise, disciplinary practices and community-based standards, which will promote the findability, accessibility, interoperability and ultimately the reuse and value of your data. The level of curation available in a repository is key to data quality and reusability.

In the report’s conclusion, the Working Group states: ‘The present recommendations therefore join other voices in encouraging research institutions, policymakers and funders to fundamentally review their research support services, as well as their definitions of the roles and activities that feed into research under this new paradigm’ (ALLEA 2020, 37).

Comparing the Netherlands with other European countries It is beyond the scope of this project to produce an in-depth comparison between the Netherlands and other countries, in part because of differences in the funding and organisation of science across countries. However, we have tried to learn from similar initiatives in neighbouring countries.

Published reports reveal the following: • A study carried out in Belgium (Flanders) in 2018 looked at how to improve research data management and the role of the Flemish government in this regard. The study also considered what lessons can be learned from Austria, France, Ireland and the Netherlands. It made an urgent call for ‘ambition, leadership and central coordination’ (Fikkers et al. 2018, 22). At the end of 2019, the Flemish government adopted an open science policy and established the Flemish Open Science Board. FWO, Flanders’ research funding body, will play a coordinating role (Flemish government, 2019). • The United Kingdom also surveyed its data landscape in 2017/2018. Once again, the recommendations refer to active coordination and cooperation. There is a need for ‘coordinating mechanisms and incentives to promote cooperation across a federated national ecosystem of provision, with links also to international services and initiatives. In short, there is a need for a strategy – and a governance structure – to build on strengths, remedy weaknesses, and fill gaps, including community-led initiatives’ (ORDT 2017, 52). One key recommendation is that UK Research and Innovation, the UK’s national research funding body, ‘should establish for itself a coordinating role – while taking full account of the critical importance of active leadership from other stakeholders including research organisations, funders, specialist service providers, publishers, learned societies, and senior representatives of the research community – in overseeing the development of ORD policies, infrastructure and services’ (ORDT 2018, 32). • Denmark’s Ministry of Science and Higher Education had an analysis carried out in 2018. ‘The analysis shows that many of the elements needed to realise FAIR data already exist, but they are fragmented and dispersed. Coordination and collaboration are crucial for developing a common approach and understanding of the FAIR data principles. …A FAIR data solution should build on local solutions with a cohesive national superstructure – not a “one-size-fits-all” solution, but a solution that can grow based on local, academic and research environments’ (Oxford 2018, 5).

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 18

• In Germany, a decision was taken in late 2018 to establish a national data infrastructure. For the 2019-2028 funding period, the federal government and the Länder have made €90 million available per annum to set up a national research data network, and this sum is explicitly not intended for hardware etc. ‘The aim of the national research data infrastructure (NFDI) is to systematically manage scientific and research data, provide long-term data storage, backup and accessibility, and network the data both nationally and internationally. The NFDI will bring multiple stakeholders together in a coordinated network of consortia tasked with providing science-driven data services to research communities.’ Three calls or proposals are planned for 2020, 2021 and 2022. The first call elicited 22 consortium applications involving a total of 142 participating institutions (www.dfg.de). • In Sweden, the Swedish Research Council delivered a national roadmap in this area in 2019. The roadmap also describes the national landscape. Key words are coordination, cooperation and the clarification of roles and responsibilities. The roadmap also highlights the importance of national coordination with a view to liaising with the EOSC. ‘Based on the observation that the current fragmentation is becoming increasingly challenging and is probably not cost efficient, the panel recommends that now is the time to consider an encompassing national e-infrastructure coordination and even organizational mergers of e-infrastructures’ (SRC 2019, 30). • The Nordic countries work together in the Nordic e-Infrastructure Collaboration (NeIC) and intend to address the challenge of open science in that context. ‘The Nordic countries are particularly well suited for collaboration among each other due to social and cultural similarities. Also, as the countries are individually small, unifying efforts in science and technology to realise common undertakings will generally result in a better end-product and greater impact in the international arena. Finally, a Nordic-wide collaboration reduces the risks of duplication of effort and therefore promotes a more cost-efficient R&D segment within the Nordics. …Development of the concept of open science is still in its infancy and it will require significant effort and funding to fully realise the potential of aligning research practices in modern science with the capabilities offered by semantic metadata modelling, linked data and knowledge graphs. To get there, it is necessary to build the essential infrastructures to support this vision’ (Jaunsen 2018, 7, 37).

A European strategy for data The European Commission presented a data strategy in early 2020 involving an open consultation procedure that will end in late May 2020.

The Commission confirms its EOSC ambitions in the strategy. ‘The EU will continue to make data resulting from its research and deployment programmes available in line with the principle “as open as possible, as closed as necessary”, and will continue to facilitate discovery, sharing of, access to and reuse of data and services by researchers through the European Open Science Cloud (EOSC)’ (EC 2020, 15).

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 19

Worldwide

CODATA The Committee on Data for Science and Technology (CODATA) is an interdisciplinary committee founded in 1966 which is now part of the International Science Council (ISC). CODATA promotes global collaboration on improving the availability and usability of data for all areas of research. CODATA also works to advance the interoperability and use/reuse of such data. CODATA organises conferences and workshops. Research Data Netherlands (RD-NL) represents the Netherlands as a member.

GO FAIR GO FAIR is a bottom-up, stakeholder-driven initiative aimed at implementing the FAIR data principles worldwide and creating an ‘internet of FAIR data & services’. GO FAIR is a Dutch initiative that began as a DTL spin-off intended to help implement the FAIR approach beyond the life sciences. The GO FAIR Foundation, which has its registered office in Leiden, has set up an International Support & Coordination Office (GFISCO) that offers individuals, institutions and organisations an open and inclusive ecosystem for collaborating in implementation networks (INs). France, Germany and the Netherlands have taken the lead in this emerging international network. INs are now active in various disciplines.

Research Data Alliance The Research Data Alliance (RDA) is a UK-based foundation, established in 2013 by the European Commission, the US Government’s National Science Foundation and National Institute of Standards and Technology, and the Australian Department of Innovation with the aim of building the social and technical infrastructure needed to share and reuse data. The Netherlands is represented by Ingrid Dillo (DANS), who co-chairs the RDA Council.

There are more than 90 Working Groups and Interest Groups addressing various generic and discipline-specific topics. Dutch experts are involved in several RDA WGs. The RDA is open to both individuals and organisations. It has almost 50 member organisations and more than 10,000 individual members from 144 countries. DANS is the member organisation representing the Netherlands as well as the RDA ‘national node’. The purpose of the RDA’s national nodes is to connect local researchers and data experts working with research data and to align local activities with broader RDA activities, for example through the RDA WGs and IGs.

World Data System Like CODATA, World Data System (WDS) falls under the International Science Council (ICS). Its mission is ‘to support the ISC’s vision by promoting long-term stewardship of, and universal and equitable access to, quality-assured scientific data and data services, products, and information across all disciplines in the Natural and Social Sciences, and the Humanities’. WDS organises working groups, provides training and certifications, and publishes reports. It has more than 80 regular members (data-related organisations), including DANS.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 20

Data Together CODATA, GO FAIR, RDA, and WDS coordinate and cooperate closely at international level. They are working on an international and complementary approach to research data under the ‘Data Together’ banner. It may be advisable for these four organisations to cooperate functionally at national level as well.

OECD The Organisation for Economic Cooperation and Development (OECD) has 36 member countries and was founded to foster discussion, study and coordination of social and economic policy. Member countries seek to solve common problems and coordinate international policy.

The OECD already published a set of principles and guidelines for access to research data from public funding back in 2007. It is currently updating these guidelines. What is in any case important is for member countries to ensure good data governance (OECD 2020).

UNESCO The mission of the United Nations Educational, Scientific and Cultural Organization (UNESCO) is to build peace, reduce poverty and promote sustainable development and intercultural dialogue through education, science, culture and communication. Every UNESCO Member State has a UNESCO National Commission. The Netherlands’ UNESCO National Commission was established in 1947 by Royal Decree. The Commission has a maximum of five members who are appointed by the Minister of Education, Culture and Science. The members volunteer their expertise in one or more of UNESCO’s four areas of activity: education, science, culture and communication.

At the most recent General Conference in November 2019, the UNESCO Member States resolved to develop new global standards for open science. It is UNESCO’s aim to have the new standards ready within two years so that they can be presented to the Member States at the next General Conference in November 2021. UNESCO’s first step in designing the global standards is to ask stakeholders – researchers, science organisations, experts and users – for their suggestions, opinions and information. UNESCO invites everyone who wants to help think about a set of global open science standards to complete its survey, available until late April 2020. Further consultation rounds will follow later this year. National consultation sessions will be organised in the various Member States as well, ensuring that the process of drafting the standards is truly global and inclusive.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 21

Chapter 3. Services

In the context of this report, ‘services’ refers to all activities that Dutch organisations undertake to deliver added value for the national data landscape. These may be services rendered against payment, services provided in the public interest, or activities stemming from an organisation’s main mission. Customers may be institutions that host research, individual researchers or any stakeholders with an interest in research data.

Which FAIR data services are currently available in the Dutch data landscape, bearing in mind the project scope defined in Chapter 1?

The scoping paper (September 2019) offered a preliminary list of services on offer in the Dutch data landscape: • IT infrastructure services (access to/use of) • Storage services (short-term) • Archiving services (long-term) • Certification support • Knowledge-generation and development of standards, ontologies and guidelines • Outreach, consultancy, education and training • Coordination (policy content, financial, technical, etc.) o policy-making, defining funding terms and conditions, setting standards o supporting coordination and cooperation within a domain o liaising with international initiatives – as a node, both within and between domains

To make this list manageable and avoid getting lost in the details, we have divided our report into the following overall categories of services:

A. Coordination and networks

B. Knowledge development and dissemination

C. Repositories and other facilities

D. Support and training

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 22

A. Coordination and networks

These are services provided indirectly. They do not support research data management directly, but help optimise the use and reuse of research data by delivering coordination mechanisms and by building networks.

There is no official, centralised coordination of services and facilities in the Netherlands. Several different organisations take responsibility for certain aspects. For example, the UKB (an alliance of university libraries and the National Library of the Netherlands) cooperates by distributing, concentrating and connecting expertise in national and international networks, but it restricts itself to the academic world, in which it is well connected. Networks, partnerships and joint projects have been set up within various domains of science in which standards are coordinated and implemented internally. Examples include DTL, Health-RI, CLARIAH and ODISSEI.

The Social Science and Humanities Digital Infrastructure Platform (PDI-SSH, https://pdi- ssh.nl/nl/home/) was established in early 2020. PDI-SSH was set up by the Social Sciences and Humanities Council and digital infrastructures in this domain (CLARIAH and ODISSEI). PDI-SSH fosters cooperation and coordination between digital infrastructure facilities, whether that involves existing or new initiatives, including coordination with DANS, the eScience Center and SURF, for example.

SURF has set up the Research Data Management-Technology Expertise Centre (RDM- TEC), which unites virtually all of the Dutch universities in an effort to coordinate the supply of and demand for RDM services (for now focusing on iRODS as an enabling federated technology). One recent initiative is the scaling up of YODA (an RDM system developed by Utrecht University) to national level.

NWO’s investment implementation plan for the digital infrastructure (2019) accords SURF a critical technical role in facilitating and coordinating a safe, federated system linking the DCCs. SURF will support the local and thematic DCCs on two fronts: • First of all, in technical terms, it will facilitate and coordinate the safe, federated system linking the DCCs. These are technologies and tools that will have already passed through the early stage of innovation and become federated in the roll-out stage. They will include support for facilities that the institutions make available to one another, with SURF playing a coordinating role. • Second, SURF will use its expertise for the benefit of knowledge-sharing and coordination. It will support the DCCs in their policymaking, bring experts together and assist research support staff.

SURF’s LCRDM is a network of more than 200 professionals whose daily work involves addressing policy and implementation issues related to research data management at their knowledge institution. Together, they tackle topics and develop proposals or products for knowledge-sharing. These products are not automatically adopted by the institutions or incorporated into national policy, however.

It is worth mentioning the communities organised from the bottom-up in this connection: • The Netherlands Research Software Engineer community (NL-RSE). Its purpose is ‘to bring together the community of people writing and contributing to research software

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 23

from Dutch universities, knowledge institutes, companies and other organizations to share knowledge, to organize meetings, and raise awareness for the scientific recognition of research software.’ • The universities’ various open science communities.

Conclusion: The Netherlands has several (sometimes overlapping) networks and there is coordination on a number of (some based on specific areas of expertise, others within specified domains).

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 24

B. Knowledge development and dissemination

These services involve developing and disseminating the knowledge required to optimise data reuse.

Knowledge development Knowledge development usually implies nationally or internationally funded projects. One current example is the Horizon2020 project FAIRsFAIR, launched in March 2019 and coordinated by DANS. The project aims to supply practical solutions for using the FAIR data principles throughout the research life cycle.

Mention must also be made of the knowledge development activities for research data management at the international organisations mentioned in Chapter 2, such as CODATA and the Research Data Alliance.

DANS has its own Research and Innovation department, which is making a major contribution to the projects referred to above.

In most cases, there is no guarantee as to how these initiatives and projects will contribute to the Netherlands’ open science strategy.

Knowledge dissemination The projects referred to above undertake their own dissemination activities with a view to promoting the optimal reuse of data. There are also the following initiatives (the list is not exhaustive): • Most knowledge institutions offer researchers and research support staff a more or less extensive range of guidance documents (on their website), training courses and workshops. • Research Data Netherlands has an introductory training course, Essentials 4 Data Support, for staff who support researchers in storing, managing, archiving and sharing their research data. • LCRDM provides support materials but also has a pool van experts who can be called in to assist. • DTL offers various background materials and (online) training courses for its own network and organises knowledge transfer between experts and users in the life sciences, including industry. • The Copyright Information Network provides information on its website about data rights. • Research funding bodies NWO and ZonMw also provide background information.

Although there is considerable overlap between the above initiatives, there is no guarantee that all the information provided is up to date, consistent and correct.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 25

C. Repositories and other facilities

Since it was launched in 2012, re3data (www.re3data.org) has become a popular source of information about research data repositories. It indexes and provides comprehensive information on nearly 2500 repositories worldwide. Universities and research centres register their institutional, disciplinary and interdisciplinary repositories in re3data, allowing researchers, funding bodies, publishers and research institutions to select suitable repositories for research data storage and searches. re3data is run by DataCite and is hosted and managed by the library of the Karlsruher Institut für Technologie.

We consulted the website on 19 March 2020 and at that time found 56 Dutch repositories in the register. In the list below, the 16 certified repositories are shaded green. The four columns on the righthand side indicate the domains in which the repository is available. • HSS – Humanities and Social Sciences • LS – Life Sciences • NS – Natural Sciences • ES – Engineering Sciences It should be noted that almost half of the repositories, 26 in number, do not use unique identifiers, meaning that they do not comply with the F of FAIR..

HSS LS NS ES 4TU.Centre for Data archive for storing and reusing research data in the X X X Research Data technical sciences. AlgaeBase AlgaeBase is a database of information on algae that includes X terrestrial, marine and freshwater organisms. Amsterdam Cohort The Amsterdam cohort study on HIV infection and AIDS among X Studies homosexual men, expanded to include drug users. CancerData.org The CancerData site is an effort of the Medical Informatics and x Knowledge Engineering team of Maastro Clinic, Maastricht. CARIBIC CARIBIC is a scientific project to study and monitor important X chemical and physical processes in the Earth´s atmosphere. CLAPOP CLAPOP is the portal of the Dutch CLARIN community. X X CLARIN INT Resources that are relevant to the lexicological study of the X Portal Dutch language and on resources relevant for research in and development of language and speech technology. CLARIN-ERIC CLARIN has a focus on language resources (data and tools). It X X is being implemented and improved at leading institutions in a large and growing number of European countries data.enanomapper A substance database for nanomaterial safety information X X DataverseNL Online storage, sharing and registration of research data, during X X X X the research period and up to ten years after its completion. DHS Data Access The DNB Household Survey (DHS) supplies longitudinal data to X the international academic community, with a focus on the psychological and economic aspects of financial behavior. Donders The repository of the Donders Institute for Brain, Cognition and X Repository9 Behaviour at the Radboud University. e-Depot of the The National Archives of the Netherlands holds over 3.5 million X National Archives records that have been created by the central government, of the Netherlands organisations and individuals and are of national significance. Many records relate to the colonial and trading history of the Netherlands in the period from 1600 to 1975. eartH2Observe EartH2Observe brings together the findings from European FP X projects. It will integrate available global earth observations.

9 As of 1 September 2020, this will be the Radboud Data Repository, a university-wide repository.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 26

HSS LS NS ES EASY Online archiving system with access to thousands of datasets in X X X the humanities, the social sciences and other disciplines. EDGAR The Emissions Database for Global Atmospheric Research X provides independent estimates of the global anthropogenic emissions and emission trends. EIDA EIDA is a distributed data centre established to (a) securely X archive seismic waveform data, and (b) provide transparent access to the archives by geosciences research communities. eLaborate eLaborate is an online work environment in which scholars can X upload scans, transcribe and annotate text, and publish the results as an online text edition which is freely available. European Climate Presented is information on changes in weather and climate X Assessment & extremes, as well as the daily dataset needed to monitor and Dataset project analyse these extremes. Huygens ING Huygens ING intends to open up old and inaccessible sources, X and to understand them better. Huygens ING aims to publish digital sources and data responsibly and with care. ICOS Carbon Data portal of the Integrated Carbon Observation System. It X Portal provides observational data from the state of the carbon cycle in Europe and the world. ICTWSS database The ICTWSS database covers four key elements of modern X political economies: trade unionism, wage setting, state intervention and social pacts. IISH Dataverse The IISH Dataverse contains micro-, meso-, and macro-level X datasets on social and economic history. ISRIC- World Soil ISRIC has a mission to serve the international community with X X Information information about the world’s soil resources to help addressing major global issues. ISRIC Soil data.isric.org is the central location for searching and X X Metadata downloading soil data bases/layers from around the world. Catalogue KNMI Climate The KNMI Climate Explorer is a web application to analyse X Explorer climate data statistically. KNMI Data Centre The KNMI Data Centre (KDC) provides access to weather, X climate and seismological datasets of KNMI. Land Portal The Land Portal collects metadata from statistical datasets X X X relating to land, peer-reviewed articles and other research reports, national laws and policies, grey literature but also news, blogs and organization profiles. Leiden Open LOVD portal provides LOVD software and access to a list of X Variation worldwide LOVD applications through Locus Specific Database Database list and List of Public LOVD installations. LISS Panel The LISS panel (Longitudinal Internet Studies for the Social X sciences) consists of 4500 households, comprising 7000 individuals. Longitudinal Aging LASA focuses on, physical, emotional, cognitive and social X Study Amsterdam functioning in late life, the connections between these aspects, and the changes that occur in the course of time Maddison Project Maddison’s work contains the Project Dataset with estimates of X GDP per capita for all countries in the world between 1820 and 2010 in a format amenable to analysis in R. Meertens Institute The focus is on resources relevant for the study of function, X Collections meaning and coherence of cultural expressions and resources relevant for the study of language variation within (Dutch) MycoBank MycoBank is a service to the mycological and scientific society X by documenting mycological nomenclatural novelties (new names and combinations) and associated data.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 27

HSS LS NS ES Open Rotterdam To accommodate a wider scope of ophthalmic data, we x Glaucoma Imaging launched our new Rotterdam Ophthalmic Data Repository. Data Sets  This portal has a successor in RODR (see below) OpenML OpenML is an open ecosystem for machine learning. OpenML X X is a platform to share detailed experimental results with the community at large and organize them for future reuse. Penn World Table PWT version 9.0 is a database with information on relative X 9.0 levels of income, output, input and productivity, covering 182 countries between 1950 and 2014. PROFILES Patient Reported Outcomes Following Initial treatment and X Registry Long term Evaluation of Survivorship is a registry for the study of the impact of cancer and its treatment Pseudobase Since the first discovery of RNA pseudoknots more and many X more pseudoknots have been found. However, not all of those pseudoknot data are easy to trace. Rotterdam The Rotterdam Ophthalmic Data Repository contains data sets X Ophthalmic Data related to ophthalmology that the Rotterdam Ophthalmic Repository Institute has made freely available for researchers worldwide. SeaDataNet SeaDataNet is a standardized system for managing the large X and diverse data sets collected by the oceanographic fleets and the automatic observation systems. Sound and Vision Sound and Vision has one of the largest audiovisual archives in X Europe. The institute manages over 70 percent of the Dutch audiovisual heritage. SACA&D Southeast Asian Climate Assessment & Dataset is focusing on X the digitization and use of high-resolution historical climate data from Indonesia and other Southeast Asian countries STITCH 4.0 The Database explores the interactions of chemicals and X proteins. SURF Data The SURF Data Repository allows researchers to store, X X X X Repository annotate and publish research datasets of any size to ensure long-term preservation and availability of their data. SHARE The Survey of Health, Ageing and Retirement in Europe is a X X multidisciplinary and cross-national panel database of micro data on health, socio-economic status and social networks. ISO Data Archive The Infrared Space Observatory (ISO) is designed to provide X detailed infrared properties of selected Galactic and extragalactic sources. The Language The Language Archive is storing a lot of unique material, from a X Archive large variety of languages worldwide, which is recorded and analyzed by researchers from different linguistic disciplines. Tilburg University TiU Dataverse is the central online repository for research data X Dataverse at Tilburg University. TRAILS TRAILS is a prospective cohort study, with young people from X X the Northern part of the Netherlands. Information that spans the total period from preadolescence up until young adulthood. TreeBASE TreeBASE is a repository of phylogenetic information, X specifically user-submitted phylogenetic trees and the data used to generate them. UvA / AUAS The University of Amsterdam and the Amsterdam University of X X X X figshare Applied Sciences cooperate to connect academic research with the insights and experiences from professional practice. World Christian The World Christian Database provides comprehensive X Database statistical information on world religions, Christian denominations, and people groups. World Religion The World Religion Database contains detailed statistics on X Database religious affiliation for every country of the world.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 28

HSS LS NS ES WorldClim – WorldClim is a set of global climate layers (climate grids) with a X Global Climate spatial resolution of about 1 square kilometer. The data can be Data used for mapping and spatial modeling in a GIS or with other computer programs. YODA Yoda publishes research data on behalf of researchers that are X X X X affiliated with Utrecht University, its research institutes and consortia where it acts as a coordinating body. 28 24 22 8

If we compare the list above with the outdated ‘Mapping of the European Research Infrastructure Landscape’ (portal.meril.eu), which lists 57 research infrastructures in the Netherlands, and limit ourselves to the categories ‘Data Archives, Data Repositories and Collections’ and ‘Research Data Service Facilities’, we can add the following:

• Historical Sample of the Netherlands (HSN) (but appears in the IISH repository) • Naturalis Biodiversity Center • Cultural Heritage Agency (part of the Ministry of Education, Culture and Science)

In addition to repositories, the national data landscape is populated by various parties that deliver data facilities, for example to process and enhance data. Some of these facilities are aimed at a limited target group, others are available to all researchers in the Netherlands. For example, DANS’s services focus primarily on the social sciences and humanities, and SURF’s services are intended for researchers and institutions in the Dutch higher education and research sector. Statistics Netherlands’ Microdata catalogue also merits mentioning.

In addition to these bodies, there are collaborative organisations that deliver data services to a specific community or scientific domain in the form of Large-Scale Scientific Infrastructures. These include several popular facilities, such as Code Ocean, Dryad, Figshare, GitHub, Mendeley, Open Science Framework, and Zenodo. It has proved difficult to obtain an exhaustive overview within the scope of this project, let alone examine the extent to which Dutch researchers use these facilities.

A tool such as Mendeley Data Monitor and Search services indexes more than 2000 data repositories worldwide, allowing researchers to search datasets. In addition, datasets are linked to knowledge institutions and countries, making it possible to produce lists by country and by knowledge institution.

National data portal The Netherland’s National Data Portal (data.overheid.nl) lists the data made available by the Dutch government. • More than 150 government organisations have published data on data.overheid.nl. • Virtually all data is updated every night. • The metadata standard for data sharing is W3C’s Data Catalog Vocabulary, DCAT. • Datasets can be published on CKAN (an open source data portal platform). The portal is maintained by the Knowledge and Resource Centre for Official Publications (KOOP) by order of the Ministry of the Interior and Kingdom Relations.

Note that the available public data is subject to significant constraints within the context of research. Most of the data is aggregated, and the source data (which makes correlations

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 29

and comparisons over time possible) is missing (for reasons of privacy and business confidentiality). See the advisory report Hergebruik van publieke data: Meer wetenschap en beter beleid (KNAW 2019a, 24).

Research Data Exchange The University of Amsterdam has started to flesh out the concept of data sovereignty, i.e. the possibility of sharing data under certain conditions.

Research Data Exchange (RDX) allows universities to share their research data in a responsible and secure manner. RDX builds on existing infrastructures, such as university repositories and disciplinary data platforms. Using this concept, knowledge institutions can ensure that much of the data for which they are responsible is archived in accordance with the FAIR principles. But that does not mean that the data is available to one and all.

Obstacles to data sharing: • ownership, authorship and copyright cannot be properly managed; • in the case of personal data, the GDPR and limited informed consent may impose restrictions; • conflicting laws and regulations are in force (GDPR, DTM, PSI); • an embargo period can be imposed; • retention periods may be limited; • there are restrictions on data use (purpose limitations, no dual use, no resell, etc.); • universities fear a commercial lock-in if they disclose all their data without any constraints. RDX makes it possible to share data in a responsible and secure manner. Under RDX, access to data can be regulated automatically, conditions of access can be legally defined, and compliance can be technically monitored and enforced.

RDX is an infrastructure that makes reliable research data transport possible while ensuring data sovereignty. Data holders determine what data can be shared with whom and under which conditions, including what processed data may be used after giving external algorithms access. RDX is not a data storage or data management facility. RDX is neutral regarding the data transported over the networks and only provides technical facilities for managing data sharing.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 30

D. Support and training

To equip researchers and data stewards to manage and reuse research data properly, support and training are essential and must build on the knowledge development and dissemination efforts discussed previously.

It goes without saying that this support and training should preferably be offered in the vicinity of the researchers themselves, by persons who understand the nature and context of the research in question. The term data stewardship10 is relevant in this regard. NPOS Project F (Education and Training in Open Science and Data Stewardship) is building on the results of an LCRDM task force and a ZonMw project on data stewardship in the Netherlands (Scholtens et al. 2019). The LCRDM report (Verheul et al. 2019) distinguishes between ‘embedded data stewards’ (often operational, and positioned within a research unit) and ‘generic data stewards’ (positioned at institutional level). The ZonMw project distinguishes between data stewards’ three roles, namely ‘focused on policy’, ‘focused on supporting research’, and ‘focused on facilitating infrastructure’.

The DCCs at the knowledge institutions should be mentioned in this context. Data stewards play a role in local support and are therefore involved, directly or indirectly, in the local DCC. There is little disagreement about the DCC concept but how that concept will be fleshed out may differ from one knowledge institution to another.

The local DCC will therefore have to offer support and training, but in turn will require support and training and therefore call on organisations that provide relevant services, for example SURF. SURF has been assigned a coordinating role in the federated infrastructure of DCCs, aimed at delivering technical facilities on the one hand and, on the other, policy support, knowledge sharing, and coordination in cross-disciplinary matters. DANS, 4TU.ResearchData and similar parties will also provide such support.

Work is being done on a standardised range of technical facilities in liaison with SURF’s RDM Coordination Group. There is no coordination regarding the other generic support; more coordination in discipline- or domain-specific support and training could also improve efficiency and transparency, ultimately for the benefit of the researchers.

10 Responsible planning and executing of all actions on digital data before, during and after a research project, with the aim of optimizing the usability, reusability and reproducibility of the resulting data (https://www.lcrdm.nl/begrippenlijst)

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 31

Chapter 4. Organisations

In addition to the research institutions themselves, the data landscape is populated by a multitude of organisations that play an active role, from policymaking to funding and from delivering and providing access to infrastructure and data services to training and coordination in liaison with international organisations. A distinction can be made between organisations specialising in a specific scientific field (often linked to international domain infrastructures) and organisations adopting a cross-domain approach. This chapter reviews the broad spectrum of organisations that play a role in defining the Dutch data landscape.

As our review of data services (Chapter 3) shows, alongside the capacity of the national IT landscape, research data as a field of endeavour requires wide-ranging specialist expertise, jobs and roles to flesh out the data content and background specifically for each scientific field. The Netherlands is widely acknowledged as having built a wealth of experience and a broad knowledge base in the field of research data. At the same time, this review reveals a highly fragmented national data landscape.

Policy organisations At national level, the ministries define the policy. The Dutch government’s 2017-2021 coalition agreement states that ‘open science’ and ‘open access’ would become the norm in academic research. In addition, in her 2019 science white paper, the Minister of Education, Culture and Science explains why she is making an extra investment in the digital infrastructure: ‘With the € 20 million investment, I want…together with stakeholders, [to] invest in data infrastructure for open science For example, for the reuse of research data, one of the top priorities of open science, it is essential to strengthen the digital infrastructure.’ This investment supports the aim that ‘the Netherlands wants to continue to be a part of top-class science worldwide. This requires cooperation at the national and international level, between scientific and social partners and with businesses. A strong Dutch system with good research facilities improves the position of our researchers for working on global challenges together with leading scientists from other countries.’

The report The Value of Data - Policy Implications (Bennett Institute for Public Policy, 2020) aims to contribute to a better understanding of the value of data and how that value can be enhanced, in a broader economic and societal context. The report focuses mainly on public data, but is also relevant to research data.

The government must make choices when it comes to investing in and defining the parameters of data accessibility. The data economy must not become a market-driven economy.

‘Capturing the value from data will often need specific capabilities (e.g. data science and analytical skills, management know-how) or complementary investments (e.g. software, other capital equipment). Our interviewees consistently indicated that a lack of capabilities is a major barrier to capturing the potential value from data use.’ (p. 7)

‘Appropriate institutional and regulatory structures will be vital for a thriving data economy, regulating the permissions different types of entity have to access different types of data and monitoring and enforcing compliance. Work on the principles and structures of data

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 32

governance for the maximum social welfare is in its early days and much more thought needs to be given to the specifics of regulatory and institutional design.’ (p. 38)

Two of the authors’ recommendations: • ‘Provide a trustworthy institutional and regulatory environment. The value data has is dependent on the environment in which it exists. Institutions are needed to regulate who has access to data, monitor impact, and enforce compliance with regulation, technical standards and codes of conduct.’ (p. 42) • ‘Simplify data regulation and licensing. Complex and overlapping regulation and intricate licensing schemes create uncertainties that hold back organisations from using and sharing data. Existing regulation should be simplified, new regulation should be coherent, and clear guidance should be provided.’ (p. 42)

Research funding bodies In the Netherlands, NWO and ZonMw are the national public bodes that fund research. In addition, a significant proportion of medical research is funded by the health fundraising organisations, known collectively as the SGF.

The funding bodies’ role extends beyond merely furnishing money to conduct research. They can, after all, make demands on the recipients of funding and in that way encourage certain trends and developments. See the text box on the following page.

NWO. In addition to publications, research data resulting from projects funded by NWO must, as far as possible, be open and available for reuse. The adage is ‘as open as possible, as closed as necessary’. Matters such as privacy, public safety, ethical constraints, intellectual property rights and commercial interests may be arguments for departing from this rule. To ensure that data is open, it must be findable, accessible, interoperable and reusable (FAIR). To achieve this, NWO began working with a data management policy that became effective for NWO instruments on 1 October 2016. Specifically, this means that applicants must fill in the data management section of each funding application. After funding has been awarded, the applicants must also submit a data management plan.

ZonMw seeks to improve the impact of research output, including research data, on science and society. To achieve this, it must be possible to reuse research data to verify results or in future research. ZonMw thus requires researchers to apply research data management & stewardship (RDM) principles and to share their data in an effort to contribute to future, innovative research. ZonMw’s procedures for RDM are geared towards creating FAIR data generated in the high-end research projects that it funds through calls for proposals. ZonMw is developing its research data management policy in close consultation with GO FAIR. It is also working actively with DTL, Health-RI, PNN, VH, VSNU and SURF to develop competence profiles and training requirements for data stewards (NPOS Project F).

In ZonMw’s view, research funders are an integral part of the knowledge institution-data service-research funder triad. This view underpins the ZonMw FAIR data management method. The role of the research funder in that triad is: a. to drive progress by setting the criteria and parameters for research funding. The result is that knowledge institutions have come to pay more attention to data stewardship and RDM support and training.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 33

b. to focus on standards by setting criteria. ZonMw encourages researchers working in a chosen discipline/subdiscipline to select specific standards, techniques, infrastructure, portals, registers, etc. This promotes the interoperability of data and metadata and, consequently, their reuse. c. to monitor outcomes. In ZonMw’s method, the responsibility rests with the knowledge institutions. ZonMw establishes the frameworks, focuses on standards and monitors outcomes according to 7 ‘key items’ that researchers must complete to show what their data management efforts have yielded at the end of their project. ZonMw uses this as (1) an accountability tool and (2) to chart the progress of a discipline/subdiscipline. This is one way to promote the use of standards.

The three parties of the triad form the ‘setting’ in which researchers can optimise their research, including open science and FAIR data. Researchers and data stewards at universities recognise and explicitly acknowledge the importance of the funder’s role: ‘Without your demands, we would not have done this’.

SGF. The health fundraising organisations are committed to disease prevention and cure and to providing patients with good care. They are a major funder of research and their aim is for that research to help prevent and cure diseases and improve quality of life. The SGF considers that the knowledge and findings arising from such research should be available to all free of charge. They should be available to the SGF’s own affiliates, which want access to the knowledge produced by research and are eager to know where their investments are leading, and to researchers and practitioners, who can use this knowledge in further research and to accelerate the development of useful applications. The SGF endorses the objectives of the open science movement and has adopted open science as a policy.

Research institutions Research data management or RDM (increasingly referred to internationally by the broader term ‘data stewardship’) is a process closely related to the research process itself. Regardless of the field, researchers know the background of research data resulting from experiments or observations. As a result, research institutions play a key role in the national data landscape through their scientific departments and central research support services (usually libraries, IT departments and Technology Transfer Offices). At the same time, many institutions have only recently adopted a policy in this regard and have only limited budgets for data management and data stewardship. The field of research data covers a very broad and heterogeneous range of scientific and technical disciplines, each with its own background, set of instruments and funding channels, resulting in a broad array of solutions, expertise and services.

Most institutions now have the following research data management systems (in some cases as part of a larger research support system): • General policy framework with FAIR data as an aim, in accordance with relevant legislation and regulations; this policy framework is or will be fleshed out in faculty policy or protocols. • Some universities have carried out or intend to carry out an audit or survey to check compliance with the policy. • An institution-wide project programme (for open science, research data management or research support) to implement policy, with or without extra funding.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 34

• A central support desk (department/contact point/unit) (the institutions themselves see these as data competence centres [under development], in most cases taking the form of virtual collaboration between the university’s library and, as a minimum, its IT services division). • Data stewards appointed to faculties or institutes. The appointments vary widely, with researchers in research departments often taking on the work (in some cases as an extra task). • An important element of the support system is help in completing a data management plan (DMP), with or without a tool (DMPonline). PhD students at Erasmus University are required to update their DMP every year • Many of the websites offer useful information about research data management, in some cases including a catalogue of services. • Storage facilities are provided at central and/or remote units. Some universities have their own data repository and/or their own GitHub or GitLab facility. • Reference is also often made to facilities at SURF, DANS, and 4TU.ResearchData. • In addition, researchers use a wide range of external open source, non-profit or semi- commercial facilities (such as Code Ocean, Dropbox, Dryad, EMBL-EBI, Figshare, GBIG/NLBIF, Genbank, Google cloud, HPC cloud, ISRIC WDS-Soils, Mendeley, OSF, Pangaea, PDB, Talkbank, Wormbase or Zenodo), but there is little overview. In most cases, the users are individual researchers. • Electronic Lab Notebooks are increasingly being used to document work in laboratories, for example. • There are also sites experimenting with Virtual Research Environments, digital project environments in which researchers can perform analyses in conjunction with other project participants (including running software on data and analysing computational cloud infrastructure). • Registration of datasets in a central research information system. Radboud University has a facility for uploading data to DANS-EASY. • Training courses and workshops are available covering policy-related and technical aspects of research data management. Increasingly, such training is being made compulsory (at least for PhD students) • Some universities also offer bachelor and master students tuition in data science or data stewardship. • A growing number of institutions have an open science community of researchers who are interested in sharing their experiences on social media and during low-threshold meetings.

Service organisations

4TU.ResearchData This service was set up for the four Dutch universities of technology. 4TU.ResearchData’s main task is to maintain a data archive for long-term storage, access and curation of research datasets, with a focus on the technical sciences. The archive went live in 2010 and has since been managed as a service to which researchers (from universities around the world) can upload and share their datasets and from which other researchers can download and use data in their research. 4TU.ResearchData is hosted and managed by Delft

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 35

University’s library, where datasets are locally stored on the university’s servers, and is governed by the university’s legal provisions.

DANS Data Archiving and Networked Services (DANS) is the Netherlands’ institute for permanent access to digital research resources. DANS is a joint KNAW and NWO institute and encourages researchers to make their digital research data findable, accessible, interoperable and reusable (FAIR). It does this by providing expert advice and certified services. DANS focuses on digital archiving. Its main services are DataverseNL for short- term data management, EASY for long-term archiving, and NARCIS, the national portal for research information.

By participating in Dutch and international projects, networks and research, DANS contributes to the ongoing innovation of the global scientific data infrastructure. For example, DANS participates actively in the international Research Data Alliance. eScience Center The Netherlands eScience Center is the Dutch centre of expertise on research software. Set up by NWO and SURF, it develops calls for project proposals in which researchers from Dutch organisations across the entire spectrum of science can apply for budgets, in part to deploy their own staff, in all cases supplemented by the eScience Centre’s own research software engineers. Wherever possible, software developed by the eScience Center is reused in projects.

The eScience Center is one of the driving forces behind the community of Research Software Engineers (RSEs), which is committed to professionalism and sustainability in research software (software stewardship, FAIR software). The eScience Center offers partners and research groups workshops, training courses and network meetings on digital methods, skills and research software.

Cooperation Due to the cross-domain or cross-institutional nature of research data and data management, a number of partnerships have been formed in recent years aimed specifically at pooling resources – including public-private ones – for the digitalisation of research and capacity-building in data stewardship. These partnerships are already driving a major process of coordination, international alignment and harmonisation of solutions.

Below, we distinguish between member organisations (or partner organisations), generic partnerships and specific partnerships.

Member organisations

SURF SURF is the collaborative organisation for IT in Dutch education and research. All publicly funded education and research institutions in the Netherlands are members and play a role in SURF’s governance.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 36

In the context of open science, SURF aims to consolidate the entire digital research process by working with researchers and institutions to develop policy, infrastructure and tools. SURF offers a wide range of services for the various phases of the research data life cycle, i.e. the safe storage, management, sharing and reuse of data. It also provides high- performance computing and research data processing, analysis and visualisation services.

‘SURF will play a critical technical role in facilitating and coordinating a safe, federated system linking the DCCs. SURF will support the local and thematic DCCs on two fronts: First of all, in technical terms, it will facilitate and coordinate the safe, federated system linking the university DCCs. These are technologies and tools that will have already passed through the early stage of innovation and become federated in the roll-out stage. They will include support for facilities that the institutions make available to one another, with SURF playing a coordinating role. Second, SURF will use its expertise for the benefit of knowledge-sharing and coordination. It will support the DCCs in their policymaking, brings experts from different DCCs together and assists research support staff. These are indispensable elements for the formation of a safe federated network’ (unofficial translation of NWO 2019, 39-40).

RDM Coordination Group (i.e. RDM Expert Group) The RDM Coordination Group has been set up within SURF’s governance structure to coordinate the above developments (particularly the first item) with the institutions. The RDM Coordination Group consists of delegates from the relevant SURF departments (the IT directors at the universities, universities of applied sciences and UMCs) and from the UKB. The RDM Coordination Group focuses on generic RDM facilities being developed by SURF (and on connecting external facilities to SURF’s generic services). All this is based on an ‘architecture for RDM facilities’.

National Coordination Point Research Data Management (LCRDM) For the second item referred to above, knowledge-sharing and coordination, SURF has set up the LCRDM, initially at the request of the VSNU but now working with and for all higher education and research institutions. The LCRDM facilitates the interface between policy and practice. Experts from the institutions work together in the LCDRM to add RDM topics to the agenda that are too much for one institution to handle on its own and require a common national approach.

DTL Dutch Techcentre for Life Sciences (DTL) is a public-private alliance of more than 50 Dutch life science organisations united in the DTL Foundation. It acts as a network of professionals in academia and business who are working together to improve the Dutch life science research infrastructure, with a focus on accessible high-end technologies (shared research facilities), FAIR data stewardship, data science and expert training. DTL was one of the pioneers of the global FAIR data approach and founded the spin-off GO FAIR organisation, meant to support FAIR implementation worldwide and beyond the life sciences. DTL is an active participant in the development of Health-RI, meant to be a recognised national platform for health data. DTL also initiates and facilitates the Data Stewards Interest Group (DSIG), a community for data stewards that serves as a platform for informal knowledge- sharing.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 37

DTL also functions as the Dutch node of ELIXIR, the European data infrastructure for the life sciences. ELIXIR’s partnership consists of 21 countries plus EMBL, uniting Europe’s leading life science organisations to manage, preserve and make optimal use of the growing amount of data generated by publicly funded research. ELIXIR-NL has been on the Dutch Roadmap for Large-Scale Scientific Infrastructure since 2015, although a sizeable funding award under this banner has not been forthcoming.

Generic partnerships

UKB The alliance of university libraries and the National Library of the Netherlands supports and accelerates progress in science by distributing, concentrating and connecting expertise in national and international networks. The UKB has a Research Data working group whose purpose is to foster knowledge-sharing in the UKB and to transfer that knowledge to the academic research community.

The aims that it has identified for the 2017-2020 period state: ‘We take a joint approach to the development of research data management services. We have a national network (LCRDM) and provide effective and efficient support at the local level. Researchers know where to go for findable, accessible, interoperable and reusable (FAIR) storage of research data during and after their investigation and where to turn with questions about ownership and legal aspects of research data management. In addition, we are developing innovative open science and data management training courses for young researchers. …We support data-intensive research through digital scholarship centres. Centres of this kind offer a range of facilities, including specialised tools, methods and techniques for collecting, processing and visualising data’.

SHB The SHB website only lists the services provided by the libraries of universities of applied sciences: • Support and advice in archiving and making (raw) research data available via the university of applied sciences data repository for reference and (re)use. • Support and advice in drawing up data management plans. • Referrals to DANS, 3TU or other national and international repositories for very large files, special formats and/or long retention periods.

RD-NL Research Data Netherlands is an alliance between 4TU-ResearchData, DANS and SURFsara that focuses on sustainable data archiving and research data reuse. The three partners share a front office that provides access to storage capacity through archiving; the organisations’ repositories comply with CoreTrustSeal guidelines.

Dutch Data Prize The Dutch Data Prize is an RD-NL initiative. It is awarded annually to a researcher or research group who contributes to science by making research data available for new or additional research. Researchers can nominate themselves or another researcher or research group for the prize. The Dutch Data Prize is awarded in three categories: 1) Humanities and social sciences, 2) Exact and technical sciences, 3) Medical and life sciences. The 2020 Dutch Data Prize is being sponsored by DANS, 4TU.ResearchData,

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 38

SURFsara, DTL, LCRDM, Netherlands eScience Center, Netwerk Digitaal Erfgoed, OpenAIRE and the UKB Research Data Working Group.

Specific partnerships

CLARIAH Common Lab Research Infrastructure for the Arts and Humanities (CLARIAH) is the distributed research infrastructure for the humanities and social sciences. CLARIAH is the infrastructure through which researchers gain access to large collections of digital data and to innovative and user-friendly data processing applications. Both the data and the applications are managed sustainably so that they will continue to be useful to scientists in the future, from literary scholars, historians and archaeologists to linguists, speech technologists and media scientists. CLARIAH is the Dutch node of CLARIN-EU and DARIAH-EU, the two humanities infrastructure programmes on the ESFRI Roadmap.

Health-RI Health-RI (‘enabling data-driven health’) is an initiative arising from the KNAW Agenda for Large-Scale Scientific Research Facilities (2016) as a synthesis of plans stemming from BBMRI-NL/EATRIS-NL, DTL/ELIXIR-NL and Maastro, later supplemented by NFU-Data4LS, NFU-Parelsnoer, CTMM-TraIT, ELSI service desk, X-Omics, GO FAIR and other projects. Health-RI is aimed at creating the national federated data infrastructure for biomedical and health research in the Netherlands. This includes biobanks (both national BBMRI-NL and PALGA, as well as local collections, mainly held at UMCs). The Health-RI Foundation was founded in January 2020, with financial support from NFU and ZonMw, among others.

UMCs are important drivers for Health-RI, but the board is pursuing an inclusive strategy, with Health-RI bringing together all the relevant stakeholders: knowledge institutions, hospitals, businesses, public organisations in the care chain, national quality registries, SURF, and so on. Important guiding principles for Health-RI programmes are the FAIR principles for data stewardship and the consolidation of federated analysis according to the Personal Health Train concept. Health-RI is setting up a portal of nationwide data services in healthcare and is committed to harmonising access to privacy-sensitive data (ELSI service desk). In terms of biomedical and clinical research, Health-RI focuses on the reuse of data generated by the care process, subject to specific reusability and consent criteria. This illustrates how the availability of data obtained in the field is a determining factor in designing a data approach within a specific domain of science, in this case health research.

ODISSEI ODISSEI (Open Data Infrastructure for Social Science and Economic Innovations) is the national research infrastructure for the social sciences in the Netherlands. Through ODISSEI, researchers have access to large-scale, longitudinal data collections that can be linked to Statistics Netherlands (CBS) records. This virtual network enables researchers to answer new interdisciplinary research questions and to investigate existing questions in novel ways. ODISSEI is a joint initiative undertaken by 34 partner organisations, including the project directors of large-scale data collections, Statistics Netherlands, SURFsara, CentERdata, and NWO. It is on the Dutch National Roadmap for Large-Scale Scientific Infrastructure 2016-2020 and was awarded relevant funding by NWO in spring 2020.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 39

ODISSEI focuses on research data in the social sciences collected by observation or through links to data records. Employees of member organisations are entitled to several forms of compensation, such as the discount and grants for Statistics Netherlands’ Microdata and the LISS panel. Researchers can use the ODISSEI Secure Supercomputer (OSSC) to perform analyses of highly-sensitive data – such as CBS Microdata – in SURF’s high-performance computing environment Cartesius.

Note: The above list of domain- or discipline-specific partnerships is not exhaustive. The Netherlands has approximately 30 funded large-scale scientific infrastructures, and data is an important factor across the board. We have only listed the alliances in which research data plays a prominent role in external exposure.

National knowledge institutions

Statistics Netherlands As the Netherlands’ main statistical office, Statistics Netherlands (CBS) provides reliable statistical information and data on socially relevant issues. In doing so, it supports public debate, policy development and decision-making, thus contributing to prosperity, welfare and democracy. CBS manages data from more than 200 government organisations and acts as a centre of expertise for big data analysis. The institution’s legal basis is the Statistics Netherlands Act. Its costs are funded from the government budget.

‘CBS’ objective in the period 2019–2023 is to upgrade the existing data infrastructure to the extent necessary to provide maximum support for the reuse of data. This will enable CBS to continue performing its role as a data hub and producer of statistics for the entire government within the statutory parameters. The benefits for society are likely to be substantial: data from government organisations can be permanently stored and managed at a single location, and then quickly made available by CBS for statistical analyses and research at national, regional and local level. This will enable many government institutions to achieve their ambitions to become data-driven organisations, while using existing facilities for data storage and data management. The result will be lower costs for society’ (CBS 2018, 64-65).

CBS has a Microdata Services facility (www.cbs.nl/microdata) that makes its own microdata sets available to researchers working at authorised research institutions, under strict conditions. Researchers can also upload their own datasets and link them to the microdata. CBS has been offering this service for almost 20 years. It is widely used by universities and other research institutions, making it an important ODISSEI component that supports upscaling through the ODISSEI Secure Super Computer.

KNMI The Royal Netherlands Meteorological Institute (KNMI) is the national research and information centre for meteorology, climate, air quality and seismology. It makes its high- quality knowledge and information operationally available 24 hours a day, seven days a week to government, policymakers, the aviation sector, businesses and the public. KNMI is an agency of the Ministry of Infrastructure and the Environment. Its unique task is to collect information on the atmosphere and the subsurface and interpret that information in terms of risks to society. It continuously extends and deepens this knowledge in cooperation with

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 40 research institutes, universities and businesses. The KNMI’s supercomputer has 39 terabytes of working memory. The agency receives 1500 gigabytes of raw data every day.

‘The KNMI provides easy access to data, products and information in accordance with the open government principle. Because meteorology, climate and seismology are of growing importance to almost every group in society, making data and products freely available supports the development of economically valuable and sustainable innovations’ (unofficial translation of KNMI-strategie 2015–2020, 6).

RIVM The Dutch National Institute for Public Health and the Environment (RIVM) is committed to a healthy population in a sustainable, safe and healthy living environment. Scientific research is fundamental to all of RIVM’s tasks. Its expertise covers three domains: Infectious Diseases and Vaccinology, Public Health and Health Services, and Environment and Safety. The RIVM website (https://www.rivm.nl/gezondeleefomgeving/data) furnishes an overview of data, graphs and maps covering the Living Environment and Health. Much of this information is available for reuse.

Commercial parties All the major scientific publishers active in the Netherlands furnish data services in addition to their catalogue of publications. Elsevier has the Mendeley Data service, Springer Nature cooperates with Figshare and Wiley with Dryad. The big publishers generally claim to support open data and open research results and say that they are offering their services to make sharing easy for researchers. In many cases, they do not charge researchers for using and depositing datasets in their repository (within certain parameters, such as the format and sometimes the size of the dataset), but researchers do not always know what happens to the data once they have deposited it.

Academic publishing is in a state of flux and non-university publishers operate in a market driven by private interests and investment companies. The Holtzbrinck Publishing Group, for example, owns Nature, Springer and MacMillan, as well as Digital Science, a technology firm that furnishes various services to researchers and research organisations, including Figshare.

In addition to the academic publishers, there are Amazon, Google and other major IT companies that provide science-related services. Amazon, for example, provides all kinds of data storage, analysis, archiving and exchange services. Researchers also use Google products such as Dataset Search, Google Drive and Google Scholar. Dropbox and other firms also focus on academic researchers.

It goes beyond the scope of this study to discuss all the available commercial data services, not least because this market is very volatile. The fact is that providing research data facilities is an interesting investment opportunity, and that is precisely what may be cause for thought: it is not so much the cost as the ethical and legal implications that may make it necessary to turn to public alternatives, or to a form of collective procurement where the parties not only negotiate the price but also the conditions under which the service is provided so that public values are safeguarded.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 41

Commercial publishers It is not yet clear what role commercial publishers will come to play in all this. The relationship between research organisations and publishers is under strain when it comes to publications. The strain appears to be extending to research data management, as several publishers are also (or will be) offering RDM services, whether or not in conjunction with collaborative platforms.

The Scholarly Publishing and Academic Resources Coalition (SPARC) recently published a landscape study and warns of possible adverse effects. ‘This report was commissioned in response to the growing trend of commercial acquisition of critical infrastructure in our institutions.’ (SPARC 2019)

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 42

Chapter 5. Regulatory aspects

This chapter discusses: - which laws and regulatory provisions apply to research data management and where researchers can learn how to deal with legislation that touches on data management. - sources of data management guidelines (both generic and discipline-specific, national and international) and how well are they adhered to and enforced. Appendix 2 provides more details regarding existing regulatory provisions.

Guidelines In addition to their own research requirements, how researchers handle their data also depends on the requirements laid down in legislation, regulatory provisions and guidelines. Legislation may be determined at national or international level and applies to all researchers. Guidelines may be generic or (discipline-)specific, mandatory or recommended, and be set by a research funder, a knowledge institution, a publisher, a facility or a community. Guidelines describe how (where) and for how long data should be stored, to what extent and when data should be made available and to whom, and what documentation, metadata and analysis software / scripts should be made available to enable interpretation of the data.

Note: Not all researchers are familiar with the guidelines that apply to them. At times, different sets of guidelines may also be contradictory, requiring harmonisation and better accessibility.

Various research funders (including NWO and ZonMw but also the European Commission under the Horizon 2020 FP) have funding rules stipulating how researchers must deal with data in funded projects. In terms of discipline-specific guidelines, these funding bodies rely on local expertise at knowledge institutions: local RDM support provided by the DCCs helps researchers to draw up and monitor their data management plans. The enforcement of data management policy is based on an assessment of the data management plan prior to the research commencing.

Knowledge institutions all have their own data management policies, anchored in institutional/institute/faculty guidelines, the collective agreement / employment contract with researchers and the Netherlands Code of Conduct for Research Integrity. The Standard Evaluation Protocol (and to a greater extent its successor, the Strategy Evaluation Protocol) also contain data management provisions within the context of quality assurance. In many cases, institution-wide framework guidelines are drawn up and then elaborated by the faculties. A number of knowledge institutions also require a data management plan prior to the start of any research project that is receiving direct funding. Institutions do little or nothing to enforce data management policy and generally have little idea of the level of compliance with their policy. However, there is strong evidence that the use of data management plans has raised awareness of the importance and therefore the quality of data management. Knowledge institutions can give open science-related performance agreements with researchers a legal basis in their data management policy and employment contracts.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 43

The UMCs use the Handbook for Adequate Natural Data Stewardship. The updated version (HANDS 2.0) addresses new legislation and regulatory provisions, for example covering privacy, and gives researchers a practical toolbox for responsible data management.

There are no signs that discrepancies in data management rules and guidelines significantly impede data sharing between knowledge institutions. The various guidelines share enough common ground (code of conduct, collective agreement) and consequently differ mainly in the details. This may well be desirable, as they can be better attuned to the local data infrastructure and research priorities of the various knowledge institutions.

Infrastructures can also dictate data management guidelines. Data infrastructures often have rules and requirements for using and depositing data. Physical research facilities also often place demands on the management and availability of data generated by their use. Adherence to these guidelines is often essential for the infrastructure’s performance and maintenance and is generally strictly enforced.

Publishers of scientific journals also often adopt guidelines for sharing the data that underpins the conclusions of a publication. The Transparency and Openness Promotion (TOP) guidelines,11 agreed in 2015, have been a driving force behind this trend. Various publishers now back this initiative. Publishers vary in their level of stringency. A prestigious journal such as PlosOne, for example, is very strict and will no longer accept articles that do not include a data availability statement. Other publishers are more reserved. There is some debate as to what role publishers should play in this regard. Some critics are worried that, in their search for new revenue models, publishers will increasingly position themselves as players in the research data arena. Public knowledge institutions would do well to consider the extent to which they wish to collaborate with (commercial) publishers in that respect. This is one of the issues currently under review in contract negotiations with Elsevier.

Legislation The General Data Protection Regulation (GDPR) became effective on 25 May 2018 and applies throughout the European Union. The GDPR is directly applicable in the Netherlands. Matters in which the GDPR allows for national discretion have been fleshed out in the Dutch GDPR Implementation Act (UAVG).

The GDPR and the UAVG provide for explicit derogations regarding the use of personal data in scientific research. Article 24 of the UAVG, for example, states that ‘…the prohibition on processing special categories of personal data does not apply if: a. processing is necessary for scientific or historical research purposes or statistical purposes in accordance with Article 89(1) of the Regulation; b. the research referred to in a. serves a public interest; c. it is impossible or would involve a disproportionate effort to request express consent; and d. safeguards have been put in place for the processing such that the data subject’s privacy is not disproportionately compromised’.

It is mainly personal data legislation such as the GDPR that gives researchers headaches. Many researchers, including those who work with personal data, are not fully aware of what the law obliges them to do and prohibits them from doing. Concerns about GDPR non- compliance can also hamper data sharing; because GDPR-compliant sharing of data from

11 https://science.sciencemag.org/content/348/6242/1422.full

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 44 research involving humans calls for an investment (time at the expense of research time, money) and specialist knowledge, researchers may be hesitant to do so.

Researchers can seek advice on personal data and other legislation from various sources. The most obvious source would be their local DCC, but according to a number of DCCs, GDPR issues are so complicated and specialised that they cannot always offer researchers adequate assistance. Local DCCs share GDPR information and best practices through the LCRDM, for example, but several would like to see more pooling of expertise. Some of the necessary expertise overlaps and some is specific to the social sciences and medical/healthcare sciences. Closer cooperation between these disciplines would be advisable. A national coordination centre may be a springboard for achieving this. It can serve as a collection point for best practices (as LCRDM is currently doing), organise generic training courses and provide ad hoc support on specific subjects.

Netherlands Code of Conduct for Research Integrity The Netherlands Code of Conduct for Research Integrity was revised in 2018. The preamble refers to the evolving nature of research and research practices and the growing importance of the way data is used and managed and developments in the field of open science. The Code itself defines standards for good research practices and formulates institutions’ duty of care.

Under the Code, institutions are themselves responsible for providing a data infrastructure, for storing research data permanently, and for adhering to the FAIR principles, even if they have outsourced some or all of these duties.

Standard Evaluation Protocol KNAW, NWO and VSNU have agreed on a protocol for evaluating research quality in the Netherlands. The current Protocol (effective 2015-2021) will be replaced by a new version (effective 2021-2027) in the course of 2020.

Under the current protocol, the assessment committees, in considering research integrity as a factor, also focus on how the research unit deals with data and data management. In its self-assessment report, the research unit must therefore describe how it ‘deals with and stores raw and processed data’. With regard to output indicators, the research unit may also state how many/which datasets it produced during the assessment period and to what extent the datasets have been used.

The new protocol, known as the Strategy Evaluation Protocol, devotes much more attention to open science. A research unit’s self-assessment should address four specific aspects, with open science being listed first. The assessment covers the involvement of stakeholders, data use, and open access to publications and other products of research.

‘The assessment committee considers the extent to which the research unit involves stakeholders, if possible and relevant, in the preparation and execution of the aims and strategy. It also considers to which extent the research unit opens up its work to other researchers and societal stakeholders in the context of its strategy and policy. Furthermore, the committee considers whether the research unit reuses data where possible; how it stores the research data according to the FAIR principles; how it makes its research data, methods and materials available; and when publications are available

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 45

through open access. Even if Open Science was not yet considered by the research unit for the past period, the assessment committee evaluates the unit’s considerations and plans for the future with regard to Open Science.

In the self-evaluation, the research unit reflects on how it involves stakeholders, to which extent the research unit opens up its work to other researchers and societal stakeholders, how it pays attention to other aspects of open science and what its future plans are in this respect.’ (SEP 2021-2027, p. 9)

Protocol for Quality Assurance in Research at Universities of Applied Sciences The universities of applied sciences conduct applied research meant to improve the quality of UAS graduates, ensure that education remains responsive to the needs of society, and innovate professional practice.

The joint Dutch universities of applied sciences have a ‘Quality Assurance System for Applied Research’ in which the Protocol for Quality Assurance in Research 2016-2022 plays a key role. The Protocol only mentions the word ‘data’ once, in the table of indicators showing use and recognition of UAS research output, i.e. ‘using research data in knowledge development’.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 46

Chapter 6. Governance

The previous chapter dealt with legislation and regulatory provisions, i.e. indirect governance. This penultimate chapter describes how governance of the transition to FAIR data is currently organised in the Netherlands in terms of decision-making, research funding (and thus parameters), consultative structures, and so on.

We begin our discussion by quoting a recommendation made during one of the working visits to the universities: ‘The governance of research data management (i.e. its implementation) must follow the same line as the governance of research. It is also preferable to flesh out the prerequisites along that line’.

Governance of research at national level We begin this chapter by looking at the governance of research at the national level and then zoom in. At the national level, we can distinguish between: (1) policymaking, (2) funding and (3) conduct of research. These three aspects are distinct but inextricably linked.

Re 1) National, overarching policy is the exclusive preserve of the relevant ministries (especially Education, Culture and Science, but also Economic Affairs and Climate Policy, Agriculture, Nature and Food Quality, and Health, Welfare and Sport). The ministries are responsible for research policy (the Higher Education and Research Plan issued every four years). The Minister of Education, Culture and Science may impose quality assurance conditions on research funding furnished directly to research institutions.

Re 2) Research funding is a Government responsibility, as evidenced by the block grant for designated institutions, but much of that responsibility is also entrusted to NWO and ZonMw (the latter for health research).12 The Minister of Education, Culture and Science determines the Science Budget. NWO’s statutory tasks13 are as follows:

12 The task of ZonMw (or rather its legal predecessor, ZON) is described in the ZON Act, i.e. ‘seeing that projects, experiments, research and developments in the field of health, prevention and care are carried out, funded or awarded. In doing so, the organisation monitors quality and coherence and also promotes use of the results’. 13 Article 3 of the NWO Act.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 47

1. NWO promotes the quality of scientific research and initiates and encourages new developments in scientific research. 2. NWO performs this task specifically by allocating funding. 3. NWO also promotes dissemination of the results of the research it initiates and encourages for the benefit of society. 4. In performing this task, NWO focuses primarily on university research, taking special note of matters of coordination and promoting these where necessary.

Re 3) The responsibility for conducting the research lies with the institutions themselves. Within the above-mentioned overarching policy, the designated institutions have their own competences. The Higher Education and Research Act (WHW) states that universities must conduct research, train researchers and transfer knowledge for the benefit of society. The same applies to the other research institutions, university medical centres and universities of applied sciences. Academic freedom is embedded in the Act.

Overarching organisations such as VSNU, NFU and VH serve the institutions concerned and, within the context of this study, advocate (e.g. in negotiations with publishers) and speak on their behalf. They appear less inclined to pursue a joint, explicit policy on research data.

The knowledge institutions (i.e. their overarching organisations) have jointly identified good research practices and institutions’ duty of care in the Netherlands Code of Conduct for Research Integrity. In doing so, they confirm that they are responsible for providing the right support.

The NPOS Steering Group (in which the Ministry of Education, Culture and Science, NWO, VSNU and NFU represent the three aspects of policymaking, funding and conduct of research) has stated its wish to coordinate the development of the national data landscape.

International representation Incidentally: Dutch representation in international bodies in this domain merits further attention and could certainly be more clearly defined or streamlined. • Most international domain infrastructures (such as ESFRIs and ERICs) have national nodes. In general, such nodes are reasonably well established and participate in the relevant infrastructure on behalf of the country concerned. There should be some concern about the Netherlands’ policy on the funding of its participation in international infrastructures, given the lasting role that a number of these infrastructures play in research data management. • The current agreements within the EOSC with regard to the Executive and Governance Boards are clear. It is typical, however, that the national liaison group has 24 participants. And en route to the new EOSC partnership structure (effective from 2021 onwards), the question is which organisation within that structure will be ‘mandated’ from the Netherlands. • The participants are also expected to deliver input for international projects on behalf of the participating countries, whereas they have often not been so empowered. • The Netherlands’ involvement and representation in international organisations such as CODATA, GO FAIR, LIBER, OECD, RDA, Sparc Europe, UNESCO, and WDS varies widely.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 48

• To the extent that member organisations (such as CESAER and LERU) do not have national organisations, representation is not an issue. •

Support at national level Research support for institutions at national level – above and beyond the support that they organise internally – can be divided into roughly three categories: (1) technical infrastructure, (2) data services and (3) software development. The three are inextricably linked.

Re 1) The technical infrastructure and IT services surrounding it are the responsibility of SURF, established several decades ago but which has more recently become a cooperative formed by all affiliated Dutch educational and research institutions (more than 100). Since the knowledge institutions themselves are members of this cooperative and are represented in its Members’ Council, their responsibility and duty of care empowers them to influence the delivery of infrastructure services and their innovation. This was a particular area of focus during the recent overhaul of SURF’s governance structure in 2019.

This view was echoed by SURF’s Scientific Technical Council (WTR, 2020) in a recent advisory report recommending that SURF should play a leading role ‘in a data-oriented service infrastructure and be empowered to make innovative use of that infrastructure for education and research’. It should be noted, however, that this mainly applies to the IT facilities for research data management.

Re 2) Data services in the Netherlands are provided by DANS (a joint KNAW/NWO institute), DTL (network organisation for the life sciences) and 4TU.ResearchData (a joint initiative of the four universities of technology but in legal terms a service furnished by Delft University of Technology). SURF also increasingly delivers research data services. These

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 49 organisations vary in their tasks and governance structures. There are also data services organised within the framework of the Netherlands’ national large-scale research infrastructures, for example in CLARIAH, ODISSEI and Health-RI (and previously, the data4lifesciences programme).

The repositories referred to in Chapter 3 should also be mentioned in this context. Some are embedded in institutions, while others function as more or less temporary partnerships. Once again, the governance structures vary considerably. SURF has the delivery of data services in its service portfolio and also ‘plays a critical technical role in facilitating and coordinating a safe, federated system that should link the DCCs to one another’ (see NWO’s 2019 investment implementation plan for the digital infrastructure).

Re 3) The Netherlands eScience Center is the nationally funded centre for the development of research software, established by NWO and SURF and in that sense given a national mandate. The purpose of the eScience Center is to improve software design and use in all disciplines. It pursues its mission along two lines: 1) By issuing calls for project proposals focusing on innovative software solutions, to be funded from national science budget. 2) By running a dissemination and outreach programme concerning software sustainability and other matters. The eScience Center will not be managing and/or maintaining software itself. That is the responsibility of the institutions. It will, however, actively support the development of software competences and quality, for example by training local DCCs and software engineers, giving workshops on machine learning and so on. The eScience Center plays a coordinating role in this regard. Capacity-building is the responsibility of the institutions. In terms of software, in other words, the institutions remain responsible.

Governance of the data landscape in other countries In Germany, the joint authorities (the federal government and the Länder together) are investing heavily in establishing a National Research Data Infrastructure (NFDI). Between 2019 and 2028, they will invest some €90 million annually in setting up 30 disciplinary or thematic consortia to achieve their open science and FAIR data objectives, deliberately opting for a coordinated, research-driven, bottom-up approach. A national Directorate and a Scientific Senate have an important role to play in the NFDI structure and in ensuring synergy between the consortia.

In Flanders, the government decided at the end of 2019 to create a Flemish Open Science Board (FOSB) and to invest €5 million annually. The FOSB is responsible for designing the entire open science ecosystem. The Flemish research council (FWO) will play a coordinating role in this. Flanders will set up the Flemish Research Data Network (FRDN), consisting of research institutions and an operational unit hosted by the FWO. The FRDN will have a Knowledge Hub, a Data Hub and a Discovery Hub. A Coordination Hub will manage the three hubs and monitor the FRDN’s internal coherence. The Coordination Hub members will be representatives of the institutions, which will provide specific input concerning RDM and open science.

In Denmark, a law was passed in 2012 establishing a virtual organisation under the Ministry of Education and Science. This organisation, the Danish e-infrastructure Cooperation (DeiC), coordinates the Danish digital research infrastructure for the eight Danish universities and provides networking, storage and computing facilities. DeiC also

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 50

coordinates Denmark’s participation in regional (Nordic) and European e-infrastructure organisations and projects. DeiC has a Board whose members represent the universities and it is headed by a director who reports to the DeiC Board.

In Sweden, the Swedish government in early 2020 mandated the Swedish Research Council (SRC) to coordinate national efforts to make research data accessible.14 The aim is to round off implementation by 2026 at the latest. In the first three-year period, the SRC will issue guidelines (recommendations, checklists, examples), tighten up funding requirements, undertake awareness-raising activities, provide for training and knowledge- sharing, and resolve practical and technical issues.

In its report on the national EoSC nodes, the e-Infrastructure Reflection Group concludes that ‘the (organisation of the) national e-Infrastructure landscape varies considerably between the countries’ (E-IRG 2019, 21) and consequently recommends that ‘Members [sic] states and associated countries should continue to increase the level of coordination between and consolidation of the various national players on e-Infrastructure provisioning’ (E-IRG 2019, 22).

Local level Setting up a Data Competence Centre (DCC) at each knowledge institution (as the primary internal and external points of contact for data issues) will impose some structure on the national data landscape. Meanwhile, those units in knowledge institutions that are active in their own data management peer networks (policy and governance support, library, IT services, information management, legal affairs) will remain so, and while this is crucial (when it comes to exchanges on specific fields of expertise), it can also be confusing.

Incidentally, local governance also merits reflection, as each institution has its own approach. This is a topic that lies beyond the scope of this study, however.

KNAW’s report on big data in research involving personal data (KNAW 2018) makes the following recommendation to the Ministry of Education, Culture and Science: Take the lead in establishing a national infrastructure to support and complement the local infrastructure in collecting, processing, sharing and analysing big data on persons. This entails having data specialists and legal and ethical experts provide facilities and support that go beyond the scope of individual institutes or faculties. The infrastructure is intended to facilitate cooperation within the scientific community and with public- and private-sector partners. The infrastructure can only be created if extra investment is made available for the long term.

Consultation structures Finally, we survey some national networks and consultation structures and their role in governing the national data landscape There appears to be some overlap here, in any event. • DataverseNL Advisory Board: The Board consists of representatives of the institutions that use the Dataverse software developed by Harvard University (USA). DANS has managed the network since 2014; the participating institutions manage

14 https://www.vr.se/english/mandates/open-science/open-access-to-research-data/progress-in-the-work-of-implementing-open- access-to-research-data.html

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 51

the data deposited in the local repositories. Each institution participates in the Dataverse Advisory Board, which determines the policies of the service. Local managers on this Board share their experiences and work together on developing RDM good practices. Institutions pay a fixed amount to participate in DataverseNL. • Coordinating SURF Contacts (SURF): Each institution that is a SURF member has its own representative at SURF: the Coordinating SURF Contact (CSC, usually a CIO, IT Director or Information Manager). This contact person streamlines and coordinates communication between their institution and SURF. The CSCs regularly consult with colleagues in their own sector (there are five sectors: universities of applied sciences, research universities, secondary vocational educational institutions, UMCs, and research institutions/other). The chairpersons of the five sector groups meet regularly to coordinate their efforts. They also meet with the SURF Board of Directors four times a year. • Implementation network of DCCs (under development): The persons responsible for establishing (or expanding) a DCC within their own institution have united in an implementation network (based on the GO FAIR approach). • LCRDM: The National Coordination Point Research Data Management has an advisory group and task groups that focus on specific issues. More than 200 persons are currently involved in these groups. • Open science contact groups: This group was initiated by the VSNU. It is not clear how participants are selected. • Coordination Group for RDM facilities: NWO’s 2019 investment implementation plan for the digital infrastructure says the following about this group: ‘The proposal is for the existing consultation structure involving the SURF Board and the knowledge institutions (the participants being the CIOs of universities and UMCs and the directors of the university libraries) to function as a forum in which the institutions jointly identify their needs, develop a plan, and draft an agenda for underlying projects. In 2019, this role will be fulfilled by the Coordination Group for RDM Facilities. …This existing group is part of SURF’s governance, with the SURF Members’ Council initiating the process on 13/6/2019 to develop a proposal to adapt this part of its governance structure to the new governance requirements. This process will be completed in late 2019.’ SURF’s new governance structure has become effective and the coordination group will become an expert group in the summer of 2020. • UKB Research Data Working Group: The purpose of this working group is to foster knowledge-sharing in the UKB and transfer that knowledge to the academic research community.

In the blog ‘New Landscapes on the Road of Open Science: 6 key issues to address for research data management in The Netherlands’ (November 2019), the authors (Delft University of Technology) argue in favour of transparent governance and closer coordination: ‘The responsibilities for research data management are often shared between different departments within a university – the library, ICT, legal, research support. These existing silos make it difficult for universities to provide the frictionless support for their research communities. All of us working in support services should be collaborating to see how we can make workable connections between these silos.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 52

But these institutional boundaries also manifest themselves at a national level. Many of the librarians congregate around LCRDM, the Surf CSC group rounds up the ICT managers, while the big decisions at NPOS are taken by senior policy players. Nevertheless, these stakeholders are still dealing with the same fundamental concerns about Open Science and research data – all of them are travelling the same road.

So we need much better coordination, and smarter routes of governance. We can start by being more transparent. What is each organisation doing, what is its role and responsibilities, where is it going? This is the first milestone in openness. And once we have that we can move on with the coordination and governance issues. Do we look to leadership from our government (OCW), or at least make firm proposals to them, perhaps in exchange for more financial stimulus? Or do we develop grass-roots communities of governance that move more quickly but risk leaving some stakeholders behind?’

Conclusion From the above, as well as from the comments made during the panel meetings (Appendix 3) and working visits (Appendix 4), it is clear that, at the very least, the governance of the data landscape raises various points of concern regarding: a) international representation; b) the position and management of various national data service organisations; c) overlaps in the network and consultation structures.

Much remains to be done to achieve the objectives relating to open science in general (as the ‘norm in academic research’) and FAIR data in particular, and to maintain and improve alignment with European initiatives in this regard. The good news is that many organisations and associations in the Netherlands are currently working hard on this. The bad news is that we are not always aware of what the others are doing and therefore often work at cross purposes.

The basic premise is that knowledge institutions are themselves responsible for conducting research, achieving their goals and complying with legislation and guidelines, including those pertaining to open science and FAIR data.

In a bid to achieve their open science aims, the knowledge institutions have embraced the National Open Science Programme. The NPOS Steering Group has expressed a wish for coordination in developing the national data landscape.

At some point, a bottom-up strategy will require coordination We are used to the bottom-up approach in the Netherlands (and to its advantages) and we should cherish it, even when it comes to research data. That is why it is so crucial for the disciplines, the knowledge institutions and large-scale scientific infrastructures to own their responsibility.

At the same time, it is clear that if we have too many initiatives in a single field, we risk losing clarity, efficiency, overview and connectivity. In essence, there is nothing wrong with all these wonderful initiatives, but we need something more (call it a ‘catalyst’) to achieve that clarity, efficiency, overview and connectivity from a basis of support and with authority.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 53

After all, all the existing initiatives and organisations were conceived with good and legitimate intentions, but it is unrealistic to expect them to bring about the hoped-for improvement in the data landscape on a national level.

The very point of national coordination is to offer researchers and institutions working in the fragmented landscape clear and more efficient support.

With regard to such coordination, we propose the following: • National Open Science Cloud: The Netherlands requires a clear and comprehensive overview of all national data services, how they overlap and any gaps that may exist. This ‘National Open Science Cloud’ (or ‘Coalition’ or ‘Commons’) will allow service providers to cooperate more closely and will improve interoperability and quality (e.g. by means of certifications), at first in the form of a network structure that could culminate later in a National Data Centre. • National centre of expertise: The available knowledge and expertise regarding FAIR data (both generic and discipline-specific, including accepted agreements, protocols and standards) can be made more accessible and authoritative by introducing an official quality label, so that such knowledge and expertise can be more easily referenced and so that a more uniform conceptual framework will emerge that can also be used for advisory purposes and in educating/training researchers and research support staff. This proposal can build on what the LCRDM has already achieved and on the forthcoming results of NPOS Project F (Education and Training in Open Science and Data Stewardship). • National node (for international liaison): The previous point is especially pertinent when it comes to the (canalised) relationship between what is available at national level and what is available and being developed at European or international level, in research infrastructures, projects, and networks and alliances. • FAIR Data Programme Board: A programme board will be established to ensure that the knowledge institutions are involved in and can influence the aforementioned services and developments. The members of the Programme Board will include representatives of research funding bodies, knowledge institutions and large-scale research facilities, both researchers and research support staff. The Programme Board will advise on the policy agenda or programme for the NPOS FAIR Data programme line (subject to the approval of the NPOS Steering Group) and may replace some existing consultation structures. • A small and temporary data coordination and expertise centre is needed at national level to encourage and support these initiatives.

Thoughts on the above proposals

How fragmented is the data landscape? Almost everyone acknowledges that the data landscape is fragmented. Opinions differ as to how serious this is and not everyone experiences it in the same way. Producing (statistical) evidence would require a very thorough (and expensive/time-consuming) study. A great many people are active in the field of FAIR data, but the general feeling is that we are somehow coming up short. All the knowledge institutions have their own teams working to resolve the same issues. There are also quite a few FAIR data consultation structures whose scope and mandate are unclear to many, leading to overhead spending at the national level that could be used in other ways. The underlying problem is that decision-making on collective issues is not properly organised, a problem that has arisen in the LCRDM, for example.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 54

Why establish one or more additional bodies if the landscape is already fragmented? The bottom-up approach is crucial and must be cherished. At the same time, it is leading to a situation where there is nothing to keep the different parties from working at cross purposes and from growing apart. Every existing organisation in the data landscape was set up and is maintained for valid reasons, but none of these organisations (which are often focused on implementation) has the natural authority to create more synergy and connectivity, not least because each one puts its own interests first (as well they should). That is why an additional body is needed temporarily to help the NPOS Steering Group to create more synergy and connectivity in the data landscape. As shown in the illustration, the three virtual networks are not additional organisations but coalitions within which efficiency, synergy and connectivity must be achieved. Because this is ultimately in the interests of Dutch science, the NDCE should be joined by a Programme Board consisting of representatives of the research funding bodies, knowledge institutions and large-scale scientific infrastructures.

How will all this relate to the existing organisations? The organisations cited in this study will all have a place in the new structure. What that place will be is under discussion, not least with the organisations themselves.

How does this tie in with NWO’s investment implementation plan for the digital infrastructure? NWO’s investment implementation plan justifiably argues in favour of a federated network of DCCs. The DCCs are now in the process of being set up at all the knowledge institutions. SURF has been assigned the role of technical coordinator and facilitator. The optimisation proposed in this report goes a step further and focuses on the data landscape as a whole.

But isn’t science an international endeavour? True, but at the same time it is still organised nationally in many respects (legislation, code of conduct, funding), and the EU and OECD also rely heavily on national-level organisations.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 55

Chapter 7. Conclusions and recommendations

This chapter summarises our most important findings. Our focus is the national landscape for research data. We conclude the chapter with recommendations, including a proposed roadmap for the National Open Science Programme’s FAIR data programme line.

It has proved difficult in our project to acquire a detailed and exhaustive overview, on the one hand because the subject of ‘data landscape’ or ‘data services’ cannot be clearly delineated and on the other because many matters are in fact organised, but not always in an obvious way. We have attempted to convey this in Chapters 2 to 6. Our conclusion, in brief, is as follows: The Netherlands has an abundant but fragmented data landscape, not only in terms of data services and organisations but also with regard to the development and dissemination of knowledge concerning research data management. The risk is that there will be overlaps and inefficiency and that we will miss out on opportunities for connectivity and innovation. Researchers themselves say that they lack an overview. Moreover, a wealth of initiatives is not always a guarantee of rapid progress towards achieving goals.

It is important to point out that the study concerns the Dutch data landscape, whereas science is essentially an international affair. Dutch researchers often take part in international projects and infrastructures, making use of facilities and tools outside the country’s borders. At the same time, research clearly depends on national funding and rules, with the EU, the OECD, UNESCO and other international bodies relying heavily on countries in this respect. That requires national coordination, which will promote more effective use of available international resources and also deliver firmer guarantees for Dutch input at international level.

Findings We have broken down our findings concerning the Dutch data landscape using the SWOT analysis methodology (strengths, weaknesses, opportunities and threats).

Strengths Weaknesses

• The Netherlands has a solid position • The landscape of Dutch data services in Europe with regard to open science has overlaps and is fragmented (and and FAIR data, with the Dutch therefore may be inefficient). There is a Presidency of the EU in 2016 providing lack of coordination, overview, a major impetus and with the standardisation and alignment. Netherlands making a major • While there are policies, agreements, contribution to defining the FAIR protocols and standards, these are not principles. generally accepted and/or • In 2017, the Dutch government declared implemented. that ‘open science and open access will • Most of the data from recent research is become the norm in academic held on the relevant knowledge research’. institution’s network drives and is not • The Ministry of Education, Culture FAIR. and Science is a strong and active • Researchers still do not always know proponent of open science. where they stand, what the rules are (ethical and legal aspects) and what is

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 56

• The national research funders (NWO, available to them. The absence of a ZonMw) have made data management uniform terminology does not help. one of their funding criteria. • Researchers are unsure how to deal • Scrupulous data management in both with sensitive business and personal the short and long term has been data during and after their research. incorporated into a widely accepted They lack proper facilities for this. code of conduct for research integrity. • The degree to which disciplines and • Virtually all knowledge institutions have domains are organised varies widely; an up-to-date data policy and offer to some extent, this depends on their researchers a reasonably detailed available international alliances (with set of data services, often with national nodes) and temporary or long- dedicated research support staff (data term funding. stewards). • Research funding is often project- • The Netherlands has a robust culture based, whereas scrupulous data of collaboration, as evidenced by the management requires a long-term large number of partnerships (including investment. those concerning the large-scale • The research sector still lacks scientific infrastructures) and SURF, the satisfactory recognition and reward national IT organisation for education mechanisms for scrupulous data and research. management, data sharing and the • The Netherlands has a large number of associated roles. data repositories, both generic and • Data management takes time and discipline-specific, several of which are requires a different set of skills. There is certified. a major shortage of capacity and little • DANS has initiated the international available training both for researchers quality label CoreTrustSeal (formerly and the data stewards who support known as Data Seal of Approval) for them. • • repositories. Development and use of data services

are poorly embedded in mainstream funding instruments and research programmes. • To the extent that there is financial leeway to procure data services, state aid rules create obstacles. That is because in most cases, data services are set up as private parties and therefore cannot be procured using public funds (or only to a limited extent).

Opportunities Threats

• The federated infrastructure of data • Policymakers, researchers and competence centres (DCCs) emerging research support staff are working from the incentive scheme for separately on open science, data digitalisation in science. sharing and scrupulous data • Data sharing with businesses and management. public organisations. Examples include • There has been a steady increase in the such initiatives as AMDEX, but also the number of commercial services various data science institutes. Possible proposing to meet the needs of

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 57

membership of the DataDeelCoalitie researchers. This trend nevertheless (including for public investment). poses risks with respect to data • National cooperation on improving ownership, dependence, the vested Recognition & Reward mechanisms. interests of commercial parties, etc. The relevant position paper states as • Strict laws and regulatory provisions follows: ‘Open science and the (e.g. on copyright and personal data) modernisation of recognition and reward that discourage the sharing and reuse mechanisms are inextricably linked.’ of research data. • The new Strategy Evaluation Protocol • Threats to the security of IT (2021-2026) emphasises the infrastructures, criminal interest in contribution that research units make to datasets. • open science. • The development of the European Open Science Cloud and the European Commission’s new Data Strategy. • The growing level of cooperation between the international platforms/networks for research data (CODATA, GO FAIR, RDA and WDS => Data Together). • A common national roadmap to overcome perceived obstacles, promote national synergy and optimise international liaisons. •

The above findings, like the recommendations below, were ‘collected’ during the study. It is obviously important to capitalise on our strengths and achievements (all the progress we have already made in the Netherlands) to overcome weaknesses and tackle threats.

The relationship between our findings and our recommendations can be summarised as follows: we are good at starting up initiatives, establishing partnerships and making agreements in the Netherlands, but we have work to do on safeguarding and implementing these actions (for example securing the resources and establishing the necessary conditions). One major concern is that having a multitude of initiatives, partnerships and agreements does not necessarily improve efficiency and transparency and that, as a result, we may lose momentum and fail to capitalise on opportunities (such as alliances with companies and internationally, and developments regarding recognition and reward) and not see threats for what they are (such as becoming dependent on commercial parties and cyberthreats).

Recommendations We have grouped our recommendations for optimising the Dutch landscape for research data into a number of priority areas.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 58

Raising awareness • Institutions should devote more effort to raising awareness of open science, data sharing and scrupulous data management (including data stewardship), making use of the informal open science communities or data champions that are already active at various institutions. • Coordinated support can be provided from the National Open Science Programme (in NPOS Project H, Accelerate open science). • KNAW can help to raise awareness by issuing its advisory report on the storage and availability of research data (end of 2020).

Capacity • Institutions should fast-track the appointment and training of additional data stewards to give their researchers the support they need. • The Ministry of Education, Culture and Science can support this by providing one-off funding (on top of the basic funding now being used to set up DCCs), so that we move more quickly towards meeting our open science objectives and so that institutions can accommodate the necessary staffing changes.

Expertise • Existing knowledge and expertise (whether or not acquired in an international context) can be pooled and made more accessible, building on the experience gained in LCRDM and the results of NPOS Project F (Education and Training in Open Science and Data Stewardship). • Similarly to the University Teaching Qualification (BKO), institutions should organise education and training in scrupulous data management for both researchers and data stewards (based on the good research practices defined in the Netherlands Code of Conduct for Research Integrity), with special attention being given to ethical review boards. • An accredited (basic) training programme and recognised and accepted HR profiles (job classification systems such as UFO, FUWAVAZ, HBO, etc.) should be established for data stewards. • Coordinated support can be provided from the National Open Science Programme in terms of generic competence profiles and educational materials.

Services and facilities • At national level, a sector architecture or service catalogue can clarify which research data and software facilities are available and certified, how they are related and any gaps that may exist. • The current owners of data repositories must ensure that they are adequately certified (CoreTrustSeal or similar). • SURF and the eScience Center can devise alternatives for long-term software archiving for relevant datasets.

Recognition • Building on the existing national recognition and reward initiative (by VSNU, NFU, NWO, ZonMw and KNAW) can help to accelerate the creation of recognition and reward mechanisms for activities that support open science, data sharing and scrupulous data management, both in selecting and appraising staff and in funding and assessing research.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 59

• The National Open Science Programme can contribute through its project exploring open science indicators (metrics).

The Open Science Career Assessment Matrix (OS-CAM)15 suggests the following indicators for recognising and rewarding researchers for their open science activities: - Datasets and research results o Using the FAIR data principles o Adopting quality standards in open data management and open data sets o Making use of open data from other researchers - Research integrity o Being aware of the ethical and legal issues relating to data sharing o Fully recognizing the contribution of others in research projects, including…open data providers - Teaching o Developing curricula and programs in open science methods, including open science data management

Funding • Data service providers could reassess their costing and revenue models. • Knowledge institutions could stipulate FAIR data requirements for research supported by direct funding (the first funding stream or basic institutional funding). • Knowledge institutions should consider how much structural funding they will require to meet their longer-term needs in terms of both infrastructure and human resources • To promote thematic DCCs, consideration could be given to funding projects that contribute to a nationally coordinated alliance in a discipline or domain of science, for example to improve a repository (so that it complies with FAIR principles), to establish a system of protocols and standards that harmonise with international protocols or standards, or to provide discipline-specific training for data stewards and organise community support. • NWO and ZonMw should work with VSNU, NFU and VH and consult other research funders on alternatives for funding scrupulous data management during and after a research project and how they would relate to project funding. Investments in the Dutch data infrastructure could feature in the scheduled update of the national roadmap for Large-Scale Scientific Infrastructures in 2021. • The Ministry of Education, Culture and Science, the Ministry of Economic Affairs and Climate Policy, the Ministry of Health, Welfare and Sport and the Ministry of Agriculture, Nature and Food Quality, which are responsible for open science, must rule on and/or offer a solution to the obstacle currently posed by state aid rules relating to the procurement of data services that exist as a legal entity, such as a foundation or a private party. Under current state aid rules, it is not possible to procure services using public funds (for example funding awarded by research funders), or only to a very limited extent. We have chosen in this report to focus on data obtained from publicly funded research, while acknowledging that some of the problems stem from hybrid forms of research funding, e.g. in public-private partnerships. This is certainly a point that should be addressed later.

15 EC (2017). Evaluation of Research Careers fully acknowledging Open Science Practices.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 60

Governance • Working with a compact, temporary data coordination and expertise centre and a programme council (consisting of researchers and research support staff), the NPOS Steering Group can serve as a coordinating body in the following areas: o National policy agenda/roadmap for FAIR data o Better awareness of and synergy between data service providers and repositories (as a ‘National Open Science Cloud’) o Access to and acceptance of available knowledge and expertise regarding scrupulous data management and data sharing o Encouraging and facilitating consistency/collaboration between disciplines and domains o Alignment with and use of national, European and international trends and developments • As much use as possible should be made of the new federated structure of DCCs in this respect.

Although it is beyond the scope of this report, the study does show that, in the interests of science and to provide more clarity for researchers, the Dutch government should interpret the GDPR less strictly than it has done.

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 61

A roadmap for the FAIR Data programme line

Proposed actions 1. Set up a National Data Coordination and Expertise Centre (NDCE) for at least the 2020-2025 period under the auspices of the NPOS Steering Group. The Centre will coordinate and support implementation of the recommendations, or itself undertake implementation, and establish the necessary alliances at national and international level, including participation in the European Open Science Cloud (EOSC). 2. Establish an advisory Programme Board for the FAIR data programme line, consisting of representatives of the research funding bodies, the knowledge institutions (universities, UMCs, KNAW, NWO-I and universities of applied sciences) and large- scale research infrastructures, both researchers and support staff. Consider the specific characteristics and related needs and coordination structures in individual fields of science, but monitor overall coherence. 3. Encourage knowledge institutions (if possible by providing additional funding) to make demonstrable progress towards making FAIR research data available and free up the necessary capacity and infrastructure to do so. Use simple and accepted indicators to measure progress. 4. Establish a national network of all data service providers and repositories (a ‘National Open Science Cloud’ or ‘Commons’ or ‘Network’) aimed at closer coordination, a better overview and greater synergy, for example illustrated by a national catalogue of services, with the NDCE acting as coordinator. After a certain period, consideration can be given to appointing a coordinator. 5. Set up a national expertise centre (building on the achievements of the National Coordination Point Research Data Management (LCRDM) and NPOS Project F) for knowledge and expertise concerning all FAIR data-related aspects (including such aspects as data stewardship, ownership and ethical, legal and social implications

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 62

(ELSI)) which will also offer general training courses, with the NDCE acting as coordinator. Here too, consider the specific characteristics of different fields of science and existing coordination structures, such as Health-RI in biomedical research. After a certain period, consideration can be given to appointing a coordinator. 6. Establish an international node for transparent liaison with European and international data organisations (EOSC, CODATA, GO FAIR, RDA, WDS), with the NDCE acting as coordinator. 7. Have the NDCE emphasise the importance of a national initiative concerning recognition and reward mechanisms (position paper: ‘Ruimte voor ieders talent’, November 2019) and contribute to this initiative from the perspective of open science. 8. Ask the NWO, ZonMw and organisations active in the field (VSNU, NFU, and VH) to plan possible improvements in data management funding (including data stewardship) for research. 9. Review the programme and the chosen governance structure at least once every two years by holding an international consultation procedure. 10. Within three years, develop a plan for organising the national data landscape over the longer term.

Remarks on roadmap • The present study only addresses research data. In time, data in higher education (including educational materials) will also need to be taken into account. • Project F (Education and Training in Open Science and Data Stewardship) has yet to be completed. The results of this project will be incorporated into the final version of the roadmap at a later stage. • The NDCE must have certain powers if it is to coordinate the data landscape effectively. To play an effective role in optimising the data landscape, it must have some measure of control over the activities undertaken by parties active in that landscape. Those parties and their stakeholders should therefore give the NDCE a mandate to act. This study makes no recommendation as to the scope of that mandate; to some extent, that depends on whether the parties active in the landscape are willing to relinquish a degree of autonomy in favour of a more effective larger whole. • In optimising the data landscape, the relationship with the other NPOS programme lines, specifically Open Access and Citizen Science, should be clear. • The transition to open science and the associated approach to FAIR data is not happening in isolation, nor is it occurring solely in the science sector. The relationship to education, the public sector and business is important because their data is crucial to research and vice versa. • What must also be noted in this connection is the business community’s attempts to arrive at a general set of data-sharing agreements (the data coalition) and its efforts regarding artificial intelligence, in which data plays a key role.

NDCE activities The NDCE referred to in item 1 above could potentially focus on the following activities: • Harmonising, safeguarding and promoting (on a website) existing agreements, policy, good practices, protocols and standards • Coordinating international liaison: Making better use of international expertise and initiatives and harmonising national interfaces with these, including in specific scientific fields

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 63

• Acting as a secondary source of expertise for the local Data Competence Centres (DCCs) at all knowledge institutions • Resolving remaining data issues (policy, ethical, legal, technical) • Launching and embedding a national curriculum for data stewards, coordinating training • Creating synergy in the national data services for scientific research, working towards a national catalogue of services • Encouraging recognition and reward mechanisms for data-sharing activities in scientific research, including working towards a recognised system of indicators (metrics)16 • Advocating the removal of constraints in privacy legislation in the interests of open science and FAIR data in particular.

NDCE Governance The NDCE must act as a catalysing and unifying force, a role that is difficult to reconcile with the set of tasks that certain organisations in the field routinely carry out. We therefore recommend positioning the NDCE as a neutral/independent entity, and not assigning its tasks and duties to one of the parties that perform an operational role in the national data landscape. The NDCE is accountable to the NPOS Steering Group and consults the FAIR Data Programme Board on important issues.

16 See also the RDA Recommendation (2020). FAIR Data Maturity Model. Specification and Guidelines 2020 (https://www.rd- alliance.org/groups/fair-data-maturity-model-wg)

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 64

Appendices

Appendix 1. Sources consulted

Appendix 2. Overview of relevant legislation, regulatory provisions and guidelines

Appendix 3. Minutes of panel meetings

Appendix 4. Reports on working visits

Appendix 5. Other sources

Appendix 6. Consultation

NPOS (2020) Final report Exploring and optimising the Dutch data landscape 65