How to Discover Requirements for Research Data Management Services Angus Whyte (DCC) and Suzie Allard (Dataone)

A Digital Curation Centre ‘working level’ guide How to Discover Requirements for Research Data Management Services Angus Whyte (DCC) and Suzie Allard (DataONE) With contributions from: D. Scott Brandt (Purdue University) Susan Wells Parham (Georgia Institute of Technology) Sherry Lake (University of Virginia) Please cite as: Whyte, A and Allard, S. (Eds) 2014. ‘How to Discover Research Data Management Service Requirements’. DCC How-to Guides. Edinburgh: Digital Curation Centre. Available online: http://www.dcc.ac.uk/how-guide Digital Curation Centre, March 2014 This work is licensed under Creative Commons Attribution BY 2.5 Scotland 1 How to Discover Requirements for Research Data Management Services In section 3 we look at the contexts for RDM 1. Introduction requirements. We take as the starting point a high-level model of the services that are typically This guide is meant for people whose role involves needed. Then we consider why the research context developing services or tools to support research data is different from other areas of business process management (RDM) and digital curation, whether in change, and what challenges that presents for a Higher Education Institution or a project working identifying requirements. To help work around the across institutions. Your RDM development role might diversity and complexity of research, we suggest be embedded with the research groups concerned, three key areas likely to shape requirements; or at a more centralised level, such as a library or computing service. You will need a methodical • Ideas, artefacts and facilities: you will need to approach to plan, elicit, analyse, document and identify what kinds of things researchers consider prioritise a range of users’ requirements. The term ‘data’, the implied relationships to other records or ‘requirements discovery’ covers these activities, and information, and what form these take. Consider this guide relates them to the process of developing how far researchers depend on others to get their RDM services. A DCC Guide How to Develop work done, and whether they use centralised Research Data Management Services describes these facilities or standardised protocols. Any drivers services in more detail. Further support is available for researchers to work at a broader scale in their from the DCC1 and DataONE2 websites, and from the research domain will likely add to the needs for resources listed at the end of this guide. RDM, while those aspects of their process that have the most technical uncertainty will also be the In section 2 of this guide we summarise data more challenging for RDM. management roles and responsibilities, and why these make a methodological approach to requirements • Research stakeholders: producers, users and worthwhile. Effective research data management policymakers; developing any RDM service to involves many actors, supporting technologies and work at institutional or cross-institutional scale organisation, including coordination of human and demands a stakeholder analysis, to reduce the financial resources. Collectively this research data risk that some group or service provider’s needs management infrastructure will call on different are not properly assessed. This should involve support services at different stages of the research people whose roles typically span organisational data lifecycle. Support services will need to integrate boundaries, such as academic liaison librarians or their information systems to some degree. Projects research computing staff. that create research data will also need systems to help manage that data within the project • Institutional rules and research norms: guidelines lifetime. And, once the data creators’ needs have that articulate the benefits and risks of change been fulfilled, other actors and systems may be will need to be established if stakeholders are to involved in data archiving and preservation to fulfil the buy in to it. This especially applies to decisions needs of subsequent users and re-users. on keeping and sharing data. Key areas of benefit will be around providing technical support during Non-academic stakeholders will have requirements, projects, helping researchers get acknowledged for example any external partners, or research for sharing, help to decide what to keep, support participants who use or help to produce the for storage and for using data catalogues. data. Service offerings from other institutions and commercial providers will also have an important The RDM development cycle can be viewed as a role. For example there will be a need to match recurring sequence of six phases: envision, initiate, individual researchers’ needs for long-term storage discover, design, implement and evaluate (3 ). Figure with the repositories or archives that best serve 1 illustrates the first five core activities in context. The their discipline, and ensure these also meet the sixth phase, ‘evaluation’, is shown separately as it expectations of the funding body or institution. often involves different actors and may apply to the whole development process as well as its outcomes. 1JDCC website. Available at: www.dcc.ac.uk 2DataONE website. Available at: http://www.dataone.org/ 3These phases are informed by service design principles and stages in business process re-engineering (BPR) e.g. Kettinger, W. J., Teng, J. T. C., & Guha, S. (1997). Business Process Change: A Study of Methodologies, Techniques, and Tools. MIS Quarterly, 21(1), 55–80. doi:10.2307/249742 2 In section 4 of this guide we outline key phases of services to be implemented. It takes a step back from RDM development and the activities these involve. In the techniques typically used to gather requirements section 5 we describe tools to support the phases of information, such as surveys, interviews and group initiating change and discovering requirements. The discussion. We consider how these techniques fit guide focuses on these activities, starting from the into service development, and what aspects of the need for a broad strategy through to description of research context need to be considered. Outputs -‐ Services Context Inputs Outcomes Principles Technical e.g. Technology Activities Actors & material o Repository aspects Accessible o Tools etc Envision Researchers Short-‐term/ Ideas o Planned Data users Long-‐term Artefacts o Initiate Data mgrs Facilities Described o Res. support Organisation Computing Legal/Ethical Discover Compliance Librarians Stakeholders o Rewarded IPR Mgmt Archivists Direct/ Producers o o etc. Design Compliance Indirect Users Attributed o -‐officers Policymakers o Cost-‐effective Implement Senior -‐managers Rules & Norms Intended Resources Funders Internal/ Outcomes Funding o Constitutional o Others External Policy Staff etc o o o Operational Interactions Impacts Evaluate Figure 1 The process of developing research data management services In section 5 we also identify the main elements of DCC and DataONE some common RDM requirements approaches. These include: The Digital Curation Centre is funded by the UK organization Jisc to build the capability for good data • Performing in-depth case studies curation practice across the UK higher education • Surveying data practices and benchmarking service sector. The DCC provides coordinated training and capabilities, using the Data Asset Framework support to institutions, and enables the transfer of (DAF), Collaborative Assessment of Research Data knowledge and best practice. This guide draws on Infrastructure and Objectives (CARDIO), or DMVitals. the DCC programme of engagement with universities in the UK, and from working alongside related projects • Documenting data lifecycles with Data Curation developing RDM services (4). Profiles • Stakeholder profiles, personas and scenarios DataONE is a cyber-infrastructure funded by the US National Science Foundation (NSF). It aims to facilitate • Development workshop events, e.g. hackdays and biological and environmental data discovery and mashups access across distributed repositories, and to provide scientists with an integrated set of tools that support all Finally in Section 6 we consider the next step of phases of the data lifecycle including data management managing requirements once they have been scoped, and curation (www.DataONE.org). and some future challenges for scoping RDM requirements. 4Jisc (2012) Managing Research Data Programme 2011-13 Retrieved from: http://www.jisc.ac.uk/whatwedo/programmes/di_researchmanagement/managingresearchdata.aspx 2 Funding bodies define responsibilities for data 2. Data Management management at a broad level. In the UK for example the Research Councils UK (RCUK) has set out Roles and Common Principles on Data Policy 5). Individual Research Councils adopt their data policies, which set Responsibilities out responsibilities that are shared between individual Policy makers in funding bodies and institutions researchers and their institutions. are placing more importance on research data Most universities and other research institutions will management and digital curation, putting new have already developed a repository for publications. responsibilities on institutions and researchers. To help make research data accessible many now Scoping their needs may be complex, and have a need to provide a ‘core’ repository for data stakeholders may be unfamiliar with data management outputs that are considered valuable assets, alongside and digital curation terminology. As their needs and services to help researchers ensure more digital

How to Discover Requirements for Research Data Management Services Angus Whyte (DCC) and Suzie Allard (Dataone)

Module 8 Wiki Guide

View Presentation

Front Page Title of Paper in Full: Centered

Undergraduate Biocuration: Developing Tomorrow’S Researchers While Mining Today’S Data

A Brief Introduction to the Data Curation Profiles

Applying the ETL Process to Blockchain Data. Prospect and Findings

SCONUL Focus Number 38 Summer/Autumn 2006

Understanding the Lifecycle of Dataset Preservation Isaiah Beard

Biocuration - Mapping Resources and Needs [Version 2; Peer Review: 2 Approved]

Automating Data Curation Processes Reagan W

Transitioning from Data Storage to Data Curation: the Challenges Facing an Archaeological Institution Maria Sagrario R

Guidance Documents for Lifecycle Management of Etds