IFLA Volume 42 Number 4 December 2016 IFLA
Contents
Special Issue: Research Data Services Guest Editors: Michael Witt and Wolfram Horstmann
Guest Editorial International approaches to research data services in libraries 251 Michael Witt and Wolfram Horstmann
Articles Modifying researchers’ data management practices: A behavioural framework for library practitioners 253 Susan Hickson, Kylie Ann Poulton, Maria Connor, Joanna Richardson and Malcolm Wolski Research data services: An exploration of requirements at two Swedish universities 266 Monica Lassi, Maria Johnsson and Koraljka Golub ‘Essentials 4 Data Support’: Five years’ experience with data management training 278 Ellen Verbakel and Marjan Grootveld Research Data Services at ETH-Bibliothek 284 Ana Sesartic and Matthias To¨we Beyond the matrix: Repository services for qualitative data 292 Sebastian Karcher, Dessislava Kirilova and Nicholas Weber Data governance, data literacy and the management of data quality 303 Tibor Koltay Data information literacy instruction in Business and Public Health: Comparative case studies 313 Katharine V. Macy and Heather L. Coates
Abstracts 328
Aims and Scope IFLA Journal is an international journal publishing peer reviewed articles on library and information services and the social, political and economic issues that impact access to information through libraries. The Journal publishes research, case studies and essays that reflect the broad spectrum of the profession internationally. To submit an article to IFLA Journal please visit: http://ifl.sagepub.com IFLA Journal Official Journal of the International Federation of Library Associations and Institutions ISSN 0340-0352 [print] 1745-2651 [online] Published 4 times a year in March, June, October and December
Editor Steve Witt, University of Illinois at Urbana-Champaign, 321 Main Library, MC – 522 1408 W. Gregory Drive, Urbana, IL, USA. Email: [email protected] Editorial Committee Barbara Combes, School of Information Studies, Charles Sturt University, Wagga Wagga, NSW Australia. Email: [email protected] Ben Gu, National Library of China, Beijing, People’s Republic of China. Email: [email protected] Dinesh Gupta, Vardhaman Mahaveer Open University, Kota, India. Email: [email protected]/[email protected] Mahmood Khosrowjerdi, Allameh Tabataba’i University, Tehran, Iran. Email: [email protected]/[email protected] Jerry W. Mansfield (Chair) Congressional Research Service, Library of Congress, Washington, DC. Email: [email protected] Ellen Ndeshi Namhila (Governing Board Liaison) University of Namibia, Windhoek, Namibia. Email: [email protected] Seamus Ross, Faculty of Information, University of Toronto, Toronto, Canada. Email: [email protected] Shali Zhang, University of Montana, Missoula, Montana, United States. Email: [email protected] Publisher SAGE, Los Angeles, London, New Delhi, Singapore, Washington DC and Melbourne. Copyright © 2016 International Federation of Library Associations and Institutions. UK: Apart from fair dealing for the purposes of research or private study, or criticism or review, and only as permitted under the Copyright, Designs and Patents Acts 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the Publishers, or in the case of reprographic reproduction, in accordance with the terms of licences issued by the Copyright Licensing Agency (www.cla.co.uk/). US: Authorization to photocopy journal material may be obtained directly from SAGE Publications or through a licence from the Copyright Clearance Center, Inc. (www.copyright.com/). Inquiries concerning reproduction outside those terms should be sent to SAGE. Annual subscription (4 issues, 2017) Free to IFLA members. Non-members: full rate (includes electronic version) £304/$561. Prices include postage. Full rate subscriptions include the right for members of the subscribing institution to access the electronic content of the journal at no extra charge from SAGE. The content can be accessed online through a number of electronic journal intermediaries, who may charge for access. Free e-mail alerts of contents listings are also available. For full details visit the SAGE website: sagepublishing.com Student discounts, single issue rates and advertising details are available from SAGE, 1 Oliver’s Yard, 55 City Road, London EC1Y 1SP, UK. Tel: +44 (0) 20 7324 8500; e-mail: [email protected]; website: sagepublishing.com. In North America from SAGE Publications, PO Box 5096, Thousand Oaks, CA 91359, USA. Please visit ifl.sagepub.com and click on More about this journal, then Abstracting/indexing, to view a full list of databases in which this journal is indexed. Printed by Henry Ling Ltd, Dorset, Dorchester, UK. IFLA Guest Editorial
International Federation of Library Associations and Institutions 2016, Vol. 42(4) 251–252 International approaches to research ª The Author(s) 2016 Reprints and permission: data services in libraries sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/0340035216678726 ifl.sagepub.com
Michael Witt Purdue University, West Lafayette, Indiana, USA
Wolfram Horstmann Go¨ttingen State and University Library, University of Go¨ttingen, Germany
Libraries and archives around the world are acquiring Alliance for the 81st IFLA World Library and Infor- new skills and applying the principles of library and mation Congress in Cape Town in 2015. The number archival sciences to solve challenges and provide new and quality of manuscripts submitted in response to services related to research data management. Librar- the call for papers enjoined the expansion from one ians are helping researchers address needs throughout special issue to two and the allowance of some addi- the research data lifecycle, for example, by conduct- tional time for reviewing and editing. ing assessments and outreach, consulting on data Major themes that emerged from the submissions management plans and metadata, incorporating data include the assessment of researcher needs and prac- into information literacy instruction and collection tices, training for librarians, examples of different management, providing reference services to help data services and approaches to designing and offer- patrons find and cite data, and providing data publi- ing them, and data information literacy. The papers cation and preservation solutions. They are creating are presented in this order as a sensible progression web guides and tutorials, training colleagues within that many libraries are undertaking themselves: to and outside of the library, contributing to discussions identify patron needs related to research data, then learn related to research data including policy development the skills required to help meet their needs, then design and planning, and in some cases, participating directly and offer services, and lastly assist patrons in using with researchers on data-intensive projects. the services and related resources. The degree to which libraries are offering or plan- In this issue, researchers were interviewed at Grif- ning to offer data services has been explored in detail fith University in Australia with the application of the for member organizations of the Association of Col- A-COM-B conceptual framework to better under- lege and Research Libraries in North America by stand researcher behaviors regarding research data Tenopir et al. (2012, 2015) and more recently for and related practices. Their results suggest that atti- LIBER (Ligue des Bibliothe`ques Europe´ennes de tude is the key element to be addressed in designing Recherche) in Europe (Tenopir et al., 2016); however, strategies to support researchers in data management. these studies only begin to paint a part of the picture in Librarians at two Swedish universities extended the terms of the variety of services that libraries around Data Curation Profile instrument and used it in inter- the world are designing and deploying. And how are viewing researchers from a variety of different disci- they creating and providing these services? plines to explore their needs for effective data This is the central motivation for the next two spe- management, in particular for subject control and cial issues of IFLA Journal: to gather the latest theory, other requirements for descriptive metadata. In the research, and state-of-the-art practices from libraries Netherlands, the ‘Essentials 4 Data Support’ taught that are informing and innovating effective data ser- vices from an international perspective. The idea for this theme was inspired by the high level of atten- Corresponding author: dance and interest in a conference program on the Michael Witt, Purdue University, West Lafayette, Indiana, topic of research data and libraries that was created IN 47907, USA. in collaboration with IFLA and the Research Data Email: [email protected] 252 IFLA Journal 42(4) data support skills to over 170 Dutch librarians and broader range of countries and libraries. It is our hope information technology professionals. Learning out- that these two issues will add to our shared under- comes are achieved by either a six-week blended standing of the latest developments and practices of course of in-person and coached online course, or libraries that are designing and providing research an online course that is accompanied by coaching, data services. or a self-directed online-only course. Two papers present examples of data services: ETH-Bibliothek Declaration of Conflicting Interests in Switzerland takes a data lifecycle approach that The author(s) declared no potential conflicts of interest incorporates researchers working in private, shared, with respect to the research, authorship, and/or publication and public domains with corresponding levels of of this article. access control and other functionalities; the Qualita- tive Data Repository hosted by Syracuse University in Funding the United States is tailored to meet the needs of qualitative and multi-method social inquiry research The author(s) received no financial support for the research, authorship, and/or publication of this article. methods with emphases on protecting data that involve human subjects and providing the capability to relate data to published text through scholarly References annotation. A literature review and discussion con- Tenopir C, Birch B, and Allard S (2012) Academic libraries ducted in Hungary explores and brings attention to and research data services: Current practices and plans the relationship between data governance and data for the future. Association of College and Research literacy. Finally, a group of American librarians Libraries. Tenopir C, et al. (2015) Research Data Services in Aca- examine why data information literacy should be inte- demic Libraries: Data Intensive Roles for the Future? grated in the areas of business and public health and Journal of eScience Librarianship, 4(2). discuss how it can be accomplished. Tenopir C, et al. (2016) Research Data Services in A second special issue of IFLA Journal on research European Academic Research Libraries (Submitted). data services in libraries, forthcoming in March 2017, Pre-available at: http://libereurope.eu/blog/2016/10/13/ will continue and build upon these four themes while research-data-services-europes-academic-research- incorporating approaches and perspectives from a libraries (accessed 17 October 2016). IFLA Article
International Federation of Library Associations and Institutions 2016, Vol. 42(4) 253–265 Modifying researchers’ data ª The Author(s) 2016 Reprints and permission: management practices: A behavioural sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/0340035216673856 framework for library practitioners ifl.sagepub.com
Susan Hickson Griffith University, Gold Coast Campus, Australia
Kylie Ann Poulton Griffith University, Nathan Campus, Australia
Maria Connor Griffith University, Gold Coast Campus, Australia
Joanna Richardson Griffith University, Nathan Campus, Australia
Malcolm Wolski Griffith University, Nathan Campus, Australia
Abstract Data is the new buzzword in academic libraries, as policy increasingly mandates that data must be open and accessible, funders require formal data management plans, and institutions are implementing guidelines around best practice. Given concerns about the current data management practices of researchers, this paper reports on the initial findings from a project being undertaken at Griffith University to apply a conceptual (A-COM-B) framework to understanding researchers’ behaviour. The objective of the project is to encourage the use of institutionally endorsed solutions for research data management. Based on interviews conducted by a team of librarians in a small, social science research centre, preliminary results indicate that attitude is the key element which will need to be addressed in designing intervention strategies to modify behaviour. The paper concludes with a discussion of the next stages in the project, which involve further data collection and analysis, the implementation of targeted strategies, and a follow-up activity to assess the extent of modifications to current undesirable practices.
Keywords Attitude, behaviour, behavioural framework, capability, libraries, motivation, opportunity, research data management
Submitted: 13 May 2016; Accepted: 7 September 2016.
Introduction and Fong, 2015; Weller and Monroe- Gulick, 2014), In recent years, advances in data management, data although many more institutions do not yet provide curation, dissemination, sharing and research infra- targeted outreach programmes (Tenopir et al., 2013). structure have transformed the provision of library ser- vices to researchers. A considerable amount of literature Corresponding author: has been published on the emergence in academic Susan Hickson, Griffith University, Gold Coast Campus, libraries of research data services for their faculty and University Drive, Southport, Queensland 4222, Australia. students (Peters and Dryden, 2011; Si et al., 2015; Wang Email: [email protected] 254 IFLA Journal 42(4)
This paper reports on the progress of a broader data management which clearly outlines how the data research project that investigates whether the actions will be disseminated, shared and retained. This of library and information specialists to improve data informs the Digital Single Market, which aims to management practices can be enhanced by under- open up digital opportunities for citizens’ access to standing the attitudes and behaviour of researchers. information and promote modern open government. Wolski and Richardson (2015: 1) have proposed a behavioural framework (A-COM-B) for service deliv- Australia ery teams to ‘a) better understand the cohort with Similar to the United Kingdom and the United which they are engaging, b) identify where and when States, the Australian Research Council (ARC) to focus their attention, and c) develop more effective (2014) considers data management planning an plans to bring about changed researcher practices’. important part of the responsible conduct of The objective of the research is to use the A-COM-B research. The ARC strongly encourages the deposit- model as a lens for improving understanding of data ing of data arising from a research project in an management practices in order to develop, plan and appropriate, publicly accessible subject and/or insti- provide interventions to change researchers’ data tutional repository. Additionally, the National management practices and to investigate the results Health and Medical Research Council‘s (2014) Pol- of such interventions. This paper reports on progress icy on the Dissemination of Research Findings states to date. that the metadata from all NHMRC projects should also be made available in an open source journal Environmental research context within 12 months of publication. International The Australian Code for the Responsible Conduct of Research (National Health and Medical Research The international research environment promotes the Council, 2007) promotes integrity in the proper man- sharing of information awarded by publicly funded agement of research data, publishing, dissemination research and encourages the reuse of data sets for the and attribution of authorship by use of a best practice betterment of the community by ensuring that practice framework. Researchers and institutions each have a reflects policy and guidelines. For example, in the responsibility to comply with legislation, guidelines, Research Councils UK’s (RCUK) (2014a) Research policy and codes of practice as part of responsible Outcomes Overview, it is stipulated that researchers research. must ‘demonstrate the value and impact of research supported through public funding’ and ‘Information on research outcomes attributed to all RCUK funded Griffith University awards are now collected online’. In addition, the Griffith University is a comprehensive, research- RCUK policy (Research Council UK, 2015) mandates intensive university, ranking 37th in the 2015/16 that research data be open and discoverable and that QS University Rankings Top 50 Under 50 (Quac- policies and procedures should be in place to ensure quarelli Symonds, 2015). Located in the rapidly compliance with these practices. RCUK actively growing corridor between Brisbane and the Gold works with researchfish (Research Councils UK, Coast in Southeast Queensland, the University offers 2014b), which is a portal where researchers can log more than 200 degrees across five campuses to more the outputs, outcomes and impacts of their work. than 43,000 students from 130 countries studying at Likewise, the United States’ National Science undergraduate through to doctoral level in one of Foundation (NSF) is ‘dedicated to the support of fun- four broad academic groups: arts, education and law; damental research and education in all scientific and business; science; and health. Griffith’s strategic engineering disciplines’(National Science Founda- research investment strategy has positioned it to be tion, 2016a). The NSF’s webpage on the dissemina- a world leader in the fields of Asian politics, trade tion and sharing of data states that all applicants must and development; climate change adaptation; crim- submit a data management plan and describe how it inology; drug discovery and infectious disease; meets the policy on the dissemination and sharing health; sustainable tourism; water science; music of research results (National Science Foundation, and the creative arts. 2016b). The Division of Information Services (INS), with Similarly the European Commission (EC) has which the authors are affiliated, has a long and proud instituted an Open Research Data Pilot as part of tradition of providing quality service to Griffith Horizon 2020 (European Commission, 2016). All students and staff. It also has an international reputa- research proposals must include a section on research tion for being innovative and cutting-edge in the Hickson et al.: Modifying researchers’ data management practices 255 deployment of emerging technologies (Brown et al., Literature review 2015; Searle et al., 2015; Stanford Libraries, 2013). In the context of this paper, data management follows In 2014 Griffith University introduced its Best the definition by O’Reilly et al. (2012: 2): ‘all aspects Practice Guidelines for Researchers: Managing of creating, housing, delivering, maintaining, and Research Data and Primary Materials, in response retiring data’. Data management is a part of the every- to the Australian Code for the Responsible Conduct day research process and work habits rather than a of Research (National Health and Medical Research specific task undertaken separately within the Council, 2007). The subsequent Griffith University research lifecycle. ‘Data management is not necessa- Code for the Responsible Conduct of Research was rily a formalised process, but rather actions taken in endorsed by the University’s Academic Committee response to a researcher’s current information needs in July 2012 and subsequently updated in Novem- and work goals’ (Fear, 2011: 71). ber 2015. According to the principles of the Code, The literature exploring data management prac- the University is required to provide research tices of academics has significantly increased in the infrastructure, training and professional develop- last decade, with many universities surveying the data ment opportunities as well as elicit an understand- management practices of their academics and ing from researchers that they are to properly researchers (Fear, 2011; Henty et al., 2008; Jahnke manage their data, while adhering to all applicable et al., 2012; Kennan and Markauskaite, 2015; Peters policies, legislation and standards with honesty and Dryden, 2011; Schumacher and VandeCreek, and integrity. 2015; Weller and Monroe-Gulick, 2014). As the Griffith University has invested significantly in the amount of research data being generated grows expo- provision of solutions and services for capturing, stor- nentially, universities are taking notice of how their ing, analysing, managing, sharing and publishing researchers manage research data. Historically the data. Examples of institutional storage solutions creator was solely responsible for their own data; include (a) Research Storage Service (https:// however, as new policies, both at a national and pub- research-storage.griffith.edu.au/), a self-serve private lisher level, now require research data to be made cloud solution based on the open source application, publicly available, institutions are required to provide ownCloud, and (b) a research data repository. Since adequate infrastructure which allows researchers to implementing the ownCloud solution in 2015, as of safely access, store and archive their data. May 2016 there were 580 Griffith users. Although a That said, institutional data storage services are pleasing start, this constitutes only approximately infrequently used as they do not adequately meet 15% of potential users at the institution. Given the researcher needs. O’Reilly et al. (2012: 13) report status of Griffith as a research intensive university, that: ‘Data storage is often inadequate, with research- a minimum of 30% take-up by May 2016 seemed a ers resorting to suboptimal data storage methods, reasonable expectation. However, it became apparent resulting in data that are often unreliable and short that there is a lack of incentives for researchers to use lived’. Westra (2014) has reported on the need to enterprise services. The research project described in address this issue as part of the University of Oregon’s this paper is but one response to bring these services decision to develop an integrated research data man- to the attention of the academic community. agement strategy. Jahnke (2012) urges institutions to Griffith University is currently developing an insti- make networked storage more readily available to tutional approach to research data support by utilising ‘multi-institutional research projects’ (p. 17), inte- the Australian National Data Service (ANDS) Data grate‘thedatapreservationsystemwiththeactive Management Framework (Australian National Data research cycle’ (p. 19), and enhance storage systems Service, 2016b) and the Capability Maturity Model with ‘intuitive live linking visualization tools’ which Integration (CMMI) (Australian National Data Ser- could entice researchers to use the systems and assist vice, 2016a). Key stakeholders of the University, with ‘curatorial decision making’. i.e. Griffith Graduate Research School, Office for Researchers are ‘largely unaware of the basic Research, and Information Services, are seeking a principles of ...data management’ (Schumacher and common understanding for the strategic direction of VandeCreek, 2015: 107) and receive little to no data management, education, training, services and training other than what they learn through doing solutions in order to develop a data support model research (Fear, 2011: 64; Peters and Dryden, 2011: and an institutional data service. 395). As a result, researchers are largely left to create The research project described in this study, along their own unique, ad hoc approach to organising with others currently being undertaken, will inform their research data. Furthermore, data management the process of developing a suitable framework. 256 IFLA Journal 42(4) reflects individuals’ organisational style, which is not always easy for others to interpret (Fear, 2011: 64). This can cause problems for those trying to re-use data later on. Not surprisingly, researchers consistently and pri- marily manage their research data on their PCs, a USB device and/or CDROM, closely followed by cloud-based services such as Google Drive and Dropbox ( Schumacher and VandeCreek, 2015; Weller and Monroe-Gulick, 2014; Wolff et al., 2016). All of these solutions pose potential privacy and security challenges, particularly the freely avail- Figure 1. A-COM-B framework for understanding able alternatives to institutional data storage, e.g. behaviour. Dropbox, Figshare, and SurveyMonkey. Jahnke (2012: 12) elaborates: Methodology The present study has used the descriptive research ...the terms of service for these products are often method, based on surveying selected participants poorly understood by researchers and the research par- through interviews. The basis for the survey design ticipants. Furthermore, the terms of service may not be was the behavioural framework described below. sufficient to meet the data protection and confidentiality standards that researchers and their institutional review boards (IRBs) require. Dropbox’s well publicized June A-COM-B Framework 2011 security glitch, which left all Dropbox accounts open to access without a password for several hours, is In investigating interventions which could be used to indicative of this problem. Applying additional security improve researchers’ data management practices, measures, such as encrypting files locally prior to shar- Wolski and Richardson (2015) examined a number ing them via a cloud service, is beyond the technical of behaviour and behaviour change theories and mod- skills of many researchers .... els. Ultimately they chose the COM-B system devel- oped by Michie et al. (2011) as the simplest, yet most The data management practices of STEM research- comprehensive framework on which to base their ers are heavily represented in the literature (Fear, approach. In the COM-B system, C ¼ Capability; 2011; Peters and Dryden, 2011). A number of studies O ¼ Opportunity; and M ¼ Motivation, all of which have surveyed academics from across all disciplines interact to generate behaviour (B). within their institutions (Schumacher and Vande- However, influenced by the work of Piderit (2000), Creek, 2015; Weller and Monroe-Gulick, 2014), and Wolski and Richardson modified the COM-B system social science and humanities researchers are often by incorporating attitude as a key component in discussed within these broader studies. Jahnke et al understanding researcher behaviour. Figure 1 offers (2012) solely investigate the data management prac- a diagrammatic representation of this concept, now tices of social science researchers. Their study presented as A-COM-B. The single-headed and explores the ‘nonlinear nature of the research process’ double-headed arrows represent the potential for (Jahnke et al., 2012: 10) and how this impacts data influence between the various elements. management practices, along with concerns around Attitude is defined as ‘an individual’s evaluation or data sharing, infrastructure and technical issues. belief about something’ (Wolski and Richardson, Martinez-Uribe and Macdonald (2009: 314) found 2015: 6); capability is the ‘the psychological or phys- that ‘the curation of research data requires trusted ical ability to enact the behaviour’ (Michie et al., 2011: 4). relationships achieved by working and conversing Motivation is defined as ‘all those brain processes that with researchers ...’. Jahnke et al. (2012: 19) recom- energize and direct behaviour, not just goals and con- mend that ‘extensive outreach to scholars is necessary scious decision-making. It includes habitual processes, to build the relationships that will facilitate data emotional responding, as well as analytical decision- preservation’. making’ (Michie et al., 2011: 4). Opportunity is defined This paper adds to the literature by applying a as ‘all the factors that lie outside the individual that make newly developed conceptual framework to the atti- the behaviour possible or prompt it’ (Michie et al., 2011: tudes and behaviours of researchers in regard to data 4). Behaviour is the result of the interaction between management practices. these four key elements. Hickson et al.: Modifying researchers’ data management practices 257
In applying the framework to a situation in which Table 1. Distribution of invited participants by career one wishes to modify a current practice, the starting stage. point is to (1) identify the underlying elemental beha- Early – mid career Late career viours that make up the practice and (2) identify (Within 15 years of being (15þ years since which of these needs to be changed. Importantly the awarded PhD) awarded PhD) Total next step is to: 10 14 24 ...identify current attitudes to the desired change in behaviour. However a challenge is that unlike beha- viour, attitudes are more difficult to observe, measure There are 23 questions in the survey, comprised of and quantify. Therefore attention may need to be paid to a mixture of multiple-choice and free-text formats. employing techniques such as qualitative interviewing The interviews were conversationally led by one coupled with good listening skills. This is an important research project team member, while a second team step as understanding the nature of attitudes will nor- member noted responses to the questions in a Google mally provide insights into the other elements of the form. Each interview was allocated 30 minutes. The framework, i.e. capability, motivation and opportunity. An understanding of all these elements creates a founda- interviews were also recorded and later transcribed. tion for developing an intervention plan. (Wolski and It is estimated that the research project will take Richardson, 2015: 8) 12 months to complete. The initial phase of the proj- ect as described in this paper was carried out over a In developing an intervention plan with specific six-month period. It is expected that the next phase of reference to data management, Wolski and Richard- the project will include devising and implementing son strongly advocated the need for research support intervention strategies, follow-up surveys and analy- teams to understand individual behaviours within the sis, and the creation of a toolkit for service providers. context of their local cohort level rather than at the It is anticipated that these activities will take a further larger faculty or institutional level. They suggested six months. that librarians could trial the strategy with a research centre. In addition, any intervention should be ‘a Demographics of the research centre multi-pronged approach which targets the different The centre chosen for the study is a high-profile, inter- elements of the framework’ (Wolski and Richardson, disciplinary social science research group. It com- 2015: 8). prises researchers from several departments across the University, whose research focuses on human resource management, industrial relations and organi- Applying the framework sational behaviour. A recent benchmarking exercise, The research team, comprising of three of the authors, conducted by librarians in Library and Learning selected a small but high-profile social sciences Services, placed the research centre among the top research centre to test the theories in Wolski and national and international research centres with a sim- Richardson’s A-COM-B model. This centre was cho- ilar research focus. sen because of the existing relationship which the For the purpose of the study, adjunct members and authors had already built with its researchers and those on extended leave were excluded. The number because this choice would align with Wolski and of researchers invited to participate was 24. Richardson’s (2015: 8) theory that ‘to understand the As shown in Table 1, there is a fairly even distri- current attitudes and to plan an ‘intervention’ plan, bution among early/mid-career and late career local service delivery teams may need to understand researchers. their local cohort to develop an effective response’. Table 2 indicates that 20 (83%) of the participants To identify underlying attitudes and behaviour, have an academic rank of senior lecturer or above. the research project team designed a series of inter- At the time of writing this paper, 12 (50%) of the view questions, categorised according to five broad invited participants had been interviewed. headings: Initial findings How and where researchers store their data; Criteria for selecting the solutions they use; Types of data Backup methods and approaches; Researchers were asked to describe their research Data management planning; and areas and the types of data they collect. Most Additional aspects of their research habits. researchers reported collecting both qualitative and 258 IFLA Journal 42(4)
Table 2. Distribution of invited participants by academic rank.
Research Assistant Research Fellow Lecturer Senior Lecturer Associate Professor Professor Total
11266824
Figure 3. Criteria for selecting data management solutions.
Figure 2. Solutions for managing research data. quantitative data, with only two respondents (16.7%) reporting that they collected qualitative data only. The most common method of data collection was audio recording interviews and transcribing to text-based format. Only two respondents (16.7%) used online surveys to collect data.
Managing data Dropbox and hard drives were the most common ways of managing research data. Of the nine research- ers (75% of total participants) who responded that Figure 4. Issues encountered with data management. they used Dropbox, five used the free Dropbox prod- uct and four used the paid Dropbox product. Sharing data with collaborators was the most com- All 12 of the researchers reported using more than monly reported problem with the way researchers one solution for managing their data, with most manage their data (Figure 4). Some of the reported using Dropbox as well as their hard drive, Google issues included complications arising from collabora- Drive and a USB device. Only one participant (8.3%) tors working between the free and the paid versions of used one of the University’s data storage services, Dropbox, and working with collaborators who used i.e. Griffith Research Space (Figure 2). However, different systems. Size of data being shared was an most researchers reported, in a separate section of issue for only one researcher (8.3%). the survey, that the size of their research data was Participants were asked to identify all methods used less than 5GB, with only two (16.7%) reporting that for backing up their data (if applicable) (Figure 5). it was more than 5GB. Using an external hard drive was the most popular way Participants were asked to indicate the criteria to back up research data, with 5 participants (41.6%) which influenced them to choose the solutions selecting this solution. Two researchers who consid- identified in Figure 2 for managing their data. See ered Dropbox to be their data backup solution Figure 3. explained that they relied on Dropbox to do the backup The most common reason given for their choice in automatically, and one researcher who used Google data management options was ease of use. One Drive ‘assumed that going to the inconvenience of researcher described Dropbox as ‘ridiculously easy’, using Google Drive [sic], it will be backed up’. and another cited difficulties using Google Drive that The one researcher (8.3%) who indicated that they do not arise when using Dropbox. did not back up their data cited ‘time constraints, Hickson et al.: Modifying researchers’ data management practices 259
Figure 7. Methods for sharing data with collaborators.
Figure 5. Solutions for backing up data. sharing passwords and login information with colla- borators to access confidential material said: ‘it’s get this done now, there isn’t time to work through the better way to do things’. In response to a question as to who owns research data in collaborative research projects, the responses from researchers suggested that although they had not previously thought about data ownership, they had clear perceptions on where ownership should lie. Some of the responses included that it was ‘jointly owned’ for collaboratively collected data and ‘assume that it’s mine’. Others believed it was ‘wherever the grant sits’ or that the University owned it.
Data management plans When asked if they had a data management plan, all of the researchers interviewed responded that they did not. One researcher referred to ‘an informal, unwrit- ten one’ and another explained, ‘people just decide what to do as they go along’. Another researcher said it was ‘instinctive’.
Figure 6. Frequency of data backups. Data sharing laziness and naive trust in the systems along with When asked whether they share or intend to share data optimism that all is good until the first time something publicly, there was an overwhelmingly negative goes wrong’ as the rationale. response from the researchers. Two researchers Figure 6 shows that one-third of the participants (5) recounted hearing stories of data being put online, and back up their data as frequently as on a daily basis. others subsequently re-using the data to write papers. However, this result should be understood in the con- This practice was perceived as contentious. One text that most of the researchers who reported backing researcher described the UK requirement to publish up their data daily, were relying on it being backed up data from publicly funded research projects as ‘con- systematically by the storage products they used. troversial’ and another researcher responded, ‘I would be horrified being forced to do that’. Others suggested that they could not see that their data was useful to Collaboration anybody else. Only one researcher reported that they Most researchers reported using Dropbox to share had thought about making their data open and that data with collaborators (see Figure 7). One researcher they did not perceive any problems in doing that. explained that it was the chief investigator’s respon- An interesting finding, which corroborates surveys sibility to ensure that the data is stored and shared reported in the literature (Federer et al., 2015; Tenopir appropriately. Another researcher who spoke about et al., 2011), was that a number of these same 260 IFLA Journal 42(4) researchers reported re-using data from research proj- order to influence researcher attitudes and analyse ects or requesting data from researchers with whom whether this translates into behavioural change. they had worked previously. Managing data Griffith research storage services From the interviews to date, it is apparent that While most researchers answered that they were researchers’ attitude is to choose the ‘easiest-to-use’ aware of Griffith’s research storage solution, only one products for storage and moving data. Some research- researcher said they used it. The most common reason ers reported frustrations with using varied and differ- for not using the service was that they did not know ing systems and having to make choices regarding the how. When researchers were asked what would moti- use of them. As one respondent said: ‘I might use vate them to use the storage solution, several referred Google Drive because I know how to use it ...and to ease of use and accessibility as motivators. One to go and use R drive and Research Storage Service researcher stated that they might start using the ser- and the Vault, I have to go and find out about it’. vice ‘if it’s the right thing to do’ and added ‘it’s just Another stated: ‘[there] isn’t time to work through the another thing, it may be well and good but you have to better way to do things’. This has translated into a learn it and it contributes to a sense of overload’. behaviour of using their preferred solution, mainly Another researcher commented: ‘if I had time to do Dropbox and a hard drive. it. I feel I am competent to do it, I just don’t have the By using the A-COM-B model to design interven- time available’. tion strategies, the research project team could create intervention plans that will target researchers’ attitude Discussion towards managing data and address capability, moti- Wolski and Richardson’s (2015) A-COM-B model vation and opportunity factors. was used to analyse the responses from a small group If it can be shown that Griffith’s preferred data of targeted researchers to test whether researchers’ management solution is as easy to use as Dropbox, data management behaviour can be understood in the the research project team could target motivators and context of attitude, capability, motivation and oppor- opportunities to trigger a change in behaviour. How- tunity. The initial findings have been analysed to ever, if Griffith’s preferred solution is not as easy to ascertain the attitude of the researchers, how it is use as Dropbox, then rather than focus on motivation reflected in behaviour, and whether the model can or capability, intervention strategies could target be used to plan intervention. Wolski and Richardson instead the researchers’ attitude that ‘easy-to-use’ is (2015: 6) assert that ‘attitude is an individual’s eva- the best reason to choose a product. luation or belief about something’ and ‘When plan- ning to initiate a plan to change behaviours of staff, consideration should first be given to the attitudes of Data backup staff and having an appreciation of their views and While only one researcher reported not backing up attitudes in relation to the change being considered’. their data, most used an external hard drive and relied Wolski and Richardson (2015: 8) also describe atti- on automated backup utilities. While backing up data tude as ‘difficult to observe, measure and quantify’. is good data management practice, using an external By analysing the open-ended responses obtained as hard drive may expose the data to security risk part of the initial stage of the research project, a pic- through theft or damage. ture of the researchers’ attitude towards data manage- The researchers’ responses demonstrate a casual ment and how this influences their behaviour has and indifferent attitude towards data backup. While become clearer. a few consciously and regularly backed up their data, Wolski and Richardson (2015: 6) also state that many left it to the ‘system’ to do. As nothing had gone ‘Understanding the nature of the attitude (more often wrong in the past, they did not perceive any threat and than not ambivalence in relation to how researchers therefore were not motivated to change their beha- regard data management) should provide insights into viour. One researcher, who regularly and deliberately the most appropriate responses that will garner the backed up their data, explained that, although they desired attitudinal change’. The authors will under- had not suffered any loss of data, it was witnessing take subsequent research to further test the applicabil- the consequences when close colleagues lost data that ity of the A-COM-B model, by designing and had made them act carefully with data backup. This is implementing intervention strategies that address an example of a motivator that has triggered a beha- capability, opportunity and motivation factors in vioural change. Hickson et al.: Modifying researchers’ data management practices 261
By applying the A-COM-B model to intervention not have felt motivated to change their behaviour and strategies, the research project team could target the conduct more robust data management planning. underlying attitudes surrounding data backup to When planning interventions, the research project reframe the task as being important enough to take team could target the researchers’ attitude that their considered, planned and careful steps to ensure data is data is unimportant or too small to manage effec- backed up securely. Citing the example above might tively. One motivator could be ensuring data plan- even be a useful intervention strategy, i.e. communi- ning requirements from grant funders are addressed; cating the impact of actual cases of lost data. As the providing data management plan templates and sup- well-known axiom says, ‘Facts tell, stories sell’. port could help to address issues around capability and opportunity. Collaboration When asked what led them to choose their data Data sharing management options, 25% of the researchers said that it was because a collaborator was using that The attitude of most of the interviewed researchers option. Many explained that either whoever was in towards data sharing was that it was contentious, that charge or the chief investigator on a research proj- their data would not be of interest to other researchers, ect normally decided on where and how data was and that sharing data might raise some methodologi- stored and managed. The researchers’ attitude cal issues. As one researcher stated: towards collaborative data management was that assumptions and unspoken understandings were The data you collect is so project specific and so raises sufficient and not necessarily in need of more accu- some general methodological questions you have to ask yourself about the merits or otherwise of using data rate and specific definition. collected for one purpose, or for one question, or one Although many researchers believed that the chief set of hypothesis, to address another one. investigator took the lead in data management, one senior researcher explained that they left it up to their And another responded: ‘I don’t think it would be of research assistant to establish the data management value to other people. They don’t have the conceptual systems of choice. They explained further that the framework to make any sense of them’. reason Dropbox was used was because the research Although most of the researchers could not per- assistant had made that decision. ceive the benefits of sharing their data with others, In targeting the attitude of researchers regarding many had themselves benefited from re-using data collaborative data management, the research project from historical research projects. team could design interventions to reframe research- An intervention strategy might leverage the atti- ers’ attitudes towards becoming more careful and tude that re-using data from historical research proj- considered when planning collaborative projects. ects or data previously collected by collaborators was Interventions targeted at senior levels might focus acceptable practice. This could be reframed as data on establishing clear and uniform collaborative data sharing can be beneficial. Motivators might include management guidelines, whereas interventions tar- the citation advantage of data publication and the geted at more junior levels might focus on the selec- publication and re-use of data as a measure of tion of data management solutions. research impact. Capability and opportunity might be enabled through support and education around Data management plans platforms and systems used to publish and share data. Another finding was that many of the researchers considered that there was a direct correlation between the size of a dataset and the importance of managing Griffith University’s research storage services that dataset. Their attitude was that their datasets were Although most of the researchers were aware of Grif- not large enough or important enough to warrant more fith’s research storage solutions, only one said they conscientious planning. One researcher, who said had used it. The researcher who reported using it they did not see themselves as a data manager, sug- expressed frustration with the service: ‘I think I just gested that anyone who made the effort to manage found it difficult to use and I tried to get some assis- their data was a ‘scientist with oodles of scientific tance and I gave up ...’ data’. Another agreed that ‘we need to get it right’, The attitude of the researchers’ towards Griffith’s but ‘it may not be seen as important’. With only rel- research storage services reflects their attitude toward atively small datasets, the centre’s researchers may choosing data management solutions. Their current 262 IFLA Journal 42(4) practices work effectively, they are easy to use, and as management behaviour. The A-COM-B model has a result the researchers feel unmotivated to change. proved a powerful tool with which to analyse Many researchers cited time constraints as a reason researcher attitude and behaviour. Using the frame- for not changing their data management behaviour. work has allowed the research team to map how they With increasing pressure on researchers to increase can tailor and directly address the interactions of atti- their research outputs, demonstrate impact and adhere tude, capability, opportunity and motivation in order to local and national policies and mandates, data man- to influence behavioural change. agement planning can be seen as an additional burden. As the team embarks on the second phase of the Interestingly, although Wolski and Richardson research project, it is worth noting that the experience (2015) suggest that the mandated policies and guide- with the initial data collection and analysis has led to a lines will not guarantee improvement to data manage- refinement in conversational interviewing and data ment practices, several researchers said that they collection techniques as the research project team would be motivated by ‘mandates’. One said ‘being members attempt to elicit nuanced responses from told to do it’ would motivate them to change their data researchers in order to obtain a more detailed under- management behaviour and another said: ‘if it was the standing of their attitudes. right thing to do’. One researcher, who reluctantly The research project team has benefited not only used the University’s Google Drive for research data from a wider understanding of data management storage, described attending a seminar where they issues, but also a broader understanding of how the were told they must use Google Drive and said that framework can be applied to changing behaviours in ‘put the fear of god into me about using Dropbox’. other areas. The research project has additionally They continued, ‘Colleagues didn’t attend that semi- influenced the way the team approaches researchers nar and they all use Dropbox because it’s easier and I and academics in the wider university environment agree with them, it’s much easier’. about data management issues.
Implications for service providers Next steps and challenges As service providers grapple with the challenges of This paper reports on the initial stages of the project getting researchers to manage their data effectively, and tests whether the A-COM-B model can be applied especially as the imperative from national and local to data management behaviours. During the initial drivers increases, many studies have focused on the research phase, a tailored response was created for behaviours of researchers towards data management. each of the researchers interviewed thus far. By This research project aims to provide an insight into selecting appropriate interventions from a suite of researcher attitudes and whether attitude change is the tools and products, the research team was able to key to changing behaviour (World Bank, 2010). By deliver a response to the researchers quickly. In the testing the A-COM-B framework for understanding next stage, the remaining researchers in the group will data management behaviour, the authors hope to pro- be interviewed, and the data collected and analysed. vide an approach for service providers wanting to Responses will be analysed using the A-COM-B design intervention strategies to change researchers’ model and intervention strategies will be planned and data management behaviour. implemented. Each researcher will receive a tailored From the research undertaken so far, there have response that is formulated through addressing spe- been two key findings: (1) the size of the datasets cific capability, opportunity and motivational issues, produced by a social science research centre of this with the aim of influencing attitude and the corre- type is quite small, which has implications for data sponding behaviours. management and storage; and (2) attitude is the big- Additional intervention methods will take the form gest challenge when seeking to modify the target of workshops, consultations and drop-in sessions for group’s data management practices. As a conse- researchers and support staff. A follow-up interview quence, the research projectteamwillinvestigate with the researchers will take place in approximately more thoroughly attitude change strategies, as identi- six months from the initial interview to determine fied in the literature. whether attitude and behaviour towards data manage- The research team has been influenced by the many ment have altered and whether engagement with the insights into researchers’ attitudes and behaviours. By suggested solutions has occurred. analysing behaviours through the A-COM-B lens, the Given that the A-COM-B framework is assisted by research project team now has a unique perspective on service providers having insights into individual atti- how to approach the challenge of changing data tudes and behaviours, the challenge of translating the Hickson et al.: Modifying researchers’ data management practices 263 framework into a model which can be applied to the library and information specialists are better equipped wider University community will be explored in the to identify underlying behaviours that influence deci- next phase of the research project. Mechanisms such sion making. Although this paper reports on the initial as a simplified online survey to capture individual stages of a larger research project, the preliminary data management practices, filtered through the findings suggest that attitude is the predominant A-COM-B categories, and coding of responses in deterrent to good data management behaviour. By order to select intervention strategies quickly will using this framework, practitioners can design inter- be considered. vention strategies that are aligned to individual need, Based on the findings from the second phase of the and that lead researchers to using safe and secure research project, the research team expects to imple- institutional solutions and services. ment a four-part plan: Declaration of Conflicting Interests 1. The research team will create a generic toolkit The author(s) declared no potential conflicts of interest for use by other research support service pro- with respect to the research, authorship, and/or publication viders within the Division. of this article. 2. The research team will train this cohort in the methodology – and any relevant tools – as out- Funding lined in this paper and subsequently refined by The author(s) received no financial support for the feedback from the second phase of the research, authorship, and/or publication of this article. research project. 3. Research support service providers for the References four main academic areas within the Univer- Australian National Data Service (2016a) Capability Matu- sity will be encouraged to initially work with rity. Available at: http://www.ands.org.au/guides/capa a research centre with which they have bility-maturity (accessed 21 April 2016). already established a good relationship, so Australian National Data Service (2016b) Creating a Data as to encourage open and honest feedback Management Framework. Available at: http://www from the participants. .ands.org.au/guides/creating-a-data-management-frame work (accessed 21 April 2016). 4. Using the data from these later projects, the Australian Research Council (2014) Funding Rules for research team will then explore whether com- Schemes under the Discovery Program for the Years monalities in data management behaviour 2015 and 2016 – Australian Laureate Fellowships, Dis- that can be inferred through categories such covery Projects, Discovery Early Career Researcher as a researcher’s career stage, academic rank Award and Discovery Indigenous. Canberra: Australian andresearcharea,couldbeusedtoscaleup Research Council. intervention strategies across the University Brown RA, Wolski M and Richardson J (2015) Developing as a whole. new skills for research support librarians. Australian Library Journal 64(3): 224–234. Based on a continuous improvement cycle, the plan European Commission (2016) Guidelines on Data Man- relies on evaluation and adjustment to fine-tune its agement in Horizon. Available at: http://ec.europa.eu/ application. Assuming positive feedback in general research/participants/data/ref/h2020/grants_manual/hi/ throughout the cycle, this would be viewed as a oa_pilot/h2020-hi-oa-data-mgt_en.pdf (accessed 3 May multi-year initiative until such time as critical mass 2016). was achieved. Fear K (2011) ‘You made it, you take care of it’: Data The data collected from this study to date com- management as personal information management. International Journal of Digital Curation 6(2): 53–77. prises audio interview files, transcripts, interview Federer LM, Lu YL, Joubert DJ, et al. (2015) Biomedical notes and data analysis. The data will be restricted data sharing and reuse: Attitudes and practices of clin- because the nature of the information means that par- ical and scientific research staff. PLoS ONE 10(6): ticipants could potentially be identified. The authors e0129506. intend to create a record about the data in the institu- Henty M, Weaver B, Bradbury SJ, et al. (2008) Investigat- tional data repository. ing Data Management Practices in Australian Univer- sities. Australia: APSR. Conclusion Jahnke L, Asher A and Keralis SD (2012) The Problem of Data. Council on Library and Information Resources This study has found that in understanding and apply- (CLIR) Report, pub. #154. Available at: https://www ing a theoretical framework to the data management .clir.org/pubs/reports/pub154/pub154.pdf (accessed 30 practices of researchers in a small research centre, September 2016). 264 IFLA Journal 42(4)
Kennan MA and Markauskaite L (2015) Research data Searle S, Wolski M, Simons N, et al. (2015) Librarians as management practices: A snapshot in time. Interna- partners in research data service development at Griffith tional Journal of Digital Curation 10(2): 69–95. University. Program: Electronic Library & Information Martinez-Uribe L and Macdonald S (2009) User engage- Systems 49: 440–460. ment in research data curation. In: Research and Si L, Xing W, Zhuang X, et al. (2015) Investigation and advanced technology for digital libraries: 13th analysis of research data services in university libraries. European conference, ECDL 2009 (eds M Agosti, J The Electronic Library 33(3): 417–449. Borbinha, S Kapidakis S, et al.), Corfu, Greece, 27 Stanford University Libraries (2013) Stanford Prize for September–2 October 2009, pp. 309–314. Berlin, InnovationinResearchLibraries(SPIRL).Available Heidelberg: Springer. at: https://library.stanford.edu/projects/stanford-prize- Michie S, van Stralen MM and West R (2011) The beha- innovation-research-libraries-spirl/2013-prizes (accessed viour change wheel: A new method for characterising 20 April 2016). and designing behaviour change interventions. Imple- Tenopir C, Allard S, Douglass K, et al. (2011) Data sharing mentation Science 6(42): 1–12. by scientists: Practices and perceptions. PLoS ONE 6(6): National Health and Medical Research Council (2014) e21101. NHMRC’s Policy on the Dissemination of Research Tenopir C, Sandusky RJ, Allard S, et al. (2013) Academic Findings. Available at: https://www.nhmrc.gov.au/ librarians and research data services: Preparation and grants-funding/policy/nhmrc-open-access-policy attitudes. IFLA Journal 39(1): 70–78. (accessed 21 April 2016). Wang M and Fong BL (2015) Embedded data librarianship: National Health and Medical Research Council, Australian A case study of providing data management support for Research Council and Universities Australia (2007) Aus- a science department. Science & Technology Libraries tralian Code for the Responsible Conduct of Research. 34(3): 228–240. Available at: https://www.nhmrc.gov.au/guidelines- Weller T and Monroe-Gulick A (2014) Understanding publications/r39 (accessed 30 September 2016). methodological and disciplinary differences in the data National Science Foundation (2016a) National Science practices of academic researchers. Library Hi Tech Foundation History. Available at: http://www.nsf.gov/ 32(3): 467–482. about/history/ (accessed 21 April 2016). Westra B (2014) Developing data management services for National Science Foundation (2016b) Dissemination and researchers at the University of Oregon. In: Ray J (ed.) Sharing of Research Results. Available at: http://www Research Data Management: Practical Strategies for .nsf.gov/bfa/dias/policy/dmp.jsp (accessed 21 April Information Professionals. West Lafayette, IN: Purdue 2016). University Press, pp. 375–391. O’Reilly K, Johnson J and Sanborn G (2012) Improving Wolff C, Rod AB and Schonfeld RC (2016) Ithaka SþRUS university research value: A case study. SAGE Open Faculty Survey 2015. Available at: http://www.sr. 2(3): 1–13. ithaka.org/publications/ithaka-sr-us-faculty-survey- Peters C and Dryden AR (2011) Assessing the academic 2015/ (accessed 20 April 2016) library’s role in campus-wide research data manage- Wolski M and Richardson J (2015) Improving data man- ment: A first step at the University of Houston. Science agement practices of researchers by using a behavioural & Technology Libraries 30(4): 387–403. framework. In: THETA 2015, Gold Coast, Australia, 11– Piderit SK (2000) Rethinking resistance and recognizing 13 May 2015. Available at: http://hdl.handle.net/10072/ ambivalence: A multidimensional view of attitudes 69141 (accessed 30 September 2016). toward an organizational change. Academy of Manage- World Bank (2010) Theories of Behavior Change. Commu- ment Review 25(4): 783–794. nication for Governance and Accountability Program Quacquarelli Symonds (2015) QS top 50 under 50 2015. (CommGAP). Washington, DC: World Bank. Available at: http://www.topuniversities.com/top-50- under-50 (accessed 10 May 2016). Research Councils UK (2014a) Research Outcomes Over- Author biographies view. Available at: http://www.rcuk.ac.uk/research/ researchoutcomes/ (accessed 20 April 2016). Susan Hickson is a Library Services Manager (Business) at Research Councils UK (2014b) About researchfish. Avail- Griffith University, Australia and a librarian by profession. able at: http://www.rcuk.ac.uk/research/researchout Sue is responsible for leading a team of professional Librar- comes/researchfish/ (accessed 20 April 2016). ians, Learning Advisers and Digital Capability Advisers Research Councils UK (2015) RCUK Common Principles who support academics, researchers and students in the on Data Policy. Available at: http://www.rcuk.ac.uk/ areas of Research, Learning and Teaching. Previously Sue research/datapolicy/ (accessed 20 September 2016). was a Faculty Librarian supporting the Health Group. Schumacher J and VandeCreek D (2015) Intellectual cap- ital at risk: Data management practices and data loss by Kylie Ann Poulton began her library career as a researcher faculty members at five American universities. Interna- in investment banking before joining Information Services tional Journal of Digital Curation 10(2): 96–109. at Griffith University. During her career at Griffith she has Hickson et al.: Modifying researchers’ data management practices 265 held roles in Liaison Librarianship, Project Management and Australia, and has been a lecturer in Library and Informa- Change Management. She has also led Griffith’s Higher tion Science. Recent publications have been centred around Education Research Data Collection team and worked as a library support for research and research data management project officer in the University’s Research Office. She is frameworks. currently Griffith University’s Business Librarian. Malcolm Wolski is the Director, eResearch Services, at Maria Connor is the Business Librarian at Griffith Uni- Griffith University. In his role, he is responsible for the versity Gold Coast Campus, Australia. She holds an MLIS development, management and delivery of eResearch ser- from Victoria University, Wellington, New Zealand. vices to support the University’s research community, which includes the associated information management Joanna Richardson PhD is Library Strategy Advisor at systems, infrastructure provision, data management ser- Griffith University, Australia. Previously she was respon- vices and media production. Malcolm is a part of the senior sible for scholarly content and discovery services including leadership team providing library, information and IT ser- repositories, research publications and resource discovery. vices at the Griffith University and he works closely with Joanna has also worked as an information technology these groups to provide service desk, infrastructure and librarian in university libraries in both North America and outreach services to the research community. IFLA Article
International Federation of Library Associations and Institutions 2016, Vol. 42(4) 266–277 Research data services: An exploration of ª The Author(s) 2016 Reprints and permission: requirements at two Swedish universities sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/0340035216671963 ifl.sagepub.com
Monica Lassi Lund University, Sweden
Maria Johnsson Lund University, Sweden
Koraljka Golub Linnaeus University, Sweden
Abstract The paper reports on an exploratory study of researchers’ needs for effective research data management at two Swedish universities, conducted in order to inform the ongoing development of research data services. Twelve researchers from diverse fields have been interviewed, including biology, cultural studies, economics, environmental studies, geography, history, linguistics, media and psychology. The interviews were structured, guided by the Data Curation Profiles Toolkit developed at Purdue University, with added questions regarding subject metadata. The preliminary analysis indicates that the research data management practices vary greatly among the respondents, and therefore so do the implications for research data services. The added questions on subject metadata indicate needs of services guiding researchers in describing their datasets with adequate metadata.
Keywords Academic libraries, data services, metadata and semantic web, organization of information, services to user populations, types of libraries and information providers
Submitted: 20 May 2016; Accepted: 6 September 2016.
Introduction with researchers, following the Data Curation Pro- Over the recent years a plethora of discussions have files (DCP) Toolkit (Carlson and Brandt, 2014). been taking place in relation to scholarly research data Based on the toolkit, a data curation profile is created and the role of academic libraries and research insti- for each of the interviewed researchers. The profile tutions in providing various services to support may be used by the researcher for research data man- research data management (see, for example, Borg- agement (RDM), and by libraries in developing sup- man, 2015). Libraries that are not yet offering support port for RDM. for research data may be at the stage of developing The outline of the paper is as follows. In the next such services, as is the case of two Swedish univer- section (Background), the context of research data sities in focus of this paper. The libraries of Linnaeus services provided by libraries as seen in professional University and Lund University have chosen to first discussions, project reports and research literature is identify characteristics, requirements, needs and related issues of managing different types of research data produced by researchers employed by the respec- Corresponding author: Monica Lassi, Department of Scholarly Communication at tive universities, which would then serve as a founda- University Library, Lund University, PO Box 3, Lund, 22100 tion to create most appropriate research data Sweden. management services. The study is based on interviews Email: [email protected] Lassi et al.: Research data services 267 provided. This is followed by a section (Methodol- quality RDM systems and related professional sup- ogy) on the methods of the study. The results are port (Andersson et al., 2011). presented and analysed in the next section (Results). In 2014 the Lund University Library undertook an The last section (Discussion) concludes the paper with investigation of the conditions for RDM within a discussion on the implications of the results for Humanities and Social Sciences at the university. It developing research data services. both looked at the organizational aspects for RDM, and brought in researchers’ opinions through a small survey (A˚ hlfeldt and Johnsson, 2014). Background These above mentioned projects were all early, Research data management and services in Sweden exploring projects on RDM, with a focus on data sharing. The conditions for this project with inter- During the past years several RDM projects and views with Data Curation Profiles have been a bit events took place at Swedish universities; many of different, as the debate on open data and RDM have which had been initiated by academic libraries and been more animated the last year. With the Data Cura- archives. There has been a lot of focus on data sharing tion Profiles we have had the possibility to study and open data, and lately also on preservation and researchers’ experiences on a more individual basis archiving, in particular with projects for e-archiving. and close to their everyday work. Several projects have investigated researchers’ At the national level it is the Swedish Research experiences and attitudes regarding research data Council that is the major stakeholder in RDM, active management. In 2007 libraries and archives at Uni- particularly in funding related research infrastructures versity of Gothenburg, Lund University and the in Sweden, such as SND, Environment Climate Data Swedish University of Agricultural Sciences under- Sweden (ECDS), Bioinformatics Infrastructure for took a joint pre-study with the aim of exploring Life Sciences (BILS) and the Max IV Laboratory research data in open archives and university which provides access to synchrotron X-rays. At the archives. The study comprised a survey on research- request of the Swedish government, in 2015 the ers’ attitudes towards publishing research data and Swedish Research Council proposed a national policy looked specifically at the future roles of open archives for open access to scientific information, including and university archives (Bj¨orklund and Eriksson, publications and research data (Swedish Research 2007). The Swedish National Data Service (SND) Council, 2015). The Swedish government’s stance performed a major study in 2008–2009 on practices on the proposal is expected to be made clear in the of open access to and reuse of research data. The fall of 2016 when the research bill is anticipated. The study comprised two surveys targeting professors and policy aims to ensure open access to all Swedish sci- PhD students working at Swedish universities. The entific information that has been fully or partially survey results pointed to the following: funded by public funds, by the year 2025. In Sweden, a number of barriers to sharing research data; research data are owned by the academic or research unresolved issues regarding legal and ethical institution at which the researcher is employed (see aspects; Bohlin, 1997), and not by the researchers themselves lack of resources to make research data as is the case in many countries, such as Finland. available. However, this is not very well known by Swedish researchers; they regard the data they have collected It concluded that researchers need to be trained in as their own (Swedish Research Council, 2015). The relation to research methods, digital research data- proposal suggests a model in which the responsibility bases and accessible e-tools, as well as that funds for archiving and providing access would lie with should be made available for preparing research data different organizations. Academic and research insti- to be shared and archived (Carlhed and Alfredsson, tutions would have responsibility for archiving 2009). research data, and a national facility would be respon- SND participated in a similar project also involving sible for making the research data available by linking university libraries at the University of Gothenburg, to the research data archived at the academic and Lund University, Link¨oping University and Malm¨o research institutions. University, focusing on researchers within Arts and In spite of the absence of an established national Humanities. The participating libraries interviewed policy, there is a relatively strong spirit of profes- researchers about their attitudes towards publishing sional development in RDM at Swedish universities their research data. The project revealed a positive and research infrastructures, as is particularly evident attitude in this respect, and identified the need for from the considerable number of events aimed at 268 IFLA Journal 42(4)
RDM research and training. The primary player is the different fields. The Lund University Library is also Swedish National Data Service (SND). Apart from involved in projects of research infrastructures at the arranging training events in research data manage- university, such as the ICOS Carbon Portal (ICOS ment, SND is also coordinating pilot projects on RDM Carbon Portal, 2016). at several Swedish universities during 2016.
Subject metadata in research data Development of research data services – Linnaeus University Naming and organizing data and relationships among them can have ‘profound effects on the ability to dis- Research data services at Linnaeus University (LNU) cover, exchange, and curate data’ (Borgman 2015: have been in the planning stage since 2015, when a 65). Standardized metadata schemes increase these librarian working as a research strategist at the uni- abilities. While in many scientific areas metadata versity library was chosen to serve as the representa- schemes are well used (although there are still gaps tive for SND. Apart from taking part in SND’s in a number of them), the question is raised here about training events, the LNU library liaised with the Digi- the degree to which subject metadata are standar- tal Humanities Initiative at LNU (Linnaeus Univer- dized. Subject searching (searching by topic or theme) sity, 2016) beginning in February 2016. As one of the is one of the most common and at the same time the original eight pilot projects planned as part of this most challenging type of searching in information initiative, the Humanities Data Curation pilot project systems. Subject index terms taken from standardized was envisioned. It set out to be conducted by the knowledge organization systems (KOS), like classifi- Digital Humanities researchers who would collabo- cation systems and subject headings systems, provide rate with LNU Library SND representative as well numerous benefits compared to free-text indexing: as the Lund University Library, with a particular consistency through uniformity in term format and focus on the survey reported here. The plan for further the assignment of terms, provision of semantic rela- steps is under development. tionships among terms, support of browsing by pro- vision of consistent and clear hierarchies (for a Development of research data services – Lund detailed overview see, for example, Lancaster University Library 2003). In terms of cross-searching based on different metadata schemas using subject terms from different At the Lund University Library, the Department of KOS, challenges like mapping across the KOS need to Scholarly Communication addresses RDM and devel- be addressed in order to meet the established objec- opment of related services. Lund University Library is tives of quality controlled information retrieval sys- one of 26 libraries in the network of Lund University tems like those provided by libraries. Libraries. Whereas most libraries provide a service to Libraries are providing quality subject access of a particular faculty, department or centre, the Univer- resources described in library catalogues through, for sity Library provides a service to the entire University example, classification schemes and subject headings; (Lund University Libraries, 2016). There are a num- however, when it comes to research data, there seems ber of RDM-related projects coming up at Lund Uni- to be a lack of controlled subject terms in metadata versity, and in several of these library staff is schemes. An exploratory analysis of 36 disciplinary involved. The university hosts several research infra- metadata standards from the list provided by the Digi- structures which are working in different ways with tal Curation Centre (2016) in the UK shows that 18 questions of data sharing, data preservation, etc. Dur- (50%) of them to do not provide any subject metadata ing the fall 2015 Lund University Library conducted a field. Of those that do, only a few offer clear guide- survey of the university faculties regarding their lines on the controlled vocabulary use, while others experiences of and attitudes to RDM (Johnsson and leave it to be a freely added keyword. Lassi, 2016). The survey, in which faculty managers Several examples of the former include the and library staff were interviewed, showed that the following: respondents expect RDM to become more important in the future, and that it will become a part of SDAC (Standard for Documentation of Astro- researchers’ work tasks in a more structured way than nomical Catalogues) which provides three cate- at present. The results of the survey also showed the gories of keywords: (1) the ones as in the diversity of research data generated within a full uni- printed publication, (2) controlled ADC key- versity, and verified the need to investigate the char- words (take from controlled sets), and (3) mis- acteristics of different types of research data within sion name, a header which precedes the Lassi et al.: Research data services 269
satellite name for data originating from the sat- information about a particular data set and the ellite mission. researcher’s doings in terms of curating that data set. FGDC/CSDGM (Federal Geographic Data The method consists of an interview template with Committee Content Standard for Digital Geos- questions concerning RDM of a specific data set. The patial Metadata) supports two categories of interviewee is the researcher who has created the data subject metadata: (1) theme keyword the- set and the interviewer is a librarian or from a related saurus, from a formally registered thesaurus profession. Also included in the DCP Toolkit are or a similar authoritative source of theme key- guidelines and instructions to the interviewer as well words, (2) theme keyword, a common-use as instructions to the interviewee. The interviews are word or phrase used to describe the subject of to result in data curation profiles for specific research the data set. areas or projects, and could be useful both to research ClinicalTrials.gov Protocol Data Element support staff as well as to the researchers themselves. Definitions for its keyword element advises use The DCP template comprises 13 modules on RDM of words or phrases that best describe the pro of a specific data set. The questions cover the nature tocol with a note to use Medical Subject Head of the specific data set, such as form, format, size, and ings (MeSH) where appropriate. bring up aspects like sharing, archiving, discovery, organization and description, linking, interoperabil- Several of the others use Dublin Core’s dct: subject ity, and measuring impact. They are formulated in a element, which as in the previous examples allows generic way in order to allow their use in all kinds of free keywords, while recommending the use of con- scientific areas. In order to address metadata and sub- trolled vocabulary. ject access in particular, four questions were added to the interview worksheet in Module 6 – Organization Methodology and Description of Data, each followed by a 5-point The research questions guiding the study are: Likert scale ranging from ‘not a priority’ to ‘high priority’, followed by the ‘I do not know’ option 1. What are the respondents’ current practices (please see Appendix for the modified version of the of research data management? list of questions in Module 6): 2. What are the respondents’ current practices of using subject metadata to describe their data? The ability to apply standardized subject clas- 3. What are the implications for developing sification to the data set. (Please list relevant research data services? controlled vocabularies next to this table); The ability for automatic suggestions of key- The research questions have been investigated words for subject classification; through structured interviews with researchers, during The ability to add your own tags to the data set; which they were asked to fill out the interview work- and sheet of the Data Curation Profiles Toolkit, while the The ability to connect the data set to the key- session was audio recorded. The researcher was asked words of the publications that are based on it. to respond to the questions in the interview worksheet with a specific project in mind on themes such as After the development and launch of the DCP sharing, ingestion to a repository and organization and Toolkit, in 2011 about a dozen workshops on how description of data. The study design is further elabo- to use the toolkit were held across the USA. As a rated below. result, several libraries have constructed data curation profiles which have been published in a public direc- Data Curation Profiles Toolkit tory of the DCP (Carlson and Brandt, 2015). In this exploratory study the Data Curation Profiles In the years since 2011 several projects have used (DCP) Toolkit was used, developed through a the DCP in the development of RDM services research project conducted by the Purdue University (Bracke, 2011; Brandt and Kim, 2014; Carlson and Libraries and the Graduate School of Library and Bracke, 2013; McLure et al., 2014; Wright et al., Information Science at the University of Illinois 2013). The Cornell University Library used DCP Urbana-Champaign (Witt et al., 2009). The DCP when developing a research data registry at the uni- Toolkit is envisioned as an aid to starting discussions versity (Wright et al., 2013). The library performed between librarians and faculty and in the planning of eight interviews with researchers in a wide range of data services that directly address the needs of subject areas. In spite of considerable variety in researchers. The profiles are supposed to give researchers’ priorities, the project team was able to 270 IFLA Journal 42(4) discern similarities among their needs. In another At the start of each interview, the respondents were project at Purdue University, DCP was used to inves- presented with information about the study and the tigate the needs for data curation for researchers interview session, and were asked to read through and within agricultural science (Bracke, 2011). The proj- sign an informed consent form. Following the ect was focused on establishing the role which the research ethics guidelines of the Swedish Research subject librarian in Agricultural Science could take Council (2011) the informed consent form stated that in data curation. The project performed interviews all answers would be anonymized and that no risks of based on the DCP, and the identified data sets were participating in the study could be predicted. Other put in a prototype data repository. information in the consent form included the aims of One example where they used selected parts of the the study and the explanation that the data collected DCP was a project at the Library of Colorado State might be used for scientific publishing and in other University (McLure et al., 2014). They conducted a studies, in which case all data would be anonymized. number of focus group interviews with researchers in The respondents were asked to fill out the DCP order to explore what kind of data sets researchers interview worksheet during the interview session, had, how they managed their data, and what kind of which was also audio recorded. The audio recordings support they would need in terms of RDM. Methodo- served as an aid for the interviewers to capture any logically, they also investigated the feasibility of explanations or discussions that arose from the inter- adapting DCP to focus groups. The findings showed view worksheet questions that might not have been that focus groups may be very useful when investigat- written down in the worksheet. The interview work- ing general conditions for RDM and to spot trends and sheets were scanned and stored along with consent behaviours among researchers. On the other hand, forms and audio recordings in an online collaborative conducting individual interviews using the DCP gives environment, with access allowed only to the three more specific, granular detail on researchers’ data authors of the paper. management (McLure et al., 2014). Note that the DCP Toolkit recommends that two sessions are conducted with each researcher in order to allow for time to learn from the discussions that Data collection arise from the interviews. In the reported study only Data was collected through structured interviews one session per respondent was scheduled in order to which were guided by the DCP Toolkit’s interview avoid the risk of getting fewer respondents because it worksheet, described above. The DCP interview would take too much of their time for anyone to template was translated into Swedish to fit the accept the invitation to take part in the study. Swedish-speaking researchers at Lund University. At Linnaeus University all interviews were held in English, but allowing the respondents to reply in Data processing Swedish if that was their preference – this was The responses from the interview worksheet were because one of the interviewers there was not a transferred into MS Excel. The respondents were native Swedish speaker. The respondents were anonymized in the MS Excel spreadsheet in order to recruited through email advertisements, personal facilitate the reuse of the data set. The audio record- contacts with researchers and a faculty survey (the ings were used to take notes of statements that could latter only at Lund University). be of interest in relation to the research questions, the In total 12 interviews were conducted – five at statements that further explain something from the Linnaeus University and seven at Lund University. interview worksheet or those that were interesting The interviews were conducted between 28 January examples of situations concerning RDM. and 28 April 2016 and lasted between 46 and 119 For each of the respondents at Lund University, a minutes. Respondents were at different stages of their data curation profile was created based on the instruc- research career, ranging from PhD candidate to pro- tions provided by the DCP Toolkit. The audio record- fessor. Further, the respondents were active in a wide ings contributed to creating a set of recommendations range of research areas, such as archaeology, biology, tailored to each respondent concerning their current business administration, film studies, library and and future RDM practices. The recommendations sec- information science, and media and journalism. In tion is a modification of the original data curation this exploratory study we could not cover all disci- profile. The data curation profile was delivered to the plines, e.g. informatics and chemistry. A follow-up researcher with an invitation to a follow-up meeting study with a larger sample size would be able to cover on the DCP. We followed the detailed guidelines and more disciplines. instructions for constructing the data curation profiles Lassi et al.: Research data services 271 from the collected data. Together with the audio questions to the data, or start with the deeper form of recordings, the paper-based interview worksheets analysis. Of the 12 data sets four were bound by pri- provided rich material for creating the data curation vacy or confidentiality agreements, whereas five were profiles. not, and one respondent was unsure.
Data analysis Attitudes towards, and incentives for, sharing data Guided by the research questions, the data analysis Out of the 12 respondents two have deposited the data was started by a simple numeric analysis of the num- in focus in the interview in a repository. Among the ber of responses to each question. These results were other 10 respondents, eight stated that they were will- complemented by analysing the relevant modules of ing to deposit their data, while two were hesitant to the audio recordings, aiming to find any explanatory share their data. The hesitancy was motivated by the statements or comments to the questions and their fact that the data was sensitive, comprising inter- responses. Whereas the numerical data analysis was views, and that the researcher had promised their conducted for all the data, in this paper we focused on respondents confidentiality. Further, 11 of the 12 Modules 3 (Sharing), 5 (Ingestion), 6 (Metadata) and respondents stated that their data would probably be 7 (Discovery) for the qualitative analysis, as these of interest to others, suggesting, for example, public were deemed to in conjunction cover many, but not libraries, media companies, teachers, study partici- all, of the aspects of RDM. pants, activists and non-government organizations. When speculating about the use that others could have Results of their data, respondents suggested addressing new research questions, developing educational pro- As stated, the respondents conduct research in a wide grammes, conducting statistical analysis, and serving range of scientific research areas, using a wide range for informed policy making. As to citation, nine of the of research methods to collect, process and analyse 12 respondents would require a citation or an data. Also, the study is based on a small number of acknowledgement when others used their data. respondents, 12 persons in total. Therefore, the results The first data stage of the data management cycle are presented to show this broad array of experiences was commonly seen as the best stage to share, by 10 and descriptions. out of 12; of these, five would share the data with anyone, and the other five would prefer to share with Research data sets immediate collaborators or others in their field. The tools used by the respondents to generate data Whether the respondents would share the data at the varied greatly and included audio recorder (3), cam- second data stage differed: three respondents stated era (3), questionnaires (2), pen and paper (2), sensors, that they would not share at that stage, while nine field notes and a plethora of software tools. Most of stated that they would share, but predominantly with the tools required to utilize the respondents’ data are immediate collaborators or researchers in the same proprietary software that requires a purchase to use, field (5). The hesitancy to share data at this stage such as Excel (4/12), SPSS (2), NVivo (1), Word (1) could perhaps be explained by the processing or initial and MATLAB (1). The R software environment and analysis done to the data, having moved from raw data programming language and the Genome browser are to another state which could possibly be more difficult two examples of tools used, available under licences for others to understand or use. Of the 12 respondents allowing free use for academic purposes. eight indicated that they would share their data after The DCP interview guide asked the respondents to they had published results. identify the stages that their data had passed through As stated above, in Sweden research data are during their research project. All respondents identi- owned by the academic or research institution at fied at least two stages, commonly three. The first which the researcher was employed at the time of the stage typically concerns raw data collected or data collection. Of the 12 respondents five reported obtained by the respondents. The second stage typi- knowing that their employers owned the data, cally concerns some type of processing, e.g. quality whereas the answers to this question among the other checking or cleaning the data, or preliminary analysis, seven respondents varied to include the researcher (4), e.g. first coding. In total 11 out of 12 persons declared the research community (1) and the general public (1). they had a third data stage. In the third data stage the The research funders of the data collected in the data are often in a second form of analysis format, and respondents’ studies generally do not require data to in this phase the researchers start posing their research be shared or deposited in a repository (11/12), while 272 IFLA Journal 42(4) one person stated that there was such a requirement publications, they might not know the name for this but that it would be illegal to share the data due to activity. Respondents who elaborated on their privacy protection. responses in the worksheet indicated that subject clas- The ability to see usage statistics of their data sets sification is important for others to find their data sets had a high priority (5/12) or medium priority (5) for and that standardization of the classification is needed the majority of the respondents. The majority of the to ensure findability. Also, a respondent described the respondents indicated that the ability to gather infor- importance of standardized KOS in archaeology, as mation about the people who have used the data set places have changed names throughout history. would be of high (6/12) or medium priority (3), and a The ability for automatic suggestions of keywords few that it would be of low priority (3). Suggestions of for subject classification was also deemed as high to other metrics or analytics that could be of interest to medium priority (high priority 6/11, medium priority the respondents include citations (4), seeing publica- 3, low priority 2). Out of the six respondents who tions based on the data set, and the country, academic rated this as having a high priority, four stressed the or research institution, and time at which a data set has importance of them being able to make the final deci- been used. sion as to which suggestions to add to the data set. One respondent noted that automatic suggestions of keywords would be a good idea to start to think about Metadata classification; they generally do not think about key- The respondents organized and described their data words before it is required to add them to a sets in a wide variety of ways, including metadata publication. schemes, codebooks, a paper filing system using plas- Ability to add one’s own tags was considered the tic folders ordered geographically and by topic, and top priority (high priority 9/11, medium priority 2) tables in a MS Excel file. Two respondents used stan- regarding subject metadata features. One respondent dardized metadata schemes: one scheme is provided justified the high priority by explaining that the devel- by a Swedish national infrastructure, and one is a opment of the research area moves quicker than the domain-specific taxonomy. One respondent reported classification systems, so tagging as a complement to using a particularly wide range of different descrip- standardized subject classification would increase tions to cover the needs of others who may be inter- findability and show the data set’s uniqueness. ested in the data, including: a standardized metadata Another respondent suggested that tags could be use- scheme coupled with their own classifications to ful to inform others about anomalies that might be describe instruments and parameters measured, tech- interpreted as errors but are actually just outliers. nical specifications of, for example, how often mea- Ability to connect the dataset to the keywords of surements were conducted, and a verbal description of the publications based on it was also considered how the data had been collected and processed. rather important (high priority 5/11, medium priority The majority (7/12) considered that the existing 3, low priority 1, N/A or I do not know 1). Among organization and description are sufficient for others the more hesitant respondents, one responded that it to understand and use the data. One respondent noted would just be confusing, and another one stated that that there is a risk that codebooks are too detailed and everything that can be automated does not have to be complex for someone else to read, which could automated in its own right; choosing keywords one- impede the understanding and correct use of the data. self could add a level of reflection to the process. The ability to apply standardized metadata from Having a more positive attitude towards this, one the respondents’ own fields was considered important respondent stated that the importance of this ability by most (high priority by 8/11, medium priority by 2, is more or less self-evident. low priority by 1). The ability to apply standardized subject classification to the data set was also consid- ered of medium to high importance (high priority 5/ Data ingestion into a repository 11, medium priority 5, low priority 1). All (12/12) respondents reported on the need to pre- Most (10) respondents requested more information pare the data for their ingestion into a data repository about what was meant by the question ‘[t]he ability to to some extent, ranging from what the respondents apply standardized subject classification to the data viewed as hardly any work at all (2) for raw data, to set’. This could indicate that the question is designed a lot of work (1). Activities needed were reported with library and information science jargon; while to be related to making the data set understandable researchers may be required to add keywords and to others by preparing codebooks and metadata; fill- other subject classification when submitting ing out gaps or fixing errors in the data; digitizing Lassi et al.: Research data services 273 materials that are on paper; and anonymizing identi- the distribution was three high priority, three medium, fying information from interviews. One respondent four low, two not a priority; for discovery via a search expanded on the process of making the data set under- service like Google the distribution of answers was standable through adding sufficient metadata to them, five high priority, two medium, three low, two not a stating that it takes a lot of time and effort, and needs priority. to be balanced with how much work is needed for the The replies varied greatly as to how the researchers data to be understandable and usable. envision the users would use the data, and included Most respondents (7/12) expressed high priority for various types of discovery services, ranging from the ability to submit the data by themselves, while a Google, databases like repositories or library catalo- few (4) thought it was low priority. Two respondents gues, integrated cross-database search, to defining stated that preparing and submitting the data should interfaces such as access through metadata, datasets be done by experts, one of them suggesting the library divided into subject areas, search box with keywords, as a potential service provider. A respondent who classification-based browsing interface. expressed high priority to submitting the data them- The ability to connect data sets to visualization or selves explained that they wanted control of the inges- analytical tools was highly desired by most respon- tion process. dents (7/12), but had a low (2) or no priority (1) for a As to the automatized submission process, the opi- few others. Allowing others to comment on or anno- nions ranged from ‘it is not possible’ (1) and ‘not a tate their data sets had varying importance from high priority’ (5), to ‘medium priority’ (2) and ‘high pri- (4/12) and medium priority (3), to low (2) or no pri- ority’ (4). Two respondents stated that they would ority (3). appreciate it as a time-saving factor and the fact that Regarding the ability to connect data with publica- it would be possible only for some types of data such tions and other research output, most respondents as those that do not need to be anonymized, or partic- reported a high (5/12) or medium priority (4), with a ularly secured for privacy reasons. One respondent few giving this a low priority (2), or did not know (1). stated that they were generally in favour of automat- Support for web service APIs to give access to their ing processes, but for these purposes it seems a bit data was seen as being either of high priority (4/12) or risky in case there are errors in the data set that they being of low priority (3). Three respondents have not yet had time to fix before the automated responded that they did not know or that this was not submission process starts. applicable for their data. The ability to connect or The possibility of batch upload to repository was merge their data with other data sets was again considered high priority by six of 12 respondents, reported as either a high (4/12) or medium priority medium priority by a few (2), and low and no priority (5), or not a priority at all (3). by one person each; in addition, two people chose the N/A/I do not know option. As above, the time factor was reported as an important factor for batch upload, Data preservation as well as creating a collection of all the data (thou- Which parts of the data set that would be most impor- sands of documents, photos and notes) in one set. tant to preserve over time varied greatly. Some respon- dents (4 out of 12) responded that everything is equally important, including documents from funders, project Discovery and use of the data descriptions, data at different stages of the research The ability for researchers within the same scientific process and publications. Other respondents (4) indi- area to easily find the data was ranked as high priority cated that it would be sufficient to preserve the raw by all respondents, although one referred to metadata data. Lastly, some respondents (4) responded that one about the data, rather than data itself due to high con- or a few stages of the processed data would be most fidentiality. Possibility for researchers outside the sci- important to preserve. The time period for which the entific area at hand to easily find the data was also respondents estimated that their data would be useful considered a high priority (10 high priority, 2 or valuable to others ranged from indefinitely (5) to medium; of the high 1 answered for metadata in the 10–20 years (3), 5–10 years (1), 3–5 years (2). One same context as for the previous question). The pos- person indicated that they did not know. sibility for the general public to easily find the data The ability to audit datasets to ensure structural and the possibility to discover them via a search ser- integrity long-term was indicated to be of either high vice was also considered important; on average it was (6/12) or medium priority (3). Two respondents less important than the first two possibilities to make responded that this had a low priority, and one that the data available to researches. For the general public they did not know. 274 IFLA Journal 42(4)
Migrating datasets into new formats over time had including setting embargos on data sets when meta- a high (8/12) or medium priority (3), whereas only data could be available, though the data themselves one respondent indicated that it had a low priority. are not. Similarly, the need for a secondary storage site for As to data ingestion, the fact that most respondents datasets was indicated to be of high (8/12) or medium need to prepare the data for their ingestion into a data (2) priority, one respondent responding that it was not repository means that in order to save the time of the a priority, and one respondent indicating that they did researcher, a data management plan would need to be not know. Having the secondary storage site at a dif- in place, which could help the researcher plan for the ferent geographic location was not as prioritized, ingestion already in the planning phase of a project. responses varying from high (5) to medium priority Similarly, services from the library could be provided (4), one respondent giving it a low priority, and one to support both the planning and the ingestion, which responding that they did not know. was suggested by a few respondents. Combining the possibility for researchers to submit data to a reposi- tory themselves with a service in which librarians Concluding discussion deposit data on the researcher’s behalf would be use- During the interviews, the respondents shared their ful, to facilitate sharing for researchers who want con- experiences with collecting, processing, managing trol over the ingestion phase, as well as those who do and storing data in their everyday work. They could not have the time to do this themselves. By providing easily give examples of situations that arise in their training in data ingestion, researchers could become RDM practices and procedures that they take for more and more self-sufficient in sharing their data their data. However, they had not always reflected themselves. upon why they worked in a certain manner and if As to data organization and description, services to there were other ways of managing their data, for support organization and description of data in order example concerning going about long-term preser- for others to use them, need to be provided. Availabil- vation and sharing of data. As expected, some ity in multiple formats should also be supported. Stan- respondents had more elaborate and thorough proto- dardized metadata, controlled vocabularies, cols for RDM, predominately in areas that require automated suggestions of keywords in general and meticulous protocols such as environmental research automated assignment of keywords from the related based on observational data. publication(s) in a repository, ability to add own key- Generally the respondents are positive towards the words should all be supported. Providing training to idea of sharing research data, and they show a clear researchers in how to use metadata to correctly awareness about it. For most of the researchers, shar- describe data also seems to be a valuable service. ing and exchanging data with research colleagues is Such training could include the different characteris- very common and natural, in particular when colla- tics and uses of subject metadata derived from con- borating with others. This could possibly include trolled vocabularies compared to add user-generated sharing data for educational purposes, as open educa- tags, and how to use the correct terms for the intended tional resources. If there are sensitive aspects to the audience of the data set. research data, the researchers are also aware of this, As to funders, there still does not seem to be a not least if they have had to get permissions from an policy requiring them to draft a data management ethics board to collect the data. Concerning which plan, to share publish or deposit data in a repository, types of research data to share, it seems least proble- and in a majority of cases to preserve the data beyond matic for the respondents to share data from the first the life of the funding. However, this may change data stage, e.g. raw data. It seems the respondents are soon depending on the Swedish government’s deci- more hesitant about sharing data which has undergone sions regarding the policy suggested by the Swedish some initial processing or analysis, corresponding to National Research Council, so it is a good time to start the second or third data stage. Those who are willing planning for these services. Provision of authorized to share data from the second or third data stage would access only would also need to be ensured, as a con- prefer to share with immediate collaborators or siderable portion of datasets may be bound by privacy researchers in the same field only. There is a concern or confidentiality concerns. that the ‘early processed’ data may not be properly A long-term preservation policy, whether local or analysed or interpreted by an external researcher. In national, needs to take into consideration the these context libraries and other research support ser- resources required to archive vast amounts of data vices may play an active role in providing guidance created at universities in Sweden. Although archives on how to share data in trustful and secure ways, typically require some selection of resources to be Lassi et al.: Research data services 275 archived, the respondents saw great value in all their References materials being archived. It may also be relevant to Andersson U, Alfredsson A, Arvidsson S, et al. (2011) preserve auxiliary resources related to a research data Projektet Forskningsdata inom humaniora och set, including data collection instruments, software konstnarliga¨ vetenskaper - Open access? Projektrap- and databases used and created during different stages port till Kungl. biblioteket, Programmet for¨ OpenAc- of the research process. This could be addressed in cess.se [The Project Research Data in the Humanities future research. and Arts – Open Access? Project Report to the Using the DCP Toolkit to gather data about National Library of Sweden, Programme for OpenAc- researchers’ RDM and needs for service development cess.se]. Available at: https://gupea.ub.gu.se/bitstream/ 2077/29112/1/gupea_2077_29112_1.pdf (accessed 9 was a valuable experience. Based on the interviews September 2016). we created data curation profiles that can provide Bj¨orklund C and Eriksson J (2007) Forskningsdata i valuable guidance for the researchers and the results oppna¨ arkiv och universitetsarkiv: en forstudie¨ vid of the interviews will be considered in our further Goteborgs¨ universitet, Lunds universitet och Sveriges research data service developments. In some of the Lantbruksuniversitet.Projektrapport till Kungliga bib- interviews, the work sheet responses were quite brief, lioteket, Programmet for¨ OpenAccess.se [Research and thus hard to translate into something meaningful Data in Open Repositories and University Reposi- in the data curation profile. The audio recordings of tories: A Pilot Study at Gothenburg University, Lund the interviews provided extra information what University and Swedish University of Agricultural helped to make a richer profile. Also, we added an Sciences. Project Report to the National Library of extra section to the data curation profile with perso- Sweden, Programme for OpenAccess.se]. Available nalized recommendations focusing on the research at: https://gupea.ub.gu.se/bitstream/2077/7379/1/for skningsdata_rapportKB.pdf (accessed 9 September data management of future research projects. This 2016). was a particularly appreciated part of the data curation Bohlin A (1997) Offentlighet & sekretess i myndighets profile, according to communication and follow-up forskningsverksamhet [Public Disclosure and meetings with the respondents. As the construction Secrecy in Government Research]. Stockholm: of data curation profiles was a rather cumbersome Riksarkivet. process, we would want to simplify the process if Borgman CL (2015) Big Data, Little Data, No Data: Scho- we were to introduce data curation profiles as a reg- larship in the Networked World. Cambridge, MA: MIT ular service. This would require further investigation Press. into the benefits and challenges of the DCP Toolkit Bracke M (2011) Emerging data curation roles for librar- from the researchers’ perspective as well as the ians: A case study of agricultural data. Journal of Agri- library’s perspective. According to the website, the cultural and Food Information 12(1): 65–74. DOI: 10. DCP Toolkit is in the planning stages of redesign, 1080/10496505.2011.539158. Brandt SD and Kim E (2014) Data curation profiles as a so the profile construction process may be made sim- means to explore managing, sharing, disseminating or pler in the future. preserving digital outcomes. International Journal of The next step of this project is to continue to ana- Performance Arts and Digital Media 10(1): 21–34. DOI: lyse the qualitative data, i.e. the recordings of the 10.1080/14794713.2014.912498. interview sessions, which will provide complemen- CarlhedCandAlfredssonI(2009)SwedishNationalData tary information to the DCP Toolkit questions, includ- Service’s strategy for sharing and mediating data: Prac- ing explanations to the responses to the worksheet tices of open access to and reuse of research data - the questions. Lund University Library is in the planning state of the art in Sweden 2009. In: IASSIST/IFDO stages of a study evaluating a tool for data manage- 2009, Mobile data and life cycle. IASSIST’s 35th ment plans, which will complement the results from annual conference, Tampere, Finland, 26–29 May this study concerning, for example, file types, 2009. Available at: http://uu.diva-portal.org/smash/ volumes of data and needs for storage solutions. get/diva2:396306/FULLTEXT01.pdf (accessed 9 September 2016). Carlson J and Bracke MS (2013) Data management Declaration of Conflicting Interests and sharing from the perspective of graduate stu- The author(s) declared no potential conflicts of interest dents: An examination of the culture and practice with respect to the research, authorship, and/or publication at the Water Quality Field Station. portal: Libraries of this article. and the Academy 13(4): 343–361. DOI: 10.1353/pla. 2013.0034. Funding Carlson J and Brandt SD (2014) Data Curation Profiles. The author(s) received no financial support for the Available at: http://datacurationprofiles.org/ (accessed 9 research, authorship, and/or publication of this article. September 2016). 276 IFLA Journal 42(4)
Carlson J and Brandt SD (eds) (2015) Data Curation Pro- Author biographies files Directory. Available at: http://docs.lib.purdue.edu/ Koraljka Golub is a researcher in the field of digital dcp/ (accessed 9 September 2016). libraries and information retrieval. Her research has in Digital Curation Centre (2016) Disciplinary Metadata. Available at http://www.dcc.ac.uk/resources/metadata- particular focused on topics related to knowledge organi- standards (accessed 9 September 2016). zation, integrating existing knowledge organization sys- ICOS Carbon Portal (2016) ICOS Carbon Portal. Avail- tems with social tagging and/or automated subject able at: https://www.icos-cp.eu (accessed 9 September indexing, and evaluating resulting end-user information 2016). retrieval. She is currently running a Digital Humanities Johnsson M and Lassi M (2016) Hantering av forsknings- initiative at Linnaeus University, and is also exploring data vid fakulteterna inom Lunds universitet – en establishment of an iSchool at the same university. She lagesbeskrivning¨ hosten¨ 2015: Rapport av ett projekt works as an Associate Professor at the Department of utfort¨ vid Universitetsbiblioteket [Research Data Man- Library and Information Science, School of Cultural agement at the Lund University Faculties – A Descrip- Sciences, Linnaeus University. She also serves as an tion of the Current Status, Fall 2015: Report of a Adjunct at the School of Information Studies, Charles Project at the Lund University Library]. Lund Univer- Sturt University, Australia. For over five years in the sity Library. Available at: http://portal.research.lu.se/ portal/files/5616081/8725586.pdf (accessed 9 Septem- recent past she worked as an information science ber 2016). researcher at UKOLN, University of Bath, UK. Her edu- Lancaster FW (2003) Indexing and Abstracting in Theory cational background covers technology, social sciences and Practice. 3rd edn. Champaign, IL: University of and humanities. In 2007 she obtained her doctoral degree Illinois. from the Department of Information Technology, Lund Linnaeus University (2016) Digital Humanities. Available University, Sweden, on the topic of automated subject at: https://lnu.se/en/digihum (accessed 9 September classification. Her earlier degrees are from the Depart- 2016). ment of Information Sciences and the Department of Eng- Lund University Libraries (2016) Organisation. Available lish, University of Zagreb, Croatia. at: http://www.lub.lu.se/en/about-the-library-network/ organisation (accessed 9 September 2016). Maria Johnsson is a librarian specializing in research McLure M, Level AV, Cranston CL, et al. (2014) Data support services in the Section of Scholarly Communi- curation: A study of researcher practices and needs. cation at Lund University Library. She has a special Libraries and the Academy 14(2): 139–164. DOI: 10. 1353/pla.2014.0009. focus on research data management and e-science, and Swedish Research Council (2011) God forskningssed on how libraries may develop services within research [Research Ethics]. Available at: https://publikationer. data management. Before joining the University Library vr.se/produkt/god-forskningssed/ (accessed 9 Septem she had a position at the Library of Faculty of Engi- ber 2016). neering, Lund University. She also has experience in Swedish Research Council (2015) Forslag¨ till nationella working with library and information services at differ- riktlinjer for¨ oppen¨ tillga˚ng till vetenskaplig information ent companies. She has a Master’s in Library and Infor- [Proposal for National Guidelines for Open Access to mation Science, combined with studies in Modern Scientific Information]. Available at: https://publika Languages. tioner.vr.se/produkt/forslag-till-nationella-riktlinjer-for- oppen-tillgang-till-vetenskaplig-information/ (accessed Monica Lassi is a librarian focusing on research data man- 9 September 2016). agement and research infrastructures, working at the Witt M, Carlson JD, Brandt SD, et al. (2009) Construct- ing data curation profiles. International Journal of Department of Scholarly Communication, Lund University Digital Curation 4(3): 93–103. DOI: 10.2218/ijdc. Library. She also works for the research infrastructure v4i3.117. ICOS Carbon Portal, a pan-European research infrastruc- Wright SJ, Kozlowski WA, Dietrich D, et al. (2013) Using ture collecting and disseminating quality-controlled obser- data curation profiles to design the datastar dataset reg- vational data related to greenhouse gas. She is a member of istry. D-Lib Magazine 19(7/8): 37–49. Available at: the research group Information Practices: Communication, http://www.dlib.org/dlib/july13/wright/07wright.html Culture and Society at the Department of Arts and Cultural (accessed 9 September 2016). Sciences, Lund University. Before joining the Lund Uni- ˚ Ahlfeldt J and Johnsson M (2015) Research Libraries and versity Library in 2014, Monica worked as a lecturer at the Research Data Management within the Humanities and Swedish School of Library and Information Science for 14 Social Sciences: Project Report. Lund: Lund University years, teaching knowledge organization, content manage- Library. Available at: http://portal.research.lu.se/portal/ ment and interaction design. Monica’s educational back- files/6286782/5050466.pdf (accessed 9 September ground includes library and information science, 2016). Lassi et al.: Research data services 277 informatics and language technology. She holds a PhD in a socio-technical approach to the design of a collaboratory Library and Information Science from the University of for Library and Information Science’. She also holds an Gothenburg, Sweden. The title of her doctoral thesis, MSc in Library and Information Science, which includes defended in 2014, is ‘Facilitating collaboration: Exploring studies in systems architecture.
Appendix Questions added to the original interview worksheet are indicated with an asterisk (*)
Module 6 – Organization and Description of Data 3. Please prioritize your need for the following types of services for your data.
Not a Low Medium High I don’t know priority priority priority priority or N/A
The ability to make the data accessible in multiple formats. The ability to apply standardized metadata from your field or discipline to the dataset. * The ability to apply standardized subject classification to the data set. (Please list relevant controlled vocabularies next to this table) * The ability for automatic suggestions of keywords for subject classification. * The ability to add your own tags to the data set. * The ability to connect the data set to the keywords of the publications that are based on it. IFLA Article
International Federation of Library Associations and Institutions 2016, Vol. 42(4) 278–283 ‘Essentials 4 Data Support’: Five years’ ª The Author(s) 2016 Reprints and permission: experience with data management training sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/0340035216674027 ifl.sagepub.com
Ellen Verbakel TU Delft, Netherlands
Marjan Grootveld Data Archiving and Networked Services (DANS), Netherlands
Abstract This article describes a research data management course for support staff such as librarians and IT staff. The authors, who coach the participants, introduce the three course formats and describe the training in more detail. In the last years over 170 persons have participated in this training. It combines a wealth of online information with face-to-face meetings. The aim of the course is to support the participants in strengthening various skills and acquiring knowledge so they feel confident to support, advise and train researchers. Interaction among the students is embedded in the structure of the training, because we regard it as a valuable instrument to develop a professional network. Recently the course has taken on a new challenge: in addition to the regular courses a couple of in house trainings have been delivered on request. The paper ends with a description of the key group assignments for such compact trainings.
Keywords Blended learning, data education, data literacy, data support, information skills, training library staff
Submitted: 15 May 2016; Accepted: 12 September 2016.
Introduction to ‘Essentials 4 Data Support’ 4 Data Support course aims to contribute to professio- At the end of 2011 a ‘Data Intelligence 4 Librarians’ nalization of data supporters and coordination course was developed by 3TU.Datacentrum to pro- between them. Data Supporters are people who sup- vide online resources and training for digital preser- port researchers in storing, managing, archiving and vation practitioners, specifically for library staff. The sharing their research data’ (Essentials 4 Data Sup- preparation, design and mission of this training is port, 2014). The training setup as well as all training described in De Smaele et al. (2013: 218): ‘The course content was thoroughly revised, as described in objectives are to transfer and exchange knowledge Grootveld and Verbakel, (2015). about data management, and to provide participants As of May 2016, more than 170 data supporters with the skills required to advise researchers or from Dutch universities, Higher Vocational Educa- research groups on efficient and effective ways of tion organisations, academic hospitals and other adding value to their data’. knowledge institutes have participated in the face- Lessons learned during these training courses and to-face training that Research Data Netherlands developments in the research data management land- provides. scape have led to a revision in 2014. By then, the We are aware that the term ‘data supporter’ is training had become the flagship service of Research unusual, but exactly this makes it inclusive: The Data Netherlands (RDNL). RDNL is a coalition in the field of long-term research data archiving, consisting Corresponding author: of 4TU.Centre for Research Data, Data Archiving and Ellen Verbakel, TU Delft Library, PO Box 98, NL-2600, MG Delft, Networked Services (DANS) and SURFsara. The Netherlands. goal of the training was redefined as: ‘The Essentials Email: [email protected] Verbakel and Grootveld: ‘Essentials 4 Data Support’ 279 course appeals to (aspiring) information profession- Set-up of the blended ‘Essentials’ course als, data stewards, data librarians, research support During the six weeks of a full course the coaches and officers and others. The intentionally unusual term the students have a very intensive contact. The official saves a lot of discussion about job titles and defini- start is on the first face-to-face day. Here the students tions in a highly dynamic field. meet their fellow students and the coaches. The coa- In this paper the coaches present ‘Essentials 4 Data ches inform the students about the course, what will Support’, one of the few courses on data management be asked from them and what the coaches will deliver. that explicitly focus on data supporters. In early 2016 In a game setting the students discuss several aspects Knowledge Exchange organised a survey and work- of data support – differences in the maturity of data shop to collect and share information on current prac- management in their respective organisations start to tices around RDM training. One of the conclusions in show. Also on the agenda are the first expert presen- the survey report is: ‘It is striking that a near totality of tations. The coaches have a broad network of special- respondents provides training for PhD students ( ...) ists working in the field of research data management. librarians, data curators and IT departments/computer These experts are invited to give a presentation on centres are catered for by a third or less of respon- their daily work in the field: what problems occur, dents’ (Goldstein, 2016: 7). what worked well, what did not? What experiences In the next section the three training variants are can they share with the students? In general, the stu- introduced. This is followed by sections on the setup dents appreciate these presentations very much; they of the blended learning course and on the intended ask many questions which feed the discussion and the competencies and learning goals. After a sketch of subsequent assignments. Data management planning the course website the authors describe how the reg- is a standard topic on the first day, to prepare the ular blended learning course enables them to also participants for their first assignment, viz. to write a accept invitations for tailor-made in-house training. data management plan. A very important part of the course is the commu- nication among the students and with the coaches. A Training variants private forum is provided for the group on the course Since the first training in 2011 the course materials website. Students have to upload their assignments have been publicly available on the website, both in and are required to give feedback on the work of their English and in Dutch. The current offering consists of fellow students. This helps them to explore different three variants: angles and approaches. Furthermore, questions can be posted on the forum about the course and/or about the The full – blended – course consists of two content. Because the forum is restricted to the group face-to-face days, with fellow students, experts the students usually feel safe to do so. This can lead to and coaches. In the six weeks between the first very lively discussions on various aspects of data and the second day students familiarise them- management. The coaches use the group forum too selves with the content of the online learning for their comments on the weekly assignments and to environment, do assignments, and give feed- answer questions. back on the assignments of their fellow stu- The second face-to-face day again contains one or dents. For this they use the group’s private two guest lectures, but a substantial part of this day is forum. Students pay a fee for the full course. reserved for student presentations. One of the assign- In the Onlineþ course, registered students have ments, spanning the full six weeks, is to follow recent access to the online course materials. They are developments in one of the fields of research data not entitled to support from coaches, but are management close to their personal learning goals and welcome to contribute to discussions in the to share some findings with the group. That gives an open forum. overview of the subjects they learned about and how Online-Only students can take the course at they approached the task – varying from using general their own pace, without access to social search engines to subscribing to data and research features. blogs, mailing lists and twitter streams. From their colleagues’ presentations on the information- The next section describes the set-up of the blended gathering process and the achieved results every stu- course. Recently, we have accepted a few requests for dent learns a lot on recent topics. in-house training. Such training is bespoke and The final task at the second day, called ‘good inten- reduced variations on the full course and will be tions’, aims at the immediate future: The students described in the section ‘Recent developments’. 280 IFLA Journal 42(4) must formulate data-related activities that they will the Interest Group on Education and Training from carry out once they return home, e.g. design a data the Research Data Alliance (2013) is also defining the support workflow, discuss institutional data policies competencies needed for handling research data. For with their manager, or localise data teaching materials our purposes it is important to keep in mind that for young researchers. The activities are ideally for- RDNL intends to offer basic knowledge and skills mulated as small steps towards a concrete goal that to support staff; a project like EDISON (2015), for can be reached in the next six months. At that time the example, which is developing a curriculum for data coaches contact the former students to ask what has scientists, has clearly higher ambitions. been achieved. The students appreciate the interest, as they recognise the purpose to keep the flow of the course going during daily work, and to really put into Learning goals practice what they found inspiring. The name for the course – ‘Essentials 4 Data Support’ – refers to the main goal of the course: teaching the basic Competencies knowledge and skills (essentials) to enable a data sup- In the process of developing as well as during revising porter to take the first steps towards supporting the course the following competencies were defined researchers in storing, managing, archiving and sharing to be essential for the future data supporters. A data their research data. Some learning goals are: supporter: To get an overview of the data supporter’s field of work, including the research life cycle and the orga- Skillfully handles ICT: Efficiently uses avail- nisation of data support, e.g. in the form of the able information technology. In the last RDNL front office – back office model (Dillo and decades research data became digitally born, Doorn, 2014): and ICT knowledge becomes even more important. To understand the different parts of a data man- Shows entrepreneurship: Proactive attitude to agement plan; improve data services in response to changing To know about the various ways to store, needs in the field. Keeps an eye on trends backup, organize and document research data; which emerge in the profession, knows where To know types of archives, data publication knowledge is available (networks) and dissemi- and data citation; nates important information to key people in To advise researchers in balancing legislation the organisation. and practice; Sees from the whole: Acknowledges that data To be able to engage in a discussion with are only part of the scientific lifecycle and is researchers. aware of the significance research data have or The complete set of learning objectives of the six carrying out scientific research. Sees data and chapters can be found in (Grootveld and Verbakel, information services as part of larger whole in 2015). which decisions are made. An implicit goal, which for us has been dominant Has consulting skills: Can handle questions from the start, however, is that participants gain self- skillfully and knows when to address a dedi- confidence in discussing research data with research- cated expert. Can empathise with customer ers. The training has been developed to encourage perceptions. them in their professional dealings, by providing the Has co-operative skills: Examines how colla- knowledge to be a partner to the researcher. Strictly boration with others (employees, researchers, speaking this concerns the participants’ attitude rather institutions) may enhance service provision. than their knowledge or skill set. The DigCurV project has developed the DigCurV Curriculum Framework (DigCurV, 2013) which offers a means to identify, evaluate and plan training The ‘Essentials’ website to meet the skill requirements of staff engaged in In 2014 the website was redesigned by an external digital curation, both now and in the future. This developer. The platform for the course is T3Elearn- framework has inspired us when we redesigned the ing, a Learning Content Management Framework training, and the DigCurv game is still part of based on Typo3 CMS. The content has been the face-to-face training (see section: Setup of the designed and developed with valuable input from blended ‘Essentials’ course). A more recent initiative, subject matter experts within and outside RDNL, Verbakel and Grootveld: ‘Essentials 4 Data Support’ 281
Figure 1. User interface of Essentials 4 Data Support, Chapter II. of whom several also give a guest lecture at one of contain a large amount of external links to good prac- the training days. tices, definitions and reading materials provided by The website changed into a more user-friendly other organisations. This makes the site a valuable design. This was achieved by including more work of reference – the video clip about data manage- images and some videos with brief talks by ment planning has been viewed 4500 times – and the researchers and ‘how to’ videos about data manage- students rate the website highly in their evaluations. ment planning, data sharing concerns, reproducible The assignments and quizzes in the left-hand menu in research, data citation and persistent identifiers. Figure 1 are only accessible to participants who have Alsonewinthereviseddesignaretheendofchapter enrolled in the full course. The private forum, which quizzes, which allow for a quick recapitulation of these students and the coaches use between the face- the chapter’s contents. to-face meetings for assignments and discussion, is Whereas the initial training featured chapters like part of this website and remains accessible afterwards ‘Technical skills’ and ‘Advisory skills’ of the data for future knowledge sharing. supporters, in the current version the research cycle is leading, rather than the activities of the data sup- porters. This can be seen at the top of Figure 1, where Recent developments the planning phase precedes the research phase, as In the previous year we initiated two new develop- well as the phase in which researchers make their data ments, which we describe in this section. The first available for reuse. Each chapter starts with a short initiative served to strengthen the contacts with and outline of the sections and learning objectives of that among former course participants, while the second particular chapter. In addition to RDNL’s own content one can be seen as a spin-off of the regular blended (available under a CC-BY-SA licence) all chapters training. 282 IFLA Journal 42(4)
Last autumn the coaches organised a reunion for all (DMP) and with identifying the stakeholders in data participants from ‘Data Intelligence 4 Librarians’ and management. With regards to the former, participants ‘Essentials 4 Data Support’. The response was quite have to draft a DMP for a given, fictitious project. good: About 20% of the students, from both course This is an assignment that students do in pairs (in the iterations, accepted the invitation to discuss their regular training) or in small groups (in abridged train- daily work in RDM. They were surprised they had ings), because it is relatively demanding. However, it so much in common, although they represented very is usually the most appreciated assignment: all parti- different institutions. The key speakers at the meeting cipants know that research funders and research insti- were former students, who now have become special- tutes increasingly require DMPs, but they seldom ists, very able and willing to share their experiences. have any practical experience with them at the start With regard to our goal to increase the data support- of the training. Having written a DMP themselves, ers’ professional self-confidence, this meeting con- they feel more comfortable at the prospect of advis- vinced us of the longer-term value of our training ing researchers. In the stakeholder exercise students offer, in a way that even positive evaluations at the have to chart all staff roles, departments and organi- end of a training day cannot do. sations which also play a role in data management, The second development concerns in-house train- and to reflect on the quality of all communication ing, on request. In earlier days RDNL had received and collaboration: ‘Do I know who to contact some requests about in house training for (prospec- when ...and how do I make sure that department tive) front office staff, but decided against it because XYZ informs me about ...?’ We regard both exer- we value the networking possibilities of a mixed audi- cises as crucial stepping stones for exploring the ence even higher than the intended team-building diversity of research data management, and this has effect. However, two recent requests did concern worked out well in both in-house training courses. mixed audiences from the start. First, a Dutch institute The two in-house courses also featured an item on for higher vocational education was looking for RDM storing and archiving research data, and for the training for support staff and researchers, and eventu- Danish train-the-trainer course we designed an ally the institute’s policy department was involved as exercise to analyse and prepare an institutional well. Second, the Danish Forum for Data Manage- implementation of the recent Danish Code of Con- ment, which consists of all Danish universities, two duct for Research Integrity. national libraries and the national archive, was look- ing for a train-the-trainer course. The process in these cases was similar: the customer provided information Conclusion about the training needs of the group and any wishes RDNL’s training ‘Essentials 4 Data Support’ results regarding the duration and intensity of the training, from a clear need for professional data management e.g. with or without assignments ahead of the training support training, a need that was identified five years day or days. On that basis the coaches drafted a pro- ago. Since then Research Data Netherlands has posal in which they included the relevant parts from trained about 170 so-called data supporters, working the regular face-to-face training, or suggested to in research libraries, IT departments, medical centres develop new modules. Clearly, there cannot be a stan- and elsewhere. As front office staff they play an dard fee for a tailor-made training. essential role in research data advocacy and support In comparison to the off-the-shelf training, the coa- for gradually realising the international Open Science ches now became trainers: without guest presenters, it ambitions. We are very pleased to see a vibrant net- was up to them to convey the contents of the course, in work of data supporters in The Netherlands and addition to coaching the participants in the practical abroad, and to be recognised as a party that contri- exercises. Furthermore, for these exercises relatively butes to their professional development. more time was planned than in the regular training, because all networking interaction and knowledge transfer between the participants has to take place dur- Declaration of Conflicting Interests ing the meeting. The combination of the customer’s The author(s) declared no potential conflicts of interest requirements, the broad but not always specialist with respect to the research, authorship, and/or publication knowledge of the coaches/trainers and the time needed of this article. for interaction has led, in both cases, to an abridged and more focused version of the regular course. Funding In both in-house training courses we have included The author(s) received no financial support for the exercises with writing a data management plan research, authorship, and/or publication of this article. Verbakel and Grootveld: ‘Essentials 4 Data Support’ 283
References Research Data Alliance (2013) Interest Group: Education De Smaele M et al. (2013) Data intelligence training for and Training on Handling of Research Data. Available library staff. International Journal of Digital Curation at: https://rd-alliance.org/groups/education-and-training- 8(1): 218–228. DOI: 10.2218/ijdc.v8i1.255. handling-research-data.html (accessed 22 September DigCurV (2013) A Curriculum Framework for Digital 2016). Curation. Available at: http://www.digcurv.gla.ac.uk/ (accessed 22 September 2016). Dillo I and Doorn P (2014) The front office-back office Author biographies model: Supporting research data management in the Netherlands. International Journal of Digital Curation Ellen Verbakel is a trained librarian, but moved to 9(2): 39–46. DOI: 10.2218/ijdc.v9i2.333. 4TU.Centre for Research Data where she is now working EDISON (2015) EDISON: Building the Data Science Pro- as a data librarian. She is situated in the TU Delft Library. fession. Available at: http://edison-project.eu/ (accessed She co-developed the course ‘Data Intelligence 4 Librar- 22 September 2016). ians’ and was involved in redesigning the course into Essentials 4 Data Support (2014) Essentials 4 Data Support ‘Essentials 4 Data Support’. Being a coach at the course Course. Available at: http://datasupport.researchdata.nl/ is a very important part of her daily work. en/ (accessed 22 September 2016). Goldstein S (2016) Training for Research Data Management: Marjan Grootveld is senior policy consultant at Data Comparative European Approaches. Report, Knowledge Archiving and Networked Services (DANS). She advises Exchange, May 2016. DOI: 10.5281/zenodo.50068. knowledge institutes and research funders on data manage- Grootveld M and Verbakel E (2015) Essentials for data ment policy and practice, both in international research support: Training the front office. International Journal projects and by coaching participants of the Research Data of Digital Curation 10(1): 240–248. DOI: 10.2218/ijdc. v10i1.364. Netherlands training ‘Essentials 4 Data Support’. IFLA Article
International Federation of Library Associations and Institutions 2016, Vol. 42(4) 284–291 Research Data Services at ETH-Bibliothek ª The Author(s) 2016 Reprints and permission: sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/0340035216674971 ifl.sagepub.com Ana Sesartic ETH-Bibliothek, ETH Zu¨rich, Switzerland
Matthias To¨we ETH-Bibliothek, ETH Zu¨rich, Switzerland
Abstract The management of research data throughout its life-cycle is both a key prerequisite for effective data sharing and efficient long-term preservation of data. This article summarizes the data services and the overall approach to data management as currently practised at ETH-Bibliothek, the main library of ETH Zu¨rich, the largest technical university in Switzerland. The services offered by service providers within ETH Zu¨rich cover the entirety of the data life-cycle. The library provides support regarding conceptual questions, offers training and services concerning data publication and long-term preservation. As research data management continues to play a steadily more prominent part in both the requirements of researchers and funders as well as curricula and good scientific practice, ETH-Bibliothek is establishing close collaborations with researchers, in order to promote a mutual learning process and tackle new challenges.
Keywords Data life-cycle, data management plan, libraries, preservation, research data, research data management
Submitted: 12 May 2016; Accepted: 8 September 2016.
Introduction in different ways and likewise, expectations from The growing volume of data produced in research has their patrons can vary widely, e.g. by scientific disci- created new challenges for its management and cura- pline. Therefore, the following report should be tion to ensure continuity, transparency and account- understood as a case study rather than a general ability. Timely and effective management of research recommendation. data throughout its life-cycle ensures its long-term Libraries are never the only service providers in a value and prevents data from falling into digital obso- university and, ideally, a range of providers caters for lescence (Corti et al., 2014; Goodman et al., 2014). researchers’ and students’ needs. When it comes to Proper data management is a key prerequisite for data management in particular, IT services will obvi- effective data sharing within a specific scientific com- ously be a strong player on the technical side whereas munity and for data publication beyond any particular research offices must take an interest in how research- target group. This, in turn, increases the visibility of ers comply with internal and external requirements. In scholarly work and is likely to increase citation rates such a landscape, it is important to note that libraries (Piwowar and Vision, 2013: 25; Piwowar et al., should focus on their strengths such as metadata man- 2007). Managing research data throughout its life- agement, content curation, and support and training of cycle is not only a key prerequisite for effective data their customers, in this case the researchers. Also, the sharing but also for efficient long-term preservation services offered are never carved in stone and should because the latter must rely on technical, administra- be adapted to the current needs of science. tive and rights metadata, as well as sufficient context information being available to make sure that data Corresponding author: remains usable and understandable in the long run. Ana Sesartic, ETH Zu¨rich, ETH-Bibliothek, Ra¨mistrasse 101, Depending on their respective institutional setting, CH-8092 Switzerland. libraries can contribute to research data management Email: [email protected] Sesartic and To¨we: Research Data Services at ETH-Bibliothek 285
Figure 1. Actors and their tasks at ETH Zu¨rich along phases of the data life-cycle. Units in red boxes belong to ETH Library.
In the following, we summarize the data services can be challenging to communicate the division of and approach to data management as currently prac- labour transparently to customers. Ideally, there tised at ETH-Bibliothek (ETH Zu¨rich, 2016c), the should be only a single point of contact for customers main library of ETH Zu¨rich (ETH Zu¨rich, 2016b), the to turn to. This has not been established so far. How- largest technical university in Switzerland. ever, it should not matter who customers contact as long as they are redirected appropriately. This, again, requires a reasonable understanding of other units’ The overall concept services, which obviously evolve over time. The role of the ETH-Bibliothek within ETH Zu¨rich is With the Digital Curation Office (ETH-Bibliothek, to support researchers from early on in the data life- 2016a), the ETH-Bibliothek offers a point of contact cycle (see Figure 1), starting with general consulting for technical and conceptual questions regarding regarding compliance, support through the develop- long-term preservation and management of research ment of data management plans, the offer of data data. It also offers help and support to the researchers archiving and publication services, to the creation of of ETH Zu¨rich in managing and publishing their data, DOIs for easier citation and re-use of data. Important as well as following the requirements as stated in the phases of the life-cycle mainly in its active research Guidelines for Research Integrity of ETH Zu¨rich phase (see Figure 1) require competencies beyond (ETH Zu¨rich, 2011: 31). Regarding intellectual prop- what the library can offer. Other actors within the erty and research ethics issues, the ETH-Bibliothek university support these phases and a clarification of closely collaborates with the Technology Transfer the division of tasks between those service providers Office and the Office of Research, respectively. is needed. At ETH Zu¨rich, a good level of mutual Figure 1 illustrates the coverage of roughly defined understanding, for example of the activities in storage stages of the research data life-cycle by actors and ser- and preservation, was achieved between library staff vices at ETH Zu¨rich with their respective tasks. Note and the storage section of the IT services as part of a that Guidelines for Good Scientific Practice and further common small-scale project starting in 2006. Regular legal requirements (centre) apply throughout the cycle, meetings have continued ever since. Nevertheless, it but should obviously be considered from early on. 286 IFLA Journal 42(4)
For a first-time or occasional customer, the multi- and graduate students, as well as scientists, regarding tude of actors can obviously be confusing and users RDM. The philosophy behind this approach is the should not be left alone with it. From our experience, conviction that researchers themselves must be customers appreciate being able to get in touch with a empowered to make informed decisions on their data, contact person they can talk to about their needs. In a as they are the experts with the most intimate knowl- first instance, it is not even expected that this person edge of their own data. The workshop was focused on can provide an immediate solution, but it is important activating teaching methods, engaging the partici- that someone takes care of the issue and provides pants in group work and discussions. guidance as to where to turn. This means that all The majority of the participants at this event were actors must be aware of tasks not within their own working on their doctoral studies with few Master’s portfolio as well, and we are aware that this remains a students and post-docs present. They showed very challenge and requires an ongoing learning process. varied needs and levels of knowledge, but they were As a detail to underline the importance of personal well aware of the problems regarding RDM before- contacts: as one of the first teams within the ETH- hand and were looking for solutions. This is why we Library, the Digital Curation Office put portrait also offer tailored training courses in RDM for groups images of its staff on its website to lower the barrier and departments. These can range from so-called of writing to an otherwise anonymous email box. Tools & Tricks mini lectures, short 15-minutes inputs over coffee for lunch break, to fully-fledged one-day training workshops. It is important to note that certain Data management training departments and institutes already offer similar or Research data management (RDM) has gained increas- overlapping internal training, which is why commu- ing attention over the last years, due to growing aware- nication and coordination are key. As of now, there is ness of the value contained in research data and of the no dedicated course on RDM within the ETH curri- risks of losing such data over time. Apart from the need culum; however, some departments offer methodolo- to manage data over the course of a project, there is a gical courses including research ethics and scientific need to retain and curate data which one plans to work writing which might cover some aspects. With with in the future. It might be possible and sometimes increasing concerns about data management issues, advisable to repeat an experimental measurement, but it is to be expected that the topic will figure more in other cases it might either be prohibitively expensive prominently in curricula in the future. or otherwise ethically unacceptable to do so. For unique observational data, a repetition is not possible at all and accordingly, communities relying on such Data management plan checklist data have long been aware of the challenges. Today, the availability of well-managed data is part While these issues are mainly related to the effi- of good scientific practice and ensures the reprodu- ciency and effectiveness of research and its funding, cibility of research results, a key requirement at the RDM also addresses the accountability of science. One core of the research process. Many funding organi- principle of the scientific process is the requirement to zations prescribe the use of data management plans be able to justify results by providing underlying data and insist on open access publication of the research where necessary. There are significant reputational results they funded. risks involved for individual researchers, principal In some parts of EU’s research programme Hori- investigators and institutions who fail to comply with zon 2020, DMPs will, for example, be evaluated as good scientific practice of which RDM is just one part. part of the impact of a proposal and in the reporting To address these questions a workshop within the during the course of a funded project (European Com- ETH Critical Thinking Annual Programme (ETH mission, 2013: 6). But even if a funding body does not Zu¨rich, 2016e) was developed. The overall aim of this explicitly demand data management, following pro- programme is to strengthen critical thinking and a fessional curation and preservation concepts has responsible approach to taking actions beyond disci- numerous advantages (DLCM, 2016): plinary competences. The workshop introduced some services and tools for RDM, as well as encouraging It greatly facilitates the reuse of research data; the participants to share both their experiences and the As a result, this increases the impact of research methods and tools they use, during the interactive results; parts of the workshop. The goal of the workshop It saves precious research funds and ultimately within the Critical Thinking programme was to natural and human resources by avoiding unne- increase the critical thinking skills of undergraduate cessary duplication of work. Sesartic and To¨we: Research Data Services at ETH-Bibliothek 287
As effective and efficient data management Active data management becomes more and more challenging for both Data management is understood as a comprehensive researchers and information specialists, the question task throughout the data life-cycle. It therefore needs arose how they can best be reached and supported on a to comprise the handling of research data while the national level regarding best innovative practices in actual research is carried out. We call this active data digital preservation so as to succeed in this ambitious management to signify that data at this stage is usually enterprise. This led to the creation of the Swiss Data not static, but keeps being analysed and worked upon Life-Cycle Management (DLCM) project (Blumer as part of the research. At this stage of the life-cycle, and Burgi, 2015: 16), facilitated by swissuniversities subject-specific tools are employed in data processing (swissuniversities, 2016) and involving collaboration which may be implemented and run by a specialized between eight Swiss higher education institutions support unit or by research groups themselves. (EPF Lausanne, ETH Zu¨rich, Universities of Basel, At ETH Zu¨rich, the section Scientific IT Services Geneva (lead), Lausanne, Zu¨rich, Geneva School of of the central IT Services provides this kind of sup- Business Administration at the Western Switzerland port. Research groups from life sciences are known to University of Applied Sciences and Arts, and be among the most intensive users of their services. SWITCH, the national IT service provider for higher Among them figures the data management software education institutions). platform openBIS (ETH Zu¨rich, 2016d), which has The Data Management Plan (DMP) Checklist also been extended with components to serve as an (ETH-Bibliothek and EPFL Library, 2016) is among electronic laboratory notebook (ELN) and/or as a the first tangible deliverables of the DLCM project. It laboratory information management system (LIMS). is meant to be an essential tool aiding researchers in Obviously, this platform and other tools with sim- the management of their data, thus preparing them for ilar aims must already capture a lot of information, later publication and preservation, as needed. By giv- which is relevant for current research. Part of this ing clear guidelines, it should facilitate this task for information might be gathered automatically, while researchers and eventually save them time and effort. more will be required to be entered by researchers, The list has been customized for Switzerland based on with quality depending on their willingness to comply pre-existing national and international policies. It and therefore on the ease of the process. Most of this covers general planning and the phases of the data input and possibly additional documentation will be life-cycle, from data collection and creation to data needed in order to use and make sense of the research sharing and long-term management. Special sections data at a much later stage and with the active data cover documentation and metadata, file formats, stor- platform no longer being available. age, ethical and intellectual property issues. Ideally, While such systems themselves are not meant as the list should be used by researchers to critically publishing or preservation tools for research data, assess their data management and to gather informa- they can very well serve as sources for these pro- tion they might need to create a data management cesses. Ideally, researchers working in such a system plan. It can also serve as a starting point for further should be able to decide which part of the content face-to-face discussions of data management issues must be preserved or can be published to trigger an within research groups and with support staff if export, e.g. to a long-term preservation solution. required. The list is static with no further functionality Such a process has not yet been implemented, but and it will be observed whether a more interactive it is envisaged that an interface from openBIS to the solution will be required later. ETH Data Archive will be developed in the current The checklist was created in close collaboration DLCM project. between the Digital Curation Office at ETH- It should be noted that the exchange between the Bibliothek and the Research Data Team at EPFL different systems must not be understood merely as a Library. It is currently available through the technical transfer. Data at this stage also crosses a ETH-Bibliothek and EPFL Library websites (ETH- boundary between the research world in a narrow Bibliothek and EPFL Library, 2016) and will soon sense and a curation domain (see Figure 2). Such a be disseminated on a national portal on DLCM transition is prone to suffer from misunderstandings which will be launched in the next few months. The between the parties involved due to a lack of common portal itself will touch on many aspects, aggregating standards or even a uniform vocabulary. and providing further information about data orga- This already highlights the importance of close nization, training (in person and online) of end- collaboration not only of researchers and service pro- users, and consulting regarding best practices viders, but also among different providers with among many others. 288 IFLA Journal 42(4)
Figure 2. Transitions between curation domains along the research data life-cycle. Adapted from (Treloar, 2012). complementary competencies. An essential part of the data. This was in line with expectations from both work to achieve this has been a constructive exchange researchers and other service providers in the univer- over several years between ETH Library and the Cen- sity. At the same time, it was decided to address pre- tral IT Services. Even before the Digital Curation servation issues with a view on all kinds of scientific Team existed, staff from various teams in the library and cultural heritage unique to ETH Zu¨rich. This (e.g. University Archives of ETH Zu¨rich, Library IT includes research data from ETH Zu¨rich staff, admin- Services, Consortium of Swiss Academic Libraries) istrative records or personal bequests to ETH Zu¨rich engaged in early pilot projects mainly with the Cen- University Archives (ETH-Bibliothek, 2016c) and tral IT Services team in charge of storage manage- digital born and digitized content of ETH- ment. These activities from 2006 onwards served to Bibliothek, in particular doctoral theses which must understand the required functions, to get an idea of the be deposited in digital form and master files from available competencies and not the least to build trust several large scale digitization projects. within an, albeit loose, network of players in the field Since 2014, ETH-Bibliothek has offered the ETH of data management within the university. This Data Archive as a productive service. It is based on helped to achieve a common understanding of the the commercial long-term preservation system tasks and to raise awareness, e.g. for the need to Rosetta (Ex-Libris Group, 2016b). The application define more precisely what each stakeholder under- itself is maintained by the library’s own IT services, stands when talking about ‘archiving’. while both virtual servers and different types of stor- Two rather surprising outcomes arose from the age at two different sites of the university are pro- exchange with a retiring professor on how to transfer vided by ETH Zu¨rich’s central IT services. his well-managed archive to the library. Firstly, one ETH researchers can deposit content into the ETH of his post-docs involved in the operation could be Data Archive and define the appropriate access rights. hired to work with ETH Library’s Digital Curation This can be done manually via a web client, e.g. for Team, and secondly, the discussion of the principles supplementary material belonging to an article. underlying this research group’s archive recently led Depositors are quite free to define on which level they to a publication highlighting the general concepts want to or can provide metadata: an archival package behind their approach which had proven useful for or intellectual entity may contain just a single file almost three decades of research practice (Sesartic´ with its individual description or a ZIP- or TAR- et al., 2016). container with thousands of files included under only one metadata set. The latter is sometimes made use of when a collection of files belonging to a concluded Publication and preservation of data thesis or another publication needs to be retained for a From the start of its activities around research data, defined period of time (10 years minimum). In this ETH-Bibliothek saw a central role in support of the case, it is often assumed that the thesis itself is the processes for publication and preservation of such most comprehensive documentation of the data Sesartic and To¨we: Research Data Services at ETH-Bibliothek 289 package and should suffice for the professor to answer data producer to give them more control of which part any inquiries. This might not be an ideal arrangement, of their content is archived when. For example, pro- but it represents considerable progress compared to ducers might want to collect, (re-)structure and some previously existing group archives held exclu- describe their data before finally deciding to submit sively on CD-ROM and DVD. When the focus is on an archival package. In other cases, professors want to safeguarding data for a limited period of time only, no review datasets from their groups before submission strict requirements on the longevity of file formats are and start the transfer themselves. enforced. If requested by researchers, support is pro- Deposit in the ETH Data Archive is not coupled vided with identifying suitable formats. with an immediate publication of the content. While In other cases, metadata is routinely gathered for ETH Zu¨rich encourages open access also to research research data on the level of individual folders and data, it is currently up to the data producer to decide files in research groups who have already seen a need which data they want to make accessible as long as to operate a managed archive. They may use the open they are observing existing requirements, e.g. from source editor and viewer docuteam packer (docuteam, funders. They may opt for an embargo period of e.g. 2016; ETH-Bibliothek, 2016e) locally to organize two years or for limited access within the IP-range of files in a defined structure and add metadata as ETH Zu¨rich only. Even very restrictive access rights required. To support this potentially laborious task, for defined persons only are currently accepted. certain metadata can be pre-defined or inherited from Metadata of published content are published in the the top-most level. A researcher can then decide when ETH Knowledge Portal (ETH Zu¨rich, 2016c), in the to submit the whole or part of the structure she or he Primo Central Index (Ex-Libris Group, 2016a) and in created to the ETH Data Archive. For data in docu- the Data Citation Index (Thomson Reuters, 2016). team packer, DOIs (Digital Object Identifiers) can Currently, the workflow for the deposit of research already be reserved which will be registered to data is largely separated from the one for publications, become active after submission to the ETH Data e.g. articles to be published via the green road of open Archive. The advantage of this method is that the DOI access. In the future, a new platform will pull several can already be used in a manuscript even before the workflows for publications, bibliographic records and data has actually been deposited. DOIs are registered research data together into one service. with the consortium DataCite (2016). Likewise, For published research data in particular, it is a access rights and a retention period can be defined reasonable expectation that it should remain available which will be enforced after submission. in the long term similar to what is expected from Submission itself is handled by the tool docuteam formal publications. Whether this will actually be feeder, which processes the Submission Information possible in the future depends to a large degree on Package (SIP) put out by docuteam packer into the the file formats being employed. In the heterogeneous according Archival Information Packages (AIP) to be environment of a university with numerous contra- submitted to the ETH Data Archive. Docuteam packer dictory pressures and constraints on researchers, it will also be used for submissions to ETH Zu¨rich Uni- would not be a realistic approach to admit only a versity Archives with a modified configuration and a limited number of well-documented open formats to more interactive workflow between depositors and a data archive. Rather, the Digital Curation Office at University Archives staff. ETH-Bibliothek offers some guidance on preferred Obviously, there are limits to which kind of sub- formats and recommends a few of them depending missions can be comfortably handled via a web user on the expected retention period (ETH-Bibliothek, interface and via docuteam packer. Automated pro- 2016d). This may limit the chances of actual preser- cesses with submission applications relying on exist- vation measures in the future to the extent where ing sources for metadata and content are currently in only bitstream preservation remains as viable. This use only for library content from the institutional is made transparent to researchers submitting data repository ETH E-Collection, for master files from and usually they are fully aware of this serious lim- the digitization of rare books from ETH-Bibliothek itation. However, those depositing data ‘just in case’ in e-rara (ETH-Bibliothek, 2016b), and for digitized do not consider this as a major problem because they material from ETH Zu¨rich’s Archives of Contempo- expect a need to invest effort into using such data in rary History (ETH Zu¨rich, 2016a). In the future, such the future, anyway. Their perspective then is to post- interfaces should also be created with existing sources pone this effort to the point where it is actually for research data, such as the platform openBIS men- needed rather than making an upfront investment tioned above. Apart from fully automated processes, which might be in vain, given that only a very small there might still be others requiring a trigger from the part of their data might ever be re-used. 290 IFLA Journal 42(4)
On the other hand, data producers who regularly specific needs of one discipline. A combined and share and exchange data with colleagues in their own well-integrated landscape of institutional, networked community have usually overcome the barrier of pro- and subject-specific approaches might then cover prietary formats and often rely on open community most needs over time. standards. However, this might only apply to the core While libraries need to build on their strengths to of research data, while accompanying material may become an active part of the overall landscape for contain less suitable formats. Given the high level RDM in a university, they must also consider that the of awareness of these users, they might want to services offered are not set in stone but have to be re-consider those formats, as well. adapted and developed continuously. As scientific A particular format issue concerns the vast num- practice evolves rapidly, a constant learning and inno- ber of research data files in plain text formats, but vation process is needed to keep up with changing with a large variety of sometimes misleading file requirements. Libraries – and other service providers – extensions. While text files with documented encod- need to open up and reach out further towards ing are actually very suitable for preservation pur- researchers and collaborate more closely with the poses, it can be challenging to identify them in the researchers, in order to establish a mutual learning first place and in many cases, the identification will process on the part of both libraries and researchers. not be a technical one, but will rely on information Requirements of researchers will not only evolve fromthedataproducers. through technical and scientific developments in their Taking the sections above into account, it is obvi- field of research, but it can also be expected that RDM ous that communication with researchers forms an will play a more prominent part in curricula and in essential part of providing research data services for good scientific practice in the years to come. Constant publication and preservation. It is in the best cases efforts in information and training on RDM should rewarding for customers and staff alike, but neverthe- help to further implement it as an essential task of less time consuming when time might already be researchers with each new generation of, for example, pressing, for example when a manuscript is about to doctoral students. A top-down commitment from uni- be submitted and supplementary material needs to be versities can certainly support this, provided it is deposited on time. Obviously, this kind of communi- appropriately translated into activities, which really cation is much facilitated if appropriate skills are reach researchers at their workplace. available on both sides. It is therefore very helpful to have staff with a scientific background in the Digi- Acknowledgments tal Curation Office, although it is obvious that they The authors would like to thank all members of the Swiss cannot cover all fields in depth. DLCM project for insightful discussions. Special thanks go to the Research Data Team at EPFL Library for the excel- lent collaboration on the Data Management Checklist. Conclusions Declaration of Conflicting Interests With the growing digitization of science and society, The authors are employees of ETH Zu¨rich and participate a curation gap between research practice and curation in the DLCM project. needs opens up. However, if there is collaboration and communication between IT services, libraries and Funding researchers, the discrepancy between research prac- The authors received no financial support for the research, tice and research content preservation can be mini- authorship, and/or publication of this article beyond their mized and the curation gap closed. In order to do employment by ETH Zu¨rich. so, university libraries and data centres must continue to support and educate researchers, which also References requires a thorough understanding of researchers’ Blumer E and Burgi P-Y (2015) Data Life-Cycle Man- work practices and the challenges they meet. The agement Project : SUC P2 2015-2018. Revue e´lectro- heterogeneity of their needs limits the possibility to nique suisse de science de l’information 16: 1–17. Available at: http://archive-ouverte.unige.ch/unige: generalize services – or the other way round: in some 79346 (accessed 5 October 2016). cases, it may only be possible to serve needs close to Corti L, Van den Eynden V, Bishop L et al. (eds) (2014) the smallest common denominator between various Managing and Sharing Research Data. London: SAGE. interests. This is the reason why libraries should not Available at: http://ukdataservice.ac.uk/manage-data/ aim at serving all communities equally themselves, handbook/ (accessed 7 October 2016). but rather also keep an eye on subject specific solu- DataCite (2016) DataCite. Available at: http://www.data tions, which are created by third parties to address cite.org (accessed 6 May 2016). Sesartic and To¨we: Research Data Services at ETH-Bibliothek 291
DLCM (2016) Data Management Checklist. Available at: Ex-Libris Group (2016b) Rosetta. Available at: http:// http://www.dlcm.ch/ressources/data-management- knowledge.exlibrisgroup.com/Rosetta (accessed 10 checklist (accessed 22 September 2016). May 2016). docuteam (2016) docuteam packer. Available at: http:// Goodman A, Pepe A, Blocker AW et al. (2014) Ten simple www.docuteam.ch/en/products/it-for-archives/software rules for the care and feeding of scientific data. PLoS (accessed 10 May 2016). Computational Biology. 10(4): e1003542. ETH Zu¨rich (2011) Guidelines for Research Integrity.Zu¨r- Piwowar HA and Vision TJ (2013) Data reuse and the open ich: ETH Zu¨rich. Available at: www.vpf.ethz.ch/services/ data citation advantage. PeerJ 1: e175 Available at: researchethics/Broschure.pdf (accessed 7 October 2016). https://doi.org/10.7717/peerj.175 (accessed 5 October ETH Zu¨rich (2016a) ETH Archives for Contemporary His- 2016). tory. Available at: http://www.afz.ethz.ch (accessed 22 Piwowar HA, Day RS and Fridsma DB (2007) Sharing September 2016). detailed research data is associated with increased cita- ETH Zu¨rich (2016b) ETH Zu¨rich. Available at: http:// tion rate. PLoS ONE 2(3). Available at: http://dx.doi. www.ethz.ch/en.html (accessed 11 May 2016). org/10.1371/journal.pone.0000308 (accessed 5 October ETH Zu¨rich (2016c) ETH-Bibliothek. Available at: http:// 2016). www.library.ethz.ch/en/ (accessed 11 May 2016). Sesartic´ A, Fischlin A and To¨we M (2016) Towards nar- ETH Zu¨rich (2016d) OpenBIS. Available at: http://openbis- rowing the curation gap: Theoretical considerations and eln-lims.ethz.ch (accessed 2 September 2016). lessons learned from decades of practice. ISPRS Inter- ETH Zu¨rich (2016e) Critical Thinking Initiative. Available national Journal of Geo-Information 5(6): 91. DOI: 10. at: https://www.ethz.ch/services/en/teaching/critical- 3390/ijgi5060091. thinking-initiative/ct-annual-programme.html (accessed swissuniversities (2016) Rectors’ Conference of Swiss 22 September 2016). Higher Education Institutions. Available at: http:// ETH-Bibliothek (2016a) Digital Curation Office. Avail- www.swissuniversities.ch/en/ (accessed 12 May able at: http://www.library.ethz.ch/Digital-Curation 2016). (accessed 11 May 2016). Thomson Reuters (2016) Data Citation Index. Web of Sci- ETH-Bibliothek (2016b) E-Rara. Available at: http://www. ence. Available at: http://wokinfo.com/products_tools/ e-rara.ch/?lang=en (accessed 6 May 2016). multidisciplinary/dci/ (accessed 6 May 2016). ETH-Bibliothek (2016c) ETH Zu¨rich University Archives. Treloar A (2012) Private Research, Shared Research, Pub- Available at: http://www.library.ethz.ch/en/Resources/ lication, and the Boundary Transitions. Available at: Archival-holdings-documentations/ETH-Zurich-Univer http://andrew.treloar.net/research/diagrams/data_cura sity-Archives (accessed 11 May 2016). tion_continuum.pdf (accessed 22 September 2016). ETH-Bibliothek (2016d) File Formats for Archiving. Available at: http://www.library.ethz.ch/en/Media/ Files/File-formats-for-archiving (accessed 11 May Author biographies 2016). ETH-Bibliothek (2016e) Intended Purpose of Docuteam Ana Sesartic is an environmental scientist with a doctoral Packer. Available at: http://www.library.ethz.ch/en/ degree in Atmospheric Physics. She brings hands-on expe- Media/Files/Intended-purpose-of-docuteam-packer rience in working with and managing big data sets, and is (accessed 10 May 2016). now bridging the gap between scientists and librarians at ETH-Bibliothek and EPFL Library (2016) Data Manage- the Digital Curation Office. Currently she is working on the ment Checklist. Available at: http://bit.ly/rdmchecklist Data Life-Cycle Management (DLCM) project. (accessed 22 September 2016). European Commission (2013) Guidelines on Data Man- Matthias To¨we is a chemist with a doctoral degree in agement in Horizon 2020. Available at: http://ec. Experimental Physics and a trained scientific librarian. europa.eu/research/participants/data/ref/h2020/grants_ manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf Since 2003 he has been working at ETH-Bibliothek in (accessed 22 September 2016). different roles, first for the Consortium of Swiss Academic Ex-Libris Group (2016a) Primo Central Index. Available Libraries and later for the Swiss electronic library: e-lib.ch. at: http://www.exlibrisgroup.com/category/PrimoCen Since late 2010 he has been head of the Digital Curation tral (accessed 6 May 2016). Office of ETH Zu¨rich at ETH-Bibliothek. IFLA Article
International Federation of Library Associations and Institutions 2016, Vol. 42(4) 292–302 Beyond the matrix: Repository ª The Author(s) 2016 Reprints and permission: services for qualitative data sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/0340035216672870 ifl.sagepub.com
Sebastian Karcher Syracuse University, USA
Dessislava Kirilova Syracuse University, USA
Nicholas Weber University of Washington, USA
Abstract The Qualitative Data Repository (QDR) provides infrastructure and guidance for the sharing and reuse of digital data used in qualitative and multi-method social inquiry. In this paper we describe some of the repository’s early experiences providing services developed specifically for the curation of qualitative research data. We focus on QDR’s efforts to address two key challenges for qualitative data sharing. The first challenge concerns constraints on data sharing in order to protect human participants and their identities and to comply with copyright laws. The second set of challenges addresses the unique characteristics of qualitative data and their relationship to the published text. We describe a novel method of annotating scholarly publications, resulting in a ‘‘transparency appendix’’ that allows the sharing of such ‘‘granular data’’ (Moravcsik et al., 2013). We conclude by describing the future directions of QDR’s services for qualitative data archiving, sharing, and reuse.
Keywords Collection development, data collections, data services, services to user populations, social science literatures
Submitted: 17 May 2016; Accepted: 6 September 2016.
Introduction for secondary analysis almost nonexistent (see e.g. Social science has a rich history of archiving, sharing, Medjedovic´ and Witzel, 2010, focusing on Germany; and reusing data for secondary analysis. Institutions Yoon 2014). such as the Inter-university Consortium for Political The Qualitative Data Repository (QDR) provides and Social Research, the Roper Center for Public infrastructure for the sharing and reuse of digital data Opinion Research, UK Data, and the data banks of used in qualitative and multi-method social inquiry (Elman et al., 2010). QDR is housed at the Center for various international organizations have for decades 2 provided quantitative, matrix-based1 datasets for a Qualitative and Multi-Method Inquiry (a unit of Syr- acuse University’s Maxwell School of Citizenship variety of empirical studies across the social sciences. 3 However, a large number of primary data collections and Public Affairs ), and funded by the National Sci- created by social scientists still remain invisible to the ence Foundation. The repository is guided by four broader research community (Tenopir et al., 2011: see beliefs: especially Tables 10 and 11). And, until recently, the absence of dedicated repositories for qualitative mate- Corresponding author: rials made the likelihood that qualitative social scien- Sebastian Karcher, Syracuse University, 346 Eggers Hall, Syracuse, tists would either share their own data in a way that NY 13244-1100, USA. increases research transparency or make use of others’ Email: [email protected] Karcher et al.: Beyond the matrix 293