Integrating Dataverse and Archivematica for Research Data Preservation

Integrating Dataverse and Archivematica for Research Data Preservation

INTEGRATING DATAVERSE AND ARCHIVEMATICA FOR RESEARCH DATA PRESERVATION Meghan Goodchild Grant Hurley Scholars Portal & Scholars Portal Queen’s University Canada Canada [email protected] [email protected] ORCID 0000-0003-0172-4847 ORCID 0000-0001-7988-8046 Abstract – Scholars Portal sponsored Artefactual province of Ontario, Canada.1 Founded in 2002, Systems Inc. to develop the ability for the Scholars Portal is supported by OCUL members and preservation processing tool Archivematica to receive operated under a service agreement with the packages from Dataverse, a popular tool for University of Toronto Libraries. Our services uploading, curating, and accessing research data. The support both research data management via a integration was released as part of Archivematica 1.8 2 in 2018. This paper situates the integration project in hosted, multi-institutional instance of Dataverse 3 the broader context of research data preservation; and digital preservation services via Permafrost, a describes the scope and history of the project and the hosted Archivematica-based service that pairs with features and functionalities of the current release; the OCUL Ontario Library Research Cloud (OLRC) for and concludes with a discussion of the potential for preservation storage.4 The Dataverse-Archivematica future developments to meet additional use cases, integration project was initially undertaken as a service models and preservation approaches for research initiative to explore how research data research data. preservation aims might functionally be achieved Keywords – research data; Archivematica; using Dataverse and Archivematica together. The preservation pipeline; Dataverse Conference Topics – collaboration; technical results of a proof-of-concept phase were developed infrastructure into a working integration released as part of Archivematica version 1.8 in November 2018. This I. INTRODUCTION paper situates the integration project in the broader context of research data preservation in theory and Between 2015 and 2018, Scholars Portal practice; describes the scope and history of the contracted Artefactual Systems Inc. to develop an project and the features and functionalities of the integration between Dataverse, a popular current release; and concludes with a discussion of repository tool for uploading, curating, and the potential for future developments to meet accessing research data, with Archivematica, a additional use cases, service models and workflow application for creating preservation- preservation approaches for research data. friendly packages for long-term storage and management. Scholars Portal is the information technology service provider for members of the 1 Scholars Portal: https://scholarsportal.info/. Ontario Council of University Libraries (OCUL), a 21- 2 Scholars Portal’s Dataverse instance: member consortium of academic libraries in the https://dataverse.scholarsportal.info/. 3 Permafrost: https://permafrost.scholarsportal.info/. 4 Ontario Library Research Cloud: https://cloud.scholarsportal.info/. 16th International Conference on Digital Preservation iPRES 2019, Amsterdam, The Netherlands. Copyright held by the author(s). The text of this paper is published under a CC BY-SA license (https://creativecommons.org/licenses/by/4.0/). DOI: 10.1145/nnnnnnn.nnnnnnn II. RESEARCH DATA PRESERVATION IN CONTEXT preservation over time, including creation and receipt, appraisal and selection, preservation In this paper, the term “research data” refers to actions, storage, and access and discovery [4]. One a broad set of potential outputs from research tool that implements some of these stages of the activities across sectors and disciplines. The key lifecycle is the research data repository web uniting characteristic is that these materials stand application Dataverse.9 as unique evidence supporting a set of research findings, whether scholarly, technical, or artistic [1]. Dataverse is developed and maintained as an Furthermore, these data may constitute the open source tool by the Institute for Quantitative research findings themselves, such as in public Social Science (IQSS) at Harvard University. It has statistical or geospatial data gathering. The been developed since 2006 [5]. A large open communities of stakeholders who value research Dataverse instance is hosted by IQSS and 38 findings depend on the maintenance of the additional individual known installations of originary data sources in a trustworthy manner that Dataverse exist throughout the world as of the time privileges ensuring their continued authenticity, of writing [6]. While Dataverse was developed by availability and reliability into the future. These members of the social science community, its use is concepts have been codified within the sector as not limited to any specific disciplinary area [5]. the FAIR Principles for research data: findable, Users can deposit and describe their data files using accessible, interoperable, reusable [2]. While the general and discipline-specific metadata standards, FAIR Principles do not specifically cite long-term generate unique identifiers, and assign access preservation as a requirement, preservation is permissions. Institutions can enable self-deposit or crucial to the continued ability to discover and mediated workflows for depositors, and offer access research data into the future [3]. The FAIR Dataverse to researchers as a method of fulfilling principles therefore link to the stewardship funder requirements to deposit data in an responsibilities that repositories take on behalf of accessible repository. Published datasets are stakeholders: in order to fulfill the FAIR principles, searchable and downloadable and tabular data files organizations with access to sustained resources can be explored using visualization tools within the and infrastructure must commit to ensuring the platform itself. long-term maintenance of the materials under their Dataverse includes a suite of functions that care. The requirements for this maintenance are contribute to the ability for a stewarding outlined in standards such as the Open Archival organization to reliably preserve research data. Information System reference model (ISO 14721)5 When it comes to data receipt, it enables efficient and audit and certification frameworks including capture of materials from a researcher’s individual CoreTrustSeal,6 nestor,7 and Audit and Certification computing systems through user-friendly upload of Trustworthy Data Repositories (ISO 16363).8 tools, which tackles a major initial barrier of Repositories with stewardship responsibilities accessing data from the risky (and often therefore seek to translate audit and certification inaccessible) environments of personal storage requirements into repeatable practices to ensure systems [7]. Researchers can also describe and that data are kept reliably into the future. There is a contextualize their submissions through a variety of series of interrelated stages that make up the metadata fields and by linking to related lifecycle required for responsible data curation and publications and datasets. All submitted files receive MD5 checksums upon receipt that can enable 5 ISO 14721:2012 (CCSDS 650.0-M-2) Space data and verification of integrity over time. File format information transfer systems -- Open archival information identification is conducted using JHOVE and system (OAIS) -- Reference model. displayed using a set of MIME-type tags [8]. Finally, 6 Core trustworthy data repositories requirements, https://www.coretrustseal.org/wp- a set of tabular data formats (SPSS, Strata, R.Data, content/uploads/2017/01/Core_Trustworthy_Data_Repositor CSV, and Excel) are converted to tabular text data ies_Requirements_01_00.pdf. files upon ingest, and citation-related metadata files 7 nestor seal for trustworthy data archives, are created for the tabular files [9]. Dataverse https://www.langzeitarchivierung.de/Subsites/nestor/EN/Sie gel/siegel_node.html. converts tabular data files as accurately as possible 8 ISO 16363:2012 (CCSDS 652.0-R-1) Space data and information transfer systems -- Audit and certification of trustworthy digital repositories. 9 Dataverse: https://dataverse.org/. iPRES 2019 - 16th International Conference on Digital Preservation 2 September 16- 20, 2019, Amsterdam, The Netherlands. with the caveat that some commercial applications preserving research data.10 A series of test like SPSS have not published their specifications implementations at the University of York and [10]. Tabular files also receive UNF checksums that University of Hull were deemed successful and can be used to verify the semantic content of the Archivematica was among the preservation derivatives [11]. providers tested with the Jisc’s Research Data Shared Service pilot [14]. Therefore, Dataverse’s Initiatives in research data preservation, functions primarily map to the “Producer” end of including those using Dataverse, emphasize the the OAIS model, where materials are negotiated necessity of storing and monitoring datasets as and accepted for ingest, and some baseline independent from the submission and discovery preservation-supporting functions are performed. platforms that users usually interact with. This Further research is required on how platforms like approach appears to be informed by an Dataverse might fulfill the requirements of the interpretation of the OAIS reference model, which Producer-Archive Interface Methodology Abstract emphasizes the flow of received data into stored Standard

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    8 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us