Digital Curation at Work Modeling Workflows for Digital Archival Materials

Total Page:16

File Type:pdf, Size:1020Kb

Digital Curation at Work Modeling Workflows for Digital Archival Materials Digital Curation at Work Modeling Workflows for Digital Archival Materials Colin Post Alexandra Chassanoff Christopher A. Lee University of North Carolina Educopia Institute University of North Carolina [email protected] [email protected] [email protected] Andrew Rabkin Yinglong Zhang Katherine Skinner University of North Carolina University of North Carolina Educopia Institute [email protected] [email protected] [email protected] Sam Meister Educopia Institute [email protected] ABSTRACT materials that have typically been the province of archives and This paper describes and compares digital curation workflows from special collections, a great deal of processing work needs to occur 12 cultural heritage institutions that vary in size, nature of digital before materials are available to users. While libraries and archives collections, available resources, and level of development of digital have established practices and policies for managing analog mate- curation activities. While the research and practice of digital cura- rials, individual institutions and the archival field as a whole are tion continues to mature in the cultural heritage sector, relatively still developing best practices, standards, and shared terminologies little empirical, comparative research on digital curation activities for processing digital materials [3, 14]. has been conducted to date. The present research aims to advance Digital curation activities are complex, involving diverse skills knowledge about digital curation as it is currently practiced in and techniques, software tools and systems, and a range of profes- the field, principally by modeling digital curation workflows from sional and paraprofessional staff. Due to the variety of functions different institutional contexts. This greater understanding can con- involved and the ongoing nature of digital curation work, no single tribute to the advancement of digital curation software, practices, software environment can manage the entire scope of the steward- and technical skills. In particular, the project focuses on the role of ship of digital materials. Furthermore, digital curation responsibili- open-source software systems, as these systems already have strong ties often cross departments or units, and benefit from collaboration support in the cultural heritage sector and can readily be further among individuals at an institution with different areas of exper- developed through these existing communities. This research has tise [12, 21]. Digital curation approaches also encompass factors surfaced similarities and differences in digital curation activities, specific to particular institutional contexts: institutions must work as well as broader sociotechnical factors impacting digital curation within their existing processes, policies, institutional constraints, work, including the degree of formalization of digital curation ac- and technical platforms to marshal the necessary technologies and tivities, the nature of collections being acquired, and the level of staff to address local needs. institutional support for various software environments. This paper describes preliminary findings from the OSSArcFlow Project, a two year (2017-2019) project funded by the Institute for CCS CONCEPTS Museum and Library Services (IMLS) involving project members at the University of North Carolina at Chapel Hill and the Educopia • Applied computing → Digital libraries and archives; Institute. The project team began working with 12 partner insti- KEYWORDS tutions in July 2017 to explore how three open-source software (OSS) environments (the BitCurator environment, ArchivesSpace, Digital curation, workflows, open-source software and Archivematica)1 can support digital curation activities. These and other OSS tools have already been widely adopted in the cul- 1 INTRODUCTION tural heritage sector. The vitality and collaborative nature of these Libraries, archives, and other cultural heritage institutions care communities of users will facilitate the further development of OSS for collections of digitized and born-digital materials of increasing solutions—aided by empirical research of digital curation activities size and variety. In addition to published materials in standardized and needs. formats like electronic journals and e-books, collections include The project endeavors to foster reflection on the current state of unique or special objects in a diverse array of formats: organiza- digital curation across a variety of institutional contexts by consti- tional records, personal or family archives, websites, audio and tuting a cohort of partner institutions. The project team has worked video recordings, and more. Digital curation, or the ongoing care directly with these partner institutions, and has facilitated collab- and attention needed to keep objects viable for present and future oration, discussion, and sharing of resources among the partners. use, starts from at or before the time of acquisition and continues 1For more on these environments see https://bitcuratorconsortium.org/, well after the initial provision of access. For records or manuscript https://archivesspace.org/, https://www.archivematica.org/. The partner institutions represent varying sizes and types, geo- Library, citing many issues that arose over the course of this com- graphic locations, nature of digital collections, available resources, plex, large-scale digital library project. Principally, the team had to and level of development of digital curation activities.2 The project develop a federated system that would integrate into the existing team compared similarities and differences in digital curation ap- infrastructure for several institutions with quite different student proaches across the various institutions. We analyzed workflow bodies and disparate processes for managing ETDs. Cocciolo [7] steps, the order of steps, the tools used for different steps, as well reflects on how differing professional identities can alter perspec- sociotechnical factors impacting the development and sustainability tives on digital curation work—and occasionally lead to clashes—for of digital curation work in particular institutional contexts. instance, between digital archivists and digital asset managers, who The modeling and analysis is part of the broader aims of the had conflicting notions of how the digital asset management system OSSArcFlow project to advance workflows that incorporate OSS so- at their institution should be used. lutions to support stewardship of digital collections across a variety Recent discussions of digital library work more generally have of cultural heritage institutional contexts. Institutions regularly re- emphasized the need for rubrics or frameworks for systematic eval- port that there are both gaps and overlaps between different digital uation, both in terms of specific aspects of this work, like preserving curation tools and software environments that have to be man- particular media types [11], as well as holistic assessments to inform aged. Gaps between tools can make it difficult to push data through long-term planning [5, 6]. Andrews, Harker, and Krahmer [2] sug- workflows. For example, the output from one tool may have tobe gest the analytical hierarchy process for weighing many contextual transformed before it is compatible with the next tool. Practitioners factors that might impact collection development and management can spend large portions of time transforming data and metadata for an institutional repository, such as faculty attitudes toward so that it can interface with different systems. Overlaps between self-archiving, institutional policies regarding open access, and tools challenge curators to make decisions about when and where technical infrastructure. For long-term digital preservation, Becker to complete particular functions. While the OSSArcFlow project and Rauber [6] discuss an evaluative approach to multi-criteria focuses on three OSS environments, the workflow modeling and decision-making that attempts to encompass the highly contextual analysis has also attended to the range of other tools and systems factors that impact preservation planning, such as variability in in use, as well as factors specific to local contexts. quality of tools, dynamic nature of goals and constraints over time, The OSSArcFlow project strives to advance empirical knowl- and differing needs for various user communities. edge about the current state of digital curation to contribute to The present research has modeled workflows as a method for the development of digital curation software and practices. Al- both gaining a deeper empirical understanding about current digital though tools, techniques, and skills continue to mature, digital curation work and generating documentation that communicates curation scholarship and practice is still in many ways at a nascent a high-level representation of these activities. Increasingly, mod- stage—institutions are actively developing digital curation work- eling workflows has been recognized as a valuable approach to flows, practices, and policies, and there are notable gaps between conceptualizing work for a variety of cultural heritage activities, tools. These workflows not only provide insight into how digital from dealing with missing or damaged books [19] to managing curation is currently being practiced and conceptualized, but they e-resources [10]. For archival contexts specifically, Daines III [9] also serve as analytical tools for investigating various
Recommended publications
  • Jamie Shiers, CERN
    Investing in Curation A Shared Path to Sustainability [email protected] Data Preservation in HEP (DPHEP) Outline 1. Pick 2 of the messages from the roadmap & comment – I could comment on all – but not in 10’ 1. What (+ve) impact has 4C already had on us? – Avoiding overlap with the above 1. A point for discussion – shared responsiblity / action THE MESSAGES The 4C Roadmap Messages 1. Identify the value of digital assets and make choices 2. Demand and choose more efficient systems 3. Develop scalable services and infrastructure 4. Design digital curation as a sustainable service 5. Make funding dependent on costing digital assets across the whole lifecycle 6. Be collaborative and transparent to drive down costs IMPACT OF 4C ON DPHEP International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics “LHC Cost Model” (simplified) Start with 10PB, then +50PB/year, then +50% every 3y (or +15% / year) 10EB 1EB 6 Case B) increasing archive growth Total cost: ~$59.9M (~$2M / year) 7 1. Identify the value of digital assets and make choices • Today, significant volumes of HEP data are thrown away “at birth” – i.e. via very strict filters (aka triggers) B4 writing to storage To 1st approximation ALL remaining data needs to be kept for a few decades • “Value” can be measured in a number of ways: – Scientific publications / results; – Educational / cultural impact; – “Spin-offs” – e.g. superconductivity, ICT, vacuum technology. Why build an LHC? BEFORE! 1 – Long Tail of Papers 2 – New Theore cal Insights 3 4 3 – “Discovery” to “Precision” Volume: 100PB + ~50PB/year (+400PB/year from 2020) 11 Zimmermann( Alain Blondel TLEP design study r-ECFA 2013-07-20 5 Balance sheet – Tevatron@FNAL • 20 year investment in Tevatron ~ $4B • Students $4B • Magnets and MRI $5-10B } ~ $50B total • Computing $40B Very rough calculation – but confirms our gut feeling that investment in fundamental science pays off I think there is an opportunity for someone to repeat this exercise more rigorously cf.
    [Show full text]
  • Module 8 Wiki Guide
    Best Practices for Biomedical Research Data Management Harvard Medical School, The Francis A. Countway Library of Medicine Module 8 Wiki Guide Learning Objectives and Outcomes: 1. Emphasize characteristics of long-term data curation and preservation that build on and extend active data management ● It is the purview of permanent archiving and preservation to take over stewardship and ensure that the data do not become technologically obsolete and no longer permanently accessible. ○ The selection of a repository to ensure that certain technical processes are performed routinely and reliably to maintain data integrity ○ Determining the costs and steps necessary to address preservation issues such as technological obsolescence inhibiting data access ○ Consistent, citable access to data and associated contextual records ○ Ensuring that protected data stays protected through repository-governed access control ● Data Management ○ Refers to the handling, manipulation, and retention of data generated within the context of the scientific process ○ Use of this term has become more common as funding agencies require researchers to develop and implement structured plans as part of grant-funded project activities ● Digital Stewardship ○ Contributions to the longevity and usefulness of digital content by its caretakers that may occur within, but often outside of, a formal digital preservation program ○ Encompasses all activities related to the care and management of digital objects over time, and addresses all phases of the digital object lifecycle 2. Distinguish between preservation and curation ● Digital Curation ○ The combination of data curation and digital preservation ○ There tends to be a relatively strong orientation toward authenticity, trustworthiness, and long-term preservation ○ Maintaining and adding value to a trusted body of digital information for future and current use.
    [Show full text]
  • INLS 752: Digital Preservation and Access
    Summer, 2015 Monday-Friday, 8:00-11:30 INLS 752: Digital Preservation and Access The Instructors. Dr. Helen R. Tibbo : 962-8063(w); 929-6248(h) Office: 201 Manning Hall FAX #: (919) 962-8071 : Tibbo at email dot unc dot edu Class Listserv: INLS752- [email protected] Office Hours. I will be available until 12:30 most days. I will probably not be in the office in the afternoons during this intensive course session. Feel free to call me at home in the evening before 9:00 PM. Course Timeline. First Class: Wednesday, May 13, 2015 Last Class: Thursday, May 28, 2015 Final Project Due: Friday, May 29, 2015 Brief Course Description. This course focuses on integrating state-of-the-art information technologies, particularly those related to the digital curation lifecycle, digital repositories, and long-term digital preservation, into the daily operations of archives, records centers, museums, special collections libraries, visual resource collections, historical societies, and other information centers. Issues, topics, and technologies covered will include the promise & challenge of long-term digital preservation and curation; creating durable digital objects, approaches to preservation; development of institutional repositories; image processing; selecting materials for digitization and managing digitization projects; resource allocation and costing, risk management, digitization and metadata; rights management and other legal and ethical issues; digital asset management; standards; file formats; quality control; funding for developing and sustaining digitization projects and programs; and trusted repositories. Goals and Objectives. By the end of the course, the student should be able to: Identify the key events in the history of digital preservation and access.
    [Show full text]
  • A New Digital Dark Age? Collaborative Web Tools, Social Media and Long-Term Preservation Stuart Jeffrey Version of Record First Published: 05 Dec 2012
    This article was downloaded by: [University of York] On: 10 December 2012, At: 04:01 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK World Archaeology Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/rwar20 A new Digital Dark Age? Collaborative web tools, social media and long-term preservation Stuart Jeffrey Version of record first published: 05 Dec 2012. To cite this article: Stuart Jeffrey (2012): A new Digital Dark Age? Collaborative web tools, social media and long-term preservation, World Archaeology, 44:4, 553-570 To link to this article: http://dx.doi.org/10.1080/00438243.2012.737579 PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material. A new Digital Dark Age? Collaborative web tools, social media and long-term preservation Stuart Jeffrey Abstract This paper examines the impact of exciting new approaches to open data sharing, collaborative web tools and social media on the sustainability of archaeological data.
    [Show full text]
  • The DCC Curation Lifecycle Model Higgins, Sarah
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Aberystwyth Research Portal Aberystwyth University The DCC Curation Lifecycle Model Higgins, Sarah Published in: International Journal of Digital Curation DOI: 10.2218/ijdc.v3i1.48 Publication date: 2008 Citation for published version (APA): Higgins, S. (2008). The DCC Curation Lifecycle Model. International Journal of Digital Curation, 3(1), 134-140. https://doi.org/10.2218/ijdc.v3i1.48 General rights Copyright and moral rights for the publications made accessible in the Aberystwyth Research Portal (the Institutional Repository) are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the Aberystwyth Research Portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the Aberystwyth Research Portal Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. tel: +44 1970 62 2400 email: [email protected] Download date: 03. Oct. 2019 134 The DCC Curation Lifecycle Model The International Journal of Digital Curation Issue 1, Volume 3 | 2008 The DCC Curation Lifecycle Model Sarah Higgins, Standards Advisor, Digital Curation Centre, University of Edinburgh June 2008 Summary Lifecycle management of digital materials is necessary to ensure their continuity.
    [Show full text]
  • Web Archiving and You Web Archiving and Us
    Web Archiving and You Web Archiving and Us Amy Wickner University of Maryland Libraries Code4Lib 2018 Slides & Resources: https://osf.io/ex6ny/ Hello, thank you for this opportunity to talk about web archives and archiving. This talk is about what stakes the code4lib community might have in documenting particular experiences of the live web. In addition to slides, I’m leading with a list of material, tools, and trainings I read and relied on in putting this talk together. Despite the limited scope of the talk, I hope you’ll each find something of personal relevance to pursue further. “ the process of collecting portions of the World Wide Web, preserving the collections in an archival format, and then serving the archives for access and use International Internet Preservation Coalition To begin, here’s how the International Internet Preservation Consortium or IIPC defines web archiving. Let’s break this down a little. “Collecting portions” means not collecting everything: there’s generally a process of selection. “Archival format” implies that long-term preservation and stewardship are the goals of collecting material from the web. And “serving the archives for access and use” implies a stewarding entity conceptually separate from the bodies of creators and users of archives. It also implies that there is no web archiving without access and use. As we go along, we’ll see examples that both reinforce and trouble these assumptions. A point of clarity about wording: when I say for example “critique,” “question,” or “trouble” as a verb, I mean inquiry rather than judgement or condemnation. we are collectors So, preambles mostly over.
    [Show full text]
  • Introduction to Archives and Digital Curation Fall 2016 | Tuesdays | 6:00 PM – 8:45 PM | HBK 0109 Instructor: Dr
    University of Maryland College of Information Studies INST 604: Introduction to Archives and Digital Curation Fall 2016 | Tuesdays | 6:00 PM – 8:45 PM | HBK 0109 Instructor: Dr. Ricardo L. Punzalan, Assistant Professor Office Hours: By appointment | Office: 4117-J Hornbake Bldg., South Wing Office Telephone: 301-405-6518 | Email: [email protected] Archival thinking is an important skill in caring for an increasingly complex, multimedia, and heterogeneous information. This course provides you with an overview of fundamental theories and practices as well as the essential principles and standards that archivists apply in designing and implementing strategies for the preservation and long-term access of information. As a class, we shall examine the changing informational, organizational, societal, and technological landscapes and consider how those changes are affecting archival practices, the information and preservation professions, and the implementation of foundational archival ideas. You will also become acquainted with the values of the archives profession that underlie the mandate to manage and care for a body of information resources in diverse organizational and institutional contexts. This is a foundational course if you are training to become a professional archivist, manuscripts curator, records manager, digital curator, data librarian, etc. Thus, the course will provide you with essential knowledge for pursing a variety of career paths, including: • Professional careers in archives and records management - This course provides you an introduction to the field; introduces terms and concepts that will be used in more advanced courses; and builds a foundation for internships and professional networking. • Careers in related information fields - This course provides you with a survey of broadly applicable concepts used in information management, data curation, information policy, and user services.
    [Show full text]
  • Implications of Preservation of Information in the Digital Age
    City Research Online City, University of London Institutional Repository Citation: Roland, L. and Bawden, D. (2012). The Future of History: Investigating the Preservation of Information in the Digital Age. Library & Information History, 28(3), pp. 220- 236. doi: 10.1179/1758348912Z.00000000017 This is the unspecified version of the paper. This version of the publication may differ from the final published version. Permanent repository link: https://openaccess.city.ac.uk/id/eprint/3186/ Link to published version: http://dx.doi.org/10.1179/1758348912Z.00000000017 Copyright: City Research Online aims to make research outputs of City, University of London available to a wider audience. Copyright and Moral Rights remain with the author(s) and/or copyright holders. URLs from City Research Online may be freely distributed and linked to. Reuse: Copies of full items can be used for personal research or study, educational, or not-for-profit purposes without prior permission or charge. Provided that the authors, title and full bibliographic details are credited, a hyperlink and/or URL is given for the original metadata page and the content is not changed in any way. City Research Online: http://openaccess.city.ac.uk/ [email protected] The future of history: implications of preservation of information in the digital age Lena Roland and David Bawden Department of Information Science, City University London Abstract This study investigates the challenges of preserving information in the digital age, and explores how this may affect the future of historical knowledge. The study is based on literature analysis and a series of semi-structured interviews with 41 historians, archivists, librarians, and web researchers.
    [Show full text]
  • Preservation in the Digital Age a Review of Preservation Literature, 2009–10
    56(1) LRTS 25 Preservation in the Digital Age A Review of Preservation Literature, 2009–10 Karen F. Gracy and Miriam B. Kahn This paper surveys research and professional literature on preservation-related topics published in 2009 and 2010, identifies key contributions to the field in peri- odicals, monographs, and research reports, and provides a guide to the changing landscape of preservation in the digital age. The authors have organized the reviewed literature into five major areas of interest: tensions in preservation work as libraries embrace digital resources, mass digitization and its effects on collec- tions, risk management and disaster response, digital preservation and curation, and education for preservation in the digital age. his review article critically examines the literature of preservation published T during a two-year period, 2009–10. Almost a decade has passed since the last review of the preservation literature appeared in Library Resources and Technical Services, covering the period of 1999–2001.1 In the interim, the evolu- tion of the preservation field noted by Croft has accelerated, encompassing whole areas of practice that were in their infancy at the turn of the twenty-first century. In her review, Croft identified ten areas of emphasis in the literature: clarifying preservation misconceptions triggered by the publication of Nicholson Baker’s Double Fold; the continued importance of the artifact in the wake of new digital reformatting technologies; remote storage; mass deacidification; physical treat- ment
    [Show full text]
  • Data and Software Preservation for Open Science (DASPOS)
    Data And Software Preservation for Open Science (DASPOS) For the past few years, the worldwide High Energy Physics (HEP) Community has been developing the background principles and foundations for a community-wide initiative to move in the direction of open access, preservation, and reuse of data collected and analyzed by the field. As a subcommittee of the International Committee on Future Accelerator (ICFA) the Study Group for Data Preservation in HEP (DPHEP) has held a number of meetings in different continents and published an initial report that is the most in-depth analysis of the issues produced to date.1 Given the scope, breadth and depth of the sociological, technical and governance challenges there are many activities developing around this nucleus. Recently, a laboratory consortium including CERN, DESY and CNRS in Europe submitted a proposal for International Long-term Data and Analysis Preservation (ILDAP) to develop and prototype a first set of data reuse archival and prototyping services in the context of the Large Hadron Collider (LHC) experiments. The ILDAP proposal represents an initial attempt to translate the recommendations and concepts outlined in the DPHEP report into a framework than could form the basis for Data and Analysis preservation in HEP. Because this is an initial foray into the technical aspects of curation in HEP, the ILDAP effort is largely a review and planning exercise. Its main focus is to establish the technical requirements for data preservation and access. It will explore, in conjunction with the HEP experiments, possible standards for common simplified data formats and metadata descriptors that could be adopted across the field.
    [Show full text]
  • Digital Preservation Handbook Technical Solutions and Tools
    Digital Preservation Handbook Technical Solutions and Tools Illustration by Jørgen Stamp digitalbevaring.dk CC BY 2.5 Denmark Who is it for? Operational managers (DigCurV Manager Lens) and staff (DigCurV Practitioner Lens) in repositories, publishers and other data creators, third party service providers. Assumed level of knowledge Novice to Intermediate. Purpose To focus on technical tools and applications that support digital preservation: software, applications, programs and technical services. To consider the practical deployment of preservation techniques and technologies whether as relatively small and discrete programs (like DROID) or enterprise wide solutions that integrate many tools. This section excludes other more strategic or policy issues and standards that are sometimes described as tools: these are covered elsewhere in the Handbook. Gold sponsor Silver sponsors Bronze sponsors Reusing this information You may re-use this material in English (not including logos) with required acknowledgements free of charge in any format or medium. See How to use the Handbook for full details of licences and acknowledgements for re-use. For permission for translation into other languages email: [email protected] Please use this form of citation for the Handbook: Digital Preservation Handbook, 2nd Edition, http://handbook.dpconline.org/, Digital Preservation Coalition © 2015. 2 Contents Tools .........................................................................................................................................................
    [Show full text]
  • A DIY Approach to Digital Preservation
    Practical Digital Solutions: A DIY Approach to Digital Preservation by Tyler McNally A Thesis Submitted to the Faculty of Graduate Studies of the University of Manitoba In Partial Fulfillment of the Requirements of the Degree of MASTER OF ARTS Department of History (Archival Studies) Joint Masters Program University of Manitoba/University of Winnipeg Winnipeg, Manitoba Copyright © 2018 by Tyler McNally Table of Contents Abstract…………………………………………………………...i AcknoWledgments…………………………………………….ii Acronym IndeX…………………………………………………iii Introduction……………………………………………………..1 Chapter 1………………………………………………………….12 Chapter 2………………………………………………………….54 Chapter 3………………………………………………………….81 Conclusion ……………………………………………………….116 Bibliography……………………………………………………..125 iii Abstract Since the introduction of computers, archivists have had to find ways to deal with digital records. As more records are born digital (created through digital means) and digital technologies become more entrenched in hoW data is created and processed, it is imperative that archivists properly preserve these records. This thesis seeks to propose one possible solution to this issue. Rather than advocate for paid solutions or electronic record management systems, it advocates for more practical in-house DIY solutions. The first chapter lays out background information and the historiography of digital archiving in Canada at the federal level. The second chapter moves step-by-step through a Workflow developed at the University of Manitoba’s Faculty of Medicine Archives that lays out one possible DIY style solution. The third chapter is an audit of the WorkfloW from the second chapter against three important international standards for preserving digital information. iv Acknowledgments I would like to acknoWledge and thank Professors Thomas Nesmith and Greg Bak. Their role as professors of the Archival Studies program has been a great source of support and inspiration as well as their knowledge and passion for both archives and their students.
    [Show full text]