The Future of Email Archives
Total Page:16
File Type:pdf, Size:1020Kb
The Future of Email Archives A Report from the Task Force on Technical Approaches for Email Archives August 2018 COUNCIL ON LIBRARY AND INFORMATION RESOURCES The Future of Email Archives A Report from the Task Force on Technical Approaches for Email Archives August 2018 Sponsored by COUNCIL ON LIBRARY AND INFORMATION RESOURCES ISBN 978-1-932326-59-8 CLIR Publication No. 175 Published by: Council on Library and Information Resources 1707 L Street NW, Suite 650 Washington, DC 20036 Website at https://www.clir.org Print copies are available for $20 each. Orders may be placed through CLIR’s website at https://www.clir.org/pubs/reports/pub175/ Copyright © 2018 by Council on Library and Information Resources. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Library of Congress Cataloging in Publication Data: Names: Task Force on Technical Approaches for Email Archives, author. | Andrew W. Mellon Foundation, sponsoring body. | Digital Preservation Coalition, sponsoring body. Title: The future of email archives : a report from the Task Force on Technical Approaches for Email Archives, August 2018 / sponsored by The Andrew W. Mellon Foundation and Digital Preservation Coalition. Description: Washington, DC : Council on Library and Information Resources, [2018] | Series: CLIR publication ; no. 175 | “August 2018.” | Includes bibliographical references and index. Identifiers: LCCN 2018027625 | ISBN 9781932326598 (pbk. : alk. paper) Subjects: LCSH: Electronic mail messages. | Electronic records. | Digital preservation. Classification: LCC CD974.4 .T37 2018 | DDC 004.692--dc23 LC record available at https://lccn.loc.gov/2018027625 Cover illustration: faithie/Shuttertock.com iii Contents Acknowledgments ........................................................................................................vi The Task Force on Technical Approaches for Email Archives ............................... vii Executive Summary ........................................................................................................1 1. The Untapped Potential of Email Archives ..........................................................4 1.1 Email as the Story Keeper ..............................................................................5 1.2 How Email Archives Are Different ...............................................................7 1.3 Harnessing Technology ..................................................................................8 1.4 Adapting Archival Practices ..........................................................................9 2. The Email Stewardship Lifecycle .........................................................................11 2.1 Email as Organizational Records ................................................................11 2.2 Email from Personal Records and Donated Materials .............................12 2.3 Email Lifecycle Stages ...................................................................................13 2.3.1 Creation and Use ............................................................................14 2.3.2 Appraisal and Selection .................................................................16 2.3.3 Acquisition .......................................................................................18 2.3.4 Archival Processing ........................................................................19 2.3.5 Preservation .....................................................................................20 2.3.6 Discovery and Access for Research..............................................20 3. Email as a Documentary Technology ...................................................................22 3.1 Defining Email ...............................................................................................22 3.2 System Architecture ......................................................................................23 3.2.1 Architectural Characteristics .........................................................24 3.2.2 User Features ...................................................................................25 3.2.3 Operational and Administrative Features ..................................25 3.2.4 Email Message Data Model...........................................................26 3.2.5 Message Components ....................................................................28 3.3 Accounts .........................................................................................................29 3.4 Data Transmission Model ............................................................................30 3.5 Vulnerabilities of Email ................................................................................31 3.6 Beyond the ASCII Message: Additional Components .............................31 3.6.1 Attachments .....................................................................................32 3.6.2 Links and Resources Outside the Message .................................33 3.6.3 Signature Blocks .............................................................................33 4. Current Services and Trends .................................................................................34 4.1 The Evolving Email Ecosystem ...................................................................34 4.1.1 Abuse, Abuse Prevention, Security, and Deliverability ............34 4.1.2 Marketing and eCommerce Services ...........................................36 4.1.3 Consumer Email Services ..............................................................37 4.1.4 Enterprise Email Services and Operations .................................38 4.1.5 Email Storage, Compliance, and Records Management ...........39 4.1.6 Compliance and Legal Tools ........................................................41 iv 4.2 Challenges for Repositories .........................................................................42 4.2.1 Capturing Email .............................................................................42 4.2.2 Ensuring Authenticity ....................................................................46 4.2.3 Tracking Processing and Preservation Actions ..........................47 4.2.4 Preserving Attachments and Linked Content ............................49 4.2.5 Ensuring Security and Privacy .....................................................53 4.2.6 Processing High-Volume or Numerous Collections .................54 5. Potential Solutions and Sample Workflows .......................................................57 5.1 Preservation Strategies .................................................................................57 5.1.1 Bit-Level Preservation ....................................................................58 5.1.2 Migration .........................................................................................58 5.1.3 Emulation ........................................................................................60 5.2 Interoperability to Support Flexible Workflow Design ...........................61 5.2.1 Processing Functionality Across Multiple Tools .......................62 5.2.2 Developing a Community Data Model .......................................63 5.2.3 Defining Format Requirements ....................................................64 5.2.4 APIs and Interoperability ..............................................................65 5.3 Workflows and Implementation Scenarios ...............................................68 5.3.1 Bit-Level Preservation Workflow Scenario .................................68 5.3.2 Migration Workflow Scenarios.....................................................69 5.3.3 Emulation Workflow Scenario .....................................................74 6. The Path Forward: Recommendations and Next Steps .....................................76 6.1 Community Development and Advocacy .................................................77 6.1.1 Low-Barrier/Short-Term Actions ..................................................77 6.1.2 High-Impact/Long-Term Activities .............................................80 6.2 Tool Support, Testing, and Development ..................................................83 6.2.1 Low-Barrier/Short-Term Actions ..................................................83 6.2.2 High-Impact/Long-Term Activities .............................................84 Appendix A: Automating System Processes ...........................................................90 Appendix B: Email Tools for Libraries, Archives, and Museums ......................91 Archivematica ......................................................................................................91 DArcMail (Digital Archive Mail System) .........................................................93 EAS (Electronic Archiving System) ..................................................................94 ePADD (Email: Process, Appraise, Discover, Deliver) ..................................96 Preservica Standard Edition ..............................................................................99 TOMES Tool (Transforming Online Mail with Embedded Semantics).....100 Appendix C: Email Preservation Research Projects ............................................102 Archiving Email Symposium ..........................................................................102 Carcanet Press Email Preservation Project ....................................................102 CERP (Collaborative Electronic Records