Archiving E-Mail
Total Page:16
File Type:pdf, Size:1020Kb
DIGITAL DILEMMAS: ARCHIVING E-MAIL PRE-CONFERENCE WORKSHOP ASSOCIATION OF CANADIAN ARCHIVISTS JUNE 10, 2008 Presented By: COLLABORATIVE ELECTRONIC RECORDS PROJECT TEAM Nancy Adgent, Steve Burbeck, Ricc Ferrante, Lynda Schmitz Fuhrig, Darwin Stapleton DIGITAL DILEMMAS: ARCHIVING E-MAIL WORKSHOP June 10, 2008 9:00 – 10:30 CERP Inception, funding, goals (Darwin Stapleton) Need for e-mail preservation, why an issue (Ricc Ferrante) Identifying the issues, developing guidelines (Nancy Adgent) Results of testing (Nancy Adgent, Lynda Schmitz Fuhrig) Review workflow and tools (Lynda Schmitz Fuhrig) Questions 10:30 – 10:45 Break 10:45 - 12:00 Exercise 1: Complete accession and processing forms (Nancy Adgent) Exercise 2: Convert msg to mbox via Aid4Mail (Nancy Adgent) Exercise 3: Convert pst to mbox via MessageSave (Lynda Schmitz Fuhrig) Exercise 4: Start AIP (Lynda Schmitz Fuhrig) 12:00 – 12:15 Questions 12:15 – 1:30 Lunch on your own (McConnell Hall cafeteria is a short walk from the class location) 1:30 – 3:00 Overview of technical issues (Ricc Ferrante) AIP post parsing (Ricc Ferrante) Why xml (Ricc Ferrante) Overview of parser (Steve Burbeck) How testbed message oddities contributed to development Collaboration with NC Demonstration of parser (Steve Burbeck) Questions 3:00 - 3:15 Break 3:15 - 4:15 Exercise 5: Convert mbox via parser (Steve Burbeck) Exercise 6: Complete AIP (Lynda Schmitz Fuhrig) DSpace Introduction (Ricc Ferrante) Exercise 7: Parse attendee’s messages Summary (Ricc Ferrante) 4:15 - 4:30 Questions 2 TABLE OF CONTENTS Page Exercise 1: Complete accession and processing forms . 25 Exercise 2: Convert msg to mbox via Aid4Mail . 39 Exercise 3: Convert pst to mbox via MessageSave . 43 Exercise 4: Start AIP . 46 Exercise 5: Convert mbox via parser . 62 Exercise 6: Complete AIP . 66 Exercise 7: Parse attendee’s messages . 67 Appendix A: Forms and Guidelines on CERP website . 68 Appendix B: Application to Use Materials Produced by CERP . 69 Appendix C: CERP Processing Workflow Model . 70 Appendix D: Metadata Narrative Template. 71 Appendix E: METS Sample . 73 Appendix F: EAD Sample . 76 Appendix G: Resources and Related Projects . 79 Appendix H: Software Download Links . 80 This documentation is released by the Collaborative Electronic Records Project under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License, 2008 and 2009. This license can be viewed at http://creativecommons.org/licenses/by-nc-sa/3.0/us/ Citation CERP. (2008). “Digital Dilemmas: Archiving E-Mail.” Sleepy Hollow, NY and Washington DC: The Collaborative Electronic Records Project. Digital Dilemmas: Archiving E-Mail Collaborative Electronic Records Project Association of Canadian Archivists June 10, 2008 Dr. Darwin H. Stapleton Executive Director Rockefeller Archive Center 15 Dayton Avenue Sleepy Hollow, NY 10591 914-631-4505 [email protected] [email protected] Overview Funding Purpose Collaboration/Management Accomplishments "We're good, but not perfect" Serendipity 4 Riccardo Ferrante IT Archivist and Electronic Records Program Director Smithsonian Institution Archives Capital Gallery Building 600 Maryland Avenue, SW, Suite 3000 Washington, D.C. 20024-2520 202-633-5906 [email protected] Nancy Adgent Project Archivist Rockefeller Archive Center 15 Dayton Avenue Sleepy Hollow, NY 10591 914-366-6355 [email protected] ROCKEFELLER ARCHIVE CENTER RAC DEPOSITOR CHART RAC R. R. R. R. Brothers Indi- Foundation Family University Fund viduals Rockefeller Other Related General Education Board Common Markle -wealth NAR Foundation Fund founded JDR Jr. founded Russell Sage Foundation Foundation Center Population Own Council China Own Some Foundation Medical for Child Board On Deposit Development Key Survey Findings No records management policy No naming standards No procedures for organizing or saving Some have no on-site IT staff Inbox Folder Organization 6 Inbox – Non Standard File Names Suggested Subject Names: “Staff Meeting.Minutes.2006.08.02” ISSUES Unknown formats Deteriorating media Data on portable devices Native format vs. converting Upgraded hardware/old media Obsolete or unsupported software Duplicates, personal, junk mingled Information quantity & rate increase Information quantity & rate increase Traditional archival concepts/new era Best Practices Guidance E-MAIL GUIDELINES 7 TRANSFER GUIDELINES Prepared by the Collaborative Electronic Records Project Rockefeller Archive Center January 2007 This document may be freely used and modified by any non-profit organization. Retention Guidelines Records Disposition Schedule 8 Forms Accession Administrative & Descriptive Metadata Transfer Verification Migration/Refresh METS AIP Metadata Accession Administrative Metadata Descriptive Metadata 9 Completed METS AIP Form Transfer Documentation Form Verification Documentation 10 Migration / Refresh Schedule From CD to server From Word to PDF From preservation copy CD to new CD Testbed Findings W Y S I W Y G ? Internet Header Metadata 11 Header Metadata Return-Path: <[email protected]> METADATA Received: from (localhost [999.999.9.9] )by mailserver1 with LMTP for <[email protected]>; Fri, 19 May 2006 14:41:09 -0400 Received: from mailserver1edu ([999.999.9.9]) by mailserver1EDU (3.0.2/sieved- 3-0-build-942) for <[email protected]>; Fri, 19 May 2006 14:41:09 -0400 Received: from Stapleton-pc.Rockefeller.edu (localhost [999.999.9.9] )by mailserver1 with ESMTP id k4JIf82s018784 mailserver1 with ESMTP id k4JIf82s018784 for <[email protected]>; Fri, 19 May 2006 14:41:08 -0400 (EDT) Message-Id: <7.0.1.0.2.20060519144014.0359d620@ mailserver1edu> X-Mailer: QUALCOMM Windows Eudora Version 7.0.1.0 Date: Fri, 19 May 2006 14:41:04 -0400 To: <[email protected]> From: <[email protected]> Subject: mitelman Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-MASF: 0.00% Change of Author Change of Creation Date 12 Original Capture mbox Web Browser Display --============_-1211362437==_============_E2mXatt Content-Disposition: attachment; filename="XXXXX.doc“ Content-Type: application/octet-stream; name="XXXXX.doc"Content-Transfer- Encoding: base64 0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAPgADAP7/CQAGAAAAAAAAAAAAAAABAAAAJQ AAAAAEAAAJwAAAAEAAAD+////AAAAACQAAAD//////////////////////////////////// 14 pages of character strings were in this space. AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA-- ============_-1211362437==_============-- From ???@??? Mon Sep 17 16:44:44 2001 Return-Path: <XXXXXX @mail.rockefeller.edu> Received: from [123.45.67.890] (XXXXXX.rockefeller.edu [123.45.67.890]) by mail.rockefeller.edu (6.23.5/7.89.0) with ESMTP id f8HK8Js01021 for <XXXXX>; Mon, 17 Sep 2001 16:08:19 -0400 (EDT)Message-Id: <a0501040bb7cc1683f959@[123.45.67.890]>Date: Mon, 17 Sep 2001 16:07:46 -0500To: XXXXXXXXFrom: Jane Doe <[email protected]>Subject: Edited version of letterX- UIDL: ?e%!!Z"H!!V=^!!+[~!!Mime-Version: 1.0Content-Type: multipart/mixed; boundary="============_-1211361629==_============"This is a multi-part message in MIME format.--============_- 1211361629==_============Content-Type: text/plain; charset="iso-8859-1"-- ============_-1211361629==_============Content-Type: text/plain;"XXXXXXX.doc 1" (missing attachment)--============_- 1211361629==_============-- Aid4Mail Conversion 13 Missing Attachment Attachment Conversion Lynda Schmitz Fuhrig Project Archivist Smithsonian Institution Archives Capital Gallery Building 600 Maryland Avenue, SW, Suite 3000 Washington, D.C. 20024-2520 202-633-5917 [email protected] 14 SIA testbed relationship Three SI units agreed to participate SIA • Collects, preserves, and makes available the official records of the Smithsonian Institution • Carries out a records management program for Smithsonian offices, advising them on the disposition of records and pertinent documentary materials in analog and digital form. Administrative Financial Public-Research • Deposits records regularly • Deposits made irregularly. • Has very active relationship with SIA. • Records disposition schedules with SIA. • Archives formalized records being created • One of our largest depositors. disposition schedule for office and/or updated Bad transfer Transferred email 36,000+ messages 15 Archival Information Package CERP model * The SIP is the submission information package. It contains the email collection (variety of formats possible) received from the depositor and metadata narrative (both information supplied by the depositor and updated by the archivist). * The AIP is the archival information package. It contains the source email from the depositor, metadata (manually created METS, narrative, and other), finding aid (manually created), .mbox files, parsed XML file, parsed attachments, bad messages from parser, and parser subject-sender log. * The DIP is the dissemination information package. Package could include the entire package for viewing/downloading or a specific email message/s for viewing. The AIP remains in its original form. CERP model continued SIP * SIP to AIP •Archivist converts the collection to the .mbox (generic email format), if not already in this format. •Archivist runs the parser to convert the .mbox file/s to an XML preservation file with encoded attachments. •Archivist creates a package of all components (metadata, source, outputs, finding aids) in the zip format and submits to a digital repository. AIP * AIP to DIP The researcher queries the digital repository (DSpace) to find and retrieve