Digital Preservation in Archives: Overview of Current Research and Practices
Total Page:16
File Type:pdf, Size:1020Kb
National Archives of Sweden Digital Preservation in Archives: Overview of Current Research and Practices January 2004 - February 2005 LDB-enheten Raivo Ruusalepp Table of Contents 1. INTRODUCTION..................................................................................................................... 1 1.1 ARCHIVES AS A STAKEHOLDER 1 1.2 STRUCTURE OF THE REPORT 3 2. DIGITAL RECORDS IN RECORDS MANAGEMENT...................................................... 5 2.1 GUIDANCE AND CONSULTATION OF RECORDS CREATORS 5 2.2 METADATA USED IN RECORDS MANAGEMENT 9 2.3 DIGITAL SIGNATURES AS RECORDS AUTHENTICATION METHOD 11 2.4 APPRAISAL OF ELECTRONIC RECORDS 12 2.5 CONCLUSION 13 3. FROM AGENCY TO ARCHIVE.......................................................................................... 14 3.1 WHAT TO CAPTURE? 14 3.2 SIGNIFICANT PROPERTIES OF ELECTRONIC RECORDS 15 3.3 TRANSFER OF ELECTRONIC RECORDS AND ITS LEGAL REGULATION 17 3.4 CONCLUSION 18 4. DIGITAL PRESERVATION AND DIGITAL ARCHIVE ................................................. 20 4.1 DIGITAL PRESERVATION STRATEGIES 21 4.1.1 Emulation......................................................................................................... 21 4.1.2 Migration strategies ......................................................................................... 26 4.1.3 Encapsulation................................................................................................... 33 4.1.4 Persistent Objects / Persistent Archives .......................................................... 35 4.2 COMPARING DIGITAL PRESERVATION STRATEGIES – A SMALL SUMMARY 38 4.3 DIGITAL ARCHIVE MODELS AND SYSTEMS 40 4.3.1 The OAIS model.............................................................................................. 40 4.3.2 OAIS model implementations and related projects......................................... 43 4.3.3 Digital repository management systems.......................................................... 47 4.4 DIGITAL PRESERVATION – CONCLUSION 49 5. FROM ARCHIVE TO USERS .............................................................................................. 52 5.1 SIGNIFICANT PROPERTIES 52 5.2 DYNAMIC DOCUMENTS 53 5.3 AUTOMATION OF FINDING AIDS 53 5.4 DIGITISATION OF ARCHIVES’ HOLDINGS 54 5.5 ACCESS TO DIGITAL ARCHIVES – CONCLUSIONS 55 6. CONCLUSION – A LOOK INTO THE FUTURE.............................................................. 57 6.1 COST MODELS FOR DIGITAL ARCHIVING 58 6.2 AUTOMATION AND SCALABILITY OF DIGITAL ARCHIVES 58 6.3 BENCHMARKING 59 6.4 RISK ANALYSIS 59 6.5 PRESERVATION OF WEB AND DYNAMIC RECORDS 59 6.6 METADATA 60 6.7 RECORDS CREATED WITH OPEN-SOURCE SOFTWARE 60 7. BIBLIOGRAPHY ................................................................................................................... 63 APPENDIX 1. MATRIX OF DIGITAL PRESERVATION RESEARCH PROJECTS AND RESEARCH TOPICS.......................................................................................................................... 68 1. Introduction “The last decade and a half has produced more records than any previous similar period of human activity. The fact, that the majority of these records is less reliable, retrievable or accessible than ever before, is one of the ironies of the modern information age.”1 Idiosyncratic software systems generate, manage and store digital data using proprietary technologies and media that are not developed to prevent manipulation and that are subject to obsolescence. Therefore, long-term preservation of digital information is plagued by storage media short lifespan, obsolete hardware and software, proprietary file formats, defunct technologies and web sites. Indeed, the majority of computer products and services on the market today did not exist five years ago.2 The essence of the “problem” with digital preservation is the lack of proven methods to ensure that the digital information will continue to exist, that we will be able to access this information using the available technology tools, or that any accessible information is authentic and reliable. The last twenty five years have seen vigorous research into issues of digital preservation: technological obsolescence, storage media fragility, the manipulability of electronic systems that challenge our capacity of guaranteeing the long-term preservation, and the authenticity of electronic records. The formidable body of literature that has accumulated on the topic is divided between (sometimes-conflicting) suggestions: • to study and understand the technological context of electronic records in each individual case; • that the preservation of authenticity of records over time should be based on requirements and procedures that are independent of specific technological contexts; • to use platform-independent and open, standardised file formats for preservation; • to emulate the original technological environment on future technology platforms; • to migrate the records into formats suitable for access on current platforms; • preserve the contents of digital records on paper or microfilm; • etc. Practical implementations of these (theoretical) suggestions by archives are twofold: on the level of policies and strategies; and ad hoc practical solutions where time for planning has been short. Rarely have the two been joined into a whole, as the following report testifies, with insufficient funding often cited as the main cause. Resources devoted to preservation of electronic records within archival institutions have not been commensurate with the task. Policy and strategy are fine, but without implementation they are not worth very much.3 Implementation is arguably the greatest unresolved issue of digital preservation, and the most difficult to deliver, because it involves major resources, compliant organisations, dedicated management and appropriately skilled staff. Archivists have sometimes felt that they are expected to provide answers and solutions to problems that outstretch their capabilities. 1.1 Archives as a stakeholder Naturally, archives are but one interest group involved in finding solutions to the questions of long-term digital preservation – librarians, science communities, cultural heritage, businesses 1 L. Duranti, ’From Here to Eternity’: Concepts and Principles for the Management of Electronic Records (1999), p. 1 2 S.S. Chen, The Paradox of Digital Preservation (2001), p. 2 3 G. O’Shea, Research issues in Australian approaches to policy development (1997), p. 253 1 and government agencies, technology industry, medical institutions and many others are seeking to overcome the same problems with technology obsolescence and extending usability of digital data that depends on this technology. Many of these stakeholders have their specific requirements and correspondingly a specific angle to the digital preservation problem. For many years, different interest groups have worked on their theories and solutions separately, but in recent years, different communities have started to co-operate again and the results of this are beginning to emerge. The specific needs of different stakeholders have also become clearer and better defined through this co-operation, and this includes archives. For archives, public or private, the provision of their “traditional” services of providing long- term access to authentic and usable records, has become considerably more complicated through the use of digital technology. And in developing their “new”, digital services, the archives have to bear in mind their own stakeholders who have different interests. For example: • The users of archives Who have ever rising expectations to the access to resources in archives: access should be electronic, rapid, precise and preferably on-line. • The records’ creators Who have invested into technical infrastructure for creating records electronically and who clearly want that functionality of their records maintained also in the archives and for the long term. • The society in general Digital records form part of the society’s memory and need to be kept for future generations as a proof and sign of our time and generation. • The heritage industry In general public perception archives are often grouped together with ‘heritage organisations’ or ‘memory organisations’, alongside with libraries, museums and other institutions that “do preservation”. Many archives are making various collections accessible to the general public on-line through their digitisation programmes. Digitised material, even if created only for providing access to digital copies of original materials, tends to increase the need for digital preservation in the archives. • Technology Technology that keeps developing and changing and making the archivists’ task of preserving access to digital resources easier and more difficult at the same time. First serious attempts to air the archivists’ concerns with the technology development to the ICT industry were made by the DLM-Forum4 and were met with a supportive response.5 • Fundmakers Archivists and records managers need to constantly convince their paymasters that the solutions they are suggesting offer a good return on investment. Archives have occasionally suffered a temporary “loss of voice” because the answers to all the problems with managing and preserving digital records are not easy to find, particularly in a situation where the solutions sought by governments have to be clearly defined, cheap to implement, easy to use and reliable for long-term. These answers and solutions are only 4