This article was downloaded by: [University of Texas ] On: 02 October 2013, At: 14:13 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Encyclopedia of and Information Sciences, Third Edition Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/doi/book/10.1081/E-ELIS3 Digital Archiving Patricia Galloway a a School of Information, University of Texas at Austin, Austin, Texas, U.S.A. Published online: 09 Dec 2009

To cite this entry: Patricia Galloway . Digital Archiving. In Encyclopedia of Library and Information Sciences, Third Edition. Taylor and Francis: New York, Published online: 09 Dec 2009; 1518-1527. To link to this chapter: http://dx.doi.org/10.1081/E-ELIS3-120044332

PLEASE SCROLL DOWN FOR CHAPTER

Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material. Digital Archiving

Patricia Galloway School of Information, University of Texas at Austin, Austin, Texas, U.S.A.

Abstract Digital archiving emerged during the 1990s as a compulsory support for digital recordkeeping in govern- ments and digital publication in academia. Its concepts are governed generally by archival theory, while many of its practices have increasingly been borrowed from library and general work with digital objects. Progress in research dealing with the obstacles to reliable archiving of digital objects has been steady but slow since the early 1990s, but the basic outlines of a set of workable solutions are beginning to emerge and real digital are now functioning and providing operational data to help bootstrap the process.

INTRODUCTION continuity, and all of these evidences too are increasingly digital and require archiving. Digital archiving, the practice of preserving (long-term or The special significance of digital archiving to the dis- indefinitely) authentic digital cultural objects for present ciplines of the academy is based on the fact that knowl- and future use, is based theoretically in archival science edge work itself is becoming crucially digital. As these but draws its technological support from many areas of disciplines pursue discovery, they draw upon digital evi- the computer and information sciences. Further, as archi- dence already available and generate additional digital vist pointed out in a keynote speech at the evidence that others will use. Scholarly communication 2007 European Conference on Digital Libraries, digital itself manifests increasingly in digital media, such that archiving is just as vital to support the long-term holdings the very reward system of the academy is dependent upon of digital libraries as it is for archives of unique born- reliable and permanent digital archiving. For information digital objects.[1] Although national archives began to science itself digital archiving is especially important. preserve digital data as early as the 1970s, it was not until The impact of the availability of huge datasets of Google the 1990s that began to pay serious attention to searches and the release of the Enron e-mail corpus the problem and research supported by grant funding and demonstrated that large archived corpora of all kinds of in-house research in archives and libraries began to take digital genres are vital to the study not only of their place. By the turn of the twenty-first century, emergent content but of how to study, search, summarize, index, agreement on many aspects of digital archiving has en- and process their content, areas of study that lie at the abled the creation of credible digital archives. heart of information science. Hence digital archiving is an indispensable element of the global cyberinfrastruc- ture being constructed to support the movement of large parts of scholarly enquiry and human interaction to a

Downloaded by [University of Texas Libraries] at 14:13 02 October 2013 SIGNIFICANCE digitally constructed environment. Without archiving of any kind, a culture is without memory; digital archiving The significance to society of digital archiving, like that has emerged to preserve the digital instantiations of all of all archiving, touches everyone, from the individual to –Digital Design our memories. whole cultures. People are beginning to record their con- struction of identity through ever-multiplying forms of digital inscription, from blogs to social networking plat- forms to activity in virtual worlds, and just as archives CONCEPTS, TECHNOLOGIES, AND THEORIES have preserved paper diaries they are beginning to pre- serve such new expressions. Digital records of organiza- Concepts tions of all kinds, from e-mail lists to to collaboration spaces, likewise require preservation, for The concepts that inform digital archiving are drawn from current accountability and alike. Finally, the sum multiple fields, but central to the practice are those of of a culture’s large scale manifestations, in spheres of archival science. Digital archiving is not worth very much government, commerce, art, and science broadly con- if the objects archived cannot be trusted for their genuine- strued, constitutes the platform upon which it bases its ness. Archival science bases its concept of authenticity on

Encyclopedia of Library and Information Sciences, Third Edition DOI: 10.1081/E-ELIS3-120044332 1518 Copyright # 2010 by Taylor & Francis. All rights reserved. Digital Archiving 1519

the requirement to know the or source of an guided archival practice pertaining to paper records. Thus object, analyzing documents and collections using the because the most significant archival activity has been diplomatic questions who, what, in what manner, with that which has supported governments, the theory de- what support, why, where, and when—in other words, scribing the production of a record of human activity has establishing the context of creation and the subsequent depended significantly upon a concept of hierarchical or- history of the object itself, including its history while in ganization.[5] At a more granular level, the seventeenth- archival custody.[2] For digital archiving, some research- century practice of , originally developed in ers have been trying to reframe these concepts in terms of order to establish the genuineness of documents on the automated creation and harvesting, which has basis of their formal characteristics, has served as a theory led to the assertion that archiving a suitably authentic of record production. Although both of these theories record requires archival intervention in the design of have found some resonance with respect to the digital recordkeeping systems. But clearly the digital archives is records of governments through the international research charged with retaining the original bitstreams of digital of the InterPARES group,[6] reservations have been objects unchanged since they were committed to archival expressed with respect to their applicability to digital custody or supervision. Digital archiving carries with it records in other settings, such as the holdings of the also the necessity for preserving and providing usable records of individuals in collecting archives, because of forms of digital objects so as to present to future users at the transformative effect of digital recordkeeping on com- least the significant properties or affordances of the origi- munication in general. New concerns have been raised nal object that have been judged important to the object’s about the effects of digital communications on the flatten- character. Hence in addition to the original bitstream de- ing of bureaucratic hierarchies. In this context the themes rivative use copies are often created for digital archiving. of what constitutes an adequate representation of the col- lective cultural memory of a society and what can provide Technologies an adequate support for governmental accountability to the citizen have become important questions in planning Digital archiving depends on a broad range of information for digital archiving.[7] technologies, simply because it foresees preserving digital The basic physical and mathematical theories that sup- objects ranging in size and affordances from single files port the technical activities of computing involved in dig- to entire systems and from simple ASCII encoding to com- ital archiving are important to accomplishing it, but in plex dynamic multimedia objects. Because of the enormous general they draw broadly from those of computer sci- size of the potential digital archiving task, it will be depen- ence. One area of active research that is especially signifi- dent on petabyte-scale storage management schemes, pio- cant to digital archiving, however, is that which pertains neered by the San Diego Supercomputer Center.[3] Because to large-scale information patterning as developed in in- of the need to identify and manage digital objects individu- formation retrieval and machine learning. Not only can ally, automated metadata harvesting rather than handmade this research make use of the contents of large-scale digi- description is required for archived files. Since it is recog- tal archives for the development of theory, but actual nized that large-scale digital archiving will probably require management and use of digital archives may draw upon significant redundancy in the face of potential risks, technol- such theory for planning purposes and even for the pur- ogies like polled peer-to-peer integrity-checking among mir- pose of selecting what to preserve in the first place, so the rored repositories are already being used by the CLOCKSS activities of digital archiving may be very dependent upon (Controlled LOCKSS, LOCKSS ¼ Lots of Copies Keep ongoing empirical research.

Downloaded by [University of Texas Libraries] at 14:13 02 October 2013 Stuff Safe) consortium to guarantee the authenticity of repli- cated holdings.[4] The challenges of file size may be some- what mitigated through the use of lossless data compression SOCIAL CONTEXTS OF DIGITAL ARCHIVING schemes, but once compressed, resource discovery for text is dependent upon prior automated indexing, while indexing of Archiving tends to be an institutionalized and large-scale images will be dependent on automated pattern recognition practice, situated under the control of social institutions Design techniques. It is anticipated that new developments in digital like governments, educational institutions, and religions, –Digital technologies like information retrieval, data mining, and the only kinds of institutions with enough power, funding, social network mapping will be used to find and reflect both and motivation to guarantee persistence over a significant original and emergent knowledge structures in digital archi- period of time. Investigation of digital archiving has val collections. therefore so far been undertaken primarily by govern- ments and educational institutions. The transition to Theories computer-supported work and communication in the 1980s made paper recordkeeping insufficient to support Thus far the theories that underlie digital archiving are the accomplishment of government aims and educational either adaptations or contradictions of such theory as has institutions’ needs to preserve research materials and 1520 Digital Archiving

interest in opportunities for research into the archiving the United States an interlocking cluster of memory insti- process itself. But as a 2007 report for the Dutch Konink- tutions and their supporters exists, including the Research lijke Bibliothek points out, the internationalization of sci- Libraries Group, Online Computer Library Center, the entific publication and communication points to the future Digital Libraries Federation, and Council on Library and likelihood that digital archiving may become an interna- Information Resources, as well as philanthropic insti- tionalized cooperative effort.[8] That report also mentions tutions like the Mellon Foundation and governmental possible risk that dependence upon a large-scale but re- granting agencies like the National Archives’ National stricted set of digital repositories, controlled by govern- Historical Publications and Records Commission and the ments and perhaps outsourced to private contractors, National Science Foundation, which later has begun to might call the trustworthiness of digital archiving into administer a large fund appropriated for the Library of question.[9] Congress under an initiative known as the National Digi- Efforts so far toward digital archiving have been tal Information Infrastructure and Preservation Program. differently focused depending on which of the institutions Technology efforts like the Corporation for National of memory aims to undertake it. Libraries primarily hold Research Initiatives (with a focus on the U.S. national published materials that exist in many places in multiple information infrastructure), the Coalition for Networked copies. As they moved to digitize their non-digital hold- Information, and the National Space Science Data Center ings, their concerns were initially for access together (which led to the development of the dominant abstract with preservation of the original physical object. Hence reference model for digital repositories) have brought sig- digital libraries, otherwise in many ways instantiating nificant skills and motivation to the table. Finally, private many of the same features as digital archives, have been institutions like the Getty Foundation have brought con- less concerned with provenance (which the digitizer after siderable expertise and influence to bear on practice and all controls) and the long-term preservation of the digi- research. In , Canada, and Australia efforts are tized bitstream. As libraries increasingly acquire digital- much less fragmented, exhibiting tighter connections be- original materials like books and journals, however, it tween governments and educational institutions and the is beginning to be recognized that the itself more collaborative approach to research that this fosters. must be archived in order to perpetuate its research value. The importance of a combined view including both At a minimum, digital libraries must at least provide some- library and archival perspectives cannot be overstated. how for perpetual access to the journals and digital data- Considering the move to mass digitization by both in order sets on which scholars significantly depend. Archives’ to make existing non-digital holdings widely available, holdings, on the other hand, have always consisted of together with the fact that both research libraries and unpublished, unique objects, so they are primarily archives now receive into their holdings digital-only mate- concerned to preserve original digital objects for as long rials that cannot have a non-digital representation, it is as their remit demands—in many cases (but not always), clear that digital archiving will ultimately unite the two forever. fields in some way. At present it seems that the library field Digital archiving thus represents a superset of digital is still concerned with preservation in the short and me- librarianship, but both librarianship and archival science dium terms only,[10] however, while most archives other have supported important work contributing to digital ar- than governmental ones are still primarily concerned with chiving. The library world has brought to digital archiving looming backlogs of non-digital materials. its traditional concentration on discovery and access and its broad influence in literate countries. Significant national

Downloaded by [University of Texas Libraries] at 14:13 02 October 2013 library initiatives have been undertaken by the National OVERVIEW OF THE PROCESSES OF Library of Australia; by the British Library and related DIGITAL ARCHIVING organizations in the United Kingdom; by the Koninklijke Bibliothek in the ; by the Library of Congress In 2002 the Open Archival Information System (OAIS), –Digital Design in the United States; and by the European Union as a the basic reference model for a trusted digital repository whole. The strengths of the archival world in this effort lie worked out by the NSSDC (National Space Science Data with archives’ track record in preserving information, in Center), became ISO standard 14721.[11] The OAIS refer- some cases over millennia, and their very long term per- ence model frames the functions that such a repository spective. National initiatives in the archival field, clearly should support; these include ingest process, data man- mandated first of all by their countries’ concerns about the agement, archival storage, administration, preservation accountability of government digital recordkeeping, have planning, and access, and each of these is further subdi- been carried out by Archives Canada; by the National vided to include subtasks. In addition, the model envi- Archives and Records Administration in the United States; sions an environment in which both the “producers” of and by the British National Archives. materials to be archived and the “designated user commu- In most of these cases, national efforts are reinforced nity” for whom they are being archived will be explicitly by various related nonprofit institutions. For example, in considered by the overall management of the repository. Digital Archiving 1521

The reference model is not a repository itself, but a kind SELECTION AND APPRAISAL of boundary object which has become a basic reference point for discussions of digital archiving by most library Conventional selection and appraisal have been in turmoil and archives communities, including those interested in over the past 20 years as a result of questions raised by preserving scientific data, business assets, government the tasks of digital archiving.[16] Initially, it was accepted recordkeeping, and cultural resources. Several repository that there would be no change in how digital records were software systems have been developed that adhere to its selected and then appraised for archiving. In the United features, in both the open-source arena (DSpace, Fedora, States, at least in government settings, the rhetoric of DAITSS) and commercially (IBM’s DIAS).[12] Addi- avalanches and floods had been applied to twentieth- tional efforts beyond the original reference model docu- century paper records as a rationale for appraising up to ment have included a substantial document detailing 99% of them as unworthy of retention due to space con- the ingest process and an ongoing project to define the siderations in the practice of government archiving. If requirements for certification of compliance both to the anything, it was believed that the digital tsunami would OAIS requirements and to further requirements for a call for even more swingeing cuts on quantity, in spite of “trusted digital repository.”[13,14] the very much lessened space requirements for digital The mandatory responsibilities outlined in the OAIS records, if users were ever to make any sense of retained model, listed below, clearly mirror in significant ways materials.[17] In Europe, where there was more of a ten- the basic archiving steps recognized by the archival com- dency to leave selection of materials for preservation to munity as core responsibilities, added in square brackets: the creating agencies, the focus went in another direction: selection and appraisal depended upon the ability to es- 1. Negotiate for and accept appropriate information tablish provenance and authenticity for digital records.[18] from information producers [appraisal and selection]. But whereas it seems fairly straightforward to apply 2. Obtain sufficient control of the information to meet traditional archival practices to the selection of digital long-term preservation objectives [acquisition]. objects that look and apparently behave like paper ones, 3. Determine the scope of the ’s user community the situation is not nearly so clear when it comes to [management]. objects whose affordances cannot be rendered on paper. 4. Ensure that the preserved information is indepen- E-mail can certainly be printed out, but in doing so much dently understandable to the user community, in the of its metadata and all of its manipulability will be lost. sense that the information can be understood by users Static Web sites consisting of multiple pages, each of without the assistance of the information producer which contains all of the material to be displayed, can [arrangement and description]. also be captured after a fashion through screenshots and 5. Follow documented policies and procedures to ensure source code, but this is no longer the case with dynamic that the information is preserved against all reason- Web sites that may draw materials from many servers and able contingencies, and to enable dissemination of whose content may primarily come from a dynamic data- authenticated copies of the preserved information in base that is constantly updated. Finally, to move to the its original form, or in a form traceable to the original extreme end of the spectrum of complexity, persistent [preservation]. virtual worlds like those of videogames and collaboration 6. Make the preserved information available to the user spaces consist not only of visible structures and pro- community [reference and access].[12] grammed characters or functions, but of the activity of players or participants interacting with one another,

Downloaded by [University of Texas Libraries] at 14:13 02 October 2013 In fact the authors of the OAIS model have indicated that and none of this can be rendered to any useful degree there is no reason that the model cannot support a physi- on paper.[19] In such cases a great deal more research cal archives as well as a digital one—or indeed a remains to be done to permit us to say with confidence containing non-textual objects. At the same time, the just what should be saved to meet one or another docu- details of these responsibilities go beyond traditional ar- mentation goal. Analyses simply of communication pat- chival practice to fit digital affordances that non-digital terns in the Enron e-mail materials demonstrated that with Design –Digital objects lack, and as such they have challenged archival a complete collection, significant silences, like the failure thinking significantly, as witness a 2006 assertion by the to communicate of an actor later known to have had president of the Society of American Archivists, Richard significant influence on the course of events, become visi- Pearce-Moses, that the tasks performed by archivists re- ble.[20] This example and ongoing research suggest that main the same in spirit, but how they are done must appraisal of digital archival materials will benefit signifi- change with respect to digital records.[15] I will accord- cantly from the use of digital tools for performing analy- ingly address each of the canonical archiving tasks in sis of structure in a given corpus of material (e.g., to sequence to unpack these details and how they have been define an archival “series”). The complexities of automat- tackled (and modified) by early attempts to implement the ing creator-side classification of records through a records OAIS model and its children. management system have proved to be so great that some 1522 Digital Archiving

observers have suggested that rather than appraise piece- reluctance of individual faculty to self-archive (not signif- meal in that way, it would be cheaper in the long run to icantly different from the reluctance of office workers to keep everything. classify their e-mail into a system) has still not been adequately explained.[23] The process of ingest was very simply described in the ACQUISITION original OAIS document, which left preprocessing steps to the records creators. The results of the deployment of insti- Whether in fact digital archiving will always mean acqui- tutional repositories have shown that digital self-archiving, sition for preservation in one or more designated locations in the absence of built-in repository software that adds ideal by an archiving authority has been the first issue up for metadata on creation and ingest processes that automati- discussion under the head of acquisition. The so-called cally vet the materials submitted, still requires additional postcustodial position has held that it is unrealistic for activity by digital archivists to deal with these issues. Re- archival authorities to take custody of and preserve espe- search is ongoing to move the automated interface between cially business records, and researchers have argued that repository and creator closer to the creator. At present there archivists should instead work with the creating entity to is also a significant amount of research aimed at simplifying supervise and audit digital archiving in place.[21] Others the creation of the SIP itself, to consist of a set of related have argued that to fail to take custody of archivable files packaged with a “manifest” describing its internal rela- materials is to fail in the historical duty of an archives, tionships and each of the components in the package.[24,25] which is to be a trustworthy third party preserving materi- With the emergence of multiple architectures built on the als without bias.[22] Although in many situations postcus- OAIS model, it is believed that creating a sort of universal todialism has become the default response when archival SIP, ingestible into multiple repositories, will make future authorities have been denied adequate funding to create a replication of collections for security purposes easier. digital archiving service, most theoretical discussion of digital archiving takes the second position and assumes that materials worthy of cultural investment should be ARRANGEMENT AND DESCRIPTION preserved in archival custody. With that being the case, the acquisition process for It is an article of faith that libraries classify their content digital materials follows several by now well-defined gen- by subject and archives classify by provenance; but in the eral steps, modeled to some degree on conventional archi- world of digital archiving, as both are discovering, one val practice. Having established as a result of an appraisal can have it both ways. Conventional archival arrangement process that a set of digital materials is desirable, the attempts to keep materials from the same source or digital archives must arrange to negotiate with the creat- together (unitary provenance) and to mirror the arrange- ing entity what will be transferred, how it will be format- ment used by the creator to make sense of his/her own ted, and what kind of metadata will come with it; in OAIS records (), while providing for the potential terms, this is negotiation of a Submission Information user of the records a so-called finding aid designed to give Package (SIP) agreement. It assumes that as has been the some idea of what this structure is and what may be found case traditionally, digital archivists will take care of the in each grouping of materials. This kind of aggregational accessioning of the materials into the digital archives. procedure was necessary when records had only a physi- There is, however, another approach, and it has been cal form and could only be in one place at once, but in the instantiated in so-called institutional repositories, digital case of digital records, which must be managed individu-

Downloaded by [University of Texas Libraries] at 14:13 02 October 2013 libraries, or archives created initially by universities ally, these requirements change. to capture and highlight the intellectual output of their Where the materials in question are textual, it is possible faculties. Thus although adhering to the OAIS frame- to know in far greater detail than describers of paper work, the DSpace repository software was intended by records could what exactly is to be found in the records –Digital Design MIT (Massachusetts Institute of Technology) to serve it themselves. What this means for digital archiving is that as an institutional repository for the digital materials pro- although the digital does take care to recover the duced by its academic departments and research insti- arrangement of the materials in their native system context tutes. It was assumed that each of these units would have as seen and used by the creator (by recovering, e.g., a a contact person who would assist faculty members to record of the original file system listing of the files—itself self-archive their materials, while the university library of course a virtual view), it is not necessary to preserve the would attend to cataloging and more complex technical digital materials in their original order as long as that order tasks. Institutional repositories have been freely available can be reconstructed for use. It is even possible to do what for experimentation since 2002, when the DSpace code archivists could not do without copying whole collections, was released to the public, and digital archivists are still and that is to create multiple virtual orderings or presenta- waiting for the rush that never came. Many thousands of tions of a collection to enhance its research use. In addi- dollars have been spent on surveys and studies, but the tion, search engines are already being used to allow users Digital Archiving 1523

to drill down into collections to find just what they want, of archivists since the threat of a “Digital Dark Age” without concerning themselves with the overview of began to be apprehended. From the beginning archivists collection-level metadata unless they wish to do so. presumed that it was crucial to determine whether their Traditional archival descriptive practice has resisted duty to the preservation of public memory in digital form these ideas until recently. In order to draw attention to could even be carried out at all over the long term. In paper collections, archivists in the 1980s began to create 2003, Spindler provided a list of seven challenges of a markup convention, Encoded Archival Description “electronic record” preservation: (EAD), for the online presentation of finding aids.[26] Although EAD is as extensible as its XML substrate 1. Physical degradation of storage media. allows and can well point directly to digital archival 2. Physical obsolescence of storage media. objects at the lowest level of its traditional hierarchical 3. Incompatibility/noninteroperability of storage media. presentation, it is still seldom used to describe digital 4. Software, operating system, or encoding incompati- collections except where an existing institutional practice bility/noninteroperability. is in place. The fact is that even in the case of paper 5. Human error/vandalism. records, the artisanal process of archival description of 6. Backups and snapshots. unique collections using a controlled descriptive practice 7. Metadata.[30] as instantiated in one of the myriad standards (MAD, RAD, ISAD(G), DACS, etc.) cannot benefit from collec- Only a few years later, although all of these challenges tive cataloging as books can, and this has been a major are still recognized, some (notably 1, 2, 4, and 5) have factor in the cost of accessioning archival collections. been set aside as eminently solvable (and even solved) by Even for paper collections this kind of work is beginning existing information technology practices, while the to be seriously curtailed simply because it cannot be jus- remaining two, version incompatibility and metadata, tified in terms of delays to access.[27] The widespread have moved to the center of preservation concerns (if not adoption of Dublin Core metadata for digital library and those of technology specialists).[31] institutional repository collections has presented a similar Since digital objects are mediated by technology, though not so extreme bottleneck. Archival researchers changes in the technology can make them unavailable, and are experimenting with so-called Web 2.0 possibilities that fact is at the heart of the most intractable problem of like user tagging that would share some of the work.[28] : the ability to render a digital object and Another solution may present itself in the form of auto- make it available in the future beyond paradigm shifts in mation of metadata harvesting. Most digital creation pro- software and hardware function. Digital objects, whether grams that people use actually insert some metadata into logically conceived as arrays (text), matrices of arrays the created file itself, usually for the convenience of the (tables), irregularly bounded arrays (sound, still and moving program or the system rather than for the creator. With image), complex objects (groupings of mixed types), or increasing pressure from records management needs of really complex objects (complex objects with a temporally all kinds, vendors of programs have begun to add more changing makeup), nevertheless are digitally represented informative metadata to created files, although they have as a sequence of bits, some of which (metadata, markup) not usually agreed to open metadata standards except in maybeusedtoindicatethenatureoftheobject(s)sorepre- the case of some image and sound formats where such sented. With no indication of what the bits represent or how standards were commercially profitable. This is, however, they are to be arranged to render the object, it is impossible beginning to change under the pressure of the legalization to recover the object without extremely complex crypto-

Downloaded by [University of Texas Libraries] at 14:13 02 October 2013 of digital records as evidence and the development of legal graphic techniques and the possibility of assuming that the digital discovery methods, added to legislated require- object itself comes from a culture whose representational ments for records retention in the industries that buy most conventions are understood. For this reason proprietary and of the software. Thus for the purposes of digital archiving constantly changing software encoding schemes are the it is becoming increasingly feasible to harvest metadata bane of the digital archivist, and obtaining at least limited from digital files in order to automate their packaging for access to them for archival purposes is considered compul- Design digital archiving, while the documentation of file-system sory in the long term. –Digital structures has made it possible to use digital tools to parse This is not to say that carrying digital objects into the the structure of digital collections in situ before removal future has not been done for years already. Widespread from their original technological contexts.[29] use of databases has required it since the 1960s and the U.S. National Archives and Records Administration has been preserving such files since the 1970s, setting up one PRESERVATION of the first standard articles of faith: if the digital object is worth preserving, it must be preserved in its original Preservation of digitally archived objects has received bitstream form. This practice of nontransformative bit- much of the attention and project-based experimentation stream preservation provides not only the guarantee of 1524 Digital Archiving

the authenticity of the object, but also provides the plat- been used in the past. So far, however, the nontransforma- form for creation of transforms that may in time be re- tive methods have been relatively little tested. quired by the end user. Judging these methods requires a return to the theoret- Transformative practices have been used routinely in ical realms of traditional archiving, because preservation many environments, usually for short-term use. Most per- itself and how it is viewed depends significantly upon the sonal computer users who spend much time creating digi- social ends that it is supposed to support. From the archi- tal objects will at some time upgrade one or more pieces val perspective, authenticity—that a digital object has not of software, and often it will be possible to use the new undergone change—is vital, but the salient question is just software to carry out what over time becomes a practice what is meant by that in the digital environment. One of serial conversion: using the conversion affordances favored approach in government builds on the paper ana- provided by the new software to modify the digital object logue, where in spite of deterioration enough of the sig- so that it can be rendered in the new environment. The nificant properties of the object (e.g., human-readable if drawback of this practice for archival use is that over time slightly faded writing) remain for the object to be judged the affordances of the digital object will change subtly, authentic. By this analogy some of the properties of a depending upon what the software vendor thinks is worth digital object (original font, some formatting) can be lost keeping or indeed adding, and it will be difficult and as well, as long as enough properties (at minimum, the expensive to discern just what changes have been made. “content”) remain.[6] Agreement on what those are or Hence if transformative practices are to be applied for ought to be has not yet been achieved.[36] This is not preservation purposes, digital archivists prefer to be in surprising, because the definition of significant properties control of the transformation themselves. Some have cho- depends on who is looking, and a broad range of stake- sen the option of conversion to one of a few standard open holder groups (the public, academics, content creators, formats upon accession by the repository, using either a content owners, digital libraries and archives, software proprietary conversion tool or one written by the preser- companies, and computer scientists, to name a sample) vers (and coupled with retention of the original bitstream have different ideas as to what they expect, want, and as well).[32] At present the most favored formats for this wish to possess of digital objects. Furthermore, the broad kind of conversion are XML and Adobe’s PDF/A, both and growing range of complex and really complex digital now open formats and both presumed to have long lives objects are likely to be candidates for preservation, most ahead of them and accordingly no need for additional con- of them existing in proprietary formats, guarantees that no version work for a long time. Another strategy is to pre- matter how many and good the solutions found, they will serve the original bitstream without converting it until such never be the last word. time as someone wants to use it but it will no longer render in a given standard environment; at that point migration on demand will be undertaken, whereby a conversion program REFERENCE AND ACCESS that can convert all objects of the same format will be written and made available to potential users.[33] In physical archives, where patrons seek access to physi- Two additional strategies, both of them nontransforma- cal materials, archivists have traditionally considered that tive, have also been suggested and modestly tested. Emu- the finding aid must of necessity be reinforced by the lation, familiar to the world of software creation and expertise of the reference archivist. For this reason, apart players of older videogames, means leaving the original from marking up finding aids in EAD to put them online, bitstream alone and instead providing a program for the many archivists have tended to believe that they need do

Downloaded by [University of Texas Libraries] at 14:13 02 October 2013 new environment that supports the applications program- no more. Others, however, have learned from libraries ming interface constituted by the digital object’s original grappling with the Internet and have moved to adopt environment (viewer tool, operating system, and hard- both more uniform library-style cataloging and, in the ware) and thus displays the object as it was originally case of digital materials, have begun to experiment with –Digital Design experienced with all of its features intact. Jeff Rothenberg providing greatly amplified access to the resources them- has long been an evangelist for this method as a kind of selves. Research at the NARA is investigating how to gold standard for preservation.[34] In a variation on this incorporate so-called Web 2.0 tools, especially search theme, Raymond Lorie has proposed that each digital engines, as plug-ins into the in-progress design of the object format be provided with a program capable of Electronic Records Archives.[37] Aggregating metadata rendering it perceivable, written to run on a Universal from multiple repositories is being made easier and more Virtual Computer (UVC). The UVC can then be mapped detailed by the further development of the Open Archives onto current computing environments by means of a sin- Initiative Protocol for Metadata Harvesting (OAI-PMH) gle universal schema interpreter.[35] Significant studies to support aggregations, and this work [Open Archives have been undertaken to evaluate each of these options, Initiative Object Reuse and Exchange (OAI-ORE)] is but the fact is that there is likely to be a use in the long likely to have great usefulness for the fonds-based aggre- term of all of these strategies, just as many of them have gates that digital archives are likely to produce.[38] Digital Archiving 1525

Digital archivists will also have to solve serious ques- resource discovery, complex digital object structure, prove- tions of access. Digital objects, when made available over nance/context, preservation, and even usage records.[39] the Internet, can potentially be copied at will by the user. Digital archiving exists in a social context, an “exo- This is fine when public information from government structure” or ecology that supports the act of digital ar- archives is at issue, but even government archives may chiving and its ongoing development. National and not make all their digital holdings available online imme- international studies of cyberinfrastructure requirements diately, as when they take custody of records that are for the support of scholarly work in the twenty-first cen- restricted for some period of time. Collecting archives, tury have come to the conclusion that the guarantee of the however, which frequently hold materials for which full preservation of the world’s digital heritage will require a rights are not yet available to them, will have to consider relatively small, geographically separated, and trusted such options more carefully. Some have restricted digital network of digital repositories, or perhaps sets of such object access to controlled workstations within the archi- repositories devoted to different interests or require- ves itself, but under such circumstances it is unlikely that ments.[40,41] Bringing these networks into being entails a over time users will tolerate being forced to copy materi- delicate diplomacy among the governments and educa- als by hand when they may be permitted within fair use to tional institutions of the developed world, not unlike the photocopy paper resources. And if copies are not legally emergence of big science during the Cold War of the permissible, ways will need to be found to provide a twentieth century, to create another kind of defense of collection of digital tools to support the kinds of analysis societies and cultures. that patrons want to carry out and to allow the patron to It would be impossible to follow the development of take away at least his/her analysis results. this broad complex of alliances and commitments without the digital network connecting the participants, because the very development is happening essentially online. MANAGEMENT, INFRASTRUCTURE, A certain dramatis personae is beginning to emerge, EXOSTRUCTURE meeting regularly in an established round of conferen- ces and invitational workshops but also reporting their In addition to the technology-supported aspects of digital work online, whether as a condition of the receipt of grant archiving, serious solutions must be found to the human moneys or to remain in the game. Initial exposure of equations involved in its management, including policies, research, following the information science model, appears institutionalization, and the environment in which digital in the (now online) proceedings of refereed conferences like archiving must take place. One fundamental issue, as JCDL (Joint Conference on Digital Libraries), ECDL (Euro- suggested above, is that of intellectual property (ID). pean Conference on Digital Libraries), iPres (International Much of the literature of digital archiving proceeds as Conference on Preservation of Digital Objects), and Open though IP were not an issue or will not be so for long; Repositories, as well as restricted groups like the DLF (Dig- yet one of the strongest drivers for digital archiving de- ital Library Federation), more detailed work is made avail- velopment has been the need to guarantee the persistence able through frequently appearing open online journals like of scholarly journals in an environment where the pub- D-Lib, JoDi, First Monday, RLG DigiNews,andAriadne; lishers are loath to part with their interest in the con- while academically fungible research is published more con- tent.[10] Threats to such interests and indeed to the public ventionally in journals like JASIST, Journal of Internet Cat- interest in preserving materials without copyright restric- aloging, International Journal of Digital Curation,and tions, in the form of inadvertent or intentional damage Archival Science.Mostofthisiskepttrackofthankstothe

Downloaded by [University of Texas Libraries] at 14:13 02 October 2013 or loss of digital materials, make security for digital repo- Australian and the British Digital Preser- sitories a central concern. As the OAIS model and its vation Coalition, which together sponsor the Preserving successors are increasingly developed to add additional Access to Digital Information (PADI) gateway and archive standardized detail to the model for a trustworthy digital of the cumulative literature of digital archiving. repository, requirements in both of these areas will be addressed by an emerging certification regime.[14] Design The infrastructure that must be provided for digital ar- –Digital chiving will be managed locally but to be of much use must CONCLUSION: FUTURE TRENDS be interoperable. The repositories themselves will likely not be based upon the same software for reasons of risk, but Digital archives are beginning to exist in some numbers, they must respond to the same standards, including those especially as digital library–archives hybrids, and com- required for managing IP commitments, authentication for mitments are beginning to be made to the collaborative access, and certification of such interoperability. IP com- work that will be vital to guaranteeing the security of their mitments are not the only requirement for which metadata contents for the foreseeable future. Research to seek solu- standards must be supported by repository architecture; tions to outstanding problems is likely to be shaped by metadata standards are needed, as we have seen, to support several trends: 1526 Digital Archiving

1. Governments and educational institutions will con- http://www.dlib.org/dlib/june01/reich/06reich.html (accessed tinue to dominate the development of digital archives, December 2007). but it is likely that governments especially will begin 5. Yates, J. Control Through Communication; Johns Hopkins: to outsource production work to emergent digital Baltimore, MD, 1989. archives management businesses. 6. MacNeil, H. Providing grounds for trust II: The findings of the authenticity task force of InterPARES. Archivaria 2. Content owners will somehow get their content ar- 2002, 54 (2), 24–58. chived, and will try to make it happen with as little 7. Cook, T. Macroappraisal in theory and practice: Origins, cost as possible to themselves, preferably through characteristics, and implementation in Canada, 1950–2000. governmental initiatives that will also favor the Arch. Sci. 2005, 5 (2), 101–161. continued extension of IP ownership. 8. Hoorens, S.; Rothenberg, J.; van Orange, C.; van der 3. Hardware and software producers will continue to Mandele, M.; Levitt, R. Addressing the Uncertain Future of make change in order to make profit, but moving to Preserving the Past: Towards a Robust Strategy for standards-based formats may increasingly be neces- Digital Archiving and Preservation. RAND Technical sary in order to secure the desired outcome above. Report prepared for the Koninklijke Bibliothek; RAND 4. Artificial intelligence, neural network, and machine Corporation: Santa Monica, CA, 2007. http://www.rand.org/ learning methods will be harnessed to assist in im- pubs/technical_reports/2007/RAND_TR510.pdf (accessed December 2007). proving the results of information retrieval access to 9. Moss, M.; Ross, S. Educating Information Management the large-scale contents of digital archives, leading to Professionals: The Glasgow Perspective; DigCCurr: Chapel developments that might be called emergent classifi- Hill, NC, April 18–20, 2007. http://www.ils.unc.edu/digccurr cation or self-describing archives and tools for on- 2007/papers/rossMoss_paper_6-1.pdf (accessed December the-fly creation of virtual classifications. 2007). 5. Public response to the centralization and gating of 10. Kenney, A.; Entlich, R.; Hirtle, P.; McGovern, M.; digital archiving will parallel current concerns about Buckley, E. e-Journal Archiving Metes and Bounds: A media concentration. Requirements that digital Survey of the Landscape. Council on Library and Infor- archives pay for themselves will ironically lead to mation Resources: Washington, DC, September 2006. http:// demands for pay-per-view for almost all digitally ar- www.clir.org/pubs/reports/pub138/contents.html (accessed chived materials. December 2007). 11. Consultative Committee for Space Data Systems. Refer- 6. The availability of open-source digital archiving ence Model for an Open Archival Information System environments may lead to small-scale personal digital (OAIS); Blue Book; CCSDS: Washington, DC, January archives or local grass-roots peer-to-peer digital ar- 2002. http://public.ccsds.org/publications/archive/650x0b1. chiving on a scale unimaginable before the creation pdf (accessed December 2007). of the Internet. 12. Lavoie, B.F. The Open Archival Information System Refer- ence Model: Introductory Guide. OCLC: DPC Technology Many other scenarios are possible; but the good news is Watch Series Report 04-01, January 2004. http://www. the last suggestion. As living on the Internet becomes less dpconline.org/docs/lavoie_OAIS.pdf (accessed December of a second life and more of a first one for more people, 2007). individuals will increasingly take on these issues as their 13. Consultative Committee for Space Data Systems. Pro- own concerns and perhaps their own responsibility. ducer-Archive Interface Methodology Abstract Standard, Blue Book; CCSDS: Washington, DC, May 2004. http:// public.ccsds.org/publications/archive/651x0b1.pdf (accessed December 2007).

Downloaded by [University of Texas Libraries] at 14:13 02 October 2013 REFERENCES 14. Ambacher, B.I. Government archives and the digital re- pository audit checklist. J. Digital Inform. 2007, 8 (2). 1. Ross, S. Digital Preservation, Archival Science and http://journals.tdl.org/jodi/article/view/190/171 (accessed December 2007). –Digital Methodological Foundations for Digital Libraries. In Key- Design note speech presented at ECDL, Budapest, Hungary, 15. Pearce-Moses, R. Janus in cyberspace: Archives on the September 16–21, 2007. http://www.ecdl2007.org/Keynote_ threshold of the digital era. Presidential address; Society ECDL2007_SROSS.pdf (accessed December 2007). of American Archivists annual meeting, Washington, D.C., 2. Duranti, L. Diplomatics: New Uses for an Old Science; July 31–August 5, 2006; http://www.lib.az.us/diggovt/pre Scarecrow Press: Lanham, MD, 1998. sentations/Janus_abridged.pdf (accessed December 2007). 3. Moore, R.; Marciano, R.; Wan, M.; Sherwin, T.; Frost, R. 16. Craig, B. : Theory and Practice; K.G. Towards the interoperability of web, , and mass Sauer: Munich, 2004. storage technologies for petabyte archives. In Proceedings 17. Henry, L.J. Schellenberg in cyberspace. Am. Archivist of the Fifth NASA GSFC Conference on Mass Storage 1998, 61 (2), 309–327. Systems and Technologies, Goddard Space Flight Center: 18. Duranti, L. The concept of appraisal and archival theory. College Park, MD, September 1996. Am. Archivist 1994, 57 (2), 328–344. 4. Reich, V.; Rosenthal, D.S.H. LOCKSS: A permanent web 19. Botticelli, P. Records appraisal in network organizations. publishing and access system. D-Lib Mag. 2001, 7 (6). Archivaria 2000, 49 (1), 161–191. Digital Archiving 1527

20. Heer, J. Exploring Enron: Visualizing ANLP results. http:// Council on Library and Information Resources: Washington, jheer.org/enron/v1/ (accessed December 2007). D.C., January 1998. http://www.clir.org/pubs/reports/rothen 21. Bearman, D.; Sochats, K. Functional requirements for berg/contents.html (accessed December 2007). evidence in recordkeeping: The Pittsburgh Project. In 35. Lorie, R.A. A project on preservation of digital data. RLG Documents reporting on NHPRC Grant 93-030. http:// DigiNews 2001, 5 (3). http://digitalarchive.oclc.org/da/View www.archimuse.com/papers/nhprc/BACartic.html (accessed ObjectMain.jsp;jsessionid = 84ae0c5f8240b4ad099c48e7470 December 2007). eafdbc194a6eecc7a?fileid = 0000070519:000006287672& 22. Eastwood, T. Should creating agencies keep electronic reqid = 15674 - feature2 (accessed December 2007). records indefinitely. Arch. Manus. 1996, 24 (2), 256–267. 36. Wilson, A. Significant Properties Report, 10/14/2007 23. Markey, K.; Young, S.; St Jean, B.; Kim, J.; Yakel, E. JISC InSPECT Project. http://www.significantproperties. Census of Institutional Repositories in the United States— org.uk/documents/wp22_significant_properties.pdf (accessed MIRACLE Project Research Findings; Council on Library December 2007). and Information Resources: Washington, D.C., February 37. Nguyen, Q.; Le, D. Search framework for a large digi- 2007. http://www.clir.org/pubs/reports/pub140/contents.html tal records archives; Digital Libraries Federation (accessed December 2007). Spring Forum; Pasadena, CA, April 23–25, 2007. http:// 24. Smorul, M.; McGann, M.; JaJa, J. PAWN: A policy-driven www.diglib.org/forums/spring2007/spring2007abstracts.htm software environment for implementing producer-archive (accessed December 2007). interactions to support long term digital preservation. 38. Van de Sompel, H.; Lagoze, C.; Bekaert, J.; Liu, X.; IS&T’s Archiving, May 21–24, 2007; Arlington, VA, Payette, S.; Warner, S. An interoperable fabric for schol- 2007. http://adaptwiki.umiacs.umd.edu/twiki/pub/Lab/Papers/ arly value chains. D-Lib Mag. 2006, 12 (10). http://www. pawnrchiving07-ploaded.pdf (accessed December 2007). dlib.org/dlib/october06/vandesompel/10vandesompel.html 25. METS Editorial Board. Metadata Encoding and Transmis- (accessed December 2007). sion Standard: Primer and Reference Manual, version 1.6; 39. Lavoie, B.; Gartner, R. Preservation metadata. DPC Tech- Library of Congress: Washington, D.C., September 2007. nology Watch Series Report 05-01. http://www.dpconline. http://www.loc.gov/standards/mets/METS Documentation org/docs/reports/dpctw05-01.pdf (accessed December 2007). final 070930 msw.pdf (accessed December 2007). 40. National Science Foundation, Revolutionizing Science and 26. Pitti, D.V.; Duff, W., Eds. Encoded Archival Description Engineering through Cyber-Infrastructure: Report of the on the Internet; Haworth Press: Binghampton, NY, 2002. National Science Foundation Blue-Ribbon Advisory Panel 27. Greene, M.; Meissner, D. More product, less process: on Cyberinfrastructure. NSF: Arlington, VA, 2003. http:// Revamping traditional . Am. Archivist www.communitytechnology.org/nsf_ci_report/report.pdf 2005, 68 (2), 208–263. (accessed December 2007). 28. Yakel, E.; Reynolds, P. The next generation finding aid: 41. American Council of Learned Societies. Our Cultural The polar bear expedition digital collections: A case study Commonwealth: The Final Report of the American Coun- in reference and access to digital materials. New Skills for cil of Learned Societies Commission on Cyberinfrastruc- a Digital Era; National Archives and Records Administra- ture for the Humanities & Social Sciences; ACLS: New tion: Washington, D.C., May 31-June 2, 2006. http://rpm. York, 2006. http://www.acls.org/cyberinfrastructure/OurCul- lib.az.us/NewSkills/CaseStudies/8_Yakel_Reynolds.pdf turalCommonwealth.pdf (accessed December 2007). (accessed December 2007). 29. National Library of New Zealand. Metadata Extraction Tool Developer’s Guide. version 3.0. http://meta-extractor. sourceforge.net/meta-extractor-developers-guide-v3.pdf BIBLIOGRAPHY (accessed December 2007). 30. Spindler, R.P. Electronic records preservation. ELIS 2003, 1. Blouin, F.X., Jr.; Rosenberg, W.G., Eds. Archives, Docu- 1016–1022. mentation, and Institutions of Social Memory: Essays from

Downloaded by [University of Texas Libraries] at 14:13 02 October 2013 31. Rosenthal, D.S.H. Format obsolescence: Scenarios. http:// the Sawyer Seminar; University of Michigan Press: Ann blog.dshr.org/2007/04/format-obsolescence-scenarios.html Arbor, MI, 2006. (accessed December 2007). 2. Gilliland[-Swetland], A.J. Enduring Paradigm, New Oppor- 32. Public Record Office Victoria. Victorian Electronic tunities: The Value of the Archival Perspective in the Digital Records Strategy: Final Report, 1998; http://www.prov. Environment. Council on Library and Information Resources: vic.gov.au/vers/pdf/final.pdf (accessed December 2007). Washington, DC, February 2000. http://www.clir.org/pubs/ Design 33. CURL Exemplars in Digital Archives Project. CEDARS reports /pub89/contents.html (accessed December 2007). –Digital Guide to Digital Preservation Strategies; 2002. http:// 3. Jones, R.; Andrew, T.; MacColl, J. The Institutional Re- www.leeds.ac.uk/cedars/guideto/dpstrategies/dpstrategies. pository; Chandos Publishing: Oxford, 2006. html (accessed December 2007). 4. van Dijck, J. Mediated Memories in the Digital Age; Stan- 34. Rothenberg, J. Avoiding Technological Quicksand: Finding ford University Press: Stanford, 2007. a Viable Technical Foundation for Digital Preservation;